From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1MBkaf-0000cd-Of for garchives@archives.gentoo.org; Wed, 03 Jun 2009 07:08:50 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id C04F0E0375; Wed, 3 Jun 2009 07:08:47 +0000 (UTC) Received: from zion.lichtfels.com (zion.lichtfels.com [88.198.33.170]) by pigeon.gentoo.org (Postfix) with ESMTP id 8D08FE0375 for ; Wed, 3 Jun 2009 07:08:47 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zion.lichtfels.com (Postfix) with ESMTP id 9DA9B18400A for ; Wed, 3 Jun 2009 09:08:36 +0200 (CEST) Received: from zion.lichtfels.com ([127.0.0.1]) by localhost (zion [127.0.0.1]) (amavisd-maia, port 10024) with LMTP id 05604-03 for ; Wed, 3 Jun 2009 09:08:35 +0200 (CEST) Received: from [172.32.99.12] (mail.oops.co.at [213.129.238.225]) by zion.lichtfels.com (Postfix) with ESMTPA id 87138184001 for ; Wed, 3 Jun 2009 09:08:35 +0200 (CEST) Message-ID: <4A26217D.5000002@xunil.at> Date: Wed, 03 Jun 2009 09:08:45 +0200 From: "Stefan G. Weichinger" Organization: oops! User-Agent: Thunderbird 2.0.0.21 (X11/20090512) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Serious stability problems, including freezes References: <200906022240.08088.alexander.puchmayr@linznet.at> <4A26157E.6030805@f_philipp.fastmail.net> <200906030844.34437.alexander.puchmayr@linznet.at> In-Reply-To: <200906030844.34437.alexander.puchmayr@linznet.at> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-maia at lichtfels.com X-Archives-Salt: 4b464882-0e2e-4caf-be79-46a04a363e87 X-Archives-Hash: 9e9e798d041ca520f6ed9a25c6fd5345 Alexander Puchmayr schrieb: > Am Mittwoch 03 Juni 2009 schrieb Florian Philipp: >> Do you have a spare network adapter, maybe an older 100MBit PCI card? >> Maybe we should rule out a hardware fault on your ethernet chipset first. >> > I already thought on this, but the results of my tests dont indicate a > hardware fault on the ethernet chipset, because: > > * I can run a ping -f to the machine, it runs for hours without the > slightest problem > * As long as files transfered are small enough (i.e. they fit in the cache > buffer on the server) and the server has enough time to write back it to > the disk, there is no problem > * If I explicitly force the ethernet link to be 100FD instead of gigabit, > the is also no problem. So I don't expect any error using another 100MBit > card. I would cross-check that anyway just to be sure. Other nic, other kernel-module ... etc > For me it looks like as if the following is happening: > > * Memory gets filled up with cached files, no problem so far > * If no more physical ram is available, the system tries to free some memory > internally, e.g. by flushing the caches. > * If releasing cache entries and writing back data to their respective > files does not perform fast enough, an internal memory allocation may not > succeed, and I see the "page allocation failure" messages, with different > processes/kernel threads in the first line. > * I assume that most of the internal kernel threads don't get a problem in > this situation, but there may be some critical parts where we do. Hence, it > might just be a matter of probability whether it encounters such a critical > part, and the probabilty increases with the MB/s the data is put to the NFS > server. errm, I dunno ... but how would then smaller and slower nfs-servers run fine? Sounds unlikely to me. Any special network-settings used? buffer-sizes, MTU, jumbo frames? switch problems (you seem to have tried auto-negotiation off already). Stefan