From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1MBkDQ-0005Hu-LH for garchives@archives.gentoo.org; Wed, 03 Jun 2009 06:44:48 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id B451CE044C; Wed, 3 Jun 2009 06:44:47 +0000 (UTC) Received: from puchmayr.linznet.at (puchmayr.linznet.at [80.66.46.165]) by pigeon.gentoo.org (Postfix) with ESMTP id 3F2BCE04B9 for ; Wed, 3 Jun 2009 06:44:47 +0000 (UTC) Received: (qmail 1318 invoked by uid 210); 3 Jun 2009 06:44:46 -0000 Received: from zeus.puchmayr.linznet.at by hephaestos (envelope-from , uid 201) with qmail-scanner-2.05st (clamdscan: 0.94.2/9415. spamassassin: 3.2.1. perlscan: 2.05st. Clear:RC:1(192.168.1.2):. Processed in 0.512616 secs); 03 Jun 2009 06:44:46 -0000 Received: from zeus.puchmayr.linznet.at (192.168.1.2) by hephaestos.puchmayr.linznet.at with SMTP; 3 Jun 2009 06:44:43 -0000 From: Alexander Puchmayr To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Serious stability problems, including freezes Date: Wed, 3 Jun 2009 08:44:32 +0200 User-Agent: KMail/1.9.10 References: <200906022240.08088.alexander.puchmayr@linznet.at> <4A26157E.6030805@f_philipp.fastmail.net> In-Reply-To: <4A26157E.6030805@f_philipp.fastmail.net> Organization: Fa Linznet Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906030844.34437.alexander.puchmayr@linznet.at> X-Archives-Salt: 7c27265c-7d93-4af5-bf92-a779669d714b X-Archives-Hash: 80d38bf2b08983abe2bf01a8daa18c41 Am Mittwoch 03 Juni 2009 schrieb Florian Philipp: > > Do you have a spare network adapter, maybe an older 100MBit PCI card? > Maybe we should rule out a hardware fault on your ethernet chipset first. > I already thought on this, but the results of my tests dont indicate a hardware fault on the ethernet chipset, because: * I can run a ping -f to the machine, it runs for hours without the slightest problem * As long as files transfered are small enough (i.e. they fit in the cache buffer on the server) and the server has enough time to write back it to the disk, there is no problem * If I explicitly force the ethernet link to be 100FD instead of gigabit, the is also no problem. So I don't expect any error using another 100MBit card. For me it looks like as if the following is happening: * Memory gets filled up with cached files, no problem so far * If no more physical ram is available, the system tries to free some memory internally, e.g. by flushing the caches. * If releasing cache entries and writing back data to their respective files does not perform fast enough, an internal memory allocation may not succeed, and I see the "page allocation failure" messages, with different processes/kernel threads in the first line. * I assume that most of the internal kernel threads don't get a problem in this situation, but there may be some critical parts where we do. Hence, it might just be a matter of probability whether it encounters such a critical part, and the probabilty increases with the MB/s the data is put to the NFS server. Greetings Alex