From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.43) id 1DqgBk-0005fh-Hr for garchives@archives.gentoo.org; Thu, 07 Jul 2005 23:53:52 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.13.4/8.13.4) with SMTP id j67NqeI2013399; Thu, 7 Jul 2005 23:52:40 GMT Received: from mail.iinet.net.au (mail-02.iinet.net.au [203.59.3.34]) by robin.gentoo.org (8.13.4/8.13.4) with SMTP id j67NmCGv029924 for ; Thu, 7 Jul 2005 23:48:14 GMT Received: (qmail 19618 invoked from network); 7 Jul 2005 23:48:14 -0000 Received: from unknown (HELO moriah.localdomain) (203.59.83.149) by mail.iinet.net.au with SMTP; 7 Jul 2005 23:48:13 -0000 Received: from localhost (localhost [127.0.0.1]) by moriah.localdomain (Postfix) with ESMTP id 980F0544BC for ; Fri, 8 Jul 2005 07:48:13 +0800 (WST) Received: from moriah.localdomain ([127.0.0.1]) by localhost (moriah [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 20715-03 for ; Fri, 8 Jul 2005 07:48:07 +0800 (WST) Received: from localhost (localhost [127.0.0.1]) by moriah.localdomain (Postfix) with ESMTP id A1ACB544B7 for ; Fri, 8 Jul 2005 07:48:07 +0800 (WST) Subject: Re: [gentoo-user] random, hard lockups From: "W.Kenworthy" To: gentoo-user@lists.gentoo.org In-Reply-To: <20050707194438.GA9708@raw-sewage.net> References: <20050707194438.GA9708@raw-sewage.net> Content-Type: text/plain Date: Fri, 08 Jul 2005 07:48:05 +0800 Message-Id: <1120780086.31360.58.camel@bunyip> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@gentoo.org Reply-to: gentoo-user@lists.gentoo.org Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at localdomain X-Archives-Salt: ca38f3d2-7df1-41c3-a3b6-3549b50a54c9 X-Archives-Hash: d32744095004b42262144c3f0b86e033 Check your chost and kernel host type hasnt changed recently. Have had this happen in the past and the system only crashes when it reaches some incompatible code which makes it hard to track down. BillK On Thu, 2005-07-07 at 14:44 -0500, Matt Garman wrote: > My system has been experiencing random, hard (must physically > reboot) lockups over the last year or so. The lockups are thus far > completely unpredictable, and it always occurs when I'm not at my > computer (during the night, at work, etc). When the computer goes > into this hard lock up state, the monitor is blank (but not in power > save mode); the computer will respond to pings; I cannot ssh into > the computer. > > I just ran 14 hours of memtest86+ and found no errors. > > I also checked the logs---nothing unusual there (I can't even > pinpoint exactly when the lockups occur). > > Even worse, my computer may be fine for weeks or even months (i.e. > completely stable), then suddently start locking up about once a > day. > > Does anyone have any idea what the problem may be? For what it's > worth, I have a very high ERR count in /proc/interrupts: > > # uptime > 08:58:35 up 1:29, 12 users, load average: 1.22, 1.28, 1.20 > > # cat /proc/interrupts > CPU0 > 0: 5391962 XT-PIC timer > 1: 3486 XT-PIC i8042 > 2: 0 XT-PIC cascade > 5: 481356 XT-PIC sym53c8xx, NVidia nForce2, ohci1394 > 8: 2 XT-PIC rtc > 9: 0 XT-PIC acpi > 10: 0 XT-PIC ohci_hcd > 11: 534284 XT-PIC sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia > 12: 115771 XT-PIC i8042 > 14: 473 XT-PIC ide0 > 15: 11 XT-PIC ide1 > NMI: 0 > LOC: 5391944 > ERR: 33336 > MIS: 0 > > > Note that the machine has only been up for 90 minutes and it's > already logged 33k ERRs (though I don't exactly know what that > means, my other to nforce2 boards have a zero ERR count). > > For what it's worth, this computer has the following hardware: Asus > A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM, > GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller, > Fujitsu SCSI Drive, Samsung IDE drive. > > Another idea, I see the following in my dmesg: > > > PCI: Using ACPI for IRQ routing > ** PCI interrupts are no longer routed automatically. If this > ** causes a device to stop working, it is probably because the > ** driver failed to call pci_enable_device(). As a temporary > ** workaround, the "pci=routeirq" argument restores the old > ** behavior. If this argument makes the device work again, > ** please email the output of "lspci" to bjorn.helgaas@hp.com > ** so I can fix the driver. > > In my kernel config, I have Processor Type and Features -> Local > APIC support on unicprocessors and IO-APIC support on unicprocessors > both enabled. However, as you can see above, the kernel is still > using XT-PIC. My other two nforce2 boards (with the same kernel > config) use IO-APIC. I'm not sure exactly what all this means, but > it may mean something to somebody. :) > > Thanks for any help or suggestions! > Matt > > p.s. I'd be happy to post my complete dmesg if anyone would like to > see it. --MG > > -- > Matt Garman > email at: http://raw-sewage.net/index.php?file=email -- gentoo-user@gentoo.org mailing list