From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.43) id 1DqcPF-0001ce-6d for garchives@archives.gentoo.org; Thu, 07 Jul 2005 19:51:33 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.13.4/8.13.4) with SMTP id j67JofbT000657; Thu, 7 Jul 2005 19:50:41 GMT Received: from smtp.gentoo.org (smtp.gentoo.org [134.68.220.30]) by robin.gentoo.org (8.13.4/8.13.4) with ESMTP id j67Jib4q015058 for ; Thu, 7 Jul 2005 19:44:38 GMT Received: from ylpvm43-ext.prodigy.net ([207.115.57.74] helo=ylpvm43.prodigy.net) by smtp.gentoo.org with esmtp (Exim 4.43) id 1DqcIX-0004Ih-B4 for gentoo-user@lists.gentoo.org; Thu, 07 Jul 2005 19:44:37 +0000 Received: from pimout1-ext.prodigy.net (pimout1-int.prodigy.net [207.115.5.65]) by ylpvm43.prodigy.net (8.12.10 outbound/8.12.10) with ESMTP id j67JideA024336 for ; Thu, 7 Jul 2005 15:44:39 -0400 X-ORBL: [68.249.218.36] Received: from sewage.raw-sewage.fake (adsl-68-249-218-36.dsl.peoril.ameritech.net [68.249.218.36]) by pimout1-ext.prodigy.net (8.13.4 outbound domainkey aix/8.13.4) with ESMTP id j67JiUJt018388 for ; Thu, 7 Jul 2005 15:44:37 -0400 Received: by sewage.raw-sewage.fake (Postfix, from userid 1000) id 8620B52DED; Thu, 7 Jul 2005 14:44:38 -0500 (CDT) Date: Thu, 7 Jul 2005 14:44:38 -0500 From: Matt Garman To: gentoo-user Subject: [gentoo-user] random, hard lockups Message-ID: <20050707194438.GA9708@raw-sewage.net> Mail-Followup-To: gentoo-user Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@gentoo.org Reply-to: gentoo-user@lists.gentoo.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.8i X-Archives-Salt: 9c8a4c1b-753a-4ab0-96d7-1b96d3ca6aba X-Archives-Hash: 20c5931eb9ee90384e5beb4dc283f034 My system has been experiencing random, hard (must physically reboot) lockups over the last year or so. The lockups are thus far completely unpredictable, and it always occurs when I'm not at my computer (during the night, at work, etc). When the computer goes into this hard lock up state, the monitor is blank (but not in power save mode); the computer will respond to pings; I cannot ssh into the computer. I just ran 14 hours of memtest86+ and found no errors. I also checked the logs---nothing unusual there (I can't even pinpoint exactly when the lockups occur). Even worse, my computer may be fine for weeks or even months (i.e. completely stable), then suddently start locking up about once a day. Does anyone have any idea what the problem may be? For what it's worth, I have a very high ERR count in /proc/interrupts: # uptime 08:58:35 up 1:29, 12 users, load average: 1.22, 1.28, 1.20 # cat /proc/interrupts CPU0 0: 5391962 XT-PIC timer 1: 3486 XT-PIC i8042 2: 0 XT-PIC cascade 5: 481356 XT-PIC sym53c8xx, NVidia nForce2, ohci1394 8: 2 XT-PIC rtc 9: 0 XT-PIC acpi 10: 0 XT-PIC ohci_hcd 11: 534284 XT-PIC sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia 12: 115771 XT-PIC i8042 14: 473 XT-PIC ide0 15: 11 XT-PIC ide1 NMI: 0 LOC: 5391944 ERR: 33336 MIS: 0 Note that the machine has only been up for 90 minutes and it's already logged 33k ERRs (though I don't exactly know what that means, my other to nforce2 boards have a zero ERR count). For what it's worth, this computer has the following hardware: Asus A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM, GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller, Fujitsu SCSI Drive, Samsung IDE drive. Another idea, I see the following in my dmesg: PCI: Using ACPI for IRQ routing ** PCI interrupts are no longer routed automatically. If this ** causes a device to stop working, it is probably because the ** driver failed to call pci_enable_device(). As a temporary ** workaround, the "pci=routeirq" argument restores the old ** behavior. If this argument makes the device work again, ** please email the output of "lspci" to bjorn.helgaas@hp.com ** so I can fix the driver. In my kernel config, I have Processor Type and Features -> Local APIC support on unicprocessors and IO-APIC support on unicprocessors both enabled. However, as you can see above, the kernel is still using XT-PIC. My other two nforce2 boards (with the same kernel config) use IO-APIC. I'm not sure exactly what all this means, but it may mean something to somebody. :) Thanks for any help or suggestions! Matt p.s. I'd be happy to post my complete dmesg if anyone would like to see it. --MG -- Matt Garman email at: http://raw-sewage.net/index.php?file=email -- gentoo-user@gentoo.org mailing list