public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] random, hard lockups
@ 2005-07-07 19:44 Matt Garman
  2005-07-07 20:10 ` Brett I. Holcomb
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Matt Garman @ 2005-07-07 19:44 UTC (permalink / raw
  To: gentoo-user


My system has been experiencing random, hard (must physically
reboot) lockups over the last year or so. The lockups are thus far
completely unpredictable, and it always occurs when I'm not at my
computer (during the night, at work, etc). When the computer goes
into this hard lock up state, the monitor is blank (but not in power
save mode); the computer will respond to pings; I cannot ssh into
the computer.

I just ran 14 hours of memtest86+ and found no errors.

I also checked the logs---nothing unusual there (I can't even
pinpoint exactly when the lockups occur).

Even worse, my computer may be fine for weeks or even months (i.e.
completely stable), then suddently start locking up about once a
day.

Does anyone have any idea what the problem may be? For what it's
worth, I have a very high ERR count in /proc/interrupts:

# uptime
08:58:35 up  1:29, 12 users,  load average: 1.22, 1.28, 1.20

# cat /proc/interrupts
CPU0       
0:    5391962          XT-PIC  timer
1:       3486          XT-PIC  i8042
2:          0          XT-PIC  cascade
5:     481356          XT-PIC  sym53c8xx, NVidia nForce2, ohci1394
8:          2          XT-PIC  rtc
9:          0          XT-PIC  acpi
10:          0          XT-PIC  ohci_hcd
11:     534284          XT-PIC  sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia
12:     115771          XT-PIC  i8042
14:        473          XT-PIC  ide0
15:         11          XT-PIC  ide1
NMI:          0
LOC:    5391944
ERR:      33336
MIS:          0


Note that the machine has only been up for 90 minutes and it's
already logged 33k ERRs (though I don't exactly know what that
means, my other to nforce2 boards have a zero ERR count).

For what it's worth, this computer has the following hardware: Asus
A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM,
GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller,
Fujitsu SCSI Drive, Samsung IDE drive.

Another idea, I see the following in my dmesg:


PCI: Using ACPI for IRQ routing
** PCI interrupts are no longer routed automatically.  If this
** causes a device to stop working, it is probably because the
** driver failed to call pci_enable_device().  As a temporary
** workaround, the "pci=routeirq" argument restores the old
** behavior.  If this argument makes the device work again,
** please email the output of "lspci" to bjorn.helgaas@hp.com
** so I can fix the driver.

In my kernel config, I have Processor Type and Features -> Local
APIC support on unicprocessors and IO-APIC support on unicprocessors
both enabled. However, as you can see above, the kernel is still
using XT-PIC. My other two nforce2 boards (with the same kernel
config) use IO-APIC. I'm not sure exactly what all this means, but
it may mean something to somebody. :)

Thanks for any help or suggestions!
Matt

p.s. I'd be happy to post my complete dmesg if anyone would like to
see it.  --MG

-- 
Matt Garman
email at: http://raw-sewage.net/index.php?file=email
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-user] random, hard lockups
  2005-07-07 19:44 [gentoo-user] random, hard lockups Matt Garman
@ 2005-07-07 20:10 ` Brett I. Holcomb
  2005-07-07 23:03 ` Richard Fish
  2005-07-07 23:48 ` W.Kenworthy
  2 siblings, 0 replies; 4+ messages in thread
From: Brett I. Holcomb @ 2005-07-07 20:10 UTC (permalink / raw
  To: gentoo-user

Are your SCSI drives terminated properly?

On Thu, 7 Jul 2005, Matt Garman wrote:

>
> My system has been experiencing random, hard (must physically
> reboot) lockups over the last year or so. The lockups are thus far
> completely unpredictable, and it always occurs when I'm not at my
> computer (during the night, at work, etc). When the computer goes
> into this hard lock up state, the monitor is blank (but not in power
> save mode); the computer will respond to pings; I cannot ssh into
> the computer.
>
> I just ran 14 hours of memtest86+ and found no errors.
-- 

Brett I. Holcomb
brettholcomb@R777bellsouth.net
Registered Linux User #188143
Remove R777 to email
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-user] random, hard lockups
  2005-07-07 19:44 [gentoo-user] random, hard lockups Matt Garman
  2005-07-07 20:10 ` Brett I. Holcomb
@ 2005-07-07 23:03 ` Richard Fish
  2005-07-07 23:48 ` W.Kenworthy
  2 siblings, 0 replies; 4+ messages in thread
From: Richard Fish @ 2005-07-07 23:03 UTC (permalink / raw
  To: gentoo-user

Matt Garman wrote:

># cat /proc/interrupts
>CPU0       
>0:    5391962          XT-PIC  timer
>1:       3486          XT-PIC  i8042
>2:          0          XT-PIC  cascade
>5:     481356          XT-PIC  sym53c8xx, NVidia nForce2, ohci1394
>8:          2          XT-PIC  rtc
>9:          0          XT-PIC  acpi
>10:          0          XT-PIC  ohci_hcd
>11:     534284          XT-PIC  sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia
>  
>

I think you have a problem here.  Two devices, your SCSI controller and
the USB 1.1 driver both think they need 2 different interrupts.  That
just doesn't seem right!!

-Richard

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-user] random, hard lockups
  2005-07-07 19:44 [gentoo-user] random, hard lockups Matt Garman
  2005-07-07 20:10 ` Brett I. Holcomb
  2005-07-07 23:03 ` Richard Fish
@ 2005-07-07 23:48 ` W.Kenworthy
  2 siblings, 0 replies; 4+ messages in thread
From: W.Kenworthy @ 2005-07-07 23:48 UTC (permalink / raw
  To: gentoo-user

Check your chost and kernel host type hasnt changed recently.  Have had
this happen in the past and the system only crashes when it reaches some
incompatible code which makes it hard to track down.

BillK

On Thu, 2005-07-07 at 14:44 -0500, Matt Garman wrote:
> My system has been experiencing random, hard (must physically
> reboot) lockups over the last year or so. The lockups are thus far
> completely unpredictable, and it always occurs when I'm not at my
> computer (during the night, at work, etc). When the computer goes
> into this hard lock up state, the monitor is blank (but not in power
> save mode); the computer will respond to pings; I cannot ssh into
> the computer.
> 
> I just ran 14 hours of memtest86+ and found no errors.
> 
> I also checked the logs---nothing unusual there (I can't even
> pinpoint exactly when the lockups occur).
> 
> Even worse, my computer may be fine for weeks or even months (i.e.
> completely stable), then suddently start locking up about once a
> day.
> 
> Does anyone have any idea what the problem may be? For what it's
> worth, I have a very high ERR count in /proc/interrupts:
> 
> # uptime
> 08:58:35 up  1:29, 12 users,  load average: 1.22, 1.28, 1.20
> 
> # cat /proc/interrupts
> CPU0       
> 0:    5391962          XT-PIC  timer
> 1:       3486          XT-PIC  i8042
> 2:          0          XT-PIC  cascade
> 5:     481356          XT-PIC  sym53c8xx, NVidia nForce2, ohci1394
> 8:          2          XT-PIC  rtc
> 9:          0          XT-PIC  acpi
> 10:          0          XT-PIC  ohci_hcd
> 11:     534284          XT-PIC  sym53c8xx, ohci_hcd, ehci_hcd, eth0, nvidia
> 12:     115771          XT-PIC  i8042
> 14:        473          XT-PIC  ide0
> 15:         11          XT-PIC  ide1
> NMI:          0
> LOC:    5391944
> ERR:      33336
> MIS:          0
> 
> 
> Note that the machine has only been up for 90 minutes and it's
> already logged 33k ERRs (though I don't exactly know what that
> means, my other to nforce2 boards have a zero ERR count).
> 
> For what it's worth, this computer has the following hardware: Asus
> A7N8X Deluxe, AMD Athlon XP 2500 (Barton core), 2x512 MB RAM,
> GeForce4 ti4200 AGP 8x video card, LSI Logic SCSI controller,
> Fujitsu SCSI Drive, Samsung IDE drive.
> 
> Another idea, I see the following in my dmesg:
> 
> 
> PCI: Using ACPI for IRQ routing
> ** PCI interrupts are no longer routed automatically.  If this
> ** causes a device to stop working, it is probably because the
> ** driver failed to call pci_enable_device().  As a temporary
> ** workaround, the "pci=routeirq" argument restores the old
> ** behavior.  If this argument makes the device work again,
> ** please email the output of "lspci" to bjorn.helgaas@hp.com
> ** so I can fix the driver.
> 
> In my kernel config, I have Processor Type and Features -> Local
> APIC support on unicprocessors and IO-APIC support on unicprocessors
> both enabled. However, as you can see above, the kernel is still
> using XT-PIC. My other two nforce2 boards (with the same kernel
> config) use IO-APIC. I'm not sure exactly what all this means, but
> it may mean something to somebody. :)
> 
> Thanks for any help or suggestions!
> Matt
> 
> p.s. I'd be happy to post my complete dmesg if anyone would like to
> see it.  --MG
> 
> -- 
> Matt Garman
> email at: http://raw-sewage.net/index.php?file=email

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-07-07 23:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-07 19:44 [gentoo-user] random, hard lockups Matt Garman
2005-07-07 20:10 ` Brett I. Holcomb
2005-07-07 23:03 ` Richard Fish
2005-07-07 23:48 ` W.Kenworthy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox