public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user]  spontaneous reboots.. what to look for
@ 2009-02-15 23:42 Harry Putnam
  2009-02-15 23:56 ` Mark Knecht
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Harry Putnam @ 2009-02-15 23:42 UTC (permalink / raw
  To: gentoo-user

I've been experiencing spontaneous reboots on one gentoo machine
lately.  Looking thru /var/log/messages... I see the restarts but
looking above that... I'm not seeing anything I recognize as being a
culprit. 

Its been happening for a few weeks... but I've been busy and only now
digging into it ( The machine is no kind of server ).

It appears to only happen in X (I'm using xfce4) and I've only noticed
it since I started running 2.6.28 kernels.  Although I couldn't say
that it seemed to be directly related.

I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
rebooting. 

It does not appear to be heat realated... but I am only now using
lm_sensors to keep an accurate record and see if there appears to be a
relationship. 

I've had two today so either its happening more often or I'm just
spending more time on that machine.

It may also be on the first or second time its happened while I as
actually right at the keyboard.

I'm sorry to be so vague about it, but in truth, I've been pretty lazy
about it... since no real harm comes of an unexpected reboot on that
machine (so far anyway).  But clearly something that has to be figured
out. 

The only things I've checked so far... 
1) browsing thru /var/log/messages (Having trouble recognizing any
   thing that looks suspicious.

   I have noticed what appears to be a time/date anomaly where the
   progression of time is suddenly irregular.  That is, an earlier
   time shows up amongst some later times.

   It appears to have been me sudoing to visudo.  And apparently
   having /etc/sudoers open long enough for the closing of it to be
   earlier than other events taking place.
   
   Again ... I'm not real sure exactly what happened there but it
   does not appear to coincide with a reboot anyway.
 
2) checking how hot the cpu is getting (Doesn't appear to be a
   problem) But now running a cron job recording temperatures every 10
   minutes. So that may turn up something.

3) checking for overfilled disks.  (none show in df -h)




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user] spontaneous reboots.. what to look for
  2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam
@ 2009-02-15 23:56 ` Mark Knecht
  2009-02-16  0:26   ` [gentoo-user] " Harry Putnam
  2009-02-16  0:26   ` [gentoo-user] " Dale
  2009-02-16  0:16 ` Volker Armin Hemmann
  2009-02-16  0:57 ` [gentoo-user] " Neil Bothwick
  2 siblings, 2 replies; 13+ messages in thread
From: Mark Knecht @ 2009-02-15 23:56 UTC (permalink / raw
  To: gentoo-user

On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam <reader@newsguy.com> wrote:
> I've been experiencing spontaneous reboots on one gentoo machine
> lately.  Looking thru /var/log/messages... I see the restarts but
> looking above that... I'm not seeing anything I recognize as being a
> culprit.
>
> Its been happening for a few weeks... but I've been busy and only now
> digging into it ( The machine is no kind of server ).
>
> It appears to only happen in X (I'm using xfce4) and I've only noticed
> it since I started running 2.6.28 kernels.  Although I couldn't say
> that it seemed to be directly related.
>
> I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
> rebooting.
>
> It does not appear to be heat realated... but I am only now using
> lm_sensors to keep an accurate record and see if there appears to be a
> relationship.
>
> I've had two today so either its happening more often or I'm just
> spending more time on that machine.
>
> It may also be on the first or second time its happened while I as
> actually right at the keyboard.
>
> I'm sorry to be so vague about it, but in truth, I've been pretty lazy
> about it... since no real harm comes of an unexpected reboot on that
> machine (so far anyway).  But clearly something that has to be figured
> out.
>
> The only things I've checked so far...
> 1) browsing thru /var/log/messages (Having trouble recognizing any
>   thing that looks suspicious.
>
>   I have noticed what appears to be a time/date anomaly where the
>   progression of time is suddenly irregular.  That is, an earlier
>   time shows up amongst some later times.
>
>   It appears to have been me sudoing to visudo.  And apparently
>   having /etc/sudoers open long enough for the closing of it to be
>   earlier than other events taking place.
>
>   Again ... I'm not real sure exactly what happened there but it
>   does not appear to coincide with a reboot anyway.
>
> 2) checking how hot the cpu is getting (Doesn't appear to be a
>   problem) But now running a cron job recording temperatures every 10
>   minutes. So that may turn up something.
>
> 3) checking for overfilled disks.  (none show in df -h)
>

Reseat memory and PCI cards, etc. Consider removing for a period of
time any hardware not absolutely necessary to debug the problem. (I.e.
- second video card, extra disk drives, extra network adapters, etc.)
Run memtest86 for a few days if you can spare the machine. Run
spinrite, etc., to look for drive problems. Open the box up and place
a fan blowing extra air for additional cooling.

good luck,
Mark



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user]  spontaneous reboots.. what to look for
  2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam
  2009-02-15 23:56 ` Mark Knecht
@ 2009-02-16  0:16 ` Volker Armin Hemmann
  2009-02-16  0:22   ` Saphirus Sage
  2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
  2009-02-16  0:57 ` [gentoo-user] " Neil Bothwick
  2 siblings, 2 replies; 13+ messages in thread
From: Volker Armin Hemmann @ 2009-02-16  0:16 UTC (permalink / raw
  To: gentoo-user

So the problem started recently.

That means it is either:
a cap going bad.
oxidized contacts.
dust clogging the fans.
PSU is going bad.
something obscure.

Do the easy thing first. Clean your case, reseat all cards and memory modules 
and check all caps while doing so. Any of them deformed? The 'head' going up? 
Strange stuff around its feet? Congratulation, you need new hardware.

If you don't find a bad cap and the problem persists, get a new PSU. A good 
one. Not big - most PSUs are oversized, but good quality. Anandtech has 
something about psu's, so does tomshardware (most of their tests are rubbish, 
but their psu tests are ok). If the problem goes away, congratulation!
If not, well, then report back ;)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user]  spontaneous reboots.. what to look for
  2009-02-16  0:16 ` Volker Armin Hemmann
@ 2009-02-16  0:22   ` Saphirus Sage
  2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
  1 sibling, 0 replies; 13+ messages in thread
From: Saphirus Sage @ 2009-02-16  0:22 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Feb 15, 2009, at 7:16 PM, Volker Armin Hemmann <volkerarmin@googlemail.com 
 > wrote:

> So the problem started recently.
>
> That means it is either:
> a cap going bad.
> oxidized contacts.
> dust clogging the fans.
> PSU is going bad.
> something obscure.
>
> Do the easy thing first. Clean your case, reseat all cards and  
> memory modules
> and check all caps while doing so. Any of them deformed? The 'head'  
> going up?
> Strange stuff around its feet? Congratulation, you need new hardware.
>
> If you don't find a bad cap and the problem persists, get a new PSU.  
> A good
> one. Not big - most PSUs are oversized, but good quality. Anandtech  
> has
> something about psu's, so does tomshardware (most of their tests are  
> rubbish,
> but their psu tests are ok). If the problem goes away, congratulation!
> If not, well, then report back ;)
>
I had a similar issue even when not running X. To be honest, I can't  
say I have a concrete idea of exactly what caused it. I simply became  
security-nuts and began wondering if it wasn't someone just toying  
with me; hardened my sshd config and installed denyhosts to monitor  
failed loggins. This was a month ago and my uptime has been perfect,  
with no restarts. 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gentoo-user]  Re: spontaneous reboots.. what to look for
  2009-02-15 23:56 ` Mark Knecht
@ 2009-02-16  0:26   ` Harry Putnam
  2009-02-16  1:01     ` Mark Knecht
  2009-02-16  0:26   ` [gentoo-user] " Dale
  1 sibling, 1 reply; 13+ messages in thread
From: Harry Putnam @ 2009-02-16  0:26 UTC (permalink / raw
  To: gentoo-user

Mark Knecht <markknecht@gmail.com> writes:

> Reseat memory and PCI cards, etc. Consider removing for a period of
> time any hardware not absolutely necessary to debug the problem. (I.e.
> - second video card, extra disk drives, extra network adapters, etc.)
> Run memtest86 for a few days if you can spare the machine. Run
> spinrite, etc., to look for drive problems. Open the box up and place
> a fan blowing extra air for additional cooling.

That all sound fairly drastic... wouldn't any or all of those problems
leave some kind of track?  Something I can look for short of tearing
up the whole machine?

I have had the experience of breaking something in the hardware by
handling it when I really didn't need to.  An expensive video card I
had ( a few yrs ago) comes to mind  The fit was so close that dicking
around with it I broke off a small piece with some bit of circuitry in
it.

Of course I had problems with getting a viewable screen so ended up
soldering it back in... (the piece, not card to pci slot.. hehe) That
fell apart again in the same place later on and I ended up using a
piece of bailing wire to wire it in place.

Surprisingly it worked for a long time that way.

Another time... I took my wifes' computer apart (bad idea), ostensibly
adding memory and somehow broke one of the clamps holding the heatsink and
fan onto the cpu.  It could flop around quite a bit... but it actually
worked like that. 

Eventually I wired it down too...  Lasted a year or so.

But in both cases it was quite a bit of grief.

Before I retired.. I was a field construction boilermaker (weldor and
rigger).   For most of my time in that trade, anything less than 1/2"
steel plate was viewed as sheet metal.. Most of the work was 1" and up.

I didn't develop a nice light touch .. needless to say.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user] spontaneous reboots.. what to look for
  2009-02-15 23:56 ` Mark Knecht
  2009-02-16  0:26   ` [gentoo-user] " Harry Putnam
@ 2009-02-16  0:26   ` Dale
  1 sibling, 0 replies; 13+ messages in thread
From: Dale @ 2009-02-16  0:26 UTC (permalink / raw
  To: gentoo-user

Mark Knecht wrote:
> On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam <reader@newsguy.com> wrote:
>   
>> I've been experiencing spontaneous reboots on one gentoo machine
>> lately.  Looking thru /var/log/messages... I see the restarts but
>> looking above that... I'm not seeing anything I recognize as being a
>> culprit.
>>
>> Its been happening for a few weeks... but I've been busy and only now
>> digging into it ( The machine is no kind of server ).
>>
>> It appears to only happen in X (I'm using xfce4) and I've only noticed
>> it since I started running 2.6.28 kernels.  Although I couldn't say
>> that it seemed to be directly related.
>>
>> I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
>> rebooting.
>>
>> It does not appear to be heat realated... but I am only now using
>> lm_sensors to keep an accurate record and see if there appears to be a
>> relationship.
>>
>> I've had two today so either its happening more often or I'm just
>> spending more time on that machine.
>>
>> It may also be on the first or second time its happened while I as
>> actually right at the keyboard.
>>
>> I'm sorry to be so vague about it, but in truth, I've been pretty lazy
>> about it... since no real harm comes of an unexpected reboot on that
>> machine (so far anyway).  But clearly something that has to be figured
>> out.
>>
>> The only things I've checked so far...
>> 1) browsing thru /var/log/messages (Having trouble recognizing any
>>   thing that looks suspicious.
>>
>>   I have noticed what appears to be a time/date anomaly where the
>>   progression of time is suddenly irregular.  That is, an earlier
>>   time shows up amongst some later times.
>>
>>   It appears to have been me sudoing to visudo.  And apparently
>>   having /etc/sudoers open long enough for the closing of it to be
>>   earlier than other events taking place.
>>
>>   Again ... I'm not real sure exactly what happened there but it
>>   does not appear to coincide with a reboot anyway.
>>
>> 2) checking how hot the cpu is getting (Doesn't appear to be a
>>   problem) But now running a cron job recording temperatures every 10
>>   minutes. So that may turn up something.
>>
>> 3) checking for overfilled disks.  (none show in df -h)
>>
>>     
>
> Reseat memory and PCI cards, etc. Consider removing for a period of
> time any hardware not absolutely necessary to debug the problem. (I.e.
> - second video card, extra disk drives, extra network adapters, etc.)
> Run memtest86 for a few days if you can spare the machine. Run
> spinrite, etc., to look for drive problems. Open the box up and place
> a fan blowing extra air for additional cooling.
>
> good luck,
> Mark
>
>
>   

To add another test.  I had this issue once before and it was a faulty
driver for my hard drives.  I ran a command like this to test mine:

hdparm -Tt /dev/hda && hdparm -Tt /dev/hda && hdparm -Tt /dev/hda &&
hdparm -Tt /dev/hda && hdparm -Tt /dev/hda

If it can pass that then it should be all right and you can look
elsewhere.  Mine would only fail when the drives were very busy and that
test should do that pretty good.

Hope that helps.

Dale

:-)  :-) 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user]  spontaneous reboots.. what to look for
  2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam
  2009-02-15 23:56 ` Mark Knecht
  2009-02-16  0:16 ` Volker Armin Hemmann
@ 2009-02-16  0:57 ` Neil Bothwick
  2 siblings, 0 replies; 13+ messages in thread
From: Neil Bothwick @ 2009-02-16  0:57 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

On Sun, 15 Feb 2009 17:42:44 -0600, Harry Putnam wrote:

> 2) checking how hot the cpu is getting (Doesn't appear to be a
>    problem) But now running a cron job recording temperatures every 10
>    minutes. So that may turn up something.

You could also check disk temperatures with app-admin/hddtemp. I've had
random crashes due to an overheating drive before. I'd also run smartctl
(emerge smartmontools) over the drive, just to be sure.

memtest is a must, bad RAM can easily cause crashes, and take Volker's
advice on PSUs.


-- 
Neil Bothwick

What if there were no hypothetical situations?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user] Re: spontaneous reboots.. what to look for
  2009-02-16  0:26   ` [gentoo-user] " Harry Putnam
@ 2009-02-16  1:01     ` Mark Knecht
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Knecht @ 2009-02-16  1:01 UTC (permalink / raw
  To: gentoo-user

On Sun, Feb 15, 2009 at 4:26 PM, Harry Putnam <reader@newsguy.com> wrote:
> Mark Knecht <markknecht@gmail.com> writes:
>
>> Reseat memory and PCI cards, etc. Consider removing for a period of
>> time any hardware not absolutely necessary to debug the problem. (I.e.
>> - second video card, extra disk drives, extra network adapters, etc.)
>> Run memtest86 for a few days if you can spare the machine. Run
>> spinrite, etc., to look for drive problems. Open the box up and place
>> a fan blowing extra air for additional cooling.
>
> That all sound fairly drastic... wouldn't any or all of those problems
> leave some kind of track?  Something I can look for short of tearing
> up the whole machine?
>

If it's contact oxidation then no, you won't be able to see it
visually. If you're nervous about this then save it for later.

Do as Volker and Neil say. (PSU, smartmon) Run tests for a while.
memtest is a must.

Always pays to go slow but you might want to make sure you're backups
are VERY good right now.

Good luck,
Mark



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gentoo-user]  Re: spontaneous reboots.. what to look for
  2009-02-16  0:16 ` Volker Armin Hemmann
  2009-02-16  0:22   ` Saphirus Sage
@ 2009-02-16 17:30   ` Harry Putnam
  2009-02-16 17:55     ` Mark Knecht
                       ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: Harry Putnam @ 2009-02-16 17:30 UTC (permalink / raw
  To: gentoo-user

Volker Armin Hemmann <volkerarmin@googlemail.com> writes:

> Do the easy thing first. Clean your case, reseat all cards and
> memory modules and check all caps while doing so. Any of them
> deformed? The 'head' going up?  Strange stuff around its feet?
> Congratulation, you need new hardware.

Sorry to be a numb skull here but what do you mean by `caps'




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user] Re: spontaneous reboots.. what to look for
  2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
@ 2009-02-16 17:55     ` Mark Knecht
  2009-02-16 17:57     ` Volker Armin Hemmann
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Mark Knecht @ 2009-02-16 17:55 UTC (permalink / raw
  To: gentoo-user

On Mon, Feb 16, 2009 at 9:30 AM, Harry Putnam <reader@newsguy.com> wrote:
> Volker Armin Hemmann <volkerarmin@googlemail.com> writes:
>
>> Do the easy thing first. Clean your case, reseat all cards and
>> memory modules and check all caps while doing so. Any of them
>> deformed? The 'head' going up?  Strange stuff around its feet?
>> Congratulation, you need new hardware.
>
> Sorry to be a numb skull here but what do you mean by `caps'

Capacitors. They are small electronic compenents on your circuit board
that hold charge and tend to help smooth out noise on power circuits,
among other things. Sometimes they start to break down, overheat,
etc., and if they do then you might be able to spot this change
physically. In my experience the ones that go bad and that you have
some small chance of fixing are generally little cylinders sitting
upright so you see the circle on top.

If you're old like me you might need a magnifying glass to look
closely. They can be quite small and they are likely sitting all
around your processor, etc.

Good luck,
Mark



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user]  Re: spontaneous reboots.. what to look for
  2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
  2009-02-16 17:55     ` Mark Knecht
@ 2009-02-16 17:57     ` Volker Armin Hemmann
  2009-02-16 17:59     ` Stroller
  2009-02-16 17:59     ` Joseph Davis
  3 siblings, 0 replies; 13+ messages in thread
From: Volker Armin Hemmann @ 2009-02-16 17:57 UTC (permalink / raw
  To: gentoo-user

On Montag 16 Februar 2009, Harry Putnam wrote:
> Volker Armin Hemmann <volkerarmin@googlemail.com> writes:
> > Do the easy thing first. Clean your case, reseat all cards and
> > memory modules and check all caps while doing so. Any of them
> > deformed? The 'head' going up?  Strange stuff around its feet?
> > Congratulation, you need new hardware.
>
> Sorry to be a numb skull here but what do you mean by `caps'

capacitors.

http://en.wikipedia.org/wiki/Capacitor

that little black&white or green&white or black&silver or all silver 
cylindrical thingies that are all over your mainboard. Some of them are on 
your cards too. And this little guys aren't known for their robustness. In 
fact they don't like heat - dying very fast when things get hot. (excpet 
polymer/'solid' cap).



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user]  Re: spontaneous reboots.. what to look for
  2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
  2009-02-16 17:55     ` Mark Knecht
  2009-02-16 17:57     ` Volker Armin Hemmann
@ 2009-02-16 17:59     ` Stroller
  2009-02-16 17:59     ` Joseph Davis
  3 siblings, 0 replies; 13+ messages in thread
From: Stroller @ 2009-02-16 17:59 UTC (permalink / raw
  To: gentoo-user


On 16 Feb 2009, at 17:30, Harry Putnam wrote:

> Volker Armin Hemmann <volkerarmin@googlemail.com> writes:
>
>> Do the easy thing first. Clean your case, reseat all cards and
>> memory modules and check all caps while doing so. Any of them
>> deformed? The 'head' going up?  Strange stuff around its feet?
>> Congratulation, you need new hardware.
>
> Sorry to be a numb skull here but what do you mean by `caps'

Capacitors.
http://images.google.com/images?&q=bad%20capacitors

But don't rely on this - a component can fail fail without it being  
visible.

IME the most common cure for nonspecific hardware failures is  
replacing the PSU, but in your case I would also swap out the graphics  
card early.

Stroller.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-user]  Re: spontaneous reboots.. what to look for
  2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
                       ` (2 preceding siblings ...)
  2009-02-16 17:59     ` Stroller
@ 2009-02-16 17:59     ` Joseph Davis
  3 siblings, 0 replies; 13+ messages in thread
From: Joseph Davis @ 2009-02-16 17:59 UTC (permalink / raw
  To: gentoo-user

I believe he is referring to capacitors, you should be able to google 
for some pictures of common capacitors.

They look like little barrels, usually dark blue as I look at my 
motherboard... They have special electrical paste in them, if it leaks, 
they are dead.

Harry Putnam wrote:
> Volker Armin Hemmann <volkerarmin@googlemail.com> writes:
> 
>> Do the easy thing first. Clean your case, reseat all cards and
>> memory modules and check all caps while doing so. Any of them
>> deformed? The 'head' going up?  Strange stuff around its feet?
>> Congratulation, you need new hardware.
> 
> Sorry to be a numb skull here but what do you mean by `caps'
> 
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-02-16 17:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam
2009-02-15 23:56 ` Mark Knecht
2009-02-16  0:26   ` [gentoo-user] " Harry Putnam
2009-02-16  1:01     ` Mark Knecht
2009-02-16  0:26   ` [gentoo-user] " Dale
2009-02-16  0:16 ` Volker Armin Hemmann
2009-02-16  0:22   ` Saphirus Sage
2009-02-16 17:30   ` [gentoo-user] " Harry Putnam
2009-02-16 17:55     ` Mark Knecht
2009-02-16 17:57     ` Volker Armin Hemmann
2009-02-16 17:59     ` Stroller
2009-02-16 17:59     ` Joseph Davis
2009-02-16  0:57 ` [gentoo-user] " Neil Bothwick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox