* [gentoo-user] spontaneous reboots.. what to look for @ 2009-02-15 23:42 Harry Putnam 2009-02-15 23:56 ` Mark Knecht ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Harry Putnam @ 2009-02-15 23:42 UTC (permalink / raw To: gentoo-user I've been experiencing spontaneous reboots on one gentoo machine lately. Looking thru /var/log/messages... I see the restarts but looking above that... I'm not seeing anything I recognize as being a culprit. Its been happening for a few weeks... but I've been busy and only now digging into it ( The machine is no kind of server ). It appears to only happen in X (I'm using xfce4) and I've only noticed it since I started running 2.6.28 kernels. Although I couldn't say that it seemed to be directly related. I mean I didn't boot into 2.6.28 and suddenly notice spontaneous rebooting. It does not appear to be heat realated... but I am only now using lm_sensors to keep an accurate record and see if there appears to be a relationship. I've had two today so either its happening more often or I'm just spending more time on that machine. It may also be on the first or second time its happened while I as actually right at the keyboard. I'm sorry to be so vague about it, but in truth, I've been pretty lazy about it... since no real harm comes of an unexpected reboot on that machine (so far anyway). But clearly something that has to be figured out. The only things I've checked so far... 1) browsing thru /var/log/messages (Having trouble recognizing any thing that looks suspicious. I have noticed what appears to be a time/date anomaly where the progression of time is suddenly irregular. That is, an earlier time shows up amongst some later times. It appears to have been me sudoing to visudo. And apparently having /etc/sudoers open long enough for the closing of it to be earlier than other events taking place. Again ... I'm not real sure exactly what happened there but it does not appear to coincide with a reboot anyway. 2) checking how hot the cpu is getting (Doesn't appear to be a problem) But now running a cron job recording temperatures every 10 minutes. So that may turn up something. 3) checking for overfilled disks. (none show in df -h) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] spontaneous reboots.. what to look for 2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam @ 2009-02-15 23:56 ` Mark Knecht 2009-02-16 0:26 ` [gentoo-user] " Harry Putnam 2009-02-16 0:26 ` [gentoo-user] " Dale 2009-02-16 0:16 ` Volker Armin Hemmann 2009-02-16 0:57 ` [gentoo-user] " Neil Bothwick 2 siblings, 2 replies; 13+ messages in thread From: Mark Knecht @ 2009-02-15 23:56 UTC (permalink / raw To: gentoo-user On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam <reader@newsguy.com> wrote: > I've been experiencing spontaneous reboots on one gentoo machine > lately. Looking thru /var/log/messages... I see the restarts but > looking above that... I'm not seeing anything I recognize as being a > culprit. > > Its been happening for a few weeks... but I've been busy and only now > digging into it ( The machine is no kind of server ). > > It appears to only happen in X (I'm using xfce4) and I've only noticed > it since I started running 2.6.28 kernels. Although I couldn't say > that it seemed to be directly related. > > I mean I didn't boot into 2.6.28 and suddenly notice spontaneous > rebooting. > > It does not appear to be heat realated... but I am only now using > lm_sensors to keep an accurate record and see if there appears to be a > relationship. > > I've had two today so either its happening more often or I'm just > spending more time on that machine. > > It may also be on the first or second time its happened while I as > actually right at the keyboard. > > I'm sorry to be so vague about it, but in truth, I've been pretty lazy > about it... since no real harm comes of an unexpected reboot on that > machine (so far anyway). But clearly something that has to be figured > out. > > The only things I've checked so far... > 1) browsing thru /var/log/messages (Having trouble recognizing any > thing that looks suspicious. > > I have noticed what appears to be a time/date anomaly where the > progression of time is suddenly irregular. That is, an earlier > time shows up amongst some later times. > > It appears to have been me sudoing to visudo. And apparently > having /etc/sudoers open long enough for the closing of it to be > earlier than other events taking place. > > Again ... I'm not real sure exactly what happened there but it > does not appear to coincide with a reboot anyway. > > 2) checking how hot the cpu is getting (Doesn't appear to be a > problem) But now running a cron job recording temperatures every 10 > minutes. So that may turn up something. > > 3) checking for overfilled disks. (none show in df -h) > Reseat memory and PCI cards, etc. Consider removing for a period of time any hardware not absolutely necessary to debug the problem. (I.e. - second video card, extra disk drives, extra network adapters, etc.) Run memtest86 for a few days if you can spare the machine. Run spinrite, etc., to look for drive problems. Open the box up and place a fan blowing extra air for additional cooling. good luck, Mark ^ permalink raw reply [flat|nested] 13+ messages in thread
* [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-15 23:56 ` Mark Knecht @ 2009-02-16 0:26 ` Harry Putnam 2009-02-16 1:01 ` Mark Knecht 2009-02-16 0:26 ` [gentoo-user] " Dale 1 sibling, 1 reply; 13+ messages in thread From: Harry Putnam @ 2009-02-16 0:26 UTC (permalink / raw To: gentoo-user Mark Knecht <markknecht@gmail.com> writes: > Reseat memory and PCI cards, etc. Consider removing for a period of > time any hardware not absolutely necessary to debug the problem. (I.e. > - second video card, extra disk drives, extra network adapters, etc.) > Run memtest86 for a few days if you can spare the machine. Run > spinrite, etc., to look for drive problems. Open the box up and place > a fan blowing extra air for additional cooling. That all sound fairly drastic... wouldn't any or all of those problems leave some kind of track? Something I can look for short of tearing up the whole machine? I have had the experience of breaking something in the hardware by handling it when I really didn't need to. An expensive video card I had ( a few yrs ago) comes to mind The fit was so close that dicking around with it I broke off a small piece with some bit of circuitry in it. Of course I had problems with getting a viewable screen so ended up soldering it back in... (the piece, not card to pci slot.. hehe) That fell apart again in the same place later on and I ended up using a piece of bailing wire to wire it in place. Surprisingly it worked for a long time that way. Another time... I took my wifes' computer apart (bad idea), ostensibly adding memory and somehow broke one of the clamps holding the heatsink and fan onto the cpu. It could flop around quite a bit... but it actually worked like that. Eventually I wired it down too... Lasted a year or so. But in both cases it was quite a bit of grief. Before I retired.. I was a field construction boilermaker (weldor and rigger). For most of my time in that trade, anything less than 1/2" steel plate was viewed as sheet metal.. Most of the work was 1" and up. I didn't develop a nice light touch .. needless to say. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-16 0:26 ` [gentoo-user] " Harry Putnam @ 2009-02-16 1:01 ` Mark Knecht 0 siblings, 0 replies; 13+ messages in thread From: Mark Knecht @ 2009-02-16 1:01 UTC (permalink / raw To: gentoo-user On Sun, Feb 15, 2009 at 4:26 PM, Harry Putnam <reader@newsguy.com> wrote: > Mark Knecht <markknecht@gmail.com> writes: > >> Reseat memory and PCI cards, etc. Consider removing for a period of >> time any hardware not absolutely necessary to debug the problem. (I.e. >> - second video card, extra disk drives, extra network adapters, etc.) >> Run memtest86 for a few days if you can spare the machine. Run >> spinrite, etc., to look for drive problems. Open the box up and place >> a fan blowing extra air for additional cooling. > > That all sound fairly drastic... wouldn't any or all of those problems > leave some kind of track? Something I can look for short of tearing > up the whole machine? > If it's contact oxidation then no, you won't be able to see it visually. If you're nervous about this then save it for later. Do as Volker and Neil say. (PSU, smartmon) Run tests for a while. memtest is a must. Always pays to go slow but you might want to make sure you're backups are VERY good right now. Good luck, Mark ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] spontaneous reboots.. what to look for 2009-02-15 23:56 ` Mark Knecht 2009-02-16 0:26 ` [gentoo-user] " Harry Putnam @ 2009-02-16 0:26 ` Dale 1 sibling, 0 replies; 13+ messages in thread From: Dale @ 2009-02-16 0:26 UTC (permalink / raw To: gentoo-user Mark Knecht wrote: > On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam <reader@newsguy.com> wrote: > >> I've been experiencing spontaneous reboots on one gentoo machine >> lately. Looking thru /var/log/messages... I see the restarts but >> looking above that... I'm not seeing anything I recognize as being a >> culprit. >> >> Its been happening for a few weeks... but I've been busy and only now >> digging into it ( The machine is no kind of server ). >> >> It appears to only happen in X (I'm using xfce4) and I've only noticed >> it since I started running 2.6.28 kernels. Although I couldn't say >> that it seemed to be directly related. >> >> I mean I didn't boot into 2.6.28 and suddenly notice spontaneous >> rebooting. >> >> It does not appear to be heat realated... but I am only now using >> lm_sensors to keep an accurate record and see if there appears to be a >> relationship. >> >> I've had two today so either its happening more often or I'm just >> spending more time on that machine. >> >> It may also be on the first or second time its happened while I as >> actually right at the keyboard. >> >> I'm sorry to be so vague about it, but in truth, I've been pretty lazy >> about it... since no real harm comes of an unexpected reboot on that >> machine (so far anyway). But clearly something that has to be figured >> out. >> >> The only things I've checked so far... >> 1) browsing thru /var/log/messages (Having trouble recognizing any >> thing that looks suspicious. >> >> I have noticed what appears to be a time/date anomaly where the >> progression of time is suddenly irregular. That is, an earlier >> time shows up amongst some later times. >> >> It appears to have been me sudoing to visudo. And apparently >> having /etc/sudoers open long enough for the closing of it to be >> earlier than other events taking place. >> >> Again ... I'm not real sure exactly what happened there but it >> does not appear to coincide with a reboot anyway. >> >> 2) checking how hot the cpu is getting (Doesn't appear to be a >> problem) But now running a cron job recording temperatures every 10 >> minutes. So that may turn up something. >> >> 3) checking for overfilled disks. (none show in df -h) >> >> > > Reseat memory and PCI cards, etc. Consider removing for a period of > time any hardware not absolutely necessary to debug the problem. (I.e. > - second video card, extra disk drives, extra network adapters, etc.) > Run memtest86 for a few days if you can spare the machine. Run > spinrite, etc., to look for drive problems. Open the box up and place > a fan blowing extra air for additional cooling. > > good luck, > Mark > > > To add another test. I had this issue once before and it was a faulty driver for my hard drives. I ran a command like this to test mine: hdparm -Tt /dev/hda && hdparm -Tt /dev/hda && hdparm -Tt /dev/hda && hdparm -Tt /dev/hda && hdparm -Tt /dev/hda If it can pass that then it should be all right and you can look elsewhere. Mine would only fail when the drives were very busy and that test should do that pretty good. Hope that helps. Dale :-) :-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] spontaneous reboots.. what to look for 2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam 2009-02-15 23:56 ` Mark Knecht @ 2009-02-16 0:16 ` Volker Armin Hemmann 2009-02-16 0:22 ` Saphirus Sage 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam 2009-02-16 0:57 ` [gentoo-user] " Neil Bothwick 2 siblings, 2 replies; 13+ messages in thread From: Volker Armin Hemmann @ 2009-02-16 0:16 UTC (permalink / raw To: gentoo-user So the problem started recently. That means it is either: a cap going bad. oxidized contacts. dust clogging the fans. PSU is going bad. something obscure. Do the easy thing first. Clean your case, reseat all cards and memory modules and check all caps while doing so. Any of them deformed? The 'head' going up? Strange stuff around its feet? Congratulation, you need new hardware. If you don't find a bad cap and the problem persists, get a new PSU. A good one. Not big - most PSUs are oversized, but good quality. Anandtech has something about psu's, so does tomshardware (most of their tests are rubbish, but their psu tests are ok). If the problem goes away, congratulation! If not, well, then report back ;) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] spontaneous reboots.. what to look for 2009-02-16 0:16 ` Volker Armin Hemmann @ 2009-02-16 0:22 ` Saphirus Sage 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam 1 sibling, 0 replies; 13+ messages in thread From: Saphirus Sage @ 2009-02-16 0:22 UTC (permalink / raw To: gentoo-user@lists.gentoo.org On Feb 15, 2009, at 7:16 PM, Volker Armin Hemmann <volkerarmin@googlemail.com > wrote: > So the problem started recently. > > That means it is either: > a cap going bad. > oxidized contacts. > dust clogging the fans. > PSU is going bad. > something obscure. > > Do the easy thing first. Clean your case, reseat all cards and > memory modules > and check all caps while doing so. Any of them deformed? The 'head' > going up? > Strange stuff around its feet? Congratulation, you need new hardware. > > If you don't find a bad cap and the problem persists, get a new PSU. > A good > one. Not big - most PSUs are oversized, but good quality. Anandtech > has > something about psu's, so does tomshardware (most of their tests are > rubbish, > but their psu tests are ok). If the problem goes away, congratulation! > If not, well, then report back ;) > I had a similar issue even when not running X. To be honest, I can't say I have a concrete idea of exactly what caused it. I simply became security-nuts and began wondering if it wasn't someone just toying with me; hardened my sshd config and installed denyhosts to monitor failed loggins. This was a month ago and my uptime has been perfect, with no restarts. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-16 0:16 ` Volker Armin Hemmann 2009-02-16 0:22 ` Saphirus Sage @ 2009-02-16 17:30 ` Harry Putnam 2009-02-16 17:55 ` Mark Knecht ` (3 more replies) 1 sibling, 4 replies; 13+ messages in thread From: Harry Putnam @ 2009-02-16 17:30 UTC (permalink / raw To: gentoo-user Volker Armin Hemmann <volkerarmin@googlemail.com> writes: > Do the easy thing first. Clean your case, reseat all cards and > memory modules and check all caps while doing so. Any of them > deformed? The 'head' going up? Strange stuff around its feet? > Congratulation, you need new hardware. Sorry to be a numb skull here but what do you mean by `caps' ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam @ 2009-02-16 17:55 ` Mark Knecht 2009-02-16 17:57 ` Volker Armin Hemmann ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Mark Knecht @ 2009-02-16 17:55 UTC (permalink / raw To: gentoo-user On Mon, Feb 16, 2009 at 9:30 AM, Harry Putnam <reader@newsguy.com> wrote: > Volker Armin Hemmann <volkerarmin@googlemail.com> writes: > >> Do the easy thing first. Clean your case, reseat all cards and >> memory modules and check all caps while doing so. Any of them >> deformed? The 'head' going up? Strange stuff around its feet? >> Congratulation, you need new hardware. > > Sorry to be a numb skull here but what do you mean by `caps' Capacitors. They are small electronic compenents on your circuit board that hold charge and tend to help smooth out noise on power circuits, among other things. Sometimes they start to break down, overheat, etc., and if they do then you might be able to spot this change physically. In my experience the ones that go bad and that you have some small chance of fixing are generally little cylinders sitting upright so you see the circle on top. If you're old like me you might need a magnifying glass to look closely. They can be quite small and they are likely sitting all around your processor, etc. Good luck, Mark ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam 2009-02-16 17:55 ` Mark Knecht @ 2009-02-16 17:57 ` Volker Armin Hemmann 2009-02-16 17:59 ` Stroller 2009-02-16 17:59 ` Joseph Davis 3 siblings, 0 replies; 13+ messages in thread From: Volker Armin Hemmann @ 2009-02-16 17:57 UTC (permalink / raw To: gentoo-user On Montag 16 Februar 2009, Harry Putnam wrote: > Volker Armin Hemmann <volkerarmin@googlemail.com> writes: > > Do the easy thing first. Clean your case, reseat all cards and > > memory modules and check all caps while doing so. Any of them > > deformed? The 'head' going up? Strange stuff around its feet? > > Congratulation, you need new hardware. > > Sorry to be a numb skull here but what do you mean by `caps' capacitors. http://en.wikipedia.org/wiki/Capacitor that little black&white or green&white or black&silver or all silver cylindrical thingies that are all over your mainboard. Some of them are on your cards too. And this little guys aren't known for their robustness. In fact they don't like heat - dying very fast when things get hot. (excpet polymer/'solid' cap). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam 2009-02-16 17:55 ` Mark Knecht 2009-02-16 17:57 ` Volker Armin Hemmann @ 2009-02-16 17:59 ` Stroller 2009-02-16 17:59 ` Joseph Davis 3 siblings, 0 replies; 13+ messages in thread From: Stroller @ 2009-02-16 17:59 UTC (permalink / raw To: gentoo-user On 16 Feb 2009, at 17:30, Harry Putnam wrote: > Volker Armin Hemmann <volkerarmin@googlemail.com> writes: > >> Do the easy thing first. Clean your case, reseat all cards and >> memory modules and check all caps while doing so. Any of them >> deformed? The 'head' going up? Strange stuff around its feet? >> Congratulation, you need new hardware. > > Sorry to be a numb skull here but what do you mean by `caps' Capacitors. http://images.google.com/images?&q=bad%20capacitors But don't rely on this - a component can fail fail without it being visible. IME the most common cure for nonspecific hardware failures is replacing the PSU, but in your case I would also swap out the graphics card early. Stroller. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: spontaneous reboots.. what to look for 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam ` (2 preceding siblings ...) 2009-02-16 17:59 ` Stroller @ 2009-02-16 17:59 ` Joseph Davis 3 siblings, 0 replies; 13+ messages in thread From: Joseph Davis @ 2009-02-16 17:59 UTC (permalink / raw To: gentoo-user I believe he is referring to capacitors, you should be able to google for some pictures of common capacitors. They look like little barrels, usually dark blue as I look at my motherboard... They have special electrical paste in them, if it leaks, they are dead. Harry Putnam wrote: > Volker Armin Hemmann <volkerarmin@googlemail.com> writes: > >> Do the easy thing first. Clean your case, reseat all cards and >> memory modules and check all caps while doing so. Any of them >> deformed? The 'head' going up? Strange stuff around its feet? >> Congratulation, you need new hardware. > > Sorry to be a numb skull here but what do you mean by `caps' > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] spontaneous reboots.. what to look for 2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam 2009-02-15 23:56 ` Mark Knecht 2009-02-16 0:16 ` Volker Armin Hemmann @ 2009-02-16 0:57 ` Neil Bothwick 2 siblings, 0 replies; 13+ messages in thread From: Neil Bothwick @ 2009-02-16 0:57 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 616 bytes --] On Sun, 15 Feb 2009 17:42:44 -0600, Harry Putnam wrote: > 2) checking how hot the cpu is getting (Doesn't appear to be a > problem) But now running a cron job recording temperatures every 10 > minutes. So that may turn up something. You could also check disk temperatures with app-admin/hddtemp. I've had random crashes due to an overheating drive before. I'd also run smartctl (emerge smartmontools) over the drive, just to be sure. memtest is a must, bad RAM can easily cause crashes, and take Volker's advice on PSUs. -- Neil Bothwick What if there were no hypothetical situations? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-02-16 17:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-15 23:42 [gentoo-user] spontaneous reboots.. what to look for Harry Putnam 2009-02-15 23:56 ` Mark Knecht 2009-02-16 0:26 ` [gentoo-user] " Harry Putnam 2009-02-16 1:01 ` Mark Knecht 2009-02-16 0:26 ` [gentoo-user] " Dale 2009-02-16 0:16 ` Volker Armin Hemmann 2009-02-16 0:22 ` Saphirus Sage 2009-02-16 17:30 ` [gentoo-user] " Harry Putnam 2009-02-16 17:55 ` Mark Knecht 2009-02-16 17:57 ` Volker Armin Hemmann 2009-02-16 17:59 ` Stroller 2009-02-16 17:59 ` Joseph Davis 2009-02-16 0:57 ` [gentoo-user] " Neil Bothwick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox