[gentoo-user] System reboot

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] System reboot
@ 2018-12-16  3:54 Dale
  2018-12-16 12:18 ` Rich Freeman
  2018-12-16 15:45 ` [gentoo-user] " Nikos Chantziaras
  0 siblings, 2 replies; 6+ messages in thread
From: Dale @ 2018-12-16  3:54 UTC (permalink / raw
  To: gentoo-user

Howdy,

As some know, I've done some upgrades recently.  A month or so back, I
had reboots due to Dolphin using up all the system memory, since
upgraded from 16GBs to 32GBs.  I stopped using Dolphin for that reason. 
I'll test it someday to see if the bug is fixed.  Just a bit ago I had
my system power off.  On one hand, I think it is a hardware issue with
the computer.  On the other hand, I noticed something odd.  The display
was on on my UPS when my system went down.  It usually only comes on
when I push the button or the power fails.  Thing is, the power hadn't
failed.  Sort of makes me wonder.  I have a CyberPower CP 1350C that has
that digital display thing.  The display is blue.  Some may have seen
one before.  While I was shutdown, I checked the UPS to make sure it
switches as it should by unplugging it.  It did.  My speakers, modem,
router etc didn't lose power.  When I got booted up, I ran upsc and it
shows a complete charge on the battery.  Like this:  battery.charge:
100  The most load I've ever had on it was about 250 watts.  That's with
the new CPU and it compiling heavily.  It's not even close to being
overloaded.

I checked the messages log.  Before with the memory hogging Dolphin it
had logged the problem.  Today, it shows this:

Dec 15 20:40:01 fireball CROND[30668]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Dec 15 20:50:01 fireball CROND[1532]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Dec 15 21:00:01 fireball CROND[5513]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Dec 15 21:01:01 fireball CROND[5718]: (root) CMD (run-parts
/etc/cron.hourly)
Dec 15 21:08:34 fireball syslog-ng[4370]: syslog-ng starting up;
version='3.17.2'
Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: *** info
[daemon/startup.c(136)]:
Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: Started gpm successfully.
Entered daemon mode.

As you can see, it went from running a normal cron job to me booting
back up.  I don't see any error at all.  Not even one electron.  One
other odd thing.  When I first boot up, my ethernet port, net.eth1,
fails to come up.  I have to reboot and then it comes up fine.  It comes
up the second time even if I do a complete power off.  When I got logged
into KDE and started my webrowsers, they are asking about recovering the
last session.  Obviously they were not closed in a normal way. 

When I hit the power button to turn it back on the first time, I went
into the BIOS and checked the temps of the CPU etc.  They all looked
normal and all fans were working.  I thought maybe it got hot or
something but that doesn't seem to be the case either. 

What should I suspect?  Is the UPS display a sign the UPS itself did
something?  Could my power supply in the puter being going out? 
Something else I should check?  Something else I should check next time
before powering up?  Ideas?  Thoughts? 

Thanks

Dale

:-)  :-) 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] System reboot
  2018-12-16  3:54 [gentoo-user] System reboot Dale
@ 2018-12-16 12:18 ` Rich Freeman
  2018-12-16 15:17   ` Dale
  2018-12-16 15:45 ` [gentoo-user] " Nikos Chantziaras
  1 sibling, 1 reply; 6+ messages in thread
From: Rich Freeman @ 2018-12-16 12:18 UTC (permalink / raw
  To: gentoo-user

On Sat, Dec 15, 2018 at 10:54 PM Dale <rdalek1967@gmail.com> wrote:
>
> I checked the messages log.  Before with the memory hogging Dolphin it
> had logged the problem.  Today, it shows this:
>
>
> Dec 15 20:40:01 fireball CROND[30668]: (root) CMD (/usr/lib64/sa/sa1 1 1)
> Dec 15 20:50:01 fireball CROND[1532]: (root) CMD (/usr/lib64/sa/sa1 1 1)
> Dec 15 21:00:01 fireball CROND[5513]: (root) CMD (/usr/lib64/sa/sa1 1 1)
> Dec 15 21:01:01 fireball CROND[5718]: (root) CMD (run-parts
> /etc/cron.hourly)
> Dec 15 21:08:34 fireball syslog-ng[4370]: syslog-ng starting up;
> version='3.17.2'
> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: *** info
> [daemon/startup.c(136)]:
> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: Started gpm successfully.
> Entered daemon mode.
>
>
> As you can see, it went from running a normal cron job to me booting
> back up.  I don't see any error at all.  Not even one electron.

This is pretty typical if you aren't taking special steps to log this
sort of thing.  There are a couple of ways the kernel can crash:

1.  OOPS/BUG - these are semi-recoverable errors.  I believe they can
get logged unless they occur in a manner that disrupt your userspace
logger, vfs, filesystem, or disk.  If the error happens in one of
those subsystems then your filesystems will stop syncing and it won't
be logged normally.

2.  PANIC - these are unrecoverable and are NOT logged.  When the
kernel PANICs it halts all disk IO and just about everything else.
This is to prevent damage to anything already written on disk.  You
don't want a corrupt OS trying to write to your disk - that makes a
bad situation MUCH worse.  It would be like sending a drunk surgeon
into the operating room to fix up a trauma patient.

3.  Hardware reset.  This isn't a kernel issue but I'll toss it in.
If your CPU gets a reset signal it forgets it was ever running linux
and starts executing the firmware as if it had been freshly powered
on.  There is no opportunity to capture anything.  Only way to log
something like this is hardware-level monitoring.

Issues #1-2 CAN be logged, but not conventionally.  There are few
routes for this:

1.  Remote console logging.  Serial and network are the two main
options for this.  If you have a hardware serial port you can capture
its output and any kernel errors will be output to these (just the
text/backtrace/etc).  A network console is very easy to set up if you
have a remote host that can run netcat on the same LAN:
https://www.kernel.org/doc/Documentation/networking/netconsole.txt

2.  Recovery kernel.  Gentoo doesn't have tooling for this but you can
follow https://wiki.gentoo.org/wiki/Kernel_Crash_Dumps .  Disclaimer -
I haven't done this in ages so it could be dated in parts.  If the
kernel panics then it will run the recovery kernel, which boots in a
clean state and dumps the old kernel's RAM to disk for subsequent
analysis.

#1 gets the job done most of the time, but #2 is more thorough.  If
you have a host that is tending to reset you should consider network
logging as a starting point - it is easy to set up.

I'm not sure why your UPS display is coming on.  It might be some kind
of spurious data on the USB port if it is connected.  It might be a
result of something the PC is doing.  It is also possible it is due to
a brownout or other power issues going into your house, but if your
UPS is in good shape and not overloaded then it should be shielding
your PC from the effects of these.  A PC power supply issue sounds
plausible.  I've had my CP UPS flicker its display and a light might
flicker a bit at the same time, but the PC was unaffected.  I'll also
note that these kinds of transient issues are often mitigated by
having a good quality PC power supply that is not overloaded, and that
this probably also helps with any latency in the UPS switching in.  If
your PC power supply is strained to the point of breaking then any
transients in the input supply are going to get through to the output
rails.  This is one of those areas where spending an extra $30 on your
build can make a significant difference.

-- 
Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] System reboot
  2018-12-16 12:18 ` Rich Freeman
@ 2018-12-16 15:17   ` Dale
  0 siblings, 0 replies; 6+ messages in thread
From: Dale @ 2018-12-16 15:17 UTC (permalink / raw
  To: gentoo-user

Rich Freeman wrote:
> On Sat, Dec 15, 2018 at 10:54 PM Dale <rdalek1967@gmail.com> wrote:
>> I checked the messages log.  Before with the memory hogging Dolphin it
>> had logged the problem.  Today, it shows this:
>>
>>
>> Dec 15 20:40:01 fireball CROND[30668]: (root) CMD (/usr/lib64/sa/sa1 1 1)
>> Dec 15 20:50:01 fireball CROND[1532]: (root) CMD (/usr/lib64/sa/sa1 1 1)
>> Dec 15 21:00:01 fireball CROND[5513]: (root) CMD (/usr/lib64/sa/sa1 1 1)
>> Dec 15 21:01:01 fireball CROND[5718]: (root) CMD (run-parts
>> /etc/cron.hourly)
>> Dec 15 21:08:34 fireball syslog-ng[4370]: syslog-ng starting up;
>> version='3.17.2'
>> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: *** info
>> [daemon/startup.c(136)]:
>> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: Started gpm successfully.
>> Entered daemon mode.
>>
>>
>> As you can see, it went from running a normal cron job to me booting
>> back up.  I don't see any error at all.  Not even one electron.
> This is pretty typical if you aren't taking special steps to log this
> sort of thing.  There are a couple of ways the kernel can crash:
>
> 1.  OOPS/BUG - these are semi-recoverable errors.  I believe they can
> get logged unless they occur in a manner that disrupt your userspace
> logger, vfs, filesystem, or disk.  If the error happens in one of
> those subsystems then your filesystems will stop syncing and it won't
> be logged normally.
>
> 2.  PANIC - these are unrecoverable and are NOT logged.  When the
> kernel PANICs it halts all disk IO and just about everything else.
> This is to prevent damage to anything already written on disk.  You
> don't want a corrupt OS trying to write to your disk - that makes a
> bad situation MUCH worse.  It would be like sending a drunk surgeon
> into the operating room to fix up a trauma patient.
>
> 3.  Hardware reset.  This isn't a kernel issue but I'll toss it in.
> If your CPU gets a reset signal it forgets it was ever running linux
> and starts executing the firmware as if it had been freshly powered
> on.  There is no opportunity to capture anything.  Only way to log
> something like this is hardware-level monitoring.
>
> Issues #1-2 CAN be logged, but not conventionally.  There are few
> routes for this:
>
> 1.  Remote console logging.  Serial and network are the two main
> options for this.  If you have a hardware serial port you can capture
> its output and any kernel errors will be output to these (just the
> text/backtrace/etc).  A network console is very easy to set up if you
> have a remote host that can run netcat on the same LAN:
> https://www.kernel.org/doc/Documentation/networking/netconsole.txt
>
> 2.  Recovery kernel.  Gentoo doesn't have tooling for this but you can
> follow https://wiki.gentoo.org/wiki/Kernel_Crash_Dumps .  Disclaimer -
> I haven't done this in ages so it could be dated in parts.  If the
> kernel panics then it will run the recovery kernel, which boots in a
> clean state and dumps the old kernel's RAM to disk for subsequent
> analysis.
>
> #1 gets the job done most of the time, but #2 is more thorough.  If
> you have a host that is tending to reset you should consider network
> logging as a starting point - it is easy to set up.
>
> I'm not sure why your UPS display is coming on.  It might be some kind
> of spurious data on the USB port if it is connected.  It might be a
> result of something the PC is doing.  It is also possible it is due to
> a brownout or other power issues going into your house, but if your
> UPS is in good shape and not overloaded then it should be shielding
> your PC from the effects of these.  A PC power supply issue sounds
> plausible.  I've had my CP UPS flicker its display and a light might
> flicker a bit at the same time, but the PC was unaffected.  I'll also
> note that these kinds of transient issues are often mitigated by
> having a good quality PC power supply that is not overloaded, and that
> this probably also helps with any latency in the UPS switching in.  If
> your PC power supply is strained to the point of breaking then any
> transients in the input supply are going to get through to the output
> rails.  This is one of those areas where spending an extra $30 on your
> build can make a significant difference.
>


I've seen kernel panics in the past.  Keep in mind, different panics can
behave differently but in the past, I got a console type screen with
some weird error messages.  Those are what I usually see.  This tho, it
was as if the power off button was pushed and held down.  The system
didn't reboot, it powered off.  I was asleep but it did beep, which is
what woke me up.  Generally in the past when I've seen something like
this, it either goes to the console and sits there until I hit reset or
just reboots.  This is the first time I've seen my system poweroff like
this.  This is what has me curious. 

My BIOS is set to remain off in the event of a power failure, which
shouldn't reach it with a power glitch or even short term power outage
due to the UPS.  However, if power fails and it does a shut off, it is
set to remain off.  This is what makes me think power supply.  It's not
that old but that doesn't rule it out either.  I've read about bad out
of the box units before.  Thing is, that is what it sort of acts like. 

My power supply is a 650 watt unit.  It can power over twice what I
pull.  When I built this rig, it could power up three times what I
pull.  Keep in mind, I'm measuring not only the puter but also the
monitor, speakers, modem and router as well.  That wattage is from the
UPS itself.  I try to allow for a lot of head room power wise to
compensate for that turn on surge when several drives and fans are
spinning up.  I've got five hard drives, three 230MM fans and a 140MM
fan just for the case.  Then comes the CPU etc.  I haven't calculated
the surge or anything but I figure it is a good bit more than what it
pulls when already running. 

It may be that this has to happen a few times to see if anything can be
narrowed down.  Maybe it will do it while I'm sitting at it next time
and I can see from start to finish what it is doing.  May help, may
not.  One reason for the thread, tips on what to look for.  A good tip
could come in handy.  ;-)  Plus, I thought there may be another log I
wasn't aware of to look at. 

Thanks.  Gives me things to think on. 

Dale

:-)  :-) 




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gentoo-user] Re: System reboot
  2018-12-16  3:54 [gentoo-user] System reboot Dale
  2018-12-16 12:18 ` Rich Freeman
@ 2018-12-16 15:45 ` Nikos Chantziaras
  2018-12-16 17:54   ` Jack
  1 sibling, 1 reply; 6+ messages in thread
From: Nikos Chantziaras @ 2018-12-16 15:45 UTC (permalink / raw
  To: gentoo-user

On 16/12/2018 05:54, Dale wrote:
> Just a bit ago I had my system power off.  On one hand, I think it is
> a hardware issue with the computer.  On the other hand, I noticed
> something odd.  The display was on on my UPS when my system went
> down.  It usually only comes on when I push the button or the power
> fails.
To me it sounds you got a power drop, but not a complete power loss, 
which triggered the UPS and told your PC to shut down.

I could be wrong.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] Re: System reboot
  2018-12-16 15:45 ` [gentoo-user] " Nikos Chantziaras
@ 2018-12-16 17:54   ` Jack
  2018-12-16 18:28     ` Dale
  0 siblings, 1 reply; 6+ messages in thread
From: Jack @ 2018-12-16 17:54 UTC (permalink / raw
  To: gentoo-user

On 2018.12.16 10:45, Nikos Chantziaras wrote:
> On 16/12/2018 05:54, Dale wrote:
>> Just a bit ago I had my system power off.  On one hand, I think it  
>> is a hardware issue with the computer.  On the other hand, I noticed  
>> something odd.  The display was on on my UPS when my system went  
>> down.  It usually only comes on when I push the button or the power  
>> fails.
> To me it sounds you got a power drop, but not a complete power loss,  
> which triggered the UPS and told your PC to shut down.
> 
> I could be wrong.
If the PC did a controlled shutdown based on signal from the UPS, I  
think it would be properly logged, not an immediate off with no trace.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-user] Re: System reboot
  2018-12-16 17:54   ` Jack
@ 2018-12-16 18:28     ` Dale
  0 siblings, 0 replies; 6+ messages in thread
From: Dale @ 2018-12-16 18:28 UTC (permalink / raw
  To: gentoo-user

Jack wrote:
> On 2018.12.16 10:45, Nikos Chantziaras wrote:
>> On 16/12/2018 05:54, Dale wrote:
>>> Just a bit ago I had my system power off.  On one hand, I think it
>>> is a hardware issue with the computer.  On the other hand, I noticed
>>> something odd.  The display was on on my UPS when my system went
>>> down.  It usually only comes on when I push the button or the power
>>> fails.
>> To me it sounds you got a power drop, but not a complete power loss,
>> which triggered the UPS and told your PC to shut down.
>>
>> I could be wrong.
> If the PC did a controlled shutdown based on signal from the UPS, I
> think it would be properly logged, not an immediate off with no trace.
>

That would be my thinking.  Still, one never knows.  I just find it odd
that my puter powered off and stayed off and the display was lit up on
the UPS.  Maybe one has nothing to do with the other or maybe they do. 
It's weird tho.  It makes me go hmmmmm. 

I do know that when the power blinks and the UPS has to kick in, it does
log it to the messages log file.  It also logs when power returns even
if it is only a second or so apart.  It also logs brown outs or over
voltage conditions as well.  Basically, if the UPS has to kick in for
some reason, it logs it. 

I hope it never does this again because I don't want to risk losing any
data, even tho I have a good backup.  That said, I do wish I could
figure out what happened if it does it again.  Could the new CPU I just
put in have a issue?  Hard drive be doing something weird with the
power, short out for a second or so?  Mobo be going out?  Power supply
going out?  Lots of options here.  When I do my next upgrade, I'm going
to reset and blow out all the power connectors. 

Dale

:-)  :-) 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-16 18:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-16  3:54 [gentoo-user] System reboot Dale
2018-12-16 12:18 ` Rich Freeman
2018-12-16 15:17   ` Dale
2018-12-16 15:45 ` [gentoo-user] " Nikos Chantziaras
2018-12-16 17:54   ` Jack
2018-12-16 18:28     ` Dale

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox