public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Kernel crash - howto find out what happened?
@ 2008-10-12  9:08 Alexander Puchmayr
  2008-10-12  9:16 ` Erik Hahn
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Alexander Puchmayr @ 2008-10-12  9:08 UTC (permalink / raw
  To: gentoo-user

Hi there!

MY gentoo system (an amd64@4400+, 2GB ram, nforce4-chipset) worked fine for 
nearly two years, but now it frequently freezes, sometimes (not always) 
scrollock and capslock LED blinking).

Since I'm using the box as desktop, I have only a frozen X-server and no 
possibility to switch to console (maybe there's some hint whats happened?).

How do I find out what happened, why it crashed? Modern systems have 
MCE-logs, but how do I read it in this case? After reboot, all information 
seems to be gone since mcelog is always empty.

I assume there's some problem with some hardware, I already tested RAM with 
memtest86, but no errors. 

Thanks for suggestions
    Alex



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12  9:08 [gentoo-user] Kernel crash - howto find out what happened? Alexander Puchmayr
@ 2008-10-12  9:16 ` Erik Hahn
  2008-10-12 11:12   ` Alexander Puchmayr
  2008-10-13 15:09 ` Duane Griffin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Erik Hahn @ 2008-10-12  9:16 UTC (permalink / raw
  To: gentoo-user

On Sun, Oct 12, 2008 at 11:08:57AM +0200, Alexander Puchmayr wrote:
> Since I'm using the box as desktop, I have only a frozen X-server and no 
> possibility to switch to console (maybe there's some hint whats happened?).
> 
> How do I find out what happened, why it crashed? Modern systems have 
> MCE-logs, but how do I read it in this case? After reboot, all information 
> seems to be gone since mcelog is always empty.
>     Alex

If it's a kernel panic you actually get debugging information on the
console. It's just hidden "behind" the X server. Maybe you can reproduce
the problem working without X (If you can do your work purely from the
VTs)

Do you use any proprietary drivers?

	-Erik

-- 
hackerkey://v4sw5hw2ln3pr5ck0ma2u7LwXm4l7Gi2e2t4b7Ken4/7a16s0r1p-5.62/-6.56g5OR



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12  9:16 ` Erik Hahn
@ 2008-10-12 11:12   ` Alexander Puchmayr
  2008-10-12 11:35     ` Alan McKinnon
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Puchmayr @ 2008-10-12 11:12 UTC (permalink / raw
  To: gentoo-user

Am Sonntag, 12. Oktober 2008 schrieb Erik Hahn:
> On Sun, Oct 12, 2008 at 11:08:57AM +0200, Alexander Puchmayr wrote:
> > Since I'm using the box as desktop, I have only a frozen X-server and
> > no possibility to switch to console (maybe there's some hint whats
> > happened?).
> >
> > How do I find out what happened, why it crashed? Modern systems have
> > MCE-logs, but how do I read it in this case? After reboot, all
> > information seems to be gone since mcelog is always empty.
> >     Alex
>
> If it's a kernel panic you actually get debugging information on the
> console. It's just hidden "behind" the X server. Maybe you can reproduce
> the problem working without X (If you can do your work purely from the
> VTs)

I've tried, but unfortunately, the X-Driver on my laptop (i965) does also 
seem to have stability problems, after ca an hour it froze using 100% 
cpu-time, unable to kill (nither kill or kill -9 did work). I guess it 
didn't wakeup from DPMS :-(

>
> Do you use any proprietary drivers?
>
On the desktop I have nvidia-card with prop. driver. 

	Alex





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12 11:12   ` Alexander Puchmayr
@ 2008-10-12 11:35     ` Alan McKinnon
  2008-10-12 13:00       ` Alexander Puchmayr
  0 siblings, 1 reply; 10+ messages in thread
From: Alan McKinnon @ 2008-10-12 11:35 UTC (permalink / raw
  To: gentoo-user

On Sunday 12 October 2008 13:12:20 Alexander Puchmayr wrote:
> > If it's a kernel panic you actually get debugging information on the
> > console. It's just hidden "behind" the X server. Maybe you can reproduce
> > the problem working without X (If you can do your work purely from the
> > VTs)
>
> I've tried, but unfortunately, the X-Driver on my laptop (i965) does also
> seem to have stability problems, after ca an hour it froze using 100%
> cpu-time, unable to kill (nither kill or kill -9 did work). I guess it
> didn't wakeup from DPMS :-(

Here's a thought: if you have a spare machine, you could ssh in to your 
desktop and continue to work normally. The ssh session would be tailing an 
appropriate log, so even if the desktop goes south there's a good chance the  
error log is visible

For something more persistent, you could try temporarily sending all logs to a 
remote log server. Remote logging is quite efficient, I usually find the only 
thing that gets in it's way is a complete instant kernel halt that brings the 
whole machine down without warning - this is extremely rare on production 
kernels

-- 
alan dot mckinnon at gmail dot com



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12 11:35     ` Alan McKinnon
@ 2008-10-12 13:00       ` Alexander Puchmayr
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Puchmayr @ 2008-10-12 13:00 UTC (permalink / raw
  To: gentoo-user

Am Sonntag, 12. Oktober 2008 schrieb Alan McKinnon:
> On Sunday 12 October 2008 13:12:20 Alexander Puchmayr wrote:
> > > If it's a kernel panic you actually get debugging information on the
> > > console. It's just hidden "behind" the X server. Maybe you can
> > > reproduce the problem working without X (If you can do your work
> > > purely from the VTs)
> >
> > I've tried, but unfortunately, the X-Driver on my laptop (i965) does
> > also seem to have stability problems, after ca an hour it froze using
> > 100% cpu-time, unable to kill (nither kill or kill -9 did work). I
> > guess it didn't wakeup from DPMS :-(
>
> Here's a thought: if you have a spare machine, you could ssh in to your
> desktop and continue to work normally. The ssh session would be tailing
> an appropriate log, so even if the desktop goes south there's a good
> chance the error log is visible
>
> For something more persistent, you could try temporarily sending all logs
> to a remote log server. Remote logging is quite efficient, I usually find
> the only thing that gets in it's way is a complete instant kernel halt
> that brings the whole machine down without warning - this is extremely
> rare on production kernels

I really doubt that this works, the logger does not have the change to write 
anything as soon the kernel crashed, neither on a local disk or remote. It 
seems to be something you called the "instant kernel halt", and I have the 
luck to mess around with one of these rare cases :-(

But to give it a chance, I'm running a "cat /proc/kmsg" on the desktop, 
started via ssh as you suggested.

	Alex





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12  9:08 [gentoo-user] Kernel crash - howto find out what happened? Alexander Puchmayr
  2008-10-12  9:16 ` Erik Hahn
@ 2008-10-13 15:09 ` Duane Griffin
  2008-10-13 23:30 ` Daniel da Veiga
  2008-10-19  9:58 ` Alexander Puchmayr
  3 siblings, 0 replies; 10+ messages in thread
From: Duane Griffin @ 2008-10-13 15:09 UTC (permalink / raw
  To: gentoo-user

2008/10/12 Alexander Puchmayr <alexander.puchmayr@linznet.at>:
> MY gentoo system (an amd64@4400+, 2GB ram, nforce4-chipset)
> worked fine for nearly two years, but now it frequently freezes, sometimes
> (not always) scrollock and capslock LED blinking).

If you have another machine lying around, try setting up netconsole
and/or serial console logging. They should catch any dying messages
from your kernel. Blinking LEDs indicates a panic, which means you
should get a message in those cases, at least.

Using serial console is the easiest and most reliable way, but
requires a serial cable. Netconsole just uses ethernet but isn't as
reliable. Take a look at Documentation/serial-console.txt and
Documentation/networking/netconsole.txt under your kernel source
directory for more info.

Cheers,
Duane.

-- 
"I never could learn to drink that blood and call it wine" - Bob Dylan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12  9:08 [gentoo-user] Kernel crash - howto find out what happened? Alexander Puchmayr
  2008-10-12  9:16 ` Erik Hahn
  2008-10-13 15:09 ` Duane Griffin
@ 2008-10-13 23:30 ` Daniel da Veiga
  2008-10-14 16:31   ` Alexander Puchmayr
  2008-10-19  9:58 ` Alexander Puchmayr
  3 siblings, 1 reply; 10+ messages in thread
From: Daniel da Veiga @ 2008-10-13 23:30 UTC (permalink / raw
  To: gentoo-user

On Sun, Oct 12, 2008 at 07:08, Alexander Puchmayr
<alexander.puchmayr@linznet.at> wrote:
> Hi there!
>
> MY gentoo system (an amd64@4400+, 2GB ram, nforce4-chipset) worked fine for
> nearly two years, but now it frequently freezes, sometimes (not always)
> scrollock and capslock LED blinking).
>
> Since I'm using the box as desktop, I have only a frozen X-server and no
> possibility to switch to console (maybe there's some hint whats happened?).
>
> How do I find out what happened, why it crashed? Modern systems have
> MCE-logs, but how do I read it in this case? After reboot, all information
> seems to be gone since mcelog is always empty.
>
> I assume there's some problem with some hardware, I already tested RAM with
> memtest86, but no errors.
>

I had one of this freezes today.
Simply killed X using CTRL+SYSREQ+K and got back a console with error messages.

Have you tried the SYSREQ keys?

-- 
Daniel da Veiga



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-13 23:30 ` Daniel da Veiga
@ 2008-10-14 16:31   ` Alexander Puchmayr
  2008-10-14 17:09     ` Alex Schuster
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Puchmayr @ 2008-10-14 16:31 UTC (permalink / raw
  To: gentoo-user

On Dienstag, 14. Oktober 2008, Daniel da Veiga wrote:
> On Sun, Oct 12, 2008 at 07:08, Alexander Puchmayr
>
> <alexander.puchmayr@linznet.at> wrote:
> > Hi there!
> >
> > MY gentoo system (an amd64@4400+, 2GB ram, nforce4-chipset) worked fine
> > for nearly two years, but now it frequently freezes, sometimes (not
> > always) scrollock and capslock LED blinking).
> >
> > Since I'm using the box as desktop, I have only a frozen X-server and
> > no possibility to switch to console (maybe there's some hint whats
> > happened?).
> >
> > How do I find out what happened, why it crashed? Modern systems have
> > MCE-logs, but how do I read it in this case? After reboot, all
> > information seems to be gone since mcelog is always empty.
> >
> > I assume there's some problem with some hardware, I already tested RAM
> > with memtest86, but no errors.
>
> I had one of this freezes today.
> Simply killed X using CTRL+SYSREQ+K and got back a console with error
> messages.
>
> Have you tried the SYSREQ keys?

How does this work? I've tried it but I didn't get this working at all. 
AFAIK, first step is to compile the CONFIG_MAGIC_SYSRQ into the kernel. 
Then, make sure there's a "1" in /proc/sys/kernel/sysrq; well it is.
/usr/src/linux/Documentation/sysrq.txt says press "ALT-SysRq-<command key>", 
I've tried it out with SysRq=printScreen and cmd='h' for help, but nothing 
happens, even under normal conditions. What did I make wrong? 

Alex



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-14 16:31   ` Alexander Puchmayr
@ 2008-10-14 17:09     ` Alex Schuster
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Schuster @ 2008-10-14 17:09 UTC (permalink / raw
  To: gentoo-user

Alexander Puchmayr writes:

> On Dienstag, 14. Oktober 2008, Daniel da Veiga wrote:
> >
> > I had one of this freezes today.
> > Simply killed X using CTRL+SYSREQ+K and got back a console with error
> > messages.
> >
> > Have you tried the SYSREQ keys?
>
> How does this work? I've tried it but I didn't get this working at all.
> AFAIK, first step is to compile the CONFIG_MAGIC_SYSRQ into the kernel.
> Then, make sure there's a "1" in /proc/sys/kernel/sysrq; well it is.
> /usr/src/linux/Documentation/sysrq.txt says press "ALT-SysRq-<command
> key>", I've tried it out with SysRq=printScreen and cmd='h' for help,
> but nothing happens, even under normal conditions. What did I make
> wrong?

Try another key than 'h'. The space key will show a little help, probably 
that what you expected to see with 'h'. Oh, you need to be on a text 
console (ctrl-at-f1) to get visible output.

http://en.wikipedia.org/wiki/Magic_SysRq_key

	Wonko



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Kernel crash - howto find out what happened?
  2008-10-12  9:08 [gentoo-user] Kernel crash - howto find out what happened? Alexander Puchmayr
                   ` (2 preceding siblings ...)
  2008-10-13 23:30 ` Daniel da Veiga
@ 2008-10-19  9:58 ` Alexander Puchmayr
  3 siblings, 0 replies; 10+ messages in thread
From: Alexander Puchmayr @ 2008-10-19  9:58 UTC (permalink / raw
  To: gentoo-user

Hi there!

As my system froze again right now, I've tried to reproduce it, tried to use 
some of the hints given to me in this thread, and made the following 
observations:

* The system freezes on heavy I/O on my sata-harddisks, especially when 
copying mpeg-files (>2GB) from one disk to another.

* a "cat /proc/kmsg" started via ssh from another machine showed the last 
lines

<4>ata6: timeout waiting for ADMA IDLE, stat=0x440
<4>ata6: timeout waiting for ADMA LEGACY, stat=0x440

* sysrq does not work at all (why?? I configured it identically to my 
notebook, it works on the nb but not on the desktop. Simply no reaction 
when pressing alt-sysrq-something, even under normal conditions.)

The sata-controller is an nvidia (onboard on my nforce-based mainboard), 
driven by sata_nv-driver (The one from the kernel, no proprietary nvidia 
chipset/sata driver installed). The kernel in question is a 
gentoo-2.6.24-r8, I'll try an upgrade to the latest stable gentoo kernel.

Thanks to all who gave suggestions
	Alex



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-10-19  9:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-12  9:08 [gentoo-user] Kernel crash - howto find out what happened? Alexander Puchmayr
2008-10-12  9:16 ` Erik Hahn
2008-10-12 11:12   ` Alexander Puchmayr
2008-10-12 11:35     ` Alan McKinnon
2008-10-12 13:00       ` Alexander Puchmayr
2008-10-13 15:09 ` Duane Griffin
2008-10-13 23:30 ` Daniel da Veiga
2008-10-14 16:31   ` Alexander Puchmayr
2008-10-14 17:09     ` Alex Schuster
2008-10-19  9:58 ` Alexander Puchmayr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox