[gentoo-user] Random Kernel Crashes ... Need more info

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] Random Kernel Crashes ... Need more info
@ 2005-09-07 18:36 Kris Kerwin
  2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman
  2005-09-07 18:55 ` gentuxx
  0 siblings, 2 replies; 17+ messages in thread
From: Kris Kerwin @ 2005-09-07 18:36 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 391 bytes --]

Hey all,

I've been experiencing some random kernel crashes, and need a way of finding 
out what happened.

I can't find any information in /var/log/lastlog or 
in /var/log/messages.*.bz2. 

Is there any way that I can monitor kernel messages during a crash and recover 
this information on the next boot?

Thanks in advance.

Kris Kerwin

PS: Please CC me in your response.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-07 18:36 [gentoo-user] Random Kernel Crashes ... Need more info Kris Kerwin
@ 2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman
  2005-09-07 19:09   ` Arturo 'Buanzo' Busleiman
  2005-09-07 18:55 ` gentuxx
  1 sibling, 1 reply; 17+ messages in thread
From: Arturo 'Buanzo' Busleiman @ 2005-09-07 18:53 UTC (permalink / raw
  To: gentoo-user

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kris Kerwin wrote:
> Is there any way that I can monitor kernel messages during a crash and recover 
> this information on the next boot?

You may snapshot dmesg's output in a timely manner, ala crontab.

- --
Arturo "Buanzo" Busleiman - www.buanzo.com.ar
Consultor en Seguridad Informatica
KTP Consultores - info AT ktpconsultores.com.ar
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDHzceAlpOsGhXcE0RAkQiAJ9DsaMLMkb57V1pe1auCGI+SLGBfACfdR29
61J2yLxkK4mbCi8bZPEMmok=
=6InN
-----END PGP SIGNATURE-----
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman
@ 2005-09-07 19:09   ` Arturo 'Buanzo' Busleiman
  0 siblings, 0 replies; 17+ messages in thread
From: Arturo 'Buanzo' Busleiman @ 2005-09-07 19:09 UTC (permalink / raw
  To: gentoo-user

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Arturo 'Buanzo' Busleiman wrote:
> You may snapshot dmesg's output in a timely manner, ala crontab.

Additionally, you my wish to play with the log_buf_len kernel parameter:

log_buf_len=n   Sets the size of the printk ring buffer, in bytes.
                        Format is n, nk, nM.  n must be a power of two.  The
                        default is set in kernel config.


- --
Arturo "Buanzo" Busleiman - www.buanzo.com.ar
Consultor en Seguridad Informatica
KTP Consultores - info AT ktpconsultores.com.ar
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDHzreAlpOsGhXcE0RApV0AJ4nJs869Ichp2EOBhZ/FGCGsbi32wCfbqqC
CRLg1gGzxLmj6Xa3kya56Gc=
=SDxa
-----END PGP SIGNATURE-----
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-07 18:36 [gentoo-user] Random Kernel Crashes ... Need more info Kris Kerwin
  2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman
@ 2005-09-07 18:55 ` gentuxx
  2005-09-17 18:42   ` Kris Kerwin
  1 sibling, 1 reply; 17+ messages in thread
From: gentuxx @ 2005-09-07 18:55 UTC (permalink / raw
  To: gentoo-user

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kris Kerwin wrote:

>Hey all,
>
>I've been experiencing some random kernel crashes, and need a way of
finding
>out what happened.
>
>I can't find any information in /var/log/lastlog or
>in /var/log/messages.*.bz2.
>
>Is there any way that I can monitor kernel messages during a crash and
recover
>this information on the next boot?
>
>Thanks in advance.
>
>Kris Kerwin
>
>PS: Please CC me in your response.

The /var/log/dmesg log contains more specific kernel messages.  You
can also get the messages by running `dmesg` (basically `cat`'s that
file).  If the kernel crashes, messages from previous boots should be
store there.  Also, if you're running a custom kernel, you may want to
turn on the "kernel debugging" option on.  (I haven't used that, but I
remember seeing the last time I compiled my kernel.)

HTH.

- --
gentux
echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge'

gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40  9795 2D81 924A
6996 0993
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDHzewLYGSSmmWCZMRAhXAAKCUTIBHs3S89XKfxBHpWEpjsr4fdQCgybvw
YJ0oXp8+mZkHbg9GNOu6px4=
=DEsO
-----END PGP SIGNATURE-----

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-07 18:55 ` gentuxx
@ 2005-09-17 18:42   ` Kris Kerwin
  2005-09-17 19:01     ` Dave Nebinger
  2005-09-17 19:10     ` Jonathan Wright
  0 siblings, 2 replies; 17+ messages in thread
From: Kris Kerwin @ 2005-09-17 18:42 UTC (permalink / raw
  To: gentoo-user; +Cc: gentuxx, Arturo 'Buanzo' Busleiman

All,

I apologize for getting back so late. It's tough being a college student. ;-)

Thanks to Arturo and gentux for helping out so far.

I've tried catting the output from dmesg and running it regularly with 
crontab, as was advised below. This, unfortunately doesn't work because cron 
can only run as often as once a minute. This means that if a crash happens in 
between these dmesg snapshots, the debugging information is lost. The only 
way that catting dmesg to a file will work is if the crash just so happens to 
occur right as dmesg is being logged. I might be able to increase my chances 
if there was anyway to set up vixie-cron to run more often than once a minute 
(once a second? more?)

Also, it seems that the kernel's ring buffer in /var/log/dmesg gets cleared 
with every boot, so I can't check it after a crash. Is there some other place 
that old kernel logs get stored? Maybe I have a problem in my syslog-ng 
setup? I don't see anything out of the ordinary 
in /etc/syslog-ng/syslog-ng.conf. One thing that I am going to try is instead 
of having messages sent to tty12, I'm logging them to a file. We'll see if 
this doesn't solve the problem.

I've also added the "kernel debugging" option to my kernel, but have no idea 
how to get at this kernel debugging info. Can someone please point me to a 
good manpage?

 As to the log_buf_len=n option, how do I do this? Is this added at the kernel 
command line?

Thanks again, all, for your timely help. As always, please be sure to CC me in 
your response.

Kris Kerwin
kkerwin@insightbb.com

------ Original Email -----

Hey all,

I've been experiencing some random kernel crashes, and need a way of finding 
out what happened.

I can't find any information in /var/log/lastlog or 
in /var/log/messages.*.bz2. 

Is there any way that I can monitor kernel messages during a crash and recover 
this information on the next boot?

Thanks in advance.

Kris Kerwin

PS: Please CC me in your response.

---------------------

On Wednesday 07 September 2005 14:09, Arturo 'Buanzo' Busleiman wrote:
> Arturo 'Buanzo' Busleiman wrote:
> > You may snapshot dmesg's output in a timely manner, ala crontab.
>
> Additionally, you my wish to play with the log_buf_len kernel parameter:
>
> log_buf_len=n   Sets the size of the printk ring buffer, in bytes.
>                         Format is n, nk, nM.  n must be a power of two. 
> The default is set in kernel config.
>
>
> --
> Arturo "Buanzo" Busleiman - www.buanzo.com.ar
> Consultor en Seguridad Informatica
> KTP Consultores - info AT ktpconsultores.com.ar

-----------------------

On Wednesday 07 September 2005 13:55, gentuxx wrote:
> Kris Kerwin wrote:
> >Hey all,
> >
> >I've been experiencing some random kernel crashes, and need a way of
>
> finding
>
> >out what happened.
> >
> >I can't find any information in /var/log/lastlog or
> >in /var/log/messages.*.bz2.
> >
> >Is there any way that I can monitor kernel messages during a crash and
>
> recover
>
> >this information on the next boot?
> >
> >Thanks in advance.
> >
> >Kris Kerwin
> >
> >PS: Please CC me in your response.
>
> The /var/log/dmesg log contains more specific kernel messages.  You
> can also get the messages by running `dmesg` (basically `cat`'s that
> file).  If the kernel crashes, messages from previous boots should be
> store there.  Also, if you're running a custom kernel, you may want to
> turn on the "kernel debugging" option on.  (I haven't used that, but I
> remember seeing the last time I compiled my kernel.)
>
> HTH.
>
> --
> gentux
> echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge'
>
> gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40  9795 2D81 924A
> 6996 0993
-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 18:42   ` Kris Kerwin
@ 2005-09-17 19:01     ` Dave Nebinger
  2005-09-17 19:29       ` Kris Kerwin
  2005-09-17 19:10     ` Jonathan Wright
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Nebinger @ 2005-09-17 19:01 UTC (permalink / raw
  To: gentoo-user; +Cc: kkerwin

> I've been experiencing some random kernel crashes, and need a way of 
> finding
> out what happened.

Kris, I'd start by answering the following:

1. What version of the kernel are you using?  Your OP is quite old, and many 
releases of the kernel have come out since then.  Have you tried a newer 
kernel?  Does the crashes keep happening regardless of the kernel version?

2. If it doesn't matter about the kernel version, then that would indicate 
most likely a hardware failure of some kind.  Could be as simple as a flakey 
memory module, or some extreme such as a motherboard and/or chipset issue, 
some device flaking out, etc.

3. Have you looked at crashes due to heat?  Is your box cleaned and have 
proper airflow?

4. Are you running any esoteric or rare hardware components in the box?

5. Have you ensured that your kernel config matches the hardware?  In some 
cases the selection of drivers is not as simple as selecting a card vendor, 
you sometimes need to get beyond that and know exactly what the device has 
installed.

6. "random kernel crashes" really doesn't provide a lot of info.  How 
frequently does it occur?  Every other month or every 3 minutes?  What 
happens to the box, a total lockup, a powerdown, etc.?

-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 19:01     ` Dave Nebinger
@ 2005-09-17 19:29       ` Kris Kerwin
  2005-09-17 19:52         ` Dave Nebinger
  0 siblings, 1 reply; 17+ messages in thread
From: Kris Kerwin @ 2005-09-17 19:29 UTC (permalink / raw
  To: Dave Nebinger; +Cc: gentoo-user

Thanks Dave.

1) The problem appears to be independant of the kernel version, as I've had it 
occur on a 2.6.10 and 2.6.12 kernel.

2) How might I check for flakey hardware?

3) I have had my BIOS respond after 3 crashes that the computer crashed due to 
excessive heat. I think that this maybe independant of the problem as well, 
because I haven't had this BIOS message in conjunction with a crash for 
several months. I've also had a crash occur when I flipped my laptop 
upside-down and placed an ice pack over the portion that produced the most 
heat (don't worry, I placed a plastic baggie over the computer hardware to 
help to protect from condensation, though I don't think that condensed 
moisture from the air would be able to conduct enough electricity to produce 
a short).

4) The only rare hardware that I have is a Broadcom wireless card, for which I 
use ndiswrapper to load a module into the kernel. The problem is independant 
of this, as well, because I have had the same crash without the module 
loaded.

5) I have not yet ensured that the kernel config matches the hardware 100%, 
though I feel 90% confident in the kernel config that I've custom made for 
this box.

6) I apologize, but as a college student, I'm often away from my computer, 
find that my computer has crashed, but have no method of determining how long 
my computer has been sitting since it last crashed. I am now at the point 
that I only turn it on when I need it, once a day, and so it crashes only 
once a day. As for the lock up itself, it is a total lockup, without a 
powerdown.

I apologize for not providing enough information, but that is because I myself 
didn't have enough information (hence the "Need more info" in my subject). As 
you may have read from my previous posts, the purpose of my writing was not 
so much to solve the kernel crash (though that is certainly the ultimate 
goal) but rather to figure out how to recover data about this crash on a 
subsequent boot. Perhaps I should have made my subject clearer by writing 
something along the lines of "How to trace a kernel oops?". I apologize.

Once I have this information, we can go ahead and figure out why my kernel 
keeps crashing. But first, I have to figure out how to trace my kernel's oops 
message. Without that information, the above answers don't really mean much.

If you could please help me to figure out a way to log old kernel messages and 
find them on subsequent boots, that would be most appreciated.

Thanks again for your help Dave.

Kris

On Saturday 17 September 2005 14:01, Dave Nebinger wrote:
> > I've been experiencing some random kernel crashes, and need a way of
> > finding
> > out what happened.
>
> Kris, I'd start by answering the following:
>
> 1. What version of the kernel are you using?  Your OP is quite old, and
> many releases of the kernel have come out since then.  Have you tried a
> newer kernel?  Does the crashes keep happening regardless of the kernel
> version?
>
> 2. If it doesn't matter about the kernel version, then that would indicate
> most likely a hardware failure of some kind.  Could be as simple as a
> flakey memory module, or some extreme such as a motherboard and/or chipset
> issue, some device flaking out, etc.
>
> 3. Have you looked at crashes due to heat?  Is your box cleaned and have
> proper airflow?
>
> 4. Are you running any esoteric or rare hardware components in the box?
>
> 5. Have you ensured that your kernel config matches the hardware?  In some
> cases the selection of drivers is not as simple as selecting a card vendor,
> you sometimes need to get beyond that and know exactly what the device has
> installed.
>
> 6. "random kernel crashes" really doesn't provide a lot of info.  How
> frequently does it occur?  Every other month or every 3 minutes?  What
> happens to the box, a total lockup, a powerdown, etc.?
-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 19:29       ` Kris Kerwin
@ 2005-09-17 19:52         ` Dave Nebinger
  2005-09-18  0:20           ` Kris Kerwin
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Nebinger @ 2005-09-17 19:52 UTC (permalink / raw
  To: Kris Kerwin; +Cc: gentoo-user

> 1) The problem appears to be independant of the kernel version, as I've 
> had it
> occur on a 2.6.10 and 2.6.12 kernel.
>
> 2) How might I check for flakey hardware?

I would guess hardware problem (unless 3 applies below), but actually 
finding the errant component can be quite a task.  For a desktop you can 
strip down to bare minimum, let it run, add a component, let it run, and 
repeat until you find one that causes the crash, although that might either 
be due to the component or interactions between components, so even that's 
not reliable.

Sounds like you have a laptop which makes that scenario harder.  Did it come 
with any diagnostic tools, ones that know how to check out the hardware 
components and look for errors?

> 3) I have had my BIOS respond after 3 crashes that the computer crashed 
> due to
> excessive heat. I think that this maybe independant of the problem as 
> well,
> because I haven't had this BIOS message in conjunction with a crash for
> several months. I've also had a crash occur when I flipped my laptop
> upside-down and placed an ice pack over the portion that produced the most
> heat

Heat can really be an issue, especially for laptops.  And the icepack 
wouldn't necessarily keep all of the components inside below the threshold 
when the crash occurs, if it is heat related.

> Once I have this information, we can go ahead and figure out why my kernel
> keeps crashing. But first, I have to figure out how to trace my kernel's 
> oops
> message. Without that information, the above answers don't really mean 
> much.
>
> If you could please help me to figure out a way to log old kernel messages 
> and
> find them on subsequent boots, that would be most appreciated.

Depending upon the fault that occurs, if it is hardware related, you might 
never get any worthwhile information out of the kernel even if you could get 
this information...  If the computer just locks up (due to heat or 
hardware), it would do so w/o giving the kernel time to log anything that 
might be of value.

I guess I would try to rule out heat as the problem first.  If your laptop 
is a newer model, you should be able to access the on-board temperature 
sensors (there's been a recent thread on that on the list, and I am by far 
no expert on it).  Get them running via a cron task to collect info over 
time, that way you should be able to see the temp values right before a 
crash kicks in; if they don't really change, you can probably rule heat out 
as the issue.

If it is a hardware problem, you're stuck with what the vendor provided. 
I'm not certain there's any diagnostic tools under linux that would do any 
of this for you.  The vendor's probably going to snub their nose at you as 
they gave it to you with windows on it and you're running the 'unsupported' 
os.  Perhaps there's some happy middleman out there that does hardware 
issues on laptops with linux, but that would be a service that would cost 
you.

-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 19:52         ` Dave Nebinger
@ 2005-09-18  0:20           ` Kris Kerwin
  2005-09-19  2:09             ` Dave Nebinger
  0 siblings, 1 reply; 17+ messages in thread
From: Kris Kerwin @ 2005-09-18  0:20 UTC (permalink / raw
  To: gentoo-user, Jonathan Wright; +Cc: Dave Nebinger

Alright! Thanks to Jonathan Wright and his program, I think that I may have 
found something. See what you guys think:

Before the crash, the following three lines appeared (in this order) nearly 
53,000 times for a total of 16MB of text:

> Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb: 
PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING 
> Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo len=60
> Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134 
127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64

These messages occurred over the course of two hours before the crash at a 
rate of more than 20 times per second. They are the messages that appeared 
just before the crash. Apparently, my computer was trying to tell me 
something pretty important.

Since _this_ problem (still not sure if it is THE problem that is causing the 
crashes to occur; as Dave pointed out, it could be hardware or heat as well) 
appears to be something in networking, I'm going to recompile a kernel 
without all of the complex networking stuff, but one that includes my 
ethernet card's driver.

I'll let you know how it goes, and if the problem persists.

Thanks again.

Kris


On Saturday 17 September 2005 14:52, Dave Nebinger wrote:
> > 1) The problem appears to be independant of the kernel version, as I've
> > had it
> > occur on a 2.6.10 and 2.6.12 kernel.
> >
> > 2) How might I check for flakey hardware?
>
> I would guess hardware problem (unless 3 applies below), but actually
> finding the errant component can be quite a task.  For a desktop you can
> strip down to bare minimum, let it run, add a component, let it run, and
> repeat until you find one that causes the crash, although that might either
> be due to the component or interactions between components, so even that's
> not reliable.
>
> Sounds like you have a laptop which makes that scenario harder.  Did it
> come with any diagnostic tools, ones that know how to check out the
> hardware components and look for errors?
>
> > 3) I have had my BIOS respond after 3 crashes that the computer crashed
> > due to
> > excessive heat. I think that this maybe independant of the problem as
> > well,
> > because I haven't had this BIOS message in conjunction with a crash for
> > several months. I've also had a crash occur when I flipped my laptop
> > upside-down and placed an ice pack over the portion that produced the
> > most heat
>
> Heat can really be an issue, especially for laptops.  And the icepack
> wouldn't necessarily keep all of the components inside below the threshold
> when the crash occurs, if it is heat related.
>
> > Once I have this information, we can go ahead and figure out why my
> > kernel keeps crashing. But first, I have to figure out how to trace my
> > kernel's oops
> > message. Without that information, the above answers don't really mean
> > much.
> >
> > If you could please help me to figure out a way to log old kernel
> > messages and
> > find them on subsequent boots, that would be most appreciated.
>
> Depending upon the fault that occurs, if it is hardware related, you might
> never get any worthwhile information out of the kernel even if you could
> get this information...  If the computer just locks up (due to heat or
> hardware), it would do so w/o giving the kernel time to log anything that
> might be of value.
>
> I guess I would try to rule out heat as the problem first.  If your laptop
> is a newer model, you should be able to access the on-board temperature
> sensors (there's been a recent thread on that on the list, and I am by far
> no expert on it).  Get them running via a cron task to collect info over
> time, that way you should be able to see the temp values right before a
> crash kicks in; if they don't really change, you can probably rule heat out
> as the issue.
>
> If it is a hardware problem, you're stuck with what the vendor provided.
> I'm not certain there's any diagnostic tools under linux that would do any
> of this for you.  The vendor's probably going to snub their nose at you as
> they gave it to you with windows on it and you're running the 'unsupported'
> os.  Perhaps there's some happy middleman out there that does hardware
> issues on laptops with linux, but that would be a service that would cost
> you.
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-18  0:20           ` Kris Kerwin
@ 2005-09-19  2:09             ` Dave Nebinger
  2005-09-19 22:56               ` Kris Kerwin
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Nebinger @ 2005-09-19  2:09 UTC (permalink / raw
  To: Kris Kerwin, gentoo-user, Jonathan Wright

> Before the crash, the following three lines appeared (in this order) 
> nearly
> 53,000 times for a total of 16MB of text:
>
>> Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb:
> PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING
>> Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo len=60
>> Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134
> 127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64

Don't assume this is your answer, Kris.  This was a known problem on one of 
the 2.6.12 kernels (2.6.12.4, I believe, but don't hold me to it).

I had many of these in my logs also.  It was a partial network patch applied 
to the networking layer but missed some components.  It was fixed by the 
2.6.13 kernel series.

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-19  2:09             ` Dave Nebinger
@ 2005-09-19 22:56               ` Kris Kerwin
  2005-09-19 23:07                 ` Volker Armin Hemmann
  0 siblings, 1 reply; 17+ messages in thread
From: Kris Kerwin @ 2005-09-19 22:56 UTC (permalink / raw
  To: Dave Nebinger; +Cc: gentoo-user

Dave,

Yup. Had a feeling that you might be right about that one.

It seems that the computer will still crash, but certainly not as often. My 
guess: there is a bigger problem that is aggravated when the computer is 
under more stress; ie: tracking excessive amounts of kernel complaints, etc. 
I've also noticed difficulties with the sound system and have had the 
computer crash a number of times when playing music (could be the media 
player or the sound system itself, but I still think that the problem is 
bigger yet).

Ideas for a next step? Is there more information that I can submit to 
<hopefully> throw out the possibility of a hardware problem, or to determine 
which piece of hardware is at fault?

Thanks again for all of your help.

Kris

On Sunday 18 September 2005 21:09, Dave Nebinger wrote:
> > Before the crash, the following three lines appeared (in this order)
> > nearly
> >
> > 53,000 times for a total of 16MB of text:
> >> Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb:
> >
> > PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING
> >
> >> Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo
> >> len=60 Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134
> >
> > 127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64
>
> Don't assume this is your answer, Kris.  This was a known problem on one of
> the 2.6.12 kernels (2.6.12.4, I believe, but don't hold me to it).
>
> I had many of these in my logs also.  It was a partial network patch
> applied to the networking layer but missed some components.  It was fixed
> by the 2.6.13 kernel series.
-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-19 22:56               ` Kris Kerwin
@ 2005-09-19 23:07                 ` Volker Armin Hemmann
  2005-09-20  2:14                   ` Dave Nebinger
  0 siblings, 1 reply; 17+ messages in thread
From: Volker Armin Hemmann @ 2005-09-19 23:07 UTC (permalink / raw
  To: gentoo-user

On Tuesday 20 September 2005 00:56, Kris Kerwin wrote:
> Dave,
>
> Yup. Had a feeling that you might be right about that one.
>
> It seems that the computer will still crash, but certainly not as often. My
> guess: there is a bigger problem that is aggravated when the computer is
> under more stress; ie: tracking excessive amounts of kernel complaints,
> etc. I've also noticed difficulties with the sound system and have had the
> computer crash a number of times when playing music (could be the media
> player or the sound system itself, but I still think that the problem is
> bigger yet).
>
> Ideas for a next step? Is there more information that I can submit to
> <hopefully> throw out the possibility of a hardware problem, or to
> determine which piece of hardware is at fault?
>
>
well, at first, let memtest86(+) run for some hours.

second, check that your box does not get too hot. Crashes on stress are mostly 
overheating or PSU going bad.

third, try a different PSU - the manufacturers like to use the cheapest 
components for this almost most important part of a computer, if possible try 
another one.

fourth, check your board and cards for 'funny looking' condensators - like 
deformation, round tops, or even some brown 'dirt' at their base. 
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-19 23:07                 ` Volker Armin Hemmann
@ 2005-09-20  2:14                   ` Dave Nebinger
  2005-09-20  2:26                     ` Volker Armin Hemmann
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Nebinger @ 2005-09-20  2:14 UTC (permalink / raw
  To: gentoo-user; +Cc: Kris Kerwin

> well, at first, let memtest86(+) run for some hours.

Volker's got a good point here...

> second, check that your box does not get too hot. Crashes on stress are 
> mostly
> overheating or PSU going bad.

Mentioned that to him about the heat... Kris, were you able to get 
lm_sensors running on the box?

> third, try a different PSU - the manufacturers like to use the cheapest
> components for this almost most important part of a computer, if possible 
> try
> another one.
>
> fourth, check your board and cards for 'funny looking' condensators - like
> deformation, round tops, or even some brown 'dirt' at their base.

This I think will be hard for him, Volker, as I believe he's running a 
laptop.

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-20  2:14                   ` Dave Nebinger
@ 2005-09-20  2:26                     ` Volker Armin Hemmann
  0 siblings, 0 replies; 17+ messages in thread
From: Volker Armin Hemmann @ 2005-09-20  2:26 UTC (permalink / raw
  To: gentoo-user

On Tuesday 20 September 2005 04:14, Dave Nebinger wrote:

> > fourth, check your board and cards for 'funny looking' condensators -
> > like deformation, round tops, or even some brown 'dirt' at their base.
>
> This I think will be hard for him, Volker, as I believe he's running a
> laptop.

In that case, just open, what can be opened and see if there is any suspicious 
capacitor. 
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 18:42   ` Kris Kerwin
  2005-09-17 19:01     ` Dave Nebinger
@ 2005-09-17 19:10     ` Jonathan Wright
  2005-09-17 19:34       ` Kris Kerwin
  1 sibling, 1 reply; 17+ messages in thread
From: Jonathan Wright @ 2005-09-17 19:10 UTC (permalink / raw
  To: gentoo-user

Kris Kerwin wrote:
> I've tried catting the output from dmesg and running it regularly with 
> crontab, as was advised below. This, unfortunately doesn't work because cron 
> can only run as often as once a minute. This means that if a crash happens in 
> between these dmesg snapshots, the debugging information is lost. The only 
> way that catting dmesg to a file will work is if the crash just so happens to 
> occur right as dmesg is being logged. I might be able to increase my chances 
> if there was anyway to set up vixie-cron to run more often than once a minute 
> (once a second? more?)

Why not run a bash script, something like (not tested or debugged! And I 
can't remember how to do a while loop in bash;)

while true; do
   if [ -e /tmp/stopdmesg ]; then
     exit;
   else
     dmesg > dmesg-$(date +%Y%m%d%H%m%s)
     sleep(5)
   fi
done

Open up your terminal and run the script (and append & to send it to the 
background). If needs be, change sleep(5) to as low as you need to get the 
dmesg information.

-- 
  Jonathan Wright                           ~ mail at djnauk.co.uk
                                            ~ www.djnauk.co.uk
--
  2.6.12-gentoo-r6-djnauk-b2 AMD Athlon(tm) XP 2100+
  up 1 day,  9:03,  3 users,  load average: 3.62, 2.94, 2.42
--
  "The Bible contains six admonishments to homosexuals  and  three
  hundred sixty two admonishments to heterosexuals.  That  doesn't
  mean that God doesn't love heterosexuals. It's  just  that  they
  need more supervision."

                                                    ~ Lynne Lavner
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 19:10     ` Jonathan Wright
@ 2005-09-17 19:34       ` Kris Kerwin
  2005-09-17 22:24         ` Jonathan Wright
  0 siblings, 1 reply; 17+ messages in thread
From: Kris Kerwin @ 2005-09-17 19:34 UTC (permalink / raw
  To: gentoo-user; +Cc: Jonathan Wright

[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

Thanks Jonathan.

Anyone have any thoughts on this? I'm not a bash or any other programmer, and 
was wondering if this would work. And how might I code that while loop?

Thanks again, all, for your help.

Kris

On Saturday 17 September 2005 14:10, Jonathan Wright wrote:
> Kris Kerwin wrote:
> > I've tried catting the output from dmesg and running it regularly with
> > crontab, as was advised below. This, unfortunately doesn't work because
> > cron can only run as often as once a minute. This means that if a crash
> > happens in between these dmesg snapshots, the debugging information is
> > lost. The only way that catting dmesg to a file will work is if the crash
> > just so happens to occur right as dmesg is being logged. I might be able
> > to increase my chances if there was anyway to set up vixie-cron to run
> > more often than once a minute (once a second? more?)
>
> Why not run a bash script, something like (not tested or debugged! And I
> can't remember how to do a while loop in bash;)
>
> while true; do
>    if [ -e /tmp/stopdmesg ]; then
>      exit;
>    else
>      dmesg > dmesg-$(date +%Y%m%d%H%m%s)
>      sleep(5)
>    fi
> done
>
> Open up your terminal and run the script (and append & to send it to the
> background). If needs be, change sleep(5) to as low as you need to get the
> dmesg information.
>
> --
>   Jonathan Wright                           ~ mail at djnauk.co.uk
>                                             ~ www.djnauk.co.uk
> --
>   2.6.12-gentoo-r6-djnauk-b2 AMD Athlon(tm) XP 2100+
>   up 1 day,  9:03,  3 users,  load average: 3.62, 2.94, 2.42
> --
>   "The Bible contains six admonishments to homosexuals  and  three
>   hundred sixty two admonishments to heterosexuals.  That  doesn't
>   mean that God doesn't love heterosexuals. It's  just  that  they
>   need more supervision."
>
>                                                     ~ Lynne Lavner

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-user] Random Kernel Crashes ... Need more info
  2005-09-17 19:34       ` Kris Kerwin
@ 2005-09-17 22:24         ` Jonathan Wright
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Wright @ 2005-09-17 22:24 UTC (permalink / raw
  To: gentoo-user; +Cc: kkerwin

Kris Kerwin wrote:
> Thanks Jonathan.
> 
> Anyone have any thoughts on this? I'm not a bash or any other programmer, and 
> was wondering if this would work. And how might I code that while loop?

Actually - the while loop was fine. I wrote that line and thought I can't 
do that! I'll have to look it up before I send it out - it was originally 
white (1) - but in doing so, I forgot to delete the statement.

>>while true; do
>>   if [ -e /tmp/stopdmesg ]; then
>>     exit;
>>   else
>>     dmesg > dmesg-$(date +%Y%m%d%H%M%S)
>>     sleep(5)
>>   fi
>>done

In theory, the following code should do it:

--cut-------------------
#!/bin/bash

if [ -z $1 ]; then
   echo "sleep time not given"
   exit
fi

while true; do
    if [ -e /tmp/stopdmesg ]; then
      exit;
    else
      dmesg > /tmp/dmesg-$(date +%Y%m%d%H%M%S)
      echo -n "."
      sleep $1
    fi
done
--cut-------------------

You can then run to program (say it's in a file called dcat)

$ ./dcat 5

which will sleep for 5 seconds at a time, before outputting the dmesg 
contents to /tmp/dmesg-(time), (e.g. /tmp/dmesg-20050917231913)

For each output, you'll see a period on screen, e.g.

$ ./dcat 5
..................................

So you can track. But you can delete the 'echo -n "."' line if you want to 
stop that. Finally, to stop it, you can either kill the process, or create 
an empty file called stopdmesg in /tmp:

$ touch /tmp/stopdmesg

which will terminate the loop and the program.

Hope that all helps and gets you the information your after!

-- 
  Jonathan Wright                           ~ mail at djnauk.co.uk
                                            ~ www.djnauk.co.uk
--
  2.6.12-gentoo-r6-djnauk-b2 AMD Athlon(tm) XP 2100+
  up 1 day, 12:08,  4 users,  load average: 4.58, 2.76, 2.62
--
  "Labels can also be misleading. I saw  a  news  report  about  a
  lesbian protest march, and the reporter said, 'Coming up next, a
  lesbian demonstration.' My first thought was,  'Cool.  I  always
  wondered how those things work.'"

                                          ~ Michael Dane, Comedian
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-09-20  2:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-07 18:36 [gentoo-user] Random Kernel Crashes ... Need more info Kris Kerwin
2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman
2005-09-07 19:09   ` Arturo 'Buanzo' Busleiman
2005-09-07 18:55 ` gentuxx
2005-09-17 18:42   ` Kris Kerwin
2005-09-17 19:01     ` Dave Nebinger
2005-09-17 19:29       ` Kris Kerwin
2005-09-17 19:52         ` Dave Nebinger
2005-09-18  0:20           ` Kris Kerwin
2005-09-19  2:09             ` Dave Nebinger
2005-09-19 22:56               ` Kris Kerwin
2005-09-19 23:07                 ` Volker Armin Hemmann
2005-09-20  2:14                   ` Dave Nebinger
2005-09-20  2:26                     ` Volker Armin Hemmann
2005-09-17 19:10     ` Jonathan Wright
2005-09-17 19:34       ` Kris Kerwin
2005-09-17 22:24         ` Jonathan Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox