* [gentoo-user] Random Kernel Crashes ... Need more info @ 2005-09-07 18:36 Kris Kerwin 2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman 2005-09-07 18:55 ` gentuxx 0 siblings, 2 replies; 17+ messages in thread From: Kris Kerwin @ 2005-09-07 18:36 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 391 bytes --] Hey all, I've been experiencing some random kernel crashes, and need a way of finding out what happened. I can't find any information in /var/log/lastlog or in /var/log/messages.*.bz2. Is there any way that I can monitor kernel messages during a crash and recover this information on the next boot? Thanks in advance. Kris Kerwin PS: Please CC me in your response. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-07 18:36 [gentoo-user] Random Kernel Crashes ... Need more info Kris Kerwin @ 2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman 2005-09-07 19:09 ` Arturo 'Buanzo' Busleiman 2005-09-07 18:55 ` gentuxx 1 sibling, 1 reply; 17+ messages in thread From: Arturo 'Buanzo' Busleiman @ 2005-09-07 18:53 UTC (permalink / raw To: gentoo-user -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kris Kerwin wrote: > Is there any way that I can monitor kernel messages during a crash and recover > this information on the next boot? You may snapshot dmesg's output in a timely manner, ala crontab. - -- Arturo "Buanzo" Busleiman - www.buanzo.com.ar Consultor en Seguridad Informatica KTP Consultores - info AT ktpconsultores.com.ar -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDHzceAlpOsGhXcE0RAkQiAJ9DsaMLMkb57V1pe1auCGI+SLGBfACfdR29 61J2yLxkK4mbCi8bZPEMmok= =6InN -----END PGP SIGNATURE----- -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman @ 2005-09-07 19:09 ` Arturo 'Buanzo' Busleiman 0 siblings, 0 replies; 17+ messages in thread From: Arturo 'Buanzo' Busleiman @ 2005-09-07 19:09 UTC (permalink / raw To: gentoo-user -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Arturo 'Buanzo' Busleiman wrote: > You may snapshot dmesg's output in a timely manner, ala crontab. Additionally, you my wish to play with the log_buf_len kernel parameter: log_buf_len=n Sets the size of the printk ring buffer, in bytes. Format is n, nk, nM. n must be a power of two. The default is set in kernel config. - -- Arturo "Buanzo" Busleiman - www.buanzo.com.ar Consultor en Seguridad Informatica KTP Consultores - info AT ktpconsultores.com.ar -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDHzreAlpOsGhXcE0RApV0AJ4nJs869Ichp2EOBhZ/FGCGsbi32wCfbqqC CRLg1gGzxLmj6Xa3kya56Gc= =SDxa -----END PGP SIGNATURE----- -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-07 18:36 [gentoo-user] Random Kernel Crashes ... Need more info Kris Kerwin 2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman @ 2005-09-07 18:55 ` gentuxx 2005-09-17 18:42 ` Kris Kerwin 1 sibling, 1 reply; 17+ messages in thread From: gentuxx @ 2005-09-07 18:55 UTC (permalink / raw To: gentoo-user -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kris Kerwin wrote: >Hey all, > >I've been experiencing some random kernel crashes, and need a way of finding >out what happened. > >I can't find any information in /var/log/lastlog or >in /var/log/messages.*.bz2. > >Is there any way that I can monitor kernel messages during a crash and recover >this information on the next boot? > >Thanks in advance. > >Kris Kerwin > >PS: Please CC me in your response. The /var/log/dmesg log contains more specific kernel messages. You can also get the messages by running `dmesg` (basically `cat`'s that file). If the kernel crashes, messages from previous boots should be store there. Also, if you're running a custom kernel, you may want to turn on the "kernel debugging" option on. (I haven't used that, but I remember seeing the last time I compiled my kernel.) HTH. - -- gentux echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge' gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40 9795 2D81 924A 6996 0993 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDHzewLYGSSmmWCZMRAhXAAKCUTIBHs3S89XKfxBHpWEpjsr4fdQCgybvw YJ0oXp8+mZkHbg9GNOu6px4= =DEsO -----END PGP SIGNATURE----- -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-07 18:55 ` gentuxx @ 2005-09-17 18:42 ` Kris Kerwin 2005-09-17 19:01 ` Dave Nebinger 2005-09-17 19:10 ` Jonathan Wright 0 siblings, 2 replies; 17+ messages in thread From: Kris Kerwin @ 2005-09-17 18:42 UTC (permalink / raw To: gentoo-user; +Cc: gentuxx, Arturo 'Buanzo' Busleiman All, I apologize for getting back so late. It's tough being a college student. ;-) Thanks to Arturo and gentux for helping out so far. I've tried catting the output from dmesg and running it regularly with crontab, as was advised below. This, unfortunately doesn't work because cron can only run as often as once a minute. This means that if a crash happens in between these dmesg snapshots, the debugging information is lost. The only way that catting dmesg to a file will work is if the crash just so happens to occur right as dmesg is being logged. I might be able to increase my chances if there was anyway to set up vixie-cron to run more often than once a minute (once a second? more?) Also, it seems that the kernel's ring buffer in /var/log/dmesg gets cleared with every boot, so I can't check it after a crash. Is there some other place that old kernel logs get stored? Maybe I have a problem in my syslog-ng setup? I don't see anything out of the ordinary in /etc/syslog-ng/syslog-ng.conf. One thing that I am going to try is instead of having messages sent to tty12, I'm logging them to a file. We'll see if this doesn't solve the problem. I've also added the "kernel debugging" option to my kernel, but have no idea how to get at this kernel debugging info. Can someone please point me to a good manpage? As to the log_buf_len=n option, how do I do this? Is this added at the kernel command line? Thanks again, all, for your timely help. As always, please be sure to CC me in your response. Kris Kerwin kkerwin@insightbb.com ------ Original Email ----- Hey all, I've been experiencing some random kernel crashes, and need a way of finding out what happened. I can't find any information in /var/log/lastlog or in /var/log/messages.*.bz2. Is there any way that I can monitor kernel messages during a crash and recover this information on the next boot? Thanks in advance. Kris Kerwin PS: Please CC me in your response. --------------------- On Wednesday 07 September 2005 14:09, Arturo 'Buanzo' Busleiman wrote: > Arturo 'Buanzo' Busleiman wrote: > > You may snapshot dmesg's output in a timely manner, ala crontab. > > Additionally, you my wish to play with the log_buf_len kernel parameter: > > log_buf_len=n Sets the size of the printk ring buffer, in bytes. > Format is n, nk, nM. n must be a power of two. > The default is set in kernel config. > > > -- > Arturo "Buanzo" Busleiman - www.buanzo.com.ar > Consultor en Seguridad Informatica > KTP Consultores - info AT ktpconsultores.com.ar ----------------------- On Wednesday 07 September 2005 13:55, gentuxx wrote: > Kris Kerwin wrote: > >Hey all, > > > >I've been experiencing some random kernel crashes, and need a way of > > finding > > >out what happened. > > > >I can't find any information in /var/log/lastlog or > >in /var/log/messages.*.bz2. > > > >Is there any way that I can monitor kernel messages during a crash and > > recover > > >this information on the next boot? > > > >Thanks in advance. > > > >Kris Kerwin > > > >PS: Please CC me in your response. > > The /var/log/dmesg log contains more specific kernel messages. You > can also get the messages by running `dmesg` (basically `cat`'s that > file). If the kernel crashes, messages from previous boots should be > store there. Also, if you're running a custom kernel, you may want to > turn on the "kernel debugging" option on. (I haven't used that, but I > remember seeing the last time I compiled my kernel.) > > HTH. > > -- > gentux > echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge' > > gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40 9795 2D81 924A > 6996 0993 -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 18:42 ` Kris Kerwin @ 2005-09-17 19:01 ` Dave Nebinger 2005-09-17 19:29 ` Kris Kerwin 2005-09-17 19:10 ` Jonathan Wright 1 sibling, 1 reply; 17+ messages in thread From: Dave Nebinger @ 2005-09-17 19:01 UTC (permalink / raw To: gentoo-user; +Cc: kkerwin > I've been experiencing some random kernel crashes, and need a way of > finding > out what happened. Kris, I'd start by answering the following: 1. What version of the kernel are you using? Your OP is quite old, and many releases of the kernel have come out since then. Have you tried a newer kernel? Does the crashes keep happening regardless of the kernel version? 2. If it doesn't matter about the kernel version, then that would indicate most likely a hardware failure of some kind. Could be as simple as a flakey memory module, or some extreme such as a motherboard and/or chipset issue, some device flaking out, etc. 3. Have you looked at crashes due to heat? Is your box cleaned and have proper airflow? 4. Are you running any esoteric or rare hardware components in the box? 5. Have you ensured that your kernel config matches the hardware? In some cases the selection of drivers is not as simple as selecting a card vendor, you sometimes need to get beyond that and know exactly what the device has installed. 6. "random kernel crashes" really doesn't provide a lot of info. How frequently does it occur? Every other month or every 3 minutes? What happens to the box, a total lockup, a powerdown, etc.? -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 19:01 ` Dave Nebinger @ 2005-09-17 19:29 ` Kris Kerwin 2005-09-17 19:52 ` Dave Nebinger 0 siblings, 1 reply; 17+ messages in thread From: Kris Kerwin @ 2005-09-17 19:29 UTC (permalink / raw To: Dave Nebinger; +Cc: gentoo-user Thanks Dave. 1) The problem appears to be independant of the kernel version, as I've had it occur on a 2.6.10 and 2.6.12 kernel. 2) How might I check for flakey hardware? 3) I have had my BIOS respond after 3 crashes that the computer crashed due to excessive heat. I think that this maybe independant of the problem as well, because I haven't had this BIOS message in conjunction with a crash for several months. I've also had a crash occur when I flipped my laptop upside-down and placed an ice pack over the portion that produced the most heat (don't worry, I placed a plastic baggie over the computer hardware to help to protect from condensation, though I don't think that condensed moisture from the air would be able to conduct enough electricity to produce a short). 4) The only rare hardware that I have is a Broadcom wireless card, for which I use ndiswrapper to load a module into the kernel. The problem is independant of this, as well, because I have had the same crash without the module loaded. 5) I have not yet ensured that the kernel config matches the hardware 100%, though I feel 90% confident in the kernel config that I've custom made for this box. 6) I apologize, but as a college student, I'm often away from my computer, find that my computer has crashed, but have no method of determining how long my computer has been sitting since it last crashed. I am now at the point that I only turn it on when I need it, once a day, and so it crashes only once a day. As for the lock up itself, it is a total lockup, without a powerdown. I apologize for not providing enough information, but that is because I myself didn't have enough information (hence the "Need more info" in my subject). As you may have read from my previous posts, the purpose of my writing was not so much to solve the kernel crash (though that is certainly the ultimate goal) but rather to figure out how to recover data about this crash on a subsequent boot. Perhaps I should have made my subject clearer by writing something along the lines of "How to trace a kernel oops?". I apologize. Once I have this information, we can go ahead and figure out why my kernel keeps crashing. But first, I have to figure out how to trace my kernel's oops message. Without that information, the above answers don't really mean much. If you could please help me to figure out a way to log old kernel messages and find them on subsequent boots, that would be most appreciated. Thanks again for your help Dave. Kris On Saturday 17 September 2005 14:01, Dave Nebinger wrote: > > I've been experiencing some random kernel crashes, and need a way of > > finding > > out what happened. > > Kris, I'd start by answering the following: > > 1. What version of the kernel are you using? Your OP is quite old, and > many releases of the kernel have come out since then. Have you tried a > newer kernel? Does the crashes keep happening regardless of the kernel > version? > > 2. If it doesn't matter about the kernel version, then that would indicate > most likely a hardware failure of some kind. Could be as simple as a > flakey memory module, or some extreme such as a motherboard and/or chipset > issue, some device flaking out, etc. > > 3. Have you looked at crashes due to heat? Is your box cleaned and have > proper airflow? > > 4. Are you running any esoteric or rare hardware components in the box? > > 5. Have you ensured that your kernel config matches the hardware? In some > cases the selection of drivers is not as simple as selecting a card vendor, > you sometimes need to get beyond that and know exactly what the device has > installed. > > 6. "random kernel crashes" really doesn't provide a lot of info. How > frequently does it occur? Every other month or every 3 minutes? What > happens to the box, a total lockup, a powerdown, etc.? -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 19:29 ` Kris Kerwin @ 2005-09-17 19:52 ` Dave Nebinger 2005-09-18 0:20 ` Kris Kerwin 0 siblings, 1 reply; 17+ messages in thread From: Dave Nebinger @ 2005-09-17 19:52 UTC (permalink / raw To: Kris Kerwin; +Cc: gentoo-user > 1) The problem appears to be independant of the kernel version, as I've > had it > occur on a 2.6.10 and 2.6.12 kernel. > > 2) How might I check for flakey hardware? I would guess hardware problem (unless 3 applies below), but actually finding the errant component can be quite a task. For a desktop you can strip down to bare minimum, let it run, add a component, let it run, and repeat until you find one that causes the crash, although that might either be due to the component or interactions between components, so even that's not reliable. Sounds like you have a laptop which makes that scenario harder. Did it come with any diagnostic tools, ones that know how to check out the hardware components and look for errors? > 3) I have had my BIOS respond after 3 crashes that the computer crashed > due to > excessive heat. I think that this maybe independant of the problem as > well, > because I haven't had this BIOS message in conjunction with a crash for > several months. I've also had a crash occur when I flipped my laptop > upside-down and placed an ice pack over the portion that produced the most > heat Heat can really be an issue, especially for laptops. And the icepack wouldn't necessarily keep all of the components inside below the threshold when the crash occurs, if it is heat related. > Once I have this information, we can go ahead and figure out why my kernel > keeps crashing. But first, I have to figure out how to trace my kernel's > oops > message. Without that information, the above answers don't really mean > much. > > If you could please help me to figure out a way to log old kernel messages > and > find them on subsequent boots, that would be most appreciated. Depending upon the fault that occurs, if it is hardware related, you might never get any worthwhile information out of the kernel even if you could get this information... If the computer just locks up (due to heat or hardware), it would do so w/o giving the kernel time to log anything that might be of value. I guess I would try to rule out heat as the problem first. If your laptop is a newer model, you should be able to access the on-board temperature sensors (there's been a recent thread on that on the list, and I am by far no expert on it). Get them running via a cron task to collect info over time, that way you should be able to see the temp values right before a crash kicks in; if they don't really change, you can probably rule heat out as the issue. If it is a hardware problem, you're stuck with what the vendor provided. I'm not certain there's any diagnostic tools under linux that would do any of this for you. The vendor's probably going to snub their nose at you as they gave it to you with windows on it and you're running the 'unsupported' os. Perhaps there's some happy middleman out there that does hardware issues on laptops with linux, but that would be a service that would cost you. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 19:52 ` Dave Nebinger @ 2005-09-18 0:20 ` Kris Kerwin 2005-09-19 2:09 ` Dave Nebinger 0 siblings, 1 reply; 17+ messages in thread From: Kris Kerwin @ 2005-09-18 0:20 UTC (permalink / raw To: gentoo-user, Jonathan Wright; +Cc: Dave Nebinger Alright! Thanks to Jonathan Wright and his program, I think that I may have found something. See what you guys think: Before the crash, the following three lines appeared (in this order) nearly 53,000 times for a total of 16MB of text: > Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb: PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING > Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo len=60 > Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134 127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64 These messages occurred over the course of two hours before the crash at a rate of more than 20 times per second. They are the messages that appeared just before the crash. Apparently, my computer was trying to tell me something pretty important. Since _this_ problem (still not sure if it is THE problem that is causing the crashes to occur; as Dave pointed out, it could be hardware or heat as well) appears to be something in networking, I'm going to recompile a kernel without all of the complex networking stuff, but one that includes my ethernet card's driver. I'll let you know how it goes, and if the problem persists. Thanks again. Kris On Saturday 17 September 2005 14:52, Dave Nebinger wrote: > > 1) The problem appears to be independant of the kernel version, as I've > > had it > > occur on a 2.6.10 and 2.6.12 kernel. > > > > 2) How might I check for flakey hardware? > > I would guess hardware problem (unless 3 applies below), but actually > finding the errant component can be quite a task. For a desktop you can > strip down to bare minimum, let it run, add a component, let it run, and > repeat until you find one that causes the crash, although that might either > be due to the component or interactions between components, so even that's > not reliable. > > Sounds like you have a laptop which makes that scenario harder. Did it > come with any diagnostic tools, ones that know how to check out the > hardware components and look for errors? > > > 3) I have had my BIOS respond after 3 crashes that the computer crashed > > due to > > excessive heat. I think that this maybe independant of the problem as > > well, > > because I haven't had this BIOS message in conjunction with a crash for > > several months. I've also had a crash occur when I flipped my laptop > > upside-down and placed an ice pack over the portion that produced the > > most heat > > Heat can really be an issue, especially for laptops. And the icepack > wouldn't necessarily keep all of the components inside below the threshold > when the crash occurs, if it is heat related. > > > Once I have this information, we can go ahead and figure out why my > > kernel keeps crashing. But first, I have to figure out how to trace my > > kernel's oops > > message. Without that information, the above answers don't really mean > > much. > > > > If you could please help me to figure out a way to log old kernel > > messages and > > find them on subsequent boots, that would be most appreciated. > > Depending upon the fault that occurs, if it is hardware related, you might > never get any worthwhile information out of the kernel even if you could > get this information... If the computer just locks up (due to heat or > hardware), it would do so w/o giving the kernel time to log anything that > might be of value. > > I guess I would try to rule out heat as the problem first. If your laptop > is a newer model, you should be able to access the on-board temperature > sensors (there's been a recent thread on that on the list, and I am by far > no expert on it). Get them running via a cron task to collect info over > time, that way you should be able to see the temp values right before a > crash kicks in; if they don't really change, you can probably rule heat out > as the issue. > > If it is a hardware problem, you're stuck with what the vendor provided. > I'm not certain there's any diagnostic tools under linux that would do any > of this for you. The vendor's probably going to snub their nose at you as > they gave it to you with windows on it and you're running the 'unsupported' > os. Perhaps there's some happy middleman out there that does hardware > issues on laptops with linux, but that would be a service that would cost > you. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-18 0:20 ` Kris Kerwin @ 2005-09-19 2:09 ` Dave Nebinger 2005-09-19 22:56 ` Kris Kerwin 0 siblings, 1 reply; 17+ messages in thread From: Dave Nebinger @ 2005-09-19 2:09 UTC (permalink / raw To: Kris Kerwin, gentoo-user, Jonathan Wright > Before the crash, the following three lines appeared (in this order) > nearly > 53,000 times for a total of 16MB of text: > >> Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb: > PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING >> Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo len=60 >> Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134 > 127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64 Don't assume this is your answer, Kris. This was a known problem on one of the 2.6.12 kernels (2.6.12.4, I believe, but don't hold me to it). I had many of these in my logs also. It was a partial network patch applied to the networking layer but missed some components. It was fixed by the 2.6.13 kernel series. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-19 2:09 ` Dave Nebinger @ 2005-09-19 22:56 ` Kris Kerwin 2005-09-19 23:07 ` Volker Armin Hemmann 0 siblings, 1 reply; 17+ messages in thread From: Kris Kerwin @ 2005-09-19 22:56 UTC (permalink / raw To: Dave Nebinger; +Cc: gentoo-user Dave, Yup. Had a feeling that you might be right about that one. It seems that the computer will still crash, but certainly not as often. My guess: there is a bigger problem that is aggravated when the computer is under more stress; ie: tracking excessive amounts of kernel complaints, etc. I've also noticed difficulties with the sound system and have had the computer crash a number of times when playing music (could be the media player or the sound system itself, but I still think that the problem is bigger yet). Ideas for a next step? Is there more information that I can submit to <hopefully> throw out the possibility of a hardware problem, or to determine which piece of hardware is at fault? Thanks again for all of your help. Kris On Sunday 18 September 2005 21:09, Dave Nebinger wrote: > > Before the crash, the following three lines appeared (in this order) > > nearly > > > > 53,000 times for a total of 16MB of text: > >> Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb: > > > > PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING > > > >> Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo > >> len=60 Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134 > > > > 127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64 > > Don't assume this is your answer, Kris. This was a known problem on one of > the 2.6.12 kernels (2.6.12.4, I believe, but don't hold me to it). > > I had many of these in my logs also. It was a partial network patch > applied to the networking layer but missed some components. It was fixed > by the 2.6.13 kernel series. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-19 22:56 ` Kris Kerwin @ 2005-09-19 23:07 ` Volker Armin Hemmann 2005-09-20 2:14 ` Dave Nebinger 0 siblings, 1 reply; 17+ messages in thread From: Volker Armin Hemmann @ 2005-09-19 23:07 UTC (permalink / raw To: gentoo-user On Tuesday 20 September 2005 00:56, Kris Kerwin wrote: > Dave, > > Yup. Had a feeling that you might be right about that one. > > It seems that the computer will still crash, but certainly not as often. My > guess: there is a bigger problem that is aggravated when the computer is > under more stress; ie: tracking excessive amounts of kernel complaints, > etc. I've also noticed difficulties with the sound system and have had the > computer crash a number of times when playing music (could be the media > player or the sound system itself, but I still think that the problem is > bigger yet). > > Ideas for a next step? Is there more information that I can submit to > <hopefully> throw out the possibility of a hardware problem, or to > determine which piece of hardware is at fault? > > well, at first, let memtest86(+) run for some hours. second, check that your box does not get too hot. Crashes on stress are mostly overheating or PSU going bad. third, try a different PSU - the manufacturers like to use the cheapest components for this almost most important part of a computer, if possible try another one. fourth, check your board and cards for 'funny looking' condensators - like deformation, round tops, or even some brown 'dirt' at their base. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-19 23:07 ` Volker Armin Hemmann @ 2005-09-20 2:14 ` Dave Nebinger 2005-09-20 2:26 ` Volker Armin Hemmann 0 siblings, 1 reply; 17+ messages in thread From: Dave Nebinger @ 2005-09-20 2:14 UTC (permalink / raw To: gentoo-user; +Cc: Kris Kerwin > well, at first, let memtest86(+) run for some hours. Volker's got a good point here... > second, check that your box does not get too hot. Crashes on stress are > mostly > overheating or PSU going bad. Mentioned that to him about the heat... Kris, were you able to get lm_sensors running on the box? > third, try a different PSU - the manufacturers like to use the cheapest > components for this almost most important part of a computer, if possible > try > another one. > > fourth, check your board and cards for 'funny looking' condensators - like > deformation, round tops, or even some brown 'dirt' at their base. This I think will be hard for him, Volker, as I believe he's running a laptop. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-20 2:14 ` Dave Nebinger @ 2005-09-20 2:26 ` Volker Armin Hemmann 0 siblings, 0 replies; 17+ messages in thread From: Volker Armin Hemmann @ 2005-09-20 2:26 UTC (permalink / raw To: gentoo-user On Tuesday 20 September 2005 04:14, Dave Nebinger wrote: > > fourth, check your board and cards for 'funny looking' condensators - > > like deformation, round tops, or even some brown 'dirt' at their base. > > This I think will be hard for him, Volker, as I believe he's running a > laptop. In that case, just open, what can be opened and see if there is any suspicious capacitor. -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 18:42 ` Kris Kerwin 2005-09-17 19:01 ` Dave Nebinger @ 2005-09-17 19:10 ` Jonathan Wright 2005-09-17 19:34 ` Kris Kerwin 1 sibling, 1 reply; 17+ messages in thread From: Jonathan Wright @ 2005-09-17 19:10 UTC (permalink / raw To: gentoo-user Kris Kerwin wrote: > I've tried catting the output from dmesg and running it regularly with > crontab, as was advised below. This, unfortunately doesn't work because cron > can only run as often as once a minute. This means that if a crash happens in > between these dmesg snapshots, the debugging information is lost. The only > way that catting dmesg to a file will work is if the crash just so happens to > occur right as dmesg is being logged. I might be able to increase my chances > if there was anyway to set up vixie-cron to run more often than once a minute > (once a second? more?) Why not run a bash script, something like (not tested or debugged! And I can't remember how to do a while loop in bash;) while true; do if [ -e /tmp/stopdmesg ]; then exit; else dmesg > dmesg-$(date +%Y%m%d%H%m%s) sleep(5) fi done Open up your terminal and run the script (and append & to send it to the background). If needs be, change sleep(5) to as low as you need to get the dmesg information. -- Jonathan Wright ~ mail at djnauk.co.uk ~ www.djnauk.co.uk -- 2.6.12-gentoo-r6-djnauk-b2 AMD Athlon(tm) XP 2100+ up 1 day, 9:03, 3 users, load average: 3.62, 2.94, 2.42 -- "The Bible contains six admonishments to homosexuals and three hundred sixty two admonishments to heterosexuals. That doesn't mean that God doesn't love heterosexuals. It's just that they need more supervision." ~ Lynne Lavner -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 19:10 ` Jonathan Wright @ 2005-09-17 19:34 ` Kris Kerwin 2005-09-17 22:24 ` Jonathan Wright 0 siblings, 1 reply; 17+ messages in thread From: Kris Kerwin @ 2005-09-17 19:34 UTC (permalink / raw To: gentoo-user; +Cc: Jonathan Wright [-- Attachment #1: Type: text/plain, Size: 1959 bytes --] Thanks Jonathan. Anyone have any thoughts on this? I'm not a bash or any other programmer, and was wondering if this would work. And how might I code that while loop? Thanks again, all, for your help. Kris On Saturday 17 September 2005 14:10, Jonathan Wright wrote: > Kris Kerwin wrote: > > I've tried catting the output from dmesg and running it regularly with > > crontab, as was advised below. This, unfortunately doesn't work because > > cron can only run as often as once a minute. This means that if a crash > > happens in between these dmesg snapshots, the debugging information is > > lost. The only way that catting dmesg to a file will work is if the crash > > just so happens to occur right as dmesg is being logged. I might be able > > to increase my chances if there was anyway to set up vixie-cron to run > > more often than once a minute (once a second? more?) > > Why not run a bash script, something like (not tested or debugged! And I > can't remember how to do a while loop in bash;) > > while true; do > if [ -e /tmp/stopdmesg ]; then > exit; > else > dmesg > dmesg-$(date +%Y%m%d%H%m%s) > sleep(5) > fi > done > > Open up your terminal and run the script (and append & to send it to the > background). If needs be, change sleep(5) to as low as you need to get the > dmesg information. > > -- > Jonathan Wright ~ mail at djnauk.co.uk > ~ www.djnauk.co.uk > -- > 2.6.12-gentoo-r6-djnauk-b2 AMD Athlon(tm) XP 2100+ > up 1 day, 9:03, 3 users, load average: 3.62, 2.94, 2.42 > -- > "The Bible contains six admonishments to homosexuals and three > hundred sixty two admonishments to heterosexuals. That doesn't > mean that God doesn't love heterosexuals. It's just that they > need more supervision." > > ~ Lynne Lavner [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Random Kernel Crashes ... Need more info 2005-09-17 19:34 ` Kris Kerwin @ 2005-09-17 22:24 ` Jonathan Wright 0 siblings, 0 replies; 17+ messages in thread From: Jonathan Wright @ 2005-09-17 22:24 UTC (permalink / raw To: gentoo-user; +Cc: kkerwin Kris Kerwin wrote: > Thanks Jonathan. > > Anyone have any thoughts on this? I'm not a bash or any other programmer, and > was wondering if this would work. And how might I code that while loop? Actually - the while loop was fine. I wrote that line and thought I can't do that! I'll have to look it up before I send it out - it was originally white (1) - but in doing so, I forgot to delete the statement. >>while true; do >> if [ -e /tmp/stopdmesg ]; then >> exit; >> else >> dmesg > dmesg-$(date +%Y%m%d%H%M%S) >> sleep(5) >> fi >>done In theory, the following code should do it: --cut------------------- #!/bin/bash if [ -z $1 ]; then echo "sleep time not given" exit fi while true; do if [ -e /tmp/stopdmesg ]; then exit; else dmesg > /tmp/dmesg-$(date +%Y%m%d%H%M%S) echo -n "." sleep $1 fi done --cut------------------- You can then run to program (say it's in a file called dcat) $ ./dcat 5 which will sleep for 5 seconds at a time, before outputting the dmesg contents to /tmp/dmesg-(time), (e.g. /tmp/dmesg-20050917231913) For each output, you'll see a period on screen, e.g. $ ./dcat 5 .................................. So you can track. But you can delete the 'echo -n "."' line if you want to stop that. Finally, to stop it, you can either kill the process, or create an empty file called stopdmesg in /tmp: $ touch /tmp/stopdmesg which will terminate the loop and the program. Hope that all helps and gets you the information your after! -- Jonathan Wright ~ mail at djnauk.co.uk ~ www.djnauk.co.uk -- 2.6.12-gentoo-r6-djnauk-b2 AMD Athlon(tm) XP 2100+ up 1 day, 12:08, 4 users, load average: 4.58, 2.76, 2.62 -- "Labels can also be misleading. I saw a news report about a lesbian protest march, and the reporter said, 'Coming up next, a lesbian demonstration.' My first thought was, 'Cool. I always wondered how those things work.'" ~ Michael Dane, Comedian -- gentoo-user@gentoo.org mailing list ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-09-20 2:31 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-07 18:36 [gentoo-user] Random Kernel Crashes ... Need more info Kris Kerwin 2005-09-07 18:53 ` Arturo 'Buanzo' Busleiman 2005-09-07 19:09 ` Arturo 'Buanzo' Busleiman 2005-09-07 18:55 ` gentuxx 2005-09-17 18:42 ` Kris Kerwin 2005-09-17 19:01 ` Dave Nebinger 2005-09-17 19:29 ` Kris Kerwin 2005-09-17 19:52 ` Dave Nebinger 2005-09-18 0:20 ` Kris Kerwin 2005-09-19 2:09 ` Dave Nebinger 2005-09-19 22:56 ` Kris Kerwin 2005-09-19 23:07 ` Volker Armin Hemmann 2005-09-20 2:14 ` Dave Nebinger 2005-09-20 2:26 ` Volker Armin Hemmann 2005-09-17 19:10 ` Jonathan Wright 2005-09-17 19:34 ` Kris Kerwin 2005-09-17 22:24 ` Jonathan Wright
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox