* [gentoo-user] log messages
@ 2010-02-16 22:36 Harry Putnam
2010-02-16 23:00 ` Alan McKinnon
0 siblings, 1 reply; 6+ messages in thread
From: Harry Putnam @ 2010-02-16 22:36 UTC (permalink / raw
To: gentoo-user
Hundreds, maybe thousands of lines like this (wrapped for mail):
Feb 16 09:38:47 reader kernel: [162289.090685] usb 4-2.1:1.1: uevent
Feb 16 09:38:48 reader kernel: [162289.467065] hdc: status error:
status=0x00 { }
Feb 16 09:38:48 reader kernel: [162289.467071] hdc: possibly failed
opcode: 0xa0
Feb 16 09:38:48 reader kernel: [162289.467079] ide-atapi: hdc:
Strange, packet command initiated yet DRQ isn't asserted
When I noticed this output involving the cdrom I wondered if I might
have left something in it but that was not the case.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] log messages
2010-02-16 22:36 [gentoo-user] log messages Harry Putnam
@ 2010-02-16 23:00 ` Alan McKinnon
2010-02-17 6:49 ` [gentoo-user] " Harry Putnam
0 siblings, 1 reply; 6+ messages in thread
From: Alan McKinnon @ 2010-02-16 23:00 UTC (permalink / raw
To: gentoo-user
On Wednesday 17 February 2010 00:36:42 Harry Putnam wrote:
> Hundreds, maybe thousands of lines like this (wrapped for mail):
>
> Feb 16 09:38:47 reader kernel: [162289.090685] usb 4-2.1:1.1: uevent
>
> Feb 16 09:38:48 reader kernel: [162289.467065] hdc: status error:
> status=0x00 { }
>
> Feb 16 09:38:48 reader kernel: [162289.467071] hdc: possibly failed
> opcode: 0xa0
>
> Feb 16 09:38:48 reader kernel: [162289.467079] ide-atapi: hdc:
> Strange, packet command initiated yet DRQ isn't asserted
>
> When I noticed this output involving the cdrom I wondered if I might
> have left something in it but that was not the case.
Do you have hal configured to poll your cdrom drive every two seconds, to see
if a disk is inserted? And if so, is the verbosity logging cranked up way
higher than it should be?
I haven't personally had to fix this myself (so can't give pointers on where
to fix it), but it seems to be a common occurrence judging from posts I see
here and at other forums.
--
alan dot mckinnon at gmail dot com
^ permalink raw reply [flat|nested] 6+ messages in thread
* [gentoo-user] Re: log messages
2010-02-16 23:00 ` Alan McKinnon
@ 2010-02-17 6:49 ` Harry Putnam
2010-02-17 8:47 ` Alan McKinnon
2010-02-17 18:51 ` Jörg Schaible
0 siblings, 2 replies; 6+ messages in thread
From: Harry Putnam @ 2010-02-17 6:49 UTC (permalink / raw
To: gentoo-user
Alan McKinnon <alan.mckinnon@gmail.com> writes:
> On Wednesday 17 February 2010 00:36:42 Harry Putnam wrote:
>> Hundreds, maybe thousands of lines like this (wrapped for mail):
>>
>> Feb 16 09:38:47 reader kernel: [162289.090685] usb 4-2.1:1.1: uevent
>>
>> Feb 16 09:38:48 reader kernel: [162289.467065] hdc: status error:
>> status=0x00 { }
>>
>> Feb 16 09:38:48 reader kernel: [162289.467071] hdc: possibly failed
>> opcode: 0xa0
>>
>> Feb 16 09:38:48 reader kernel: [162289.467079] ide-atapi: hdc:
>> Strange, packet command initiated yet DRQ isn't asserted
>>
>> When I noticed this output involving the cdrom I wondered if I might
>> have left something in it but that was not the case.
>
> Do you have hal configured to poll your cdrom drive every two seconds, to see
> if a disk is inserted? And if so, is the verbosity logging cranked up way
> higher than it should be?
>
> I haven't personally had to fix this myself (so can't give pointers on where
> to fix it), but it seems to be a common occurrence judging from posts I see
> here and at other forums.
I do have hald running, but made no special config regarding cdrom
polling. At least not on purpose.
The messages do appear to be continuous. I will execute a reboot soon
but don't want to right now.
Why I'm pondering and following this up, is that I experience a
serious freeze after some unspecified amount of uptime. Mouse and
keyboard become unresponsive... and eventually the OS cannot be
accessed at all.
SSH appears to stop and cannot contact remotely either.
This began happening quite some time ago... on a different earlier
install. I never could see anything in the logs that gave a clue to
why.
I created a script that ran from cron. It pinged a remote host, and
logged a unique easily findable string into the log using `logger',
every 5 minutes. With that I was able to narrow down the time frame
of freeze to within the last 5 minutes (of log lines).
Even then, there was nothing to indicate a problem. This was an OS
that had been running a very long time with upgrade after upgrade.
Though I hated having to rebuild all the customizations etc, I finally
completely reinstalled from scratch hoping to catch the problem with
the shotgun approach.
In that earlier OS there were no log messages regarding hdc being
generated (by the way).
Shortly after completing the new install and a couple of weeks of
getting setup the way I wanted, I began to experience the freezes
again.
I have caught the freeze in the early stages before completely losing
the network when just mouse and keyboard became unresponsive, was able
to ssh in and noticed that restarting hald held off the freeze for
some (again unspecified) amount of time.
So cutting the lengthy narrative down a bit, and briefly put, I'm
looking for anything unusual that is causing this. The hdc messages
is the only odd thing I'm seeing.
Something appears to be jamming up the hal layer somehow, but not
leaving findable tracks. At least not findable by an someone with
many yrs experience with linux but not much real debugging of
complicated problems under his belt.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Re: log messages
2010-02-17 6:49 ` [gentoo-user] " Harry Putnam
@ 2010-02-17 8:47 ` Alan McKinnon
2010-02-17 14:32 ` Harry Putnam
2010-02-17 18:51 ` Jörg Schaible
1 sibling, 1 reply; 6+ messages in thread
From: Alan McKinnon @ 2010-02-17 8:47 UTC (permalink / raw
To: gentoo-user
On Wednesday 17 February 2010 08:49:28 Harry Putnam wrote:
> I have caught the freeze in the early stages before completely losing
> the network when just mouse and keyboard became unresponsive, was able
> to ssh in and noticed that restarting hald held off the freeze for
> some (again unspecified) amount of time.
>
> So cutting the lengthy narrative down a bit, and briefly put, I'm
> looking for anything unusual that is causing this. The hdc messages
> is the only odd thing I'm seeing.
>
> Something appears to be jamming up the hal layer somehow, but not
> leaving findable tracks. At least not findable by an someone with
> many yrs experience with linux but not much real debugging of
> complicated problems under his belt.
You say the box runs ssh, implying that other hosts are nearby, so what I
would suggest is to configure your syslogger to send all logs to another host
and have that host write them to a known location.
I find that machines that freeze often still send logs to syslog properly
right up to the moment of the freeze, but these do not get written to disk as
IO is blocked. Then we restart the box, guaranteeing that the logs are lost
:-)
Remote logging and just leave it till the machine freezes again will hopefully
give you the useful logs you need to identify the problem. To save disk space
you can configure logrotate on the remote logger to delete the previous days
stuff - you don't need logs from days where the box was working fine.
Another option is to look at the pattern here: one day out of the blue a
stable system developed problems and they still surface at random times. This
is one of the characteristics of failing hardware. Have you done a full
thorough hardware test, including such things as memtest and smart?
--
alan dot mckinnon at gmail dot com
^ permalink raw reply [flat|nested] 6+ messages in thread
* [gentoo-user] Re: log messages
2010-02-17 8:47 ` Alan McKinnon
@ 2010-02-17 14:32 ` Harry Putnam
0 siblings, 0 replies; 6+ messages in thread
From: Harry Putnam @ 2010-02-17 14:32 UTC (permalink / raw
To: gentoo-user
Alan McKinnon <alan.mckinnon@gmail.com> writes:
> Remote logging and just leave it till the machine freezes again will
> hopefully give you the useful logs you need to identify the
> problem. To save disk space you can configure logrotate on the
> remote logger to delete the previous days stuff - you don't need
> logs from days where the box was working fine.
Thanks, that may be worth a try... I wonder if with rsyslog (my logger
of choice) it may be possible to log to localhost as well as remote?
I think I'll look into that too.
> Another option is to look at the pattern here: one day out of the
> blue a stable system developed problems and they still surface at
> random times. This is one of the characteristics of failing
> hardware. Have you done a full thorough hardware test, including
> such things as memtest and smart?
I agree that it sounds like hardware but even then some log tracks
should appear right? (Maybe I'll see them with the remote logging
suggestion)
> . . . . . . . . . . . . . . . . . . . . . .Have you done a full
> thorough hardware test
Haven't done the memtest or smart
But far as `full'; what other tests might I try?
ps - I did find some reiserfs errors and currently running
reiserfsck --rebuild-tree
On that (now umounted) disk after a full backup, so maybe that is
related and will cure the problem (fingers crossed hard)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [gentoo-user] Re: log messages
2010-02-17 6:49 ` [gentoo-user] " Harry Putnam
2010-02-17 8:47 ` Alan McKinnon
@ 2010-02-17 18:51 ` Jörg Schaible
1 sibling, 0 replies; 6+ messages in thread
From: Jörg Schaible @ 2010-02-17 18:51 UTC (permalink / raw
To: gentoo-user
Hi Harry,
Harry Putnam wrote:
> Alan McKinnon <alan.mckinnon@gmail.com> writes:
>
>> On Wednesday 17 February 2010 00:36:42 Harry Putnam wrote:
>>> Hundreds, maybe thousands of lines like this (wrapped for mail):
>>>
>>> Feb 16 09:38:47 reader kernel: [162289.090685] usb 4-2.1:1.1: uevent
>>>
>>> Feb 16 09:38:48 reader kernel: [162289.467065] hdc: status error:
>>> status=0x00 { }
>>>
>>> Feb 16 09:38:48 reader kernel: [162289.467071] hdc: possibly failed
>>> opcode: 0xa0
>>>
>>> Feb 16 09:38:48 reader kernel: [162289.467079] ide-atapi: hdc:
>>> Strange, packet command initiated yet DRQ isn't asserted
>>>
>>> When I noticed this output involving the cdrom I wondered if I might
>>> have left something in it but that was not the case.
>>
>> Do you have hal configured to poll your cdrom drive every two seconds, to
>> see if a disk is inserted? And if so, is the verbosity logging cranked up
>> way higher than it should be?
>>
>> I haven't personally had to fix this myself (so can't give pointers on
>> where to fix it), but it seems to be a common occurrence judging from
>> posts I see here and at other forums.
>
> I do have hald running, but made no special config regarding cdrom
> polling. At least not on purpose.
>
> The messages do appear to be continuous. I will execute a reboot soon
> but don't want to right now.
>
> Why I'm pondering and following this up, is that I experience a
> serious freeze after some unspecified amount of uptime. Mouse and
> keyboard become unresponsive... and eventually the OS cannot be
> accessed at all.
>
> SSH appears to stop and cannot contact remotely either.
>
> This began happening quite some time ago... on a different earlier
> install. I never could see anything in the logs that gave a clue to
> why.
>
> I created a script that ran from cron. It pinged a remote host, and
> logged a unique easily findable string into the log using `logger',
> every 5 minutes. With that I was able to narrow down the time frame
> of freeze to within the last 5 minutes (of log lines).
>
> Even then, there was nothing to indicate a problem. This was an OS
> that had been running a very long time with upgrade after upgrade.
>
> Though I hated having to rebuild all the customizations etc, I finally
> completely reinstalled from scratch hoping to catch the problem with
> the shotgun approach.
>
> In that earlier OS there were no log messages regarding hdc being
> generated (by the way).
>
> Shortly after completing the new install and a couple of weeks of
> getting setup the way I wanted, I began to experience the freezes
> again.
>
> I have caught the freeze in the early stages before completely losing
> the network when just mouse and keyboard became unresponsive, was able
> to ssh in and noticed that restarting hald held off the freeze for
> some (again unspecified) amount of time.
>
> So cutting the lengthy narrative down a bit, and briefly put, I'm
> looking for anything unusual that is causing this. The hdc messages
> is the only odd thing I'm seeing.
>
> Something appears to be jamming up the hal layer somehow, but not
> leaving findable tracks. At least not findable by an someone with
> many yrs experience with linux but not much real debugging of
> complicated problems under his belt.
I had once similar freezes and effects until I recognized that our rabbit
had bitten into an USB cable and the computer got undefined signals on the
USB due to contacts of the blank cable lines.
Try to disconnect any external USB device first and check if the problem
persists.
- Jörg
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-02-17 18:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-16 22:36 [gentoo-user] log messages Harry Putnam
2010-02-16 23:00 ` Alan McKinnon
2010-02-17 6:49 ` [gentoo-user] " Harry Putnam
2010-02-17 8:47 ` Alan McKinnon
2010-02-17 14:32 ` Harry Putnam
2010-02-17 18:51 ` Jörg Schaible
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox