public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Processes hang - system dies
@ 2013-01-05 14:05 Robin Atwood
  2013-01-08  1:20 ` Adam Carter
  0 siblings, 1 reply; 4+ messages in thread
From: Robin Atwood @ 2013-01-05 14:05 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 5534 bytes --]

I have a very severe problem after a recent disk replacement. After a few days 
running, all new processes just hang. The kernel reports:

Jan  5 02:25:36 opal kernel: INFO: task mysqld:11387 blocked for more than 120 
seconds.
Jan  5 02:25:36 opal kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  5 02:25:36 opal kernel: mysqld          D 0000000000000000     0 11387      
1 0x00000000
Jan  5 02:25:36 opal kernel: ffff880012caccc0 0000000000000082 
0000000000011280 ffff88012f08c660
Jan  5 02:25:36 opal kernel: 0000000000011280 ffff88012920dfd8 
0000000000011280 ffff88012920c010
Jan  5 02:25:36 opal kernel: ffff88012920dfd8 0000000000011280 
ffff880012caccc0 0000000000011280
Jan  5 02:25:36 opal kernel: Call Trace:
Jan  5 02:25:36 opal kernel: [<ffffffff810b9caf>] ? 
find_get_pages_tag+0xef/0x1a0
Jan  5 02:25:36 opal kernel: [<ffffffff8102c455>] ? 
default_spin_lock_flags+0x5/0x10
Jan  5 02:25:36 opal kernel: [<ffffffff8143401b>] ? 
_raw_spin_lock_irqsave+0x3b/0x60
Jan  5 02:25:36 opal kernel: [<ffffffff81046ec3>] ? lock_timer_base+0x33/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff8107a609>] ? 
debug_mutex_add_waiter+0x29/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff814312cf>] ? 
__mutex_lock_slowpath+0x22f/0x310
Jan  5 02:25:36 opal kernel: [<ffffffff8102c455>] ? 
default_spin_lock_flags+0x5/0x10
Jan  5 02:25:36 opal kernel: [<ffffffff8143401b>] ? 
_raw_spin_lock_irqsave+0x3b/0x60
Jan  5 02:25:36 opal kernel: [<ffffffff8118cd81>] ? queue_log_writer+0x91/0xe0
Jan  5 02:25:36 opal kernel: [<ffffffff81066a80>] ? try_to_wake_up+0x2b0/0x2b0
Jan  5 02:25:36 opal kernel: [<ffffffff81192e10>] ? 
reiserfs_commit_for_inode+0x140/0x230
Jan  5 02:25:36 opal kernel: [<ffffffff81179e87>] ? 
reiserfs_sync_file+0x97/0x120
Jan  5 02:25:36 opal kernel: [<ffffffff811290b1>] ? do_fsync+0x31/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff810ff76c>] ? sys_pwrite64+0x7c/0xb0
Jan  5 02:25:36 opal kernel: [<ffffffff8112911b>] ? sys_fsync+0xb/0x20
Jan  5 02:25:36 opal kernel: [<ffffffff81434a39>] ? 
system_call_fastpath+0x16/0x1b
Jan  5 02:25:36 opal kernel: INFO: task kworker/1:1:27685 blocked for more 
than 120 seconds.
Jan  5 02:25:36 opal kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  5 02:25:36 opal kernel: kworker/1:1     D ffff880005ee5980     0 27685      
2 0x00000000
Jan  5 02:25:36 opal kernel: ffff880005ee5980 0000000000000046 
0000000000000000 ffff880128354660
Jan  5 02:25:36 opal kernel: 0000000000011280 ffff8801018e3fd8 
0000000000011280 ffff8801018e2010
Jan  5 02:25:36 opal kernel: ffff8801018e3fd8 0000000000011280 
ffff880005ee5980 0000000000011280
Jan  5 02:25:36 opal kernel: Call Trace:
Jan  5 02:25:36 opal kernel: [<ffffffff81066a80>] ? try_to_wake_up+0x2b0/0x2b0
Jan  5 02:25:36 opal kernel: [<ffffffff8106ac92>] ? load_balance+0x102/0x790
Jan  5 02:25:36 opal kernel: [<ffffffff8105fbd0>] ? __wake_up_common+0x50/0x80
Jan  5 02:25:36 opal kernel: [<ffffffff8107a609>] ? 
debug_mutex_add_waiter+0x29/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff814312cf>] ? 
__mutex_lock_slowpath+0x22f/0x310
Jan  5 02:25:36 opal kernel: [<ffffffff8107a609>] ? 
debug_mutex_add_waiter+0x29/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff814312cf>] ? 
__mutex_lock_slowpath+0x22f/0x310
Jan  5 02:25:36 opal kernel: [<ffffffff8102c455>] ? 
default_spin_lock_flags+0x5/0x10
Jan  5 02:25:36 opal kernel: [<ffffffff8143401b>] ? 
_raw_spin_lock_irqsave+0x3b/0x60
Jan  5 02:25:36 opal kernel: [<ffffffff8118cd81>] ? queue_log_writer+0x91/0xe0
Jan  5 02:25:36 opal kernel: [<ffffffff81066a80>] ? try_to_wake_up+0x2b0/0x2b0
Jan  5 02:25:36 opal kernel: [<ffffffff8118f3f6>] ? do_journal_end+0x1d6/0xf00
Jan  5 02:25:36 opal kernel: [<ffffffff8117ff20>] ? reiserfs_sync_fs+0x70/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff8117ff00>] ? reiserfs_sync_fs+0x50/0x70
Jan  5 02:25:36 opal kernel: [<ffffffff8117ff5e>] ? 
flush_old_commits+0x3e/0x60
Jan  5 02:25:36 opal kernel: [<ffffffff8105054c>] ? 
process_one_work+0x14c/0x450
Jan  5 02:25:36 opal kernel: [<ffffffff81050c8f>] ? worker_thread+0x13f/0x4d0
Jan  5 02:25:36 opal kernel: [<ffffffff81050b50>] ? manage_workers+0x300/0x300
Jan  5 02:25:36 opal kernel: [<ffffffff81050b50>] ? manage_workers+0x300/0x300
Jan  5 02:25:36 opal kernel: [<ffffffff810578de>] ? kthread+0x9e/0xb0
Jan  5 02:25:36 opal kernel: [<ffffffff81435ac4>] ? 
kernel_thread_helper+0x4/0x10
Jan  5 02:25:36 opal kernel: [<ffffffff81057840>] ? 
kthread_freezable_should_stop+0x60/0x60
Jan  5 02:25:36 opal kernel: [<ffffffff81435ac0>] ? gs_change+0x13/0x13

I think it only occurs when I am using the machine in graphic mode (NVidia 
binary drivers) but am not positive. I have rebuilt the system assuming some 
corruption after the disk restore and built a new kernel but it makes no 
difference. The only sure thing is this never happened before the new disk; 
the trace-backs do seem to indicate it's trying to write but I did manage to 
write a small file to a partition, so the file-system seems OK. Once this 
happens the system is toast, sync, reboot and umount commands just hang, only 
Alt-Sysrq-B does anything. I would be grateful for any suggestions!

TIA
-Robin
-- 
----------------------------------------------------------------------
Robin Atwood.

"Ship me somewheres east of Suez, where the best is like the worst,
 Where there ain't no Ten Commandments an' a man can raise a thirst"
         from "Mandalay" by Rudyard Kipling
----------------------------------------------------------------------









[-- Attachment #2: Type: text/html, Size: 17150 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-user] Processes hang - system dies
  2013-01-05 14:05 [gentoo-user] Processes hang - system dies Robin Atwood
@ 2013-01-08  1:20 ` Adam Carter
  2013-01-08 11:43   ` Robin Atwood
  0 siblings, 1 reply; 4+ messages in thread
From: Adam Carter @ 2013-01-08  1:20 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On Sun, Jan 6, 2013 at 1:05 AM, Robin Atwood <robin.atwood@attglobal.net>wrote:

> **
>
> I have a very severe problem after a recent disk replacement. After a few
> days running, all new processes just hang. The kernel reports:
>
>
My guess is disk failing or kernel bug. Install smartmontools and see if
smartctl -H <devicename> returns anything interesting.

What kernel are you using? Try 3.7.1 if you're not already using that.

[-- Attachment #2: Type: text/html, Size: 1062 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-user] Processes hang - system dies
  2013-01-08  1:20 ` Adam Carter
@ 2013-01-08 11:43   ` Robin Atwood
  2013-01-08 11:54     ` Kevin Chadwick
  0 siblings, 1 reply; 4+ messages in thread
From: Robin Atwood @ 2013-01-08 11:43 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

On Tuesday 08 January 2013, Adam Carter wrote:
> On Sun, Jan 6, 2013 at 1:05 AM, Robin Atwood 
<robin.atwood@attglobal.net>wrote:
> > **
> > 
> > I have a very severe problem after a recent disk replacement. After a few
> 
> > days running, all new processes just hang. The kernel reports:
> My guess is disk failing or kernel bug. Install smartmontools and see if
> smartctl -H <devicename> returns anything interesting.
> 
> What kernel are you using? Try 3.7.1 if you're not already using that.

That's my feeling too, since smartd is reporting sectors failing by the dozen. 
However the smartctl -H test gave me a clean bill of health. The kernel is 
3.6.8, I have already upgraded with no improvement.

Cheers
-Robin
-- 
----------------------------------------------------------------------
Robin Atwood.

"Ship me somewheres east of Suez, where the best is like the worst,
 Where there ain't no Ten Commandments an' a man can raise a thirst"
         from "Mandalay" by Rudyard Kipling
----------------------------------------------------------------------









[-- Attachment #2: Type: text/html, Size: 6301 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-user] Processes hang - system dies
  2013-01-08 11:43   ` Robin Atwood
@ 2013-01-08 11:54     ` Kevin Chadwick
  0 siblings, 0 replies; 4+ messages in thread
From: Kevin Chadwick @ 2013-01-08 11:54 UTC (permalink / raw
  To: gentoo-user

> > > **
> > > 
> > > I have a very severe problem after a recent disk replacement. After a few  
> >   
> > > days running, all new processes just hang. The kernel reports:  
> > My guess is disk failing or kernel bug. Install smartmontools and see if
> > smartctl -H <devicename> returns anything interesting.
> > 
> > What kernel are you using? Try 3.7.1 if you're not already using that.  
> 
> That's my feeling too, since smartd is reporting sectors failing by the dozen. 
> However the smartctl -H test gave me a clean bill of health. The kernel is 
> 3.6.8, I have already upgraded with no improvement.

Personally I wouldn't try changing anything initially if it worked
before the disk change.

I would try a read-write test of the disk or use dd to write or read
many sectors possibly under >1 OS and machine depending on what
happens. Is SMART enabled in your BIOS?

-- 
_______________________________________________________________________

'Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface'

(Doug McIlroy)
_______________________________________________________________________


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-01-08 11:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-05 14:05 [gentoo-user] Processes hang - system dies Robin Atwood
2013-01-08  1:20 ` Adam Carter
2013-01-08 11:43   ` Robin Atwood
2013-01-08 11:54     ` Kevin Chadwick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox