From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id E98371382DE for ; Sat, 5 Jan 2013 14:07:01 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 1B87C21C02D; Sat, 5 Jan 2013 14:06:48 +0000 (UTC) Received: from kcout02.prserv.net (kcout02.prserv.net [12.154.55.32]) by pigeon.gentoo.org (Postfix) with ESMTP id D374721C02D for ; Sat, 5 Jan 2013 14:05:38 +0000 (UTC) Received: from opal.binro.org (node-whu.pool-125-24.dynamic.totbb.net[125.24.164.130]) by prserv.net (kcout02) with ESMTP id <2013010514053620200sbu3oe> (Authid: gbinet.atwoodr); Sat, 5 Jan 2013 14:05:37 +0000 X-Originating-IP: [125.24.164.130] Received: from opal.binro.org (localhost.localdomain [127.0.0.1]) by opal.binro.org (8.14.6/8.14.4) with ESMTP id r05E5VVS018226 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 5 Jan 2013 21:05:31 +0700 Received: (from robin@localhost) by opal.binro.org (8.14.6/8.14.4/Submit) id r05E5VPH018225 for gentoo-user@lists.gentoo.org; Sat, 5 Jan 2013 21:05:31 +0700 X-Authentication-Warning: opal.binro.org: robin set sender to robin.atwood@attglobal.net using -f From: Robin Atwood To: gentoo-user@lists.gentoo.org Subject: [gentoo-user] Processes hang - system dies Date: Sat, 5 Jan 2013 21:05:30 +0700 User-Agent: KMail/1.13.7 (Linux/3.6.8-gentoo; KDE/4.9.4; x86_64; ; ) X-Face: .c^^1Tm5bSr;@/t2T;-0HM`{~wj)F]2C]Zr#!Ig5fi&$LV1E^;5jL{]08F@tj{f3,U( =?utf-8?q?I=5B9=0A=09=3B7R4jB8A7=7Cmw7=7BK=5COYFzCL=5Fe/tAb?=)0_@07[e.}H`OE*na@ =?utf-8?q?7m=3DOp1=2Es0v3=5F3*=7C=3F=23l=7CXD=7Dn*=0A=09ARBV?=@IdaVd!V&bo;Z/TEb}oJi_(}3VOa^tj;$zlk96>K*hb> =?utf-8?q?PYbe6J=60=277qh=60=3Fm!!/k=5Dezl=0A=09=5FVIifMR=234kg*?="'n/S&^4@4: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="Boundary-01=_qMD6Q9BWdSJo8bR" Content-Transfer-Encoding: 7bit Message-Id: <201301052105.30959.robin.atwood@attglobal.net> X-Archives-Salt: fad6dfb8-51c6-4e70-8c85-86538f03fc9d X-Archives-Hash: a770aa3f331766e91aba76e7972bb017 --Boundary-01=_qMD6Q9BWdSJo8bR Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I have a very severe problem after a recent disk replacement. After a few days running, all new processes just hang. The kernel reports: Jan 5 02:25:36 opal kernel: INFO: task mysqld:11387 blocked for more than 120 seconds. Jan 5 02:25:36 opal kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 5 02:25:36 opal kernel: mysqld D 0000000000000000 0 11387 1 0x00000000 Jan 5 02:25:36 opal kernel: ffff880012caccc0 0000000000000082 0000000000011280 ffff88012f08c660 Jan 5 02:25:36 opal kernel: 0000000000011280 ffff88012920dfd8 0000000000011280 ffff88012920c010 Jan 5 02:25:36 opal kernel: ffff88012920dfd8 0000000000011280 ffff880012caccc0 0000000000011280 Jan 5 02:25:36 opal kernel: Call Trace: Jan 5 02:25:36 opal kernel: [] ? find_get_pages_tag+0xef/0x1a0 Jan 5 02:25:36 opal kernel: [] ? default_spin_lock_flags+0x5/0x10 Jan 5 02:25:36 opal kernel: [] ? _raw_spin_lock_irqsave+0x3b/0x60 Jan 5 02:25:36 opal kernel: [] ? lock_timer_base+0x33/0x70 Jan 5 02:25:36 opal kernel: [] ? debug_mutex_add_waiter+0x29/0x70 Jan 5 02:25:36 opal kernel: [] ? __mutex_lock_slowpath+0x22f/0x310 Jan 5 02:25:36 opal kernel: [] ? default_spin_lock_flags+0x5/0x10 Jan 5 02:25:36 opal kernel: [] ? _raw_spin_lock_irqsave+0x3b/0x60 Jan 5 02:25:36 opal kernel: [] ? queue_log_writer+0x91/0xe0 Jan 5 02:25:36 opal kernel: [] ? try_to_wake_up+0x2b0/0x2b0 Jan 5 02:25:36 opal kernel: [] ? reiserfs_commit_for_inode+0x140/0x230 Jan 5 02:25:36 opal kernel: [] ? reiserfs_sync_file+0x97/0x120 Jan 5 02:25:36 opal kernel: [] ? do_fsync+0x31/0x70 Jan 5 02:25:36 opal kernel: [] ? sys_pwrite64+0x7c/0xb0 Jan 5 02:25:36 opal kernel: [] ? sys_fsync+0xb/0x20 Jan 5 02:25:36 opal kernel: [] ? system_call_fastpath+0x16/0x1b Jan 5 02:25:36 opal kernel: INFO: task kworker/1:1:27685 blocked for more than 120 seconds. Jan 5 02:25:36 opal kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 5 02:25:36 opal kernel: kworker/1:1 D ffff880005ee5980 0 27685 2 0x00000000 Jan 5 02:25:36 opal kernel: ffff880005ee5980 0000000000000046 0000000000000000 ffff880128354660 Jan 5 02:25:36 opal kernel: 0000000000011280 ffff8801018e3fd8 0000000000011280 ffff8801018e2010 Jan 5 02:25:36 opal kernel: ffff8801018e3fd8 0000000000011280 ffff880005ee5980 0000000000011280 Jan 5 02:25:36 opal kernel: Call Trace: Jan 5 02:25:36 opal kernel: [] ? try_to_wake_up+0x2b0/0x2b0 Jan 5 02:25:36 opal kernel: [] ? load_balance+0x102/0x790 Jan 5 02:25:36 opal kernel: [] ? __wake_up_common+0x50/0x80 Jan 5 02:25:36 opal kernel: [] ? debug_mutex_add_waiter+0x29/0x70 Jan 5 02:25:36 opal kernel: [] ? __mutex_lock_slowpath+0x22f/0x310 Jan 5 02:25:36 opal kernel: [] ? debug_mutex_add_waiter+0x29/0x70 Jan 5 02:25:36 opal kernel: [] ? __mutex_lock_slowpath+0x22f/0x310 Jan 5 02:25:36 opal kernel: [] ? default_spin_lock_flags+0x5/0x10 Jan 5 02:25:36 opal kernel: [] ? _raw_spin_lock_irqsave+0x3b/0x60 Jan 5 02:25:36 opal kernel: [] ? queue_log_writer+0x91/0xe0 Jan 5 02:25:36 opal kernel: [] ? try_to_wake_up+0x2b0/0x2b0 Jan 5 02:25:36 opal kernel: [] ? do_journal_end+0x1d6/0xf00 Jan 5 02:25:36 opal kernel: [] ? reiserfs_sync_fs+0x70/0x70 Jan 5 02:25:36 opal kernel: [] ? reiserfs_sync_fs+0x50/0x70 Jan 5 02:25:36 opal kernel: [] ? flush_old_commits+0x3e/0x60 Jan 5 02:25:36 opal kernel: [] ? process_one_work+0x14c/0x450 Jan 5 02:25:36 opal kernel: [] ? worker_thread+0x13f/0x4d0 Jan 5 02:25:36 opal kernel: [] ? manage_workers+0x300/0x300 Jan 5 02:25:36 opal kernel: [] ? manage_workers+0x300/0x300 Jan 5 02:25:36 opal kernel: [] ? kthread+0x9e/0xb0 Jan 5 02:25:36 opal kernel: [] ? kernel_thread_helper+0x4/0x10 Jan 5 02:25:36 opal kernel: [] ? kthread_freezable_should_stop+0x60/0x60 Jan 5 02:25:36 opal kernel: [] ? gs_change+0x13/0x13 I think it only occurs when I am using the machine in graphic mode (NVidia binary drivers) but am not positive. I have rebuilt the system assuming some corruption after the disk restore and built a new kernel but it makes no difference. The only sure thing is this never happened before the new disk; the trace-backs do seem to indicate it's trying to write but I did manage to write a small file to a partition, so the file-system seems OK. Once this happens the system is toast, sync, reboot and umount commands just hang, only Alt-Sysrq-B does anything. I would be grateful for any suggestions! TIA -Robin -- ---------------------------------------------------------------------- Robin Atwood. "Ship me somewheres east of Suez, where the best is like the worst, Where there ain't no Ten Commandments an' a man can raise a thirst" from "Mandalay" by Rudyard Kipling ---------------------------------------------------------------------- --Boundary-01=_qMD6Q9BWdSJo8bR Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: 7bit

I have a very severe problem after a recent disk replacement. After a few days running, all new processes just hang. The kernel reports:

 

Jan 5 02:25:36 opal kernel: INFO: task mysqld:11387 blocked for more than 120 seconds.

Jan 5 02:25:36 opal kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Jan 5 02:25:36 opal kernel: mysqld D 0000000000000000 0 11387 1 0x00000000

Jan 5 02:25:36 opal kernel: ffff880012caccc0 0000000000000082 0000000000011280 ffff88012f08c660

Jan 5 02:25:36 opal kernel: 0000000000011280 ffff88012920dfd8 0000000000011280 ffff88012920c010

Jan 5 02:25:36 opal kernel: ffff88012920dfd8 0000000000011280 ffff880012caccc0 0000000000011280

Jan 5 02:25:36 opal kernel: Call Trace:

Jan 5 02:25:36 opal kernel: [<ffffffff810b9caf>] ? find_get_pages_tag+0xef/0x1a0

Jan 5 02:25:36 opal kernel: [<ffffffff8102c455>] ? default_spin_lock_flags+0x5/0x10

Jan 5 02:25:36 opal kernel: [<ffffffff8143401b>] ? _raw_spin_lock_irqsave+0x3b/0x60

Jan 5 02:25:36 opal kernel: [<ffffffff81046ec3>] ? lock_timer_base+0x33/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff8107a609>] ? debug_mutex_add_waiter+0x29/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff814312cf>] ? __mutex_lock_slowpath+0x22f/0x310

Jan 5 02:25:36 opal kernel: [<ffffffff8102c455>] ? default_spin_lock_flags+0x5/0x10

Jan 5 02:25:36 opal kernel: [<ffffffff8143401b>] ? _raw_spin_lock_irqsave+0x3b/0x60

Jan 5 02:25:36 opal kernel: [<ffffffff8118cd81>] ? queue_log_writer+0x91/0xe0

Jan 5 02:25:36 opal kernel: [<ffffffff81066a80>] ? try_to_wake_up+0x2b0/0x2b0

Jan 5 02:25:36 opal kernel: [<ffffffff81192e10>] ? reiserfs_commit_for_inode+0x140/0x230

Jan 5 02:25:36 opal kernel: [<ffffffff81179e87>] ? reiserfs_sync_file+0x97/0x120

Jan 5 02:25:36 opal kernel: [<ffffffff811290b1>] ? do_fsync+0x31/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff810ff76c>] ? sys_pwrite64+0x7c/0xb0

Jan 5 02:25:36 opal kernel: [<ffffffff8112911b>] ? sys_fsync+0xb/0x20

Jan 5 02:25:36 opal kernel: [<ffffffff81434a39>] ? system_call_fastpath+0x16/0x1b

Jan 5 02:25:36 opal kernel: INFO: task kworker/1:1:27685 blocked for more than 120 seconds.

Jan 5 02:25:36 opal kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Jan 5 02:25:36 opal kernel: kworker/1:1 D ffff880005ee5980 0 27685 2 0x00000000

Jan 5 02:25:36 opal kernel: ffff880005ee5980 0000000000000046 0000000000000000 ffff880128354660

Jan 5 02:25:36 opal kernel: 0000000000011280 ffff8801018e3fd8 0000000000011280 ffff8801018e2010

Jan 5 02:25:36 opal kernel: ffff8801018e3fd8 0000000000011280 ffff880005ee5980 0000000000011280

Jan 5 02:25:36 opal kernel: Call Trace:

Jan 5 02:25:36 opal kernel: [<ffffffff81066a80>] ? try_to_wake_up+0x2b0/0x2b0

Jan 5 02:25:36 opal kernel: [<ffffffff8106ac92>] ? load_balance+0x102/0x790

Jan 5 02:25:36 opal kernel: [<ffffffff8105fbd0>] ? __wake_up_common+0x50/0x80

Jan 5 02:25:36 opal kernel: [<ffffffff8107a609>] ? debug_mutex_add_waiter+0x29/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff814312cf>] ? __mutex_lock_slowpath+0x22f/0x310

Jan 5 02:25:36 opal kernel: [<ffffffff8107a609>] ? debug_mutex_add_waiter+0x29/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff814312cf>] ? __mutex_lock_slowpath+0x22f/0x310

Jan 5 02:25:36 opal kernel: [<ffffffff8102c455>] ? default_spin_lock_flags+0x5/0x10

Jan 5 02:25:36 opal kernel: [<ffffffff8143401b>] ? _raw_spin_lock_irqsave+0x3b/0x60

Jan 5 02:25:36 opal kernel: [<ffffffff8118cd81>] ? queue_log_writer+0x91/0xe0

Jan 5 02:25:36 opal kernel: [<ffffffff81066a80>] ? try_to_wake_up+0x2b0/0x2b0

Jan 5 02:25:36 opal kernel: [<ffffffff8118f3f6>] ? do_journal_end+0x1d6/0xf00

Jan 5 02:25:36 opal kernel: [<ffffffff8117ff20>] ? reiserfs_sync_fs+0x70/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff8117ff00>] ? reiserfs_sync_fs+0x50/0x70

Jan 5 02:25:36 opal kernel: [<ffffffff8117ff5e>] ? flush_old_commits+0x3e/0x60

Jan 5 02:25:36 opal kernel: [<ffffffff8105054c>] ? process_one_work+0x14c/0x450

Jan 5 02:25:36 opal kernel: [<ffffffff81050c8f>] ? worker_thread+0x13f/0x4d0

Jan 5 02:25:36 opal kernel: [<ffffffff81050b50>] ? manage_workers+0x300/0x300

Jan 5 02:25:36 opal kernel: [<ffffffff81050b50>] ? manage_workers+0x300/0x300

Jan 5 02:25:36 opal kernel: [<ffffffff810578de>] ? kthread+0x9e/0xb0

Jan 5 02:25:36 opal kernel: [<ffffffff81435ac4>] ? kernel_thread_helper+0x4/0x10

Jan 5 02:25:36 opal kernel: [<ffffffff81057840>] ? kthread_freezable_should_stop+0x60/0x60

Jan 5 02:25:36 opal kernel: [<ffffffff81435ac0>] ? gs_change+0x13/0x13

 

I think it only occurs when I am using the machine in graphic mode (NVidia binary drivers) but am not positive. I have rebuilt the system assuming some corruption after the disk restore and built a new kernel but it makes no difference. The only sure thing is this never happened before the new disk; the trace-backs do seem to indicate it's trying to write but I did manage to write a small file to a partition, so the file-system seems OK. Once this happens the system is toast, sync, reboot and umount commands just hang, only Alt-Sysrq-B does anything. I would be grateful for any suggestions!

 

TIA

-Robin

--

----------------------------------------------------------------------

Robin Atwood.

 

"Ship me somewheres east of Suez, where the best is like the worst,

Where there ain't no Ten Commandments an' a man can raise a thirst"

from "Mandalay" by Rudyard Kipling

----------------------------------------------------------------------

 

 

 

 

 

 

 

 

 

--Boundary-01=_qMD6Q9BWdSJo8bR--