From: Mark Knecht <markknecht@gmail.com>
To: gentoo-amd64@lists.gentoo.org
Subject: Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found
Date: Thu, 1 Apr 2010 11:57:47 -0700 [thread overview]
Message-ID: <o2q5bdc1c8b1004011157if9fb419ey3a777f4fd3743c46@mail.gmail.com> (raw)
In-Reply-To: <pan.2010.03.31.06.56.05@cox.net>
A bit long in response. Sorry.
On Tue, Mar 30, 2010 at 11:56 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Mark Knecht posted on Tue, 30 Mar 2010 13:26:59 -0700 as excerpted:
>
>> I've set up a duplicate boot partition on sdb and it boots. However one
>> thing I'm unsure if when I change the hard drive boot does the old sdb
>> become the new sda because it's what got booted? Or is the order still
>> as it was? The answer determines what I do in grub.conf as to which
>> drive I'm trying to use. I can figure this out later by putting
>> something different on each drive and looking. Might be system/BIOS
>> dependent.
>
> That depends on your BIOS. My current system (the workstation, now 6+
> years old but still going strong as it was a $400+ server grade mobo) will
> boot from whatever disk I tell it to, but keeps the same BIOS disk order
> regardless -- unless I physically turn one or more of them off, of
> course. My previous system would always switch the chosen boot drive to
> be the first one. (I suppose it could be IDE vs. SATA as well, as the
> switcher was IDE, the stable one is SATA-1.)
>
> So that's something I guess you figure out for yourself. But it sounds
> like you're already well on your way...
>
It seems to be constant mapping meaning (I guess) that I need to
change the drive specs in grub.conf on the second drive to actually
use the second drive.
I made the titles for booting different for each grub.conf file to
ensure I was really getting grub from the second drive. My sda grub
boot menu says "2.6.33-gentoo booting from sda" on the first drive,
sdb on the second drive, etc.
<SNIP>
>
> The point being... it /is/ actually possible to verify that they're
> working well before you fdisk/mkfs and load data. Tho it does take
> awhile... days... on drives of modern size.
>
I'm trying badblocks right now on sdc. using
badblocks -v /dev/sdc
Maybe I need to do something more strenuous? It looks like it will be
done an an hour or two. (i7-920 with SATA drives so it should be fast,
as long as I'm not just reading the buffers or something like that.
Roughly speaking 1TB read at 100MB/S should take 10,000 seconds or 2.7
hours. I'm at 18% after 28 minutes so that seems about right. (With no
errors so far assuming I'm using the right command)
>>> 3) suspend the disks after a period of inactivity
>>
>> This could be part of what's going on, but I don't think it's the whole
>> story. My drives (WD Green 1TB drives) apparently park the heads after 8
>> seconds (yes 8 seconds!) of inactivity to save power. Each time it parks
>> it increments the Load_Cycle_Count SMART parameter. I've been tracking
>> this on the three drives in the system. The one I'm currently using is
>> incrementing while the 2 that sit unused until I get RAID going again
>> are not. Possibly there is something about how these drives come out of
>> park that creates large delays once in awhile.
>
> You may wish to take a second look at that, for an entirely /different/
> reason. If those are the ones I just googled on the WD site, they're
> rated 300K load/unload cycles. Take a look at your BIOS spin-down
> settings, and use hdparm to get a look at the disk's powersaving and
> spindown settings. You may wish to set the disks to something more
> reasonable, as with 8 second timeouts, that 300k cycles isn't going to
> last so long...
Very true. Here is the same drive model I put in a new machine for my
dad. It's been powered up and running Gentoo as a typical desktop
machine for about 50 days. He doesn't use it more than about an hour a
day on average. It's already hit 31K load/unload cycles. At 10% of
300K that about 1.5 years of life before I hit that spec. I've watched
his system a bit and his system seems to add 1 to the count almost
exactly every 2 minutes on average. Is that a common cron job maybe?
I looked up the spec on all three WD lines - Green, Blue and Black.
All three were 300K cycles. This issue has come up on the RAID list.
It seems that some other people are seeing this and aren't exactly
sure what Linux is doing to cause this.
I'll study hdparm and BIOS when I can reboot.
My dad's current data:
gandalf ~ # smartctl -A /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 129 128 021 Pre-fail
Always - 6525
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 21
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age
Always - 1183
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 20
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
Always - 5
193 Load_Cycle_Count 0x0032 190 190 000 Old_age
Always - 31240
194 Temperature_Celsius 0x0022 121 116 000 Old_age
Always - 26
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0
gandalf ~ #
>
> You may recall a couple years ago when Ubuntu accidentally shipped with
> laptop mode (or something, IDR the details) turned on by default, and
> people were watching their drives wear out before their eyes. That's
> effectively what you're doing, with an 8-second idle timeout. Most laptop
> drives (2.5" and 1.8") are designed for it. Most 3.5" desktop/server
> drives are NOT designed for that tight an idle timeout spec, and in fact,
> may well last longer spinning at idle overnight, as opposed to shutting
> down every day even.
>
> I'd at least look into it, as there's no use wearing the things out
> unnecessarily. Maybe you'll decide to let them run that way and save the
> power, but you'll know about the available choices then, at least.
>
Yeah, that's important. Thanks. If I can solve all these RAID problems
then maybe I'll look at adding RAID to his box with better drives or
something.
Note that on my system only I'm seeing real problems in
/var/log/message, non-RAID, like 1000's of these:
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 45276264 on sda3
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46309336 on sda3
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567488 on sda3
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567680 on sda3
or
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555752 on sda3
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555760 on sda3
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555768 on sda3
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555776 on sda3
However I see NONE of that on my dad's machine using the same drive
but different chipset.
The above problems seem to result in this sort of problem when I try
going with RAID as I tried again this monring:
INFO: task kjournald:5064 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff880028351580 0 5064 2 0x00000000
ffff8801ac91a190 0000000000000046 0000000000000000 ffffffff81067110
000000000000dcf8 ffff880180863fd8 0000000000011580 0000000000011580
ffff88014165ba20 ffff8801ac89a834 ffff8801af920150 ffff8801ac91a418
Call Trace:
[<ffffffff81067110>] ? __alloc_pages_nodemask+0xfa/0x58c
[<ffffffff8129174a>] ? md_make_request+0xde/0x119
[<ffffffff810a9576>] ? sync_buffer+0x0/0x40
[<ffffffff81334305>] ? io_schedule+0x3e/0x54
[<ffffffff810a95b1>] ? sync_buffer+0x3b/0x40
[<ffffffff81334789>] ? __wait_on_bit+0x41/0x70
[<ffffffff810a9576>] ? sync_buffer+0x0/0x40
[<ffffffff81334823>] ? out_of_line_wait_on_bit+0x6b/0x77
[<ffffffff81040a66>] ? wake_bit_function+0x0/0x23
[<ffffffff8111f400>] ? journal_commit_transaction+0xb56/0x1112
[<ffffffff81334280>] ? schedule+0x8f4/0x93b
[<ffffffff81335e3d>] ? _raw_spin_lock_irqsave+0x18/0x34
[<ffffffff81040a38>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff81335bcc>] ? _raw_spin_unlock_irqrestore+0x12/0x2c
[<ffffffff8112278c>] ? kjournald+0xe2/0x20a
[<ffffffff81040a38>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff811226aa>] ? kjournald+0x0/0x20a
[<ffffffff81040665>] ? kthread+0x79/0x81
[<ffffffff81002c94>] ? kernel_thread_helper+0x4/0x10
[<ffffffff810405ec>] ? kthread+0x0/0x81
[<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
Thanks,
Mark
next prev parent reply other threads:[~2010-04-01 19:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-28 17:14 [gentoo-amd64] RAID1 boot - no bootable media found Mark Knecht
2010-03-30 6:39 ` [gentoo-amd64] " Duncan
2010-03-30 13:56 ` Mark Knecht
2010-03-30 18:08 ` Duncan
2010-03-30 20:26 ` Mark Knecht
2010-03-31 6:56 ` Duncan
2010-04-01 18:57 ` Mark Knecht [this message]
2010-04-02 9:43 ` Duncan
2010-04-02 17:18 ` Mark Knecht
2010-04-03 23:13 ` Mark Knecht
2010-04-05 18:17 ` Mark Knecht
2010-04-06 14:00 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=o2q5bdc1c8b1004011157if9fb419ey3a777f4fd3743c46@mail.gmail.com \
--to=markknecht@gmail.com \
--cc=gentoo-amd64@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox