[gentoo-amd64] Re: RAID1 boot - no bootable media found

public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: gentoo-amd64@lists.gentoo.org
Subject: [gentoo-amd64] Re: RAID1 boot - no bootable media found
Date: Fri, 2 Apr 2010 09:43:22 +0000 (UTC)	[thread overview]
Message-ID: <pan.2010.04.02.09.43.22@cox.net> (raw)
In-Reply-To: o2q5bdc1c8b1004011157if9fb419ey3a777f4fd3743c46@mail.gmail.com

Mark Knecht posted on Thu, 01 Apr 2010 11:57:47 -0700 as excerpted:

> A bit long in response. Sorry.
> 
> On Tue, Mar 30, 2010 at 11:56 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Mark Knecht posted on Tue, 30 Mar 2010 13:26:59 -0700 as excerpted:
>>
>>> [W]hen I change the hard drive boot does the old sdb become the new
>>> sda because it's what got booted? Or is the order still as it was?

>> That depends on your BIOS.

> It seems to be constant mapping meaning (I guess) that I need to change
> the drive specs in grub.conf on the second drive to actually use the
> second drive.
> 
> I made the titles for booting different for each grub.conf file to
> ensure I was really getting grub from the second drive. My sda grub boot
> menu says "2.6.33-gentoo booting from sda" on the first drive, sdb on
> the second drive, etc.

Making the titles different is a very good idea.  It's what I ended up 
doing too, as otherwise, it can get confusing pretty fast.

Something else you might want to do, for purposes of identifying the 
drives at the grub boot prompt if something goes wrong or you are 
otherwise trying to boot something on another drive, is create a (probably 
empty) differently named file on each one, say grub.sda, grub.sdb, etc.

That way, if you end up at the boot prompt you can do a find /grub.sda 
(or /grub/grub.sda, or whatever), and grub will return a list of the 
drives with that file, in this case, only one drive, thus identifying your 
normal sda drive.

You can of course do similar by cat-ing the grub.conf file on each one, 
since you are keeping your titles different, but that's a bit more 
complicated than simply doing a find on the appropriate file, to get your 
bearings straight on which is which in the event something screws up.

>>
>> The point being... [using badblocks] it /is/ actually possible to
>> verify that they're working well before you fdisk/mkfs and load data.
>> Tho it does take awhile... days... on drives of modern size.
>>
> I'm trying badblocks right now on sdc. using
> 
> badblocks -v /dev/sdc
> 
> Maybe I need to do something more strenuous? It looks like it will be
> done an an hour or two. (i7-920 with SATA drives so it should be fast,
> as long as I'm not just reading the buffers or something like that.
> 
> Roughly speaking 1TB read at 100MB/S should take 10,000 seconds or 2.7
> hours. I'm at 18% after 28 minutes so that seems about right. (With no
> errors so far assuming I'm using the right command)

I used the -w switch here, which actually goes over the disk a total of 8 
times, alternating writing and then reading back to verify the written 
pattern, for four different write patterns (0xaa, 0x55, 0xff, 0x00, which 
is alternating 10101010, alternating 01010101, all ones, all zeroes).

But that's "data distructive".  IOW, it effectively wipes the disk.  Doing 
it when the disks were new, before I fdisked them let alone mkfs-ed and 
started loading data, was fine, but it's not something you do if you have 
unbacked up data on them that you want to keep!

Incidently, that's not /quite/ the infamous US-DOD 7-pass wipe, as it's 
only 4 passes, but it should reasonably ensure against ordinary recovery, 
in any case, if you have reason to wipe your disks...  Well, except for 
any blocks the disk internals may have detected as bad and rewritten 
elsewhere, you can get the SMART data on that.  But a 4-pass wipe, as 
badblocks -w does, should certainly be good for the normal case, and is 
already way better than just an fdisk, which doesn't even wipe anything 
but the allocation tables!

But back to the timing.  Since the -w switch does a total of 8 passes (4 
each write and read, alternating) while you're doing just one with just
-v, it'll obviously take 8 times the time.  So 80K seconds... 22+ hours.

So I guess it's not days... just about a day.  (Probably something more, 
as the first part of the disk, near the outside edge, should go faster 
than the last part, so figure a bit over a day, maybe 30 hours...)

[8 second spin-down timeouts]

> Very true. Here is the same drive model I put in a new machine for my
> dad. It's been powered up and running Gentoo as a typical desktop
> machine for about 50 days. He doesn't use it more than about an hour a
> day on average. It's already hit 31K load/unload cycles. At 10% of 300K
> that about 1.5 years of life before I hit that spec. I've watched his
> system a bit and his system seems to add 1 to the count almost exactly
> every 2 minutes on average. Is that a common cron job maybe?

It's unlikely to be a cron job.  But check your logging, and check what 
sort of atime you're using on your mounts (relatime is the new kernel 
default, but it was atime until relatively recently, say 2.6.30 or .31 or 
some such, and noatime is recommended unless you have something that 
actually depends on atime, alpine is known to need it for mail, and some 
backup software uses it, tho little else on a modern system will, I always 
use noatime on my real disk mounts, as opposed to say tmpfs, here).  If 
there's something writing to the log every two minutes or less, and the 
buffers are set to timeout dirty data and flush to disk every two 
minutes...  And simply accessing a file will change the atime on it if you 
have that turned on, thus necessitating a write to disk to update the 
atime, with those dirty buffers flushed every X minutes or seconds as well.

> I looked up the spec on all three WD lines - Green, Blue and Black. All
> three were 300K cycles. This issue has come up on the RAID list. It
> seems that some other people are seeing this and aren't exactly sure
> what Linux is doing to cause this.

It's probably not just Linux, but a combination of Linux and the drive 
defaults.

> I'll study hdparm and BIOS when I can reboot.
> 
> My dad's current data:

> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED 
> WHEN_FAILED RAW_VALUE

>   4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       21

>   9 Power_On_Hours          0x0032   099   099   000    Old_age
> Always       -       1183

>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       20

> 193 Load_Cycle_Count        0x0032   190   190   000    Old_age Always  
>     -       31240

Here's my comparable numbers, several years old Seagate 7200.8 series:

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   
Always       -       996

  9 Power_On_Hours          0x0032   066   066   000    Old_age   
Always       -       30040

 12 Power_Cycle_Count       0x0032   099   099   020    Old_age   
Always       -       1045

Note that I don't have #193, the load-cycle counts.  There's a couple 
different technologies here.  The ramp-type load/unload yours uses is 
typical of the smaller 2.5" laptop drives.  These are designed for far 
shorter idle/standby timeouts and thus a far higher cycle count, load 
cycles, typical rating 300,000 to 600,000.  Standard desktop/server drives 
use a contact park method and a lower power cycle count, typically 50,000 
or so.  That's the difference.

At 300,000 load cycle count rating, your WDs are at the lower end of the 
ramp-type ratings, but still far higher than comparable contact power-
cycle ratings.  Even tho the ramp-type they use is good for far more 
cycles, as you mentioned, you're already at 10% after only 50 days.

My old Seagates, OTOH, about 4.5 years old best I can figure (I bought 
them around October, 30K operating hours ~3.5 years, and I run them most 
but not all of the time, so 4.5 years is a good estimate), rated for only 
50,000 contact start/stop cycles (they're NOT the ramp type), but SMART 
says only about 1000. or 2% of the rating, gone.  (If you check the stats 
they seem to recommend replacing at 20%, assuming that's a percentage, 
which looks likely, but either way, that's a metric I don't need to worry 
about any time soon.)

OTOH, at 30,000+ operating hours (about 3.5 years if on constantly, as I 
mentioned above), that one's running rather lower.  Again assuming it's a 
percentage metric, it would appear they rate them @ 90,000 hours.  (I 
looked up the specsheets tho, and couldn't see anything like that listed, 
only 5 years lifetime and warranty, which would be about half that, 45,000 
hours.  But given the 0 threshold there, it appears they expect the start-
stop cycles to be more critical, so they may indeed rate it 90,000 
operating hours.)  That'd be three and a half years of use, straight thru, 
so yeah, I've had them, probably four and half years now, probably five in 
October -- I don't have them spin down at all and often leave my system on 
for days at a time, but not /all/ the time, so 3.5 years of use in 4.5 
years sounds reasonable.

> Yeah, that's important. Thanks. If I can solve all these RAID problems
> then maybe I'll look at adding RAID to his box with better drives or
> something.

One thing they recommend with RAID, which I did NOT do, BTW, and which I'm 
beginning to worry about since I'm approaching the end of my 5 year 
warranties, is buying either different brands or models, or at least 
ensuring you're getting different lot numbers of the same model.  The idea 
being, if they're all the same model and lot number, and they're all part 
of the same RAID so in similar operating conditions, they're all likely to 
go out pretty close to each other.  That's one reason to be glad I'm 
running 4-way RAID-1, I suppose, as one hopes that when they start going, 
even if they are the same model and lot number, at least one of the four 
can hang on long enough for me to buy replacements and transfer the 
critical data.  But I also have an external 1 TB USB I bought, kept off 
most of the time as opposed to the RAID disks which are on most of the 
time, that I've got an external backup on, as well as the backups on the 
RAIDs, tho the external one isn't as regularly synced.  But in the event 
all four RAID drives die on me, I HAVE test-booted from a USB thumb drive 
(the external 1TB isn't bootable -- good thing I tested, eh!), to the 
external 1TB, and CAN recover from it, if I HAVE to.

> Note that on my system only I'm seeing real problems in
> /var/log/message, non-RAID, like 1000's of these:
> 
> Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 45276264 on sda3
> Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46309336 on sda3
> Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567488 on sda3
> Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567680 on sda3
> 
> or
> 
> Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555752 on
> sda3
> Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555760 on
> sda3
> Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555768 on
> sda3
> Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555776 on
> sda3

That doesn't look so good...

> However I see NONE of that on my dad's machine using the same drive but
> different chipset.
> 
> The above problems seem to result in this sort of problem when I try
> going with RAID as I tried again this monring:
> 
> INFO: task kjournald:5064 blocked for more than 120 seconds. "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[snipped the trace]

Ouch!  Blocked for 2 minutes...

Yes, between the logs and the 2-minute hung-task, that does look like some 
serious issues, chipset or other...

Talking about which...

Can you try different SATA cables?  I'm assuming you and your dad aren't 
using the same cables.  Maybe it's the cables, not the chipset.

Also, consider slowing the data down.  Disable UDMA or reduce it to a 
lower speed, or check the pinouts and try jumpering OPT1 to force SATA-1 
speeds (150 MB/sec instead of 300 MB/sec) as detailed here (watch the 
wrap!):

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?
p_faqid=1337

If that solves the issue, then you know it's related to signal timing.  

Unfortunately, this can be mobo related.  I had very similar issues with 
memory at one point, and had to slow it down from the rated PC3200, to 
PC3000 speed (declock it from 200 MHz to 183 MHz), in the BIOS.  
Unfortunately, initially the BIOS didn't have a setting for that; it 
wasn't until a BIOS update that I got it.  Until I got the update and 
declocked it, it would work most of the time, but was borderline.  The 
thing was, the memory was solid and tested so in memtest86+, but that 
tests memory cells, not speed, and at the rated speed, that memory and 
that board just didn't like each other, and there'd be occasional issues 
(bunzip2 erroring out due to checksum mismatch was a common one, and 
occasional crashes). Ultimately, I fixed the problem when I upgraded 
memory.

So having experienced the issue with memory, I know exactly how 
frustrating it can be.  But if you slow it down with the jumper and it 
works, then you can try different cables, or take off the jumper and try 
lower UDMA speeds (but still higher than SATA-1/150MB/sec), using hdparm 
or something.  Or exchange either the drives or the mobo, if you can, or 
buy an add-on SATA card and disable the onboard one.

Oh, and double-check the kernel driver you are using for it as well.  
Maybe there's another that'll work better, or driver options you can feed 
to it, or something.

Oh, and if you hadn't re-fdisked, re-created new md devices, remkfsed, and 
reloaded the system from backup, after you switched to AHCI, try that.  
AHCI and the kernel driver for it is almost certainly what you want, not 
compatibility mode, but that could potentially screw things up too, if you 
switched it and didn't redo the disk afterward.

I do wish you luck!  Seeing those errors brought back BAD memories of the 
memory problems I had, so while yours is disk not memory, I can definitely 
sympathize!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2010-04-02 10:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-28 17:14 [gentoo-amd64] RAID1 boot - no bootable media found Mark Knecht
2010-03-30  6:39 ` [gentoo-amd64] " Duncan
2010-03-30 13:56   ` Mark Knecht
2010-03-30 18:08     ` Duncan
2010-03-30 20:26       ` Mark Knecht
2010-03-31  6:56         ` Duncan
2010-04-01 18:57           ` Mark Knecht
2010-04-02  9:43             ` Duncan [this message]
2010-04-02 17:18               ` Mark Knecht
2010-04-03 23:13                 ` Mark Knecht
2010-04-05 18:17                   ` Mark Knecht
2010-04-06 14:00                     ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pan.2010.04.02.09.43.22@cox.net \
    --to=1i5t5.duncan@cox.net \
    --cc=gentoo-amd64@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox