From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 1F1B61381F3 for ; Fri, 21 Jun 2013 18:50:59 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id B09A1E0ADD; Fri, 21 Jun 2013 18:50:47 +0000 (UTC) Received: from catbert.rellim.com (catbert.rellim.com [204.17.205.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id D2288E096E for ; Fri, 21 Jun 2013 18:50:46 +0000 (UTC) Received: from localhost ([IPv6:2001:470:e815:0:230:48ff:fe34:5fe2]) (authenticated bits=0) by catbert.rellim.com (8.14.5/8.14.5) with ESMTP id r5LIoiG6011928 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 21 Jun 2013 11:50:44 -0700 X-DKIM: OpenDKIM Filter v2.6.7 catbert.rellim.com r5LIoiG6011928 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=rellim.com; s=catbert; t=1371840645; bh=rBwSOs8IBE1wWDk4ttPrrEe9A/IVlJKJEEzi7TpaOjs=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=QCw2dZwH4llUZoXInWRfHkPIwouEcQN9UYVj0XIpg0pdLQK+QqejSf0nkZU2ihCKQ g/aEAW49cs0WhHunZ+QQQ== Date: Fri, 21 Jun 2013 11:50:43 -0700 From: "Gary E. Miller" To: gentoo-amd64@lists.gentoo.org Cc: markknecht@gmail.com Subject: Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value? Message-ID: <20130621115043.32b99d94.gem@rellim.com> In-Reply-To: References: Organization: Rellim X-Mailer: Claws Mail 3.9.0 (GTK+ 2.24.17; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-amd64@lists.gentoo.org Reply-to: gentoo-amd64@lists.gentoo.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/yT9HB__VBHqyf5sNNDJh1GD"; protocol="application/pgp-signature" X-Archives-Salt: 705cf5f6-5583-4fe1-8b09-bd780dcec648 X-Archives-Hash: 6b79be49cf60b47abbc0cac084b43d50 --Sig_/yT9HB__VBHqyf5sNNDJh1GD Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Yo Mark! On Fri, 21 Jun 2013 11:38:00 -0700 Mark Knecht wrote: > On the read side I'm not sure if I'm understanding your point. I agree > that a so-designed RAID1 system could/might read smaller portions of a > larger read from RAID1 drives in parallel, taking some data from one > drive and some from another drive, and then only take action > corrective if one of the drives had troubles. However I don't know > that mdadm-based RAID1 does anything like that. Does it? It surely does. I have confirmed that at least monthly since md has existed in the kernel. > It seems to me that unless I at least _request_ all data from all > drives and minimally compare at least some error flag from the > controller telling me one drive had trouble reading a sector then how > do I know if anything bad is happening? Correct. You cant' tell if you can read something without trying to read it. Which is why you should do a full raid rebuild every week. >=20 > Or maybe you're saying it's RAID1 and I don't know if anything bad is > happening _unless_ I do a scrub and specifically check all the drives > for consistency? No. A simple read will find the problem. But given it is RAID1 the only way to be sure to read from both dirves is a raid rebuild. > I do mdadm scrubs at least once a week. I still do them by hand. They > have never appeared terribly expensive watching top or iotop but > sometimes when I'm watching NetFlix or Hulu in a VM I get more pauses > when the scrub is taking place, but it's not huge. Which is why you should cron jothem at oh-dark-thirty. > > I agree that RAID5 gives you an opportunity to get things fixed, but > there are folks who lose a disk in a RAID5, start the rebuild, and > then lose a second disk during the rebuild. Because they failed to do weekly rebuilds. > Not that I would ever run the array degraded but that I > could still tolerate a second loss while the rebuild was happening and > hopefully get by. Sadly most people make their RAID5 or RAID6 out of brand new, consecutively serial numbered drives. They then get the exactly the same temp, voltage, humidity, seek stress until they all fail within days of each other. I have personally seen 4 of 5 drives in a RAID5 fail within 3 days many times. Usually on a Friday where the tech decides the drive replacement can wait until Monday. Your only protection against a full RAIDx failure is an offsite backup. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701 gem@rellim.com Tel:+1(541)382-8588 --Sig_/yT9HB__VBHqyf5sNNDJh1GD Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iEYEARECAAYFAlHEoIQACgkQBmnRqz71OvNWCACgy3E1LlsxcPHJTuvU57c67m8j iTIAnRQ7wnivMSGNDo3jn7SzCY8fACVH =QZ1q -----END PGP SIGNATURE----- --Sig_/yT9HB__VBHqyf5sNNDJh1GD--