From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (unknown [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 4DA721381FA for ; Wed, 28 May 2014 19:20:39 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 555F8E08F3; Wed, 28 May 2014 19:20:37 +0000 (UTC) Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA256 (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 70521E08EC for ; Wed, 28 May 2014 19:20:36 +0000 (UTC) Received: from marcec ([77.22.138.176]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0M9bYB-1WzK3h2ppP-00D2hS for ; Wed, 28 May 2014 21:20:35 +0200 Date: Wed, 28 May 2014 21:20:18 +0200 From: Marc Joliet To: gentoo-amd64@lists.gentoo.org Subject: Re: [gentoo-amd64] Soliciting new RAID ideas Message-ID: <20140528212018.04387c61@marcec> In-Reply-To: <20140528152658.GA13493@sgi.com> References: <20140527223938.GA3701@sgi.com> <20140528015114.3634f6b4@marcec> <20140528152658.GA13493@sgi.com> X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-amd64@lists.gentoo.org Reply-to: gentoo-amd64@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/DKeG6vbIeJsBglA1jIJzqx2"; protocol="application/pgp-signature" X-Provags-ID: V03:K0:ngEtUWon3WNc/3IrCYXfPY6wdMeuTR1gCduxo+Q9n+EcXaAqtj5 fDr1YiQAPj6AqsZtEvnnEVBmdbGnIvy1WW8jaIi5NDRpcn0+y8cD8VFGyfonHowF+ZQezYW IBc+TAMmqCpv80H8/5uBuaMMq5/w8AcwF5IEknZa4jl3S0jsFvbCZNVg/9VtDsD32sVIm/e BBXd6By8gBJDrhSu67Ryg== X-Archives-Salt: 1c4c002c-6edf-4b75-96ef-7740e9dd345a X-Archives-Hash: 810ab10d357eb3ad4e78d84d845334a9 --Sig_/DKeG6vbIeJsBglA1jIJzqx2 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Wed, 28 May 2014 08:26:58 -0700 schrieb Bob Sanders : >=20 > Marc Joliet, mused, then expounded: > > Am Tue, 27 May 2014 15:39:38 -0700 > > schrieb Bob Sanders : > >=20 > > While I am far from a filesystem/storage expert (I see myself as a mere= user), > > the cited threads lead me to believe that this is most likely an > > overhyped/misunderstood class of errors (e.g., posts [1] and [2]), so I= would > > suggest reading them in their entirety. > >=20 > > [0] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31832 > > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31871 > > [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31877 > > [3] http://comments.gmane.org/gmane.comp.file-systems.btrfs/31821 > > >=20 > FWIW - here's the FreeNAS ZFS ECC discussion on what happens with a bad > memory bit and no ECC memory: >=20 > http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15= 449/ Thanks for explicitly linking that. I didn't read it the first time around, but just read through most of it, then reread the threads [0] and [3] above= and *think* that I understand the problem (and how it doesn't apply to BTRFS) better now. IIUC, the claim is: data is written to disk, but it must go through the RAM first, obviously, where it is corrupted (due to a permanent bit flip caused, e.g., by deteriorating hardware). At some later point, when the data is re= ad back from disk, it might happen to load around the damaged location in RAM, where it is further corrupted. At this point the checksum fails, and ZFS corrects the data in RAM (using parity information!), where it is immediate= ly corrupted again (because apparently it is corrected at the same physical location in RAM? perhaps this is specific to correction via parity?). This *additionally* corrupted data is then written back to disk (without any fur= ther checks). So the point is that, apparently, without ECC RAM, you could get a (long-te= rm) cascade of errors, especially during a scrub. The likelihood of such perma= nent RAM corruption happening in the first place is another question entirely. The various posts in [0] then basically say that regardless of whether this really is true of ZFS, it certainly doesn't apply to BTRFS, for various reasons. I suppose this quote from [1] (see above) says it most clearly: > In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, th= ey talk about > reconstructing corrupted data from parity information: >=20 > > Ok, no problem. ZFS will check against its parity. Oops, the parity fai= led since we have a new corrupted > bit. Remember, the checksum data was calculated after the corruption from= the first memory error > occurred. So now the parity data is used to "repair" the bad data. So the= data is "fixed" in RAM. >=20 > i.e. that there is parity information stored with every piece of data, an= d ZFS will "correct" errors > automatically from the parity information. I start to suspect that there= is confusion here between > checksumming for data integrity and parity information. If this is reall= y how ZFS works, then if memory > corruption interferes with this process, then I can see how a scrub could= be devastating. I don't know if > ZFS really works like this. It sounds very odd to do this without an add= itional checksum check. This sounds > very different to what you say below that btrfs does, which is only to ch= eck against redundantly-stored > copies, which I agree sounds much safer. The rest is also relevant, but I think the point that the data is corrected= via parity information, as opposed to using a known-good redundant copy of the = data (which I originally missed, and thus got confused), is the key point in understanding the (supposed) difference in behaviour between ZFS and BTRFS. All this assumes, of course, that the FreeNAS forum post that ignited this discussion is correct in the first place. > Thanks Mark! Interesting discussion on btrfs. >=20 > Bob You're welcome! I agree, it's an interesting discussion. And regarding the misspelling of my name: no problem :-) . --=20 Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup --Sig_/DKeG6vbIeJsBglA1jIJzqx2 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJThjcAAAoJEL/Q5oYsiHj0okcP/iw33UlMnUwuLuBFcrm5QsLv L3EDzAGCT4HrNX91HlQ0nA0CvOLgR3WkCDR3Nwa042/Qqvhlrb9CKJQ+X0Ko92tc /QXksem1zMxMNTqBuhWITzi15sISVgIJo1MGvI+0NOmYxthgLA9COru/fBFbXNqD aFBA2KBzz0UnaIAzhXIJGZF9OpAQRxyyricLmz0cRnudrkhV/FXXZj7A0JLfSt6c CTfdWCI/bXwuSDvO5Y56lgmPHEk8nFh0ei5dnePqHdu9vezZIgAMgS2ubI/3Jb83 kQcgz+w1YOrZjwGHp5X71DatSZUkxc/iYiU2i4hpMIk6Gj93iH20yADiPQOVn61E 5HmLyZTqZBCswIoTpIbnXTFFM9vJYHRF5LqPeGKhQW6X0Uv4NaZXzOuHHKw/oM2I Tn3d+S4rtlRI1Ju0+kMdHMbiZGwLkbk1z4HLib0Jh+xQDgeQ/tZppKwwwCCs4IJe Je3BA9m89iBsoIOQ6+W9JSBx2hsGStz4G3ZKO8WnDVjC+17qmbhSd/bYD7yMEFY5 oxNX8o3ebAC61w824X1v537VA9UWzJzF8Pj1DKCOoqKWzUB/ILsBmKNniAx2aFAM pDwQFSM+P57y2Vel05jdIa1BTrEK9/kq7crj1C+/gc09blgDL7wofDOjHrzyyzPN CKar9T5lB4orBvOwqUsL =D0j6 -----END PGP SIGNATURE----- --Sig_/DKeG6vbIeJsBglA1jIJzqx2--