From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1OJetA-0005PL-Nj for garchives@archives.gentoo.org; Wed, 02 Jun 2010 03:45:08 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 28455E0E98; Wed, 2 Jun 2010 03:44:00 +0000 (UTC) Received: from mail-pz0-f182.google.com (mail-pz0-f182.google.com [209.85.222.182]) by pigeon.gentoo.org (Postfix) with ESMTP id C2BE1E0E98 for ; Wed, 2 Jun 2010 03:43:59 +0000 (UTC) Received: by pzk12 with SMTP id 12so3239179pzk.9 for ; Tue, 01 Jun 2010 20:43:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:date:from:to:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=ZRyMxFWkEG1VwNL1xWgeH6xaKwtDezRHtyA4hwhi7I4=; b=HnRw76xQQ3rOHBKFidQouN4Mv3Q1qfIfq0uakg1urDOGscKB5q4TEsqaNU3xNDx0aG ZQ7R/oquMN1alb8zNWjByksqKCmHoLqwTGw1/V7Z+7vvh8SeQ5Qzqaz1l9vNSqNWTQJy AuWlh5q1azkzxeZSWw7vQSVp72vdRNqk2bUrQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=IFc7z/+BJcl557E5Zr9Mm5KVCwsV9srX2CIWwOeMJCSFdZchNti+wf0pCQXbuRD8+V ZR6nWIVnAC6kagm3C9s0rfdAaG2VvHcRpZ9S7AFk4HOEv2efbsnvfIYN0tIdFzp7YsQB BAxsyU6Po4jY8l1DcNlKciF7MmIFgmCeirIVE= Received: by 10.114.242.14 with SMTP id p14mr6124242wah.156.1275450239032; Tue, 01 Jun 2010 20:43:59 -0700 (PDT) Received: from smtp.gmail.com (c-67-171-128-62.hsd1.wa.comcast.net [67.171.128.62]) by mx.google.com with ESMTPS id n29sm65158121wae.4.2010.06.01.20.43.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 01 Jun 2010 20:43:56 -0700 (PDT) Received: by smtp.gmail.com (sSMTP sendmail emulation); Tue, 01 Jun 2010 20:42:00 -0700 Date: Tue, 1 Jun 2010 20:42:00 -0700 From: Brian Harring To: gentoo-portage-dev@lists.gentoo.org Subject: Re: [gentoo-portage-dev] Package compression header for binhosts Message-ID: <20100602034200.GA6316@hrair> References: <4C047F52.30209@gentoo.org> <20100601051608.GD19306@hrair> <1275422465.24611.9.camel@hangover> <4C057DA4.5050408@gentoo.org> <4C059D7B.4080804@gentoo.org> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+QahgC5+KEYLbs62" Content-Disposition: inline In-Reply-To: <4C059D7B.4080804@gentoo.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-Archives-Salt: 8fafc720-bfa4-42de-84f6-6e3c4be46436 X-Archives-Hash: b85a4733e9e67050a8e33379aac2bedd --+QahgC5+KEYLbs62 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 01, 2010 at 04:53:31PM -0700, Zac Medico wrote: > On 06/01/2010 02:52 PM, Brian Harring wrote: > > That bug isn't about a collision, it's about files being replaced under= neath > > Packages feet. Even with the tricks you've leveled the issue of things > > changing under foot still is possible- you've just made the race less > > likely. >=20 > AFAIK the race is completely eliminated by the RCU-like snapshot > mechanism. I think I like your hash-in-the-filename idea better > though, since it seems simpler to implement and maintain. You're forgetting about how one actually updates the snapshot- client=20 grabs the packages cache, starts pulling binpkgs. During that time=20 the snapshot is being updated- the client now has a stale view of the=20 repo, and since the repo's structure is based on cpv (which doesn't=20 change regardless of metadata changing like use configuration) they=20 can grab a binpkg that has the wrong metadata/checksums. It's racey in exactly the same was as before, the only difference is=20 you switched it to rewrite the tbz2 in a temp file instead of directly=20 to the tbz2. Reduction, but same level of risk for any form of=20 updates. Snapshot script just duck tapes around the issue, while leaving the=20 core flaw intact. > > What I was talking about was solving this issue once and for all via > > restructuring, and specifically refering to the potential of an md5 > > collision in the URI space- specifically what I'm implementing for pkgc= ore > > is the ability to do stupid stuff like this- > >=20 > > http://host/binpkg-store/$MD5.{txz,tbz2,tgz} >=20 > That would be the MD5 of the entire file, after compression and > having the xpak segment appended, right? Yep. The only potential issue here is the unlikely case of a CHF=20 collision. There is a way to resolve that one too, although it is=20 outside of what I'm willing to do format wise (namely a secondary url=20 fallback). > > then have multiple views accessible just via pointing the binpkg repo r= emote > > url at > >=20 > > http://host/views/license/oss-approved/ > > http://host/views/keywords/amd64/stable/ > > http://host/views/raw/ # no filtering on the view of the binpkg repo, s= ee > > everything. >=20 > So, the default path of a package would come from looking at the MD5 > in the Packages file and then mapping that to a path? default path would be defined in the preamble by a string interpolated=20 pattern; whatever folk wanted to use. A sane default is %(host)s/raw-pkgs/%(md5)s.%(compressor_ext)s imo. > > Via restructuring where the binpkgs are stored and doing this approach, > > multiple views can be had easily into the repo. An additional benefit = of > > this approach is that via making URI able to point outside the host, you > > could combine multiple seperate repositories into one just via a view. >=20 > This might also be useful for creating per-profile views while > allowing packages to be shared between profiles in cases when > hosting a separate build would be redundant. It might be possible to > save lots of build time, disk space, and testing that way. >=20 > Being able to have multiple builds of the same package with > different USE settings is also solves bug 150031 [1]. Yep and Yep. > >> Eventually, I'd like to see gentoo > >> officially distributing binary packages, so that we'll be able to > >> get a slice of the binary distribution pie. When that happens, we're > >> certainly not going to want to have race conditions like these in > >> our public binhosts. > >> > >=20 > > I'd suggest abandoning the current repository layout of Packages then, = since > > it's irrevocably flawed. You can hack around it via jamming timestamp/= md5 > > info into URI, but that's not a sane solution. >=20 > Shrug, it's a handy way to solve race conditions given the existing > version 0 format. It's not optimal, so we'll surely want something > better in version 1. The problem here is that version0 still maps down to the existing=20 binpkg on disk layout. That layout is the core flaw here- as long as=20 binpkgs are stored cpv orientated, version0 isn't able to do the crazy=20 things I'm intending. ~harring --+QahgC5+KEYLbs62 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) iEYEARECAAYFAkwF0wgACgkQsiLx3HvNzgfMHQCgzPrfIMfv526GUDHwuoGlM0PK vioAn3q5LoDt9VcLp5Z37WT1okSaNxLQ =e0kg -----END PGP SIGNATURE----- --+QahgC5+KEYLbs62--