public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Brian Harring <ferringb@gmail.com>
To: gentoo-portage-dev@lists.gentoo.org
Subject: Re: [gentoo-portage-dev] Package compression header for binhosts
Date: Tue, 1 Jun 2010 20:42:00 -0700	[thread overview]
Message-ID: <20100602034200.GA6316@hrair> (raw)
In-Reply-To: <4C059D7B.4080804@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 4222 bytes --]

On Tue, Jun 01, 2010 at 04:53:31PM -0700, Zac Medico wrote:
> On 06/01/2010 02:52 PM, Brian Harring wrote:
> > That bug isn't about a collision, it's about files being replaced underneath
> > Packages feet.  Even with the tricks you've leveled the issue of things
> > changing under foot still is possible- you've just made the race less
> > likely.
> 
> AFAIK the race is completely eliminated by the RCU-like snapshot
> mechanism. I think I like your hash-in-the-filename idea better
> though, since it seems simpler to implement and maintain.

You're forgetting about how one actually updates the snapshot- client 
grabs the packages cache, starts pulling binpkgs.  During that time 
the snapshot is being updated- the client now has a stale view of the 
repo, and since the repo's structure is based on cpv (which doesn't 
change regardless of metadata changing like use configuration) they 
can grab a binpkg that has the wrong metadata/checksums.

It's racey in exactly the same was as before, the only difference is 
you switched it to rewrite the tbz2 in a temp file instead of directly 
to the tbz2.  Reduction, but same level of risk for any form of 
updates.

Snapshot script just duck tapes around the issue, while leaving the 
core flaw intact.

> > What I was talking about was solving this issue once and for all via
> > restructuring, and specifically refering to the potential of an md5
> > collision in the URI space- specifically what I'm implementing for pkgcore
> > is the ability to do stupid stuff like this-
> > 
> > http://host/binpkg-store/$MD5.{txz,tbz2,tgz}
> 
> That would be the MD5 of the entire file, after compression and
> having the xpak segment appended, right?

Yep.  The only potential issue here is the unlikely case of a CHF 
collision.  There is a way to resolve that one too, although it is 
outside of what I'm willing to do format wise (namely a secondary url 
fallback).



> > then have multiple views accessible just via pointing the binpkg repo remote
> > url at
> > 
> > http://host/views/license/oss-approved/
> > http://host/views/keywords/amd64/stable/
> > http://host/views/raw/ # no filtering on the view of the binpkg repo, see
> > everything.
> 
> So, the default path of a package would come from looking at the MD5
> in the Packages file and then mapping that to a path?

default path would be defined in the preamble by a string interpolated 
pattern; whatever folk wanted to use.

A sane default is %(host)s/raw-pkgs/%(md5)s.%(compressor_ext)s imo.


> > Via restructuring where the binpkgs are stored and doing this approach,
> > multiple views can be had easily into the repo.  An additional benefit of
> > this approach is that via making URI able to point outside the host, you
> > could combine multiple seperate repositories into one just via a view.
> 
> This might also be useful for creating per-profile views while
> allowing packages to be shared between profiles in cases when
> hosting a separate build would be redundant. It might be possible to
> save lots of build time, disk space, and testing that way.
> 
> Being able to have multiple builds of the same package with
> different USE settings is also solves bug 150031 [1].

Yep and Yep.

> >> Eventually, I'd like to see gentoo
> >> officially distributing binary packages, so that we'll be able to
> >> get a slice of the binary distribution pie. When that happens, we're
> >> certainly not going to want to have race conditions like these in
> >> our public binhosts.
> >>
> > 
> > I'd suggest abandoning the current repository layout of Packages then, since
> > it's irrevocably flawed.  You can hack around it via jamming timestamp/md5
> > info into URI, but that's not a sane solution.
> 
> Shrug, it's a handy way to solve race conditions given the existing
> version 0 format. It's not optimal, so we'll surely want something
> better in version 1.

The problem here is that version0 still maps down to the existing 
binpkg on disk layout.  That layout is the core flaw here- as long as 
binpkgs are stored cpv orientated, version0 isn't able to do the crazy 
things I'm intending.
~harring

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

      reply	other threads:[~2010-06-02  3:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01  3:32 [gentoo-portage-dev] Package compression header for binhosts Zac Medico
2010-06-01  5:16 ` Brian Harring
2010-06-01 20:01   ` Ned Ludd
2010-06-01 21:22     ` Brian Harring
2010-06-01 21:37       ` Zac Medico
2010-06-01 21:52         ` Brian Harring
2010-06-01 23:53           ` Zac Medico
2010-06-02  3:42             ` Brian Harring [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100602034200.GA6316@hrair \
    --to=ferringb@gmail.com \
    --cc=gentoo-portage-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox