From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 5804A138334 for ; Wed, 23 Oct 2019 01:21:34 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 9555AE093D; Wed, 23 Oct 2019 01:21:29 +0000 (UTC) Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 83FCBE090E for ; Wed, 23 Oct 2019 01:21:26 +0000 (UTC) Received: by mail-pl1-f180.google.com with SMTP id y8so716995plk.0 for ; Tue, 22 Oct 2019 18:21:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=ZJnTgP6GaMwYusgdihFv079i13EaZRUdPhiqEbg5s1M=; b=N6IJoyXdvRN2VLtW6CE4j0JiisWKedjFXN4PuY2e+bZuH2tbHTLXjbmgXfprBsgmuH NL9MBcUcMZObuTLtgbvrK5XKW1yuCpWr9JYrF6a8NNwWggEHvRNRqpMLtVXiNQNMxrpx r/yLNyt2G3PLIWztqSZKJQo0qEbdBCh1/wnkYSI8/ovAz0ngqdqqBr6mrNfHNxwrmsDd kw7/ClCYwkyiwBFt0DK9pA2oQ/09TjzM1V7PWZ5wNNM9mFa8gMVCyutJcZDHn1/bXkm4 qWcE2Khuu8sVTpBKCXJMQEztWFQlpa8+E2oWQAC7mji/7sXD408DMfWGVsr2zEaOKkKA fAdQ== X-Gm-Message-State: APjAAAXkNlEJYgXiRw1ivJvsZ7O1ZeqYmULhv/ld2TMCVzTSzkyANeDB urWNi3cjm/3A1rDZxUBi5UC37yqWopozXSBsuaONA9yRb94= X-Google-Smtp-Source: APXvYqyIQB5AZxZO3CinYIcI/dZzRtJK7kM7djf0rHUlRfiXmVXBJRjCSV1RdP3jDDc/sqv4Hlzs2D1IpZvZtFPpJS0= X-Received: by 2002:a17:902:326:: with SMTP id 35mr419156pld.248.1571793684383; Tue, 22 Oct 2019 18:21:24 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 References: <752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org> In-Reply-To: From: Rich Freeman Date: Tue, 22 Oct 2019 21:21:12 -0400 Message-ID: Subject: Re: [gentoo-dev] New distfile mirror layout To: gentoo-dev Content-Type: text/plain; charset="UTF-8" X-Archives-Salt: 96dff530-6716-4b75-96a2-3bf888534e7c X-Archives-Hash: eca258a1d359655c1166bc2b6c1848e0 On Mon, Oct 21, 2019 at 12:42 PM Richard Yao wrote: > > Also, another idea is to use a cheap hash function (e.g. fletcher) and just have the mirrors do the hashing behind the scenes. Then we would have the best of both worlds. I think something that is getting missed in this discussion is that we don't control all of our mirrors, and they're generally donated resources. Somebody has some webserver, and they stick a Debian mirror in one directory tree, and an Arch one in another, and they're kind enough to give us one too. That is why we're seeing odder situations like ntfs and so on being mentioned. They're not necessarily even running Linux, let alone zfs or some other optimized filesystem. And their webserver might be set up to do browsable directory indexes which could perform terribly even if the filesystem itself is fine with direct filename lookups. It doesn't matter if you have hashed b-trees or whatever for filename lookups if you're going to ask the filesystem to give you a list of every file in a large directory - it is going to have to traverse whatever data structure it uses entirely to do so. If we want to start putting requirements on hosting a mirror, then we'll end up with less mirrors, and with mirrors more is usually better. Ideally a mirror should just be a black box to us - we don't really care what they're running because we don't depend on any mirror individually. Likewise if we negatively impact mirror hosts we'll end up with less mirrors. Sure, maybe those hosts have odd configurations, but we're still better off with them than without. That said we do seem to have a lot of mirrors so it probably isn't the end of the world if we lose a limited number. And there is nothing to say that we can't have some infra mirror set up more for interactive browsing that we don't have people fetch from but which dispenses with all the hashing or which bins by the first letter of the filename/etc. It seems like most of the use cases where hashing is inconvenient are for more casual use. To avoid another reply, people are talking about having utilities that can fetch distfiles using the new scheme. I'd think that "ebuild foo.ebuild fetch" is probably the simplest solution for this. Chances are that you're dealing with SRC_URI strings that have variable substitution in them anyway, so just letting ebuild do the fetching means you're not substituting ${PV} and so on, let alone all the stuff versionator and its ilk do. And of course you can always just fetch from upstream anyway if you do have a clean URI. -- Rich