From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 313FE138334 for ; Tue, 8 Oct 2019 03:33:59 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 69785E08C0; Tue, 8 Oct 2019 03:33:58 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 3C6D4E08C0 for ; Tue, 8 Oct 2019 03:33:58 +0000 (UTC) Received: from pomiot (c134-66.icpnet.pl [85.221.134.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id 5999D34B8F2; Tue, 8 Oct 2019 03:33:56 +0000 (UTC) Message-ID: <0a281500f949432666cda1e948db6b062e99ced5.camel@gentoo.org> Subject: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? From: =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?= To: gentoo-mirrors@lists.gentoo.org Cc: infrastructure Date: Tue, 08 Oct 2019 05:33:52 +0200 Organization: Gentoo Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-z6vsQgeJLz7REWiGjw2R" User-Agent: Evolution 3.32.4 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-mirrors@lists.gentoo.org Reply-to: gentoo-mirrors@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 X-Archives-Salt: 0901d807-654f-4684-8636-68c127fae4f2 X-Archives-Hash: 113ee7cb8e89266f2b64a8f48512cfc8 --=-z6vsQgeJLz7REWiGjw2R Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, everyone. TL;DR: shortly, distfiles will need to be present under two paths for the transitional period. Would you prefer us using hardlinks or symlinks for that? We're planning to start deploying a new GLEP 75-based [1] mirror layout to our mirrors soonish. This implies a transitional period during which we'll be using both old and new layouts, so all file entries will be duplicated. The plan is roughly to: 1. Enable new split layout in emirrordist, and start using both simultaneously for newly-mirrored files. 2. Duplicate the existing distfiles to new layout. 3. Live with both layouts for some longish time, to support people using old Portage versions. 4. Eventually disable the old (flat) layout and start removing files. The basic problem is whether to use hardlinks or symlinks for the duplicate files. I've elaborate more on both solutions in [2] but I'll summarize shortly here. Hardlinks have the advantage that for mirrors enabling -H, they avoid extra space usage and extra traffic. However, we don't really know how many mirrors enable that, and I suspect it's around half of them. At initial deployment time, rsync will just hardlink files in new layout to existing entries, and at cleanup time it will just unlink old entries. For mirrors not enabling -H, hardlinks will mean all distfiles being transferred again during deployment time. Furthermore, through all transitional period all files will be duplicated, and so duplicated will be space usage. Cleanup should be lightweight though. Symlinks have the advantage that we know that all or almost all mirrors enable them. They are lightweight at deployment time since it's just a matter of rsync copying symlinks, and they definitely won't cause double space usage. However, they will cause all files being retransferred at cleanup time -- due to symlinks being replaced by real files. Technically, I suppose we could avoid that by splitting that into two stages, repeated for smaller groups of files. Firstly, replace symlinks with hardlinks which will make it light for at least some of the errors. Then, remove old files and jump over to the next group. For mirrors not using -H, this will still mean double transfer but we'd limit double space usage to one group at a time, and only for a short period. If any mirrors sync over rsync without using -l (talking about private mirrors here), they will not get the new layout at all which is going to suck for their users. Which way do you prefer? [1] https://www.gentoo.org/glep/glep-0075.html [2] https://bugs.gentoo.org/534528#c38 --=20 Best regards, Micha=C5=82 G=C3=B3rny --=-z6vsQgeJLz7REWiGjw2R Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQGTBAABCgB9FiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAl2cA6BfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEM3 NkE4NDUwOTQwOThEMjhDQzhCMjZDNTYzOUFEQUUyMzI5RTI0MEUACgkQY5ra4jKe JA6S5gf+M9LdFjGPi27YhbVPL3t+wNsJnvYPkL+wb2meY0XlCQtYD6a3GKbEjsPz 8Zsh3IKBdBzPhvpQF+3kTz2EnbPVC9nL6aHDQ3ECfmd/DfsRweC2Co3cEMfujVXA F254Zj47TP1yXsxio/rVil2SLUfVMAjUPk1Fo6pMewbWoswZZSzxreQB/Gqf4Jfl aFOmFxEK+1lv4VHH2Ibud4S205++iTC9i/GKsz29d8rvmjna/7/poFVn07QfQGST dkjRn4wjI/ZhUUO2rHaKKez8dUy1UprxdU3h/E3V2ZbIoF+JlEtCkgeqdn4pY70U CzOMkF/FZBH/zhtjYa/p9FJBkCRorQ== =FYW7 -----END PGP SIGNATURE----- --=-z6vsQgeJLz7REWiGjw2R--