public inbox for gentoo-mirrors@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
@ 2019-10-08  3:33 Michał Górny
  2019-10-08  7:13 ` SoEasyTo Mirrors Manager
  2019-10-08 13:36 ` Carlos Carvalho
  0 siblings, 2 replies; 6+ messages in thread
From: Michał Górny @ 2019-10-08  3:33 UTC (permalink / raw
  To: gentoo-mirrors; +Cc: infrastructure

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

Hello, everyone.

TL;DR: shortly, distfiles will need to be present under two paths for
the transitional period.  Would you prefer us using hardlinks or
symlinks for that?


We're planning to start deploying a new GLEP 75-based [1] mirror layout
to our mirrors soonish.  This implies a transitional period during which
we'll be using both old and new layouts, so all file entries will be
duplicated.  The plan is roughly to:

1. Enable new split layout in emirrordist, and start using both
simultaneously for newly-mirrored files.

2. Duplicate the existing distfiles to new layout.

3. Live with both layouts for some longish time, to support people using
old Portage versions.

4. Eventually disable the old (flat) layout and start removing files.


The basic problem is whether to use hardlinks or symlinks
for the duplicate files.  I've elaborate more on both solutions in [2]
but I'll summarize shortly here.

Hardlinks have the advantage that for mirrors enabling -H, they avoid
extra space usage and extra traffic.  However, we don't really know how
many mirrors enable that, and I suspect it's around half of them.
At initial deployment time, rsync will just hardlink files in new layout
to existing entries, and at cleanup time it will just unlink old
entries.

For mirrors not enabling -H, hardlinks will mean all distfiles being
transferred again during deployment time.  Furthermore, through all
transitional period all files will be duplicated, and so duplicated will
be space usage.  Cleanup should be lightweight though.

Symlinks have the advantage that we know that all or almost all mirrors
enable them.  They are lightweight at deployment time since it's just
a matter of rsync copying symlinks, and they definitely won't cause
double space usage.  However, they will cause all files being
retransferred at cleanup time -- due to symlinks being replaced by real
files.

Technically, I suppose we could avoid that by splitting that into two
stages, repeated for smaller groups of files.  Firstly, replace symlinks
with hardlinks which will make it light for at least some of the errors.
Then, remove old files and jump over to the next group.  For mirrors not
using -H, this will still mean double transfer but we'd limit double
space usage to one group at a time, and only for a short period.

If any mirrors sync over rsync without using -l (talking about private
mirrors here), they will not get the new layout at all which is going to
suck for their users.


Which way do you prefer?


[1] https://www.gentoo.org/glep/glep-0075.html
[2] https://bugs.gentoo.org/534528#c38

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
  2019-10-08  3:33 [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? Michał Górny
@ 2019-10-08  7:13 ` SoEasyTo Mirrors Manager
  2019-10-08  7:23   ` Martin Kubiak (TUBS)
  2019-10-08  9:24   ` Michał Górny
  2019-10-08 13:36 ` Carlos Carvalho
  1 sibling, 2 replies; 6+ messages in thread
From: SoEasyTo Mirrors Manager @ 2019-10-08  7:13 UTC (permalink / raw
  To: gentoo-mirrors

On 2019-10-08 05:33, Michał Górny wrote :

> Hello, everyone.
> 
> TL;DR: shortly, distfiles will need to be present under two paths for
> the transitional period.  Would you prefer us using hardlinks or
> symlinks for that?
> 
> We're planning to start deploying a new GLEP 75-based [1] mirror layout
> to our mirrors soonish.  This implies a transitional period during 
> which
> we'll be using both old and new layouts, so all file entries will be
> duplicated.  The plan is roughly to:
> 
> 1. Enable new split layout in emirrordist, and start using both
> simultaneously for newly-mirrored files.
> 
> 2. Duplicate the existing distfiles to new layout.
> 
> 3. Live with both layouts for some longish time, to support people 
> using
> old Portage versions.
> 
> 4. Eventually disable the old (flat) layout and start removing files.
> 
> The basic problem is whether to use hardlinks or symlinks
> for the duplicate files.  I've elaborate more on both solutions in [2]
> but I'll summarize shortly here.
> 
> Hardlinks have the advantage that for mirrors enabling -H, they avoid
> extra space usage and extra traffic.  However, we don't really know how
> many mirrors enable that, and I suspect it's around half of them.
> At initial deployment time, rsync will just hardlink files in new 
> layout
> to existing entries, and at cleanup time it will just unlink old
> entries.
> 
> For mirrors not enabling -H, hardlinks will mean all distfiles being
> transferred again during deployment time.  Furthermore, through all
> transitional period all files will be duplicated, and so duplicated 
> will
> be space usage.  Cleanup should be lightweight though.
> 
> Symlinks have the advantage that we know that all or almost all mirrors
> enable them.  They are lightweight at deployment time since it's just
> a matter of rsync copying symlinks, and they definitely won't cause
> double space usage.  However, they will cause all files being
> retransferred at cleanup time -- due to symlinks being replaced by real
> files.
> 
> Technically, I suppose we could avoid that by splitting that into two
> stages, repeated for smaller groups of files.  Firstly, replace 
> symlinks
> with hardlinks which will make it light for at least some of the 
> errors.
> Then, remove old files and jump over to the next group.  For mirrors 
> not
> using -H, this will still mean double transfer but we'd limit double
> space usage to one group at a time, and only for a short period.
> 
> If any mirrors sync over rsync without using -l (talking about private
> mirrors here), they will not get the new layout at all which is going 
> to
> suck for their users.
> 
> Which way do you prefer?

For soeasyto mirror, we are already using both -H and --links, and the 
mirror
is hosted on a single partition, so, in order to preserve bandwith as 
you
suggested, it's better to use hardlinks, keeping in mind that could 
cause
server "overload" as per [1], but it is not an issue here.

One question remains though: how will the layout.conf be created ? Is it 
by the
mirror maintainer, or only by the master distfiles, and then all mirrors 
will
automatically replicate it ? Because it could be interesting to let the 
mirror
maintainer decide whether to use split or flat layout depending on their 
usage
of hardlinks / symlinks, and leave the choice by providing a master for 
flat,
hybrid, and split layouts ?

> [1] https://www.gentoo.org/glep/glep-0075.html
> [2] https://bugs.gentoo.org/534528#c38


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
  2019-10-08  7:13 ` SoEasyTo Mirrors Manager
@ 2019-10-08  7:23   ` Martin Kubiak (TUBS)
  2019-10-08  9:24   ` Michał Górny
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Kubiak (TUBS) @ 2019-10-08  7:23 UTC (permalink / raw
  To: gentoo-mirrors

Hi,

We do both @ ftp.rz.tu-bs.de

Martin


Von unterwegs gesendet.

> Am 08.10.2019 um 09:13 schrieb SoEasyTo Mirrors Manager <mirrors@soeasyto.com>:
> 
>> On 2019-10-08 05:33, Michał Górny wrote :
>> 
>> Hello, everyone.
>> TL;DR: shortly, distfiles will need to be present under two paths for
>> the transitional period.  Would you prefer us using hardlinks or
>> symlinks for that?
>> We're planning to start deploying a new GLEP 75-based [1] mirror layout
>> to our mirrors soonish.  This implies a transitional period during which
>> we'll be using both old and new layouts, so all file entries will be
>> duplicated.  The plan is roughly to:
>> 1. Enable new split layout in emirrordist, and start using both
>> simultaneously for newly-mirrored files.
>> 2. Duplicate the existing distfiles to new layout.
>> 3. Live with both layouts for some longish time, to support people using
>> old Portage versions.
>> 4. Eventually disable the old (flat) layout and start removing files.
>> The basic problem is whether to use hardlinks or symlinks
>> for the duplicate files.  I've elaborate more on both solutions in [2]
>> but I'll summarize shortly here.
>> Hardlinks have the advantage that for mirrors enabling -H, they avoid
>> extra space usage and extra traffic.  However, we don't really know how
>> many mirrors enable that, and I suspect it's around half of them.
>> At initial deployment time, rsync will just hardlink files in new layout
>> to existing entries, and at cleanup time it will just unlink old
>> entries.
>> For mirrors not enabling -H, hardlinks will mean all distfiles being
>> transferred again during deployment time.  Furthermore, through all
>> transitional period all files will be duplicated, and so duplicated will
>> be space usage.  Cleanup should be lightweight though.
>> Symlinks have the advantage that we know that all or almost all mirrors
>> enable them.  They are lightweight at deployment time since it's just
>> a matter of rsync copying symlinks, and they definitely won't cause
>> double space usage.  However, they will cause all files being
>> retransferred at cleanup time -- due to symlinks being replaced by real
>> files.
>> Technically, I suppose we could avoid that by splitting that into two
>> stages, repeated for smaller groups of files.  Firstly, replace symlinks
>> with hardlinks which will make it light for at least some of the errors.
>> Then, remove old files and jump over to the next group.  For mirrors not
>> using -H, this will still mean double transfer but we'd limit double
>> space usage to one group at a time, and only for a short period.
>> If any mirrors sync over rsync without using -l (talking about private
>> mirrors here), they will not get the new layout at all which is going to
>> suck for their users.
>> Which way do you prefer?
> 
> For soeasyto mirror, we are already using both -H and --links, and the mirror
> is hosted on a single partition, so, in order to preserve bandwith as you
> suggested, it's better to use hardlinks, keeping in mind that could cause
> server "overload" as per [1], but it is not an issue here.
> 
> One question remains though: how will the layout.conf be created ? Is it by the
> mirror maintainer, or only by the master distfiles, and then all mirrors will
> automatically replicate it ? Because it could be interesting to let the mirror
> maintainer decide whether to use split or flat layout depending on their usage
> of hardlinks / symlinks, and leave the choice by providing a master for flat,
> hybrid, and split layouts ?
> 
>> [1] https://www.gentoo.org/glep/glep-0075.html
>> [2] https://bugs.gentoo.org/534528#c38
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
  2019-10-08  7:13 ` SoEasyTo Mirrors Manager
  2019-10-08  7:23   ` Martin Kubiak (TUBS)
@ 2019-10-08  9:24   ` Michał Górny
  1 sibling, 0 replies; 6+ messages in thread
From: Michał Górny @ 2019-10-08  9:24 UTC (permalink / raw
  To: gentoo-mirrors, SoEasyTo Mirrors Manager

Dnia October 8, 2019 7:13:03 AM UTC, SoEasyTo Mirrors Manager <mirrors@soeasyto.com> napisał(a):
>On 2019-10-08 05:33, Michał Górny wrote :
>
>> Hello, everyone.
>> 
>> TL;DR: shortly, distfiles will need to be present under two paths for
>> the transitional period.  Would you prefer us using hardlinks or
>> symlinks for that?
>> 
>> We're planning to start deploying a new GLEP 75-based [1] mirror
>layout
>> to our mirrors soonish.  This implies a transitional period during 
>> which
>> we'll be using both old and new layouts, so all file entries will be
>> duplicated.  The plan is roughly to:
>> 
>> 1. Enable new split layout in emirrordist, and start using both
>> simultaneously for newly-mirrored files.
>> 
>> 2. Duplicate the existing distfiles to new layout.
>> 
>> 3. Live with both layouts for some longish time, to support people 
>> using
>> old Portage versions.
>> 
>> 4. Eventually disable the old (flat) layout and start removing files.
>> 
>> The basic problem is whether to use hardlinks or symlinks
>> for the duplicate files.  I've elaborate more on both solutions in
>[2]
>> but I'll summarize shortly here.
>> 
>> Hardlinks have the advantage that for mirrors enabling -H, they avoid
>> extra space usage and extra traffic.  However, we don't really know
>how
>> many mirrors enable that, and I suspect it's around half of them.
>> At initial deployment time, rsync will just hardlink files in new 
>> layout
>> to existing entries, and at cleanup time it will just unlink old
>> entries.
>> 
>> For mirrors not enabling -H, hardlinks will mean all distfiles being
>> transferred again during deployment time.  Furthermore, through all
>> transitional period all files will be duplicated, and so duplicated 
>> will
>> be space usage.  Cleanup should be lightweight though.
>> 
>> Symlinks have the advantage that we know that all or almost all
>mirrors
>> enable them.  They are lightweight at deployment time since it's just
>> a matter of rsync copying symlinks, and they definitely won't cause
>> double space usage.  However, they will cause all files being
>> retransferred at cleanup time -- due to symlinks being replaced by
>real
>> files.
>> 
>> Technically, I suppose we could avoid that by splitting that into two
>> stages, repeated for smaller groups of files.  Firstly, replace 
>> symlinks
>> with hardlinks which will make it light for at least some of the 
>> errors.
>> Then, remove old files and jump over to the next group.  For mirrors 
>> not
>> using -H, this will still mean double transfer but we'd limit double
>> space usage to one group at a time, and only for a short period.
>> 
>> If any mirrors sync over rsync without using -l (talking about
>private
>> mirrors here), they will not get the new layout at all which is going
>
>> to
>> suck for their users.
>> 
>> Which way do you prefer?
>
>For soeasyto mirror, we are already using both -H and --links, and the 
>mirror
>is hosted on a single partition, so, in order to preserve bandwith as 
>you
>suggested, it's better to use hardlinks, keeping in mind that could 
>cause
>server "overload" as per [1], but it is not an issue here.
>
>One question remains though: how will the layout.conf be created ? Is
>it 
>by the
>mirror maintainer, or only by the master distfiles, and then all
>mirrors 
>will
>automatically replicate it ? Because it could be interesting to let the
>
>mirror
>maintainer decide whether to use split or flat layout depending on
>their 
>usage
>of hardlinks / symlinks, and leave the choice by providing a master for
>
>flat,
>hybrid, and split layouts ?

We are replicating layout.conf along with distfiles from the master mirror. This makes sense for the majority of the mirrors since they're going to replicate the structure of master mirror as well.

You can technically override this locally but you'd also have to adjust fetch procedure to account for the changed layout, i.e. run your custom tooling.

If you do not rsync from master mirror and e.g. use emirrordist locally, you'd have to create layout.conf yourself. The future version of emirrordist will respect your setting there (the patches are not merged yet). 


>
>> [1] https://www.gentoo.org/glep/glep-0075.html
>> [2] https://bugs.gentoo.org/534528#c38


--
Best regards, 
Michał Górny


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
  2019-10-08  3:33 [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? Michał Górny
  2019-10-08  7:13 ` SoEasyTo Mirrors Manager
@ 2019-10-08 13:36 ` Carlos Carvalho
  2019-10-22 10:05   ` Michał Górny
  1 sibling, 1 reply; 6+ messages in thread
From: Carlos Carvalho @ 2019-10-08 13:36 UTC (permalink / raw
  To: gentoo-mirrors; +Cc: infrastructure

Michał Górny (mgorny@gentoo.org) wrote on Tue, Oct 08, 2019 at 12:33:52AM -03:
> TL;DR: shortly, distfiles will need to be present under two paths for
> the transitional period.  Would you prefer us using hardlinks or
> symlinks for that?

We (gentoo.c3sl.ufpr.br) prefer symlinks.

BTW, I think we're a packages mirror now, we'd be happy to also be a distfile.
How can we proceed to do it? Is it interesting for the community?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
  2019-10-08 13:36 ` Carlos Carvalho
@ 2019-10-22 10:05   ` Michał Górny
  0 siblings, 0 replies; 6+ messages in thread
From: Michał Górny @ 2019-10-22 10:05 UTC (permalink / raw
  To: Carlos Carvalho, gentoo-mirrors; +Cc: infrastructure

[-- Attachment #1: Type: text/plain, Size: 879 bytes --]

On Tue, 2019-10-08 at 10:36 -0300, Carlos Carvalho wrote:
> Michał Górny (mgorny@gentoo.org) wrote on Tue, Oct 08, 2019 at 12:33:52AM -03:
> > TL;DR: shortly, distfiles will need to be present under two paths for
> > the transitional period.  Would you prefer us using hardlinks or
> > symlinks for that?
> 
> We (gentoo.c3sl.ufpr.br) prefer symlinks.
> 
> BTW, I think we're a packages mirror now, we'd be happy to also be a distfile.
> How can we proceed to do it? Is it interesting for the community?

I'm sorry for replying this late.  I had hoped somebody more
knowledgeable about the process would answer.

I think the relevant documentation is at:

https://wiki.gentoo.org/wiki/Project:Infrastructure/Mirrors/Source

I can't talk for the whole community but I think they would appreciate
every new mirror ;-).

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-22 10:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-08  3:33 [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? Michał Górny
2019-10-08  7:13 ` SoEasyTo Mirrors Manager
2019-10-08  7:23   ` Martin Kubiak (TUBS)
2019-10-08  9:24   ` Michał Górny
2019-10-08 13:36 ` Carlos Carvalho
2019-10-22 10:05   ` Michał Górny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox