* [gentoo-project] RFC: Dropping rsync as a tree distribution method @ 2018-12-16 4:15 Alec Warner 2018-12-16 4:40 ` Matt Turner ` (4 more replies) 0 siblings, 5 replies; 40+ messages in thread From: Alec Warner @ 2018-12-16 4:15 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 947 bytes --] Hi, I am currently embarking on a plan to redo our existing rsync[0] mirror network. The current network has aged a bit. Its likely too large and is under-maintained. I think in the ideal case we would instead pivot this project to scaling out our git mirror capabilities and slowly migrate all consumers to pulling the git tree directly. To that end, I'm looking for blockers as to why various customers cannot switch to pulling the gentoo ebuild repository from git[1] instead of rsync. So for example: - bandwidth concerns (preferably with documentation / data.) - Firewall concerns - CPU concerns (e.g. rsync is great for tiny systems?) - Disk usage for git vs rsync - Other things i have not thought of. -A [0] This excludes emerge-webrsync; which I don't plan on touching. [1] Rich talked about some downsides earlier at https://lwn.net/Articles/759539/; but while these are challenges (some fixable) they are not necessarily blockers. [-- Attachment #2: Type: text/html, Size: 1216 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 4:15 [gentoo-project] RFC: Dropping rsync as a tree distribution method Alec Warner @ 2018-12-16 4:40 ` Matt Turner 2018-12-16 5:13 ` Georgy Yakovlev 2018-12-16 11:34 ` Rich Freeman ` (3 subsequent siblings) 4 siblings, 1 reply; 40+ messages in thread From: Matt Turner @ 2018-12-16 4:40 UTC (permalink / raw To: Gentoo project list On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > - Disk usage for git vs rsync This is why I have not switched. With git you pull down increasing amounts of history, whereas with rsync the data fits easily in a <1GB partition. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 4:40 ` Matt Turner @ 2018-12-16 5:13 ` Georgy Yakovlev 2018-12-16 5:17 ` Alec Warner ` (3 more replies) 0 siblings, 4 replies; 40+ messages in thread From: Georgy Yakovlev @ 2018-12-16 5:13 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1034 bytes --] On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > > - Disk usage for git vs rsync > > This is why I have not switched. With git you pull down increasing > amounts of history, whereas with rsync the data fits easily in a <1GB > partition. Recent portage can use sync-depth = 1 repo dir no longer grows as it used to and it's works fine unlike initial implementation that was giving trouble https://bugs.gentoo.org/552814 du -hs /var/db/repos/gentoo 350M /var/db/repos/gentoo example /etc/portage/repos.conf/gentoo.conf : [DEFAULT] main-repo = gentoo [gentoo] auto-sync = yes location = /var/db/repos/gentoo sync-type = git sync-uri = https://github.com/gentoo-mirror/gentoo.git sync-depth = 1 sync-git-clone-extra-opts = -b master sync-git-verify-commit-signature = true sync is almost instantaneous compared to rsync, but some folks not going to like github as a mirror in this case. -- Georgy Yakovlev Gentoo Linux Developer [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:13 ` Georgy Yakovlev @ 2018-12-16 5:17 ` Alec Warner 2018-12-16 6:50 ` Raymond Jennings ` (2 more replies) 2018-12-16 6:55 ` Raymond Jennings ` (2 subsequent siblings) 3 siblings, 3 replies; 40+ messages in thread From: Alec Warner @ 2018-12-16 5:17 UTC (permalink / raw To: gentoo-project, Zac Medico [-- Attachment #1: Type: text/plain, Size: 1649 bytes --] On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > > > - Disk usage for git vs rsync > > > > This is why I have not switched. With git you pull down increasing > > amounts of history, whereas with rsync the data fits easily in a <1GB > > partition. > > Recent portage can use sync-depth = 1 > repo dir no longer grows as it used to and it's works fine unlike initial > implementation that was giving trouble > > https://bugs.gentoo.org/552814 > > du -hs /var/db/repos/gentoo > 350M /var/db/repos/gentoo > > example /etc/portage/repos.conf/gentoo.conf : > [DEFAULT] > main-repo = gentoo > > [gentoo] > auto-sync = yes > location = /var/db/repos/gentoo > sync-type = git > sync-uri = https://github.com/gentoo-mirror/gentoo.git > sync-depth = 1 > sync-git-clone-extra-opts = -b master > sync-git-verify-commit-signature = true > > > sync is almost instantaneous compared to rsync, but some folks not going > to > like github as a mirror in this case. > I don't plan on using github for the mirror, so I'm not overly worried about that portion. +Zac Medico <zmedico@gentoo.org> My recollection was that git doesn't ship with ebuild metadata by default, so even if we make the first sync fast (by using depth=1 in the clone) do we have a good story for ebuild metadata? Is portage just faster than in the past for ebuilds with missing metadata? Does emerge --sync handle metadata regen for syncs with git origins? -A > > > -- > Georgy Yakovlev > Gentoo Linux Developer [-- Attachment #2: Type: text/html, Size: 2652 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:17 ` Alec Warner @ 2018-12-16 6:50 ` Raymond Jennings 2018-12-16 6:52 ` Raymond Jennings 2018-12-16 7:38 ` Zac Medico 2018-12-16 7:42 ` Zac Medico 2 siblings, 1 reply; 40+ messages in thread From: Raymond Jennings @ 2018-12-16 6:50 UTC (permalink / raw To: gentoo-project, antarus; +Cc: Zac Medico I filed a bug on this suggestion myself recently, here: https://bugs.gentoo.org/671174 The commentary there from the others may prove useful in this conversation. On Sat, Dec 15, 2018 at 9:18 PM Alec Warner <antarus@gentoo.org> wrote: > > > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: >> >> On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: >> > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: >> > > - Disk usage for git vs rsync >> > >> > This is why I have not switched. With git you pull down increasing >> > amounts of history, whereas with rsync the data fits easily in a <1GB >> > partition. >> >> Recent portage can use sync-depth = 1 >> repo dir no longer grows as it used to and it's works fine unlike initial >> implementation that was giving trouble >> >> https://bugs.gentoo.org/552814 >> >> du -hs /var/db/repos/gentoo >> 350M /var/db/repos/gentoo >> >> example /etc/portage/repos.conf/gentoo.conf : >> [DEFAULT] >> main-repo = gentoo >> >> [gentoo] >> auto-sync = yes >> location = /var/db/repos/gentoo >> sync-type = git >> sync-uri = https://github.com/gentoo-mirror/gentoo.git >> sync-depth = 1 >> sync-git-clone-extra-opts = -b master >> sync-git-verify-commit-signature = true >> >> >> sync is almost instantaneous compared to rsync, but some folks not going to >> like github as a mirror in this case. > > > I don't plan on using github for the mirror, so I'm not overly worried about that portion. > > +Zac Medico > > My recollection was that git doesn't ship with ebuild metadata by default, so even if we make the first sync fast (by using depth=1 in the clone) do we have a good story for ebuild metadata? Is portage just faster than in the past for ebuilds with missing metadata? Does emerge --sync handle metadata regen for syncs with git origins? > > -A > >> >> >> >> -- >> Georgy Yakovlev >> Gentoo Linux Developer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 6:50 ` Raymond Jennings @ 2018-12-16 6:52 ` Raymond Jennings 0 siblings, 0 replies; 40+ messages in thread From: Raymond Jennings @ 2018-12-16 6:52 UTC (permalink / raw To: gentoo-project, antarus; +Cc: Zac Medico s/on/for On Sat, Dec 15, 2018 at 10:50 PM Raymond Jennings <shentino@gmail.com> wrote: > > I filed a bug on this suggestion myself recently, here: > > https://bugs.gentoo.org/671174 > > The commentary there from the others may prove useful in this conversation. > > On Sat, Dec 15, 2018 at 9:18 PM Alec Warner <antarus@gentoo.org> wrote: > > > > > > > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > >> > >> On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > >> > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > >> > > - Disk usage for git vs rsync > >> > > >> > This is why I have not switched. With git you pull down increasing > >> > amounts of history, whereas with rsync the data fits easily in a <1GB > >> > partition. > >> > >> Recent portage can use sync-depth = 1 > >> repo dir no longer grows as it used to and it's works fine unlike initial > >> implementation that was giving trouble > >> > >> https://bugs.gentoo.org/552814 > >> > >> du -hs /var/db/repos/gentoo > >> 350M /var/db/repos/gentoo > >> > >> example /etc/portage/repos.conf/gentoo.conf : > >> [DEFAULT] > >> main-repo = gentoo > >> > >> [gentoo] > >> auto-sync = yes > >> location = /var/db/repos/gentoo > >> sync-type = git > >> sync-uri = https://github.com/gentoo-mirror/gentoo.git > >> sync-depth = 1 > >> sync-git-clone-extra-opts = -b master > >> sync-git-verify-commit-signature = true > >> > >> > >> sync is almost instantaneous compared to rsync, but some folks not going to > >> like github as a mirror in this case. > > > > > > I don't plan on using github for the mirror, so I'm not overly worried about that portion. > > > > +Zac Medico > > > > My recollection was that git doesn't ship with ebuild metadata by default, so even if we make the first sync fast (by using depth=1 in the clone) do we have a good story for ebuild metadata? Is portage just faster than in the past for ebuilds with missing metadata? Does emerge --sync handle metadata regen for syncs with git origins? > > > > -A > > > >> > >> > >> > >> -- > >> Georgy Yakovlev > >> Gentoo Linux Developer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:17 ` Alec Warner 2018-12-16 6:50 ` Raymond Jennings @ 2018-12-16 7:38 ` Zac Medico 2018-12-16 7:42 ` Zac Medico 2 siblings, 0 replies; 40+ messages in thread From: Zac Medico @ 2018-12-16 7:38 UTC (permalink / raw To: Alec Warner, gentoo-project, Zac Medico [-- Attachment #1.1: Type: text/plain, Size: 2649 bytes --] On 12/15/18 9:17 PM, Alec Warner wrote: > > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org > <mailto:gyakovlev@gentoo.org>> wrote: > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org > <mailto:antarus@gentoo.org>> wrote: > > > - Disk usage for git vs rsync > > > > This is why I have not switched. With git you pull down increasing > > amounts of history, whereas with rsync the data fits easily in a <1GB > > partition. > > Recent portage can use sync-depth = 1 > repo dir no longer grows as it used to and it's works fine unlike > initial > implementation that was giving trouble > > https://bugs.gentoo.org/552814 > > du -hs /var/db/repos/gentoo > 350M /var/db/repos/gentoo > > example /etc/portage/repos.conf/gentoo.conf : > [DEFAULT] > main-repo = gentoo > > [gentoo] > auto-sync = yes > location = /var/db/repos/gentoo > sync-type = git > sync-uri = https://github.com/gentoo-mirror/gentoo.git > sync-depth = 1 > sync-git-clone-extra-opts = -b master > sync-git-verify-commit-signature = true > > > sync is almost instantaneous compared to rsync, but some folks not > going to > like github as a mirror in this case. > > > I don't plan on using github for the mirror, so I'm not overly worried > about that portion. > > +Zac Medico <mailto:zmedico@gentoo.org> > > My recollection was that git doesn't ship with ebuild metadata by > default, so even if we make the first sync fast (by using depth=1 in the > clone) do we have a good story for ebuild metadata? Is portage just > faster than in the past for ebuilds with missing metadata? Does emerge > --sync handle metadata regen for syncs with git origins? The metadata has to be included in the git repostory, and we've currently got "master" and "stable" branches which include everything that the rsync tree has: https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=master https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=stable Both branches are also mirrored on github: https://github.com/gentoo-mirror/gentoo/commits/master https://github.com/gentoo-mirror/gentoo/commits/stable It would be interesting to see some garbage collection stats for sync-deph = 1, people using it should post the output of this command: git count-objects -v > -A > > > > > -- > Georgy Yakovlev > Gentoo Linux Developer > -- Thanks, Zac [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:17 ` Alec Warner 2018-12-16 6:50 ` Raymond Jennings 2018-12-16 7:38 ` Zac Medico @ 2018-12-16 7:42 ` Zac Medico 2018-12-18 17:28 ` Andrew Savchenko 2 siblings, 1 reply; 40+ messages in thread From: Zac Medico @ 2018-12-16 7:42 UTC (permalink / raw To: Alec Warner, gentoo-project, Zac Medico [-- Attachment #1.1: Type: text/plain, Size: 2653 bytes --] On 12/15/18 9:17 PM, Alec Warner wrote: > > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org > <mailto:gyakovlev@gentoo.org>> wrote: > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org > <mailto:antarus@gentoo.org>> wrote: > > > - Disk usage for git vs rsync > > > > This is why I have not switched. With git you pull down increasing > > amounts of history, whereas with rsync the data fits easily in a <1GB > > partition. > > Recent portage can use sync-depth = 1 > repo dir no longer grows as it used to and it's works fine unlike > initial > implementation that was giving trouble > > https://bugs.gentoo.org/552814 > > du -hs /var/db/repos/gentoo > 350M /var/db/repos/gentoo > > example /etc/portage/repos.conf/gentoo.conf : > [DEFAULT] > main-repo = gentoo > > [gentoo] > auto-sync = yes > location = /var/db/repos/gentoo > sync-type = git > sync-uri = https://github.com/gentoo-mirror/gentoo.git > sync-depth = 1 > sync-git-clone-extra-opts = -b master > sync-git-verify-commit-signature = true > > > sync is almost instantaneous compared to rsync, but some folks not > going to > like github as a mirror in this case. > > > I don't plan on using github for the mirror, so I'm not overly worried > about that portion. > > +Zac Medico <mailto:zmedico@gentoo.org> > > My recollection was that git doesn't ship with ebuild metadata by > default, so even if we make the first sync fast (by using depth=1 in the > clone) do we have a good story for ebuild metadata? Is portage just > faster than in the past for ebuilds with missing metadata? Does emerge > --sync handle metadata regen for syncs with git origins? > > -A The metadata has to be included in the git repostory, and we've currently got "master" and "stable" branches which include everything that the rsync tree has: https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=master https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=stable Both branches are also mirrored on github: https://github.com/gentoo-mirror/gentoo/commits/master https://github.com/gentoo-mirror/gentoo/commits/stable It would be interesting to see some garbage collection stats for sync-deph = 1, people using it should post the output of this command: git count-objects -v > > > > > -- > Georgy Yakovlev > Gentoo Linux Developer > -- Thanks, Zac [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 7:42 ` Zac Medico @ 2018-12-18 17:28 ` Andrew Savchenko 0 siblings, 0 replies; 40+ messages in thread From: Andrew Savchenko @ 2018-12-18 17:28 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 879 bytes --] On Sat, 15 Dec 2018 23:42:01 -0800 Zac Medico wrote: > It would be interesting to see some garbage collection stats for > sync-deph = 1, people using it should post the output of this command: > > git count-objects -v I use sync-depth = 1 for /usr/portage from git://anongit.gentoo.org/repo/sync/gentoo.git almost since its inception. So my stats are: $ git count-objects -v count: 28 size: 184 in-pack: 592843 packs: 35 size-pack: 353388 prune-packable: 20 garbage: 0 size-garbage: 0 $ du -hs /usr/portage/ --exclude=/usr/portage/packages --exclude=/usr/portage/distfiles 1.1G /usr/portage/ The largest dirs are: 157 /usr/portage/metadata/md5-cache 171 /usr/portage/metadata 346 /usr/portage/.git/objects 346 /usr/portage/.git/objects/pack 361 /usr/portage/.git 1044 /usr/portage/ Best regards, Andrew Savchenko [-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:13 ` Georgy Yakovlev 2018-12-16 5:17 ` Alec Warner @ 2018-12-16 6:55 ` Raymond Jennings 2018-12-16 10:22 ` Toralf Förster 2018-12-17 17:26 ` Matt Turner 3 siblings, 0 replies; 40+ messages in thread From: Raymond Jennings @ 2018-12-16 6:55 UTC (permalink / raw To: gentoo-project Instead of the github mirror, how about infra's native version, git://anongit.gentoo.org/repo/sync/gentoo.git? I think that one's even QA filtered and metadata primed on top of the regular dev branch hosted on github. On Sat, Dec 15, 2018 at 9:13 PM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > > > - Disk usage for git vs rsync > > > > This is why I have not switched. With git you pull down increasing > > amounts of history, whereas with rsync the data fits easily in a <1GB > > partition. > > Recent portage can use sync-depth = 1 > repo dir no longer grows as it used to and it's works fine unlike initial > implementation that was giving trouble > > https://bugs.gentoo.org/552814 > > du -hs /var/db/repos/gentoo > 350M /var/db/repos/gentoo > > example /etc/portage/repos.conf/gentoo.conf : > [DEFAULT] > main-repo = gentoo > > [gentoo] > auto-sync = yes > location = /var/db/repos/gentoo > sync-type = git > sync-uri = https://github.com/gentoo-mirror/gentoo.git > sync-depth = 1 > sync-git-clone-extra-opts = -b master > sync-git-verify-commit-signature = true > > > sync is almost instantaneous compared to rsync, but some folks not going to > like github as a mirror in this case. > > > -- > Georgy Yakovlev > Gentoo Linux Developer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:13 ` Georgy Yakovlev 2018-12-16 5:17 ` Alec Warner 2018-12-16 6:55 ` Raymond Jennings @ 2018-12-16 10:22 ` Toralf Förster 2018-12-17 17:26 ` Matt Turner 3 siblings, 0 replies; 40+ messages in thread From: Toralf Förster @ 2018-12-16 10:22 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1: Type: text/plain, Size: 347 bytes --] On 12/16/18 6:13 AM, Georgy Yakovlev wrote: > du -hs /var/db/repos/gentoo > 350M /var/db/repos/gentoo I do have # du -hs /var/db/repos/* 667M /var/db/repos/gentoo 2.0M /var/db/repos/libressl 28K /var/db/repos/local but except that I like your config (and BTW the new repo path). -- Toralf PGP 23217DA7 9B888F45 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 5:13 ` Georgy Yakovlev ` (2 preceding siblings ...) 2018-12-16 10:22 ` Toralf Förster @ 2018-12-17 17:26 ` Matt Turner 2018-12-17 17:43 ` Raymond Jennings 3 siblings, 1 reply; 40+ messages in thread From: Matt Turner @ 2018-12-17 17:26 UTC (permalink / raw To: Gentoo project list On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > > > - Disk usage for git vs rsync > > > > This is why I have not switched. With git you pull down increasing > > amounts of history, whereas with rsync the data fits easily in a <1GB > > partition. > > Recent portage can use sync-depth = 1 > repo dir no longer grows as it used to and it's works fine unlike initial > implementation that was giving trouble > > https://bugs.gentoo.org/552814 > > du -hs /var/db/repos/gentoo > 350M /var/db/repos/gentoo > > example /etc/portage/repos.conf/gentoo.conf : > [DEFAULT] > main-repo = gentoo > > [gentoo] > auto-sync = yes > location = /var/db/repos/gentoo > sync-type = git > sync-uri = https://github.com/gentoo-mirror/gentoo.git > sync-depth = 1 > sync-git-clone-extra-opts = -b master > sync-git-verify-commit-signature = true > > > sync is almost instantaneous compared to rsync, but some folks not going to > like github as a mirror in this case. Thanks for the information. That seems to work great! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-17 17:26 ` Matt Turner @ 2018-12-17 17:43 ` Raymond Jennings 2018-12-18 3:57 ` Georgy Yakovlev 0 siblings, 1 reply; 40+ messages in thread From: Raymond Jennings @ 2018-12-17 17:43 UTC (permalink / raw To: gentoo-project On Mon, Dec 17, 2018 at 9:26 AM Matt Turner <mattst88@gentoo.org> wrote: > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > > > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > > > > - Disk usage for git vs rsync > > > > > > This is why I have not switched. With git you pull down increasing > > > amounts of history, whereas with rsync the data fits easily in a <1GB > > > partition. > > > > Recent portage can use sync-depth = 1 > > repo dir no longer grows as it used to and it's works fine unlike initial > > implementation that was giving trouble > > > > https://bugs.gentoo.org/552814 > > > > du -hs /var/db/repos/gentoo > > 350M /var/db/repos/gentoo > > > > example /etc/portage/repos.conf/gentoo.conf : > > [DEFAULT] > > main-repo = gentoo > > > > [gentoo] > > auto-sync = yes > > location = /var/db/repos/gentoo > > sync-type = git > > sync-uri = https://github.com/gentoo-mirror/gentoo.git > > sync-depth = 1 > > sync-git-clone-extra-opts = -b master > > sync-git-verify-commit-signature = true > > > > > > sync is almost instantaneous compared to rsync, but some folks not going to > > like github as a mirror in this case. Would I be correct to say they won't need github if they use infra's own native anongit server? > Thanks for the information. That seems to work great! > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-17 17:43 ` Raymond Jennings @ 2018-12-18 3:57 ` Georgy Yakovlev 2018-12-18 4:02 ` Raymond Jennings ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Georgy Yakovlev @ 2018-12-18 3:57 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1877 bytes --] On Monday, December 17, 2018 9:43:05 AM PST Raymond Jennings wrote: > On Mon, Dec 17, 2018 at 9:26 AM Matt Turner <mattst88@gentoo.org> wrote: > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> wrote: > > > > > - Disk usage for git vs rsync > > > > > > > > This is why I have not switched. With git you pull down increasing > > > > amounts of history, whereas with rsync the data fits easily in a <1GB > > > > partition. > > > > > > Recent portage can use sync-depth = 1 > > > repo dir no longer grows as it used to and it's works fine unlike > > > initial > > > implementation that was giving trouble > > > > > > https://bugs.gentoo.org/552814 > > > > > > du -hs /var/db/repos/gentoo > > > 350M /var/db/repos/gentoo > > > > > > example /etc/portage/repos.conf/gentoo.conf : > > > [DEFAULT] > > > main-repo = gentoo > > > > > > [gentoo] > > > auto-sync = yes > > > location = /var/db/repos/gentoo > > > sync-type = git > > > sync-uri = https://github.com/gentoo-mirror/gentoo.git > > > sync-depth = 1 > > > sync-git-clone-extra-opts = -b master > > > sync-git-verify-commit-signature = true > > > > > > > > > sync is almost instantaneous compared to rsync, but some folks not going > > > to > > > like github as a mirror in this case. > > Would I be correct to say they won't need github if they use infra's > own native anongit server? I'm guessing, but probably infra server is not supposed to handle load from all the users and will temporarily ban if one tries to sync more than several times per day (like rsync master does). But don't quote me on that, better ask infra. > > > Thanks for the information. That seems to work great! -- Georgy Yakovlev Gentoo Linux Developer [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 3:57 ` Georgy Yakovlev @ 2018-12-18 4:02 ` Raymond Jennings 2018-12-18 8:06 ` Robin H. Johnson 2018-12-20 1:18 ` Kent Fredric 2 siblings, 0 replies; 40+ messages in thread From: Raymond Jennings @ 2018-12-18 4:02 UTC (permalink / raw To: gentoo-project My assumption here is that infra is the one hosting anongit.gentoo.org On Mon, Dec 17, 2018 at 7:57 PM Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > > On Monday, December 17, 2018 9:43:05 AM PST Raymond Jennings wrote: > > On Mon, Dec 17, 2018 at 9:26 AM Matt Turner <mattst88@gentoo.org> wrote: > > > On Sun, Dec 16, 2018 at 12:13 AM Georgy Yakovlev <gyakovlev@gentoo.org> > wrote: > > > > On Saturday, December 15, 2018 8:40:38 PM PST Matt Turner wrote: > > > > > On Sat, Dec 15, 2018 at 11:16 PM Alec Warner <antarus@gentoo.org> > wrote: > > > > > > - Disk usage for git vs rsync > > > > > > > > > > This is why I have not switched. With git you pull down increasing > > > > > amounts of history, whereas with rsync the data fits easily in a <1GB > > > > > partition. > > > > > > > > Recent portage can use sync-depth = 1 > > > > repo dir no longer grows as it used to and it's works fine unlike > > > > initial > > > > implementation that was giving trouble > > > > > > > > https://bugs.gentoo.org/552814 > > > > > > > > du -hs /var/db/repos/gentoo > > > > 350M /var/db/repos/gentoo > > > > > > > > example /etc/portage/repos.conf/gentoo.conf : > > > > [DEFAULT] > > > > main-repo = gentoo > > > > > > > > [gentoo] > > > > auto-sync = yes > > > > location = /var/db/repos/gentoo > > > > sync-type = git > > > > sync-uri = https://github.com/gentoo-mirror/gentoo.git > > > > sync-depth = 1 > > > > sync-git-clone-extra-opts = -b master > > > > sync-git-verify-commit-signature = true > > > > > > > > > > > > sync is almost instantaneous compared to rsync, but some folks not going > > > > to > > > > like github as a mirror in this case. > > > > Would I be correct to say they won't need github if they use infra's > > own native anongit server? > I'm guessing, but probably infra server is not supposed to handle load from > all the users and will temporarily ban if one tries to sync more than several > times per day (like rsync master does). But don't quote me on that, better ask > infra. > > > > > > Thanks for the information. That seems to work great! > > > -- > Georgy Yakovlev > Gentoo Linux Developer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 3:57 ` Georgy Yakovlev 2018-12-18 4:02 ` Raymond Jennings @ 2018-12-18 8:06 ` Robin H. Johnson 2018-12-20 1:18 ` Kent Fredric 2 siblings, 0 replies; 40+ messages in thread From: Robin H. Johnson @ 2018-12-18 8:06 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1002 bytes --] On Mon, Dec 17, 2018 at 07:57:21PM -0800, Georgy Yakovlev wrote: > > Would I be correct to say they won't need github if they use infra's > > own native anongit server? > I'm guessing, but probably infra server is not supposed to handle load from > all the users and will temporarily ban if one tries to sync more than several > times per day (like rsync master does). But don't quote me on that, better ask > infra. anongit.gentoo.org is already 3 servers, depending where in the world you are. It would continue to scale: possibly selectively (some instances only having a subset of repos). Beyond that, I could also see offering pre-built git-bundle outputs as snapshot points, specifically because they can be mirrored as static files by HTTP systems. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 1113 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 3:57 ` Georgy Yakovlev 2018-12-18 4:02 ` Raymond Jennings 2018-12-18 8:06 ` Robin H. Johnson @ 2018-12-20 1:18 ` Kent Fredric 2 siblings, 0 replies; 40+ messages in thread From: Kent Fredric @ 2018-12-20 1:18 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1026 bytes --] On Mon, 17 Dec 2018 19:57:21 -0800 Georgy Yakovlev <gyakovlev@gentoo.org> wrote: > I'm guessing, but probably infra server is not supposed to handle load from > all the users and will temporarily ban if one tries to sync more than several > times per day (like rsync master does). But don't quote me on that, better ask > infra. I'd imagine the server requirements with regard to load, is less for git than it is for rsync. Partly, because I believe rsync's require tree traversal, and dynamic checksumming of data on the server side for each sync. Whereas with Git, that checksumming and traversal are essentially precomputed, and the backing store can be efficiently condensed to a single file, with much more efficient IO. That is, instead of iterating through 9k+ inodes, it just opens the one and chases the parent SHA1 chains. Then your restrictions seem to amount to total bandwidth available, with a little CPU and IO overhead, as opposed to a larger bandwith, CPU and IO requirement. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 4:15 [gentoo-project] RFC: Dropping rsync as a tree distribution method Alec Warner 2018-12-16 4:40 ` Matt Turner @ 2018-12-16 11:34 ` Rich Freeman 2018-12-16 21:10 ` Matthew Thode 2018-12-20 1:26 ` Kent Fredric 2018-12-16 17:15 ` Toralf Förster ` (2 subsequent siblings) 4 siblings, 2 replies; 40+ messages in thread From: Rich Freeman @ 2018-12-16 11:34 UTC (permalink / raw To: gentoo-project On Sat, Dec 15, 2018 at 11:15 PM Alec Warner <antarus@gentoo.org> wrote: > > [1] Rich talked about some downsides earlier at https://lwn.net/Articles/759539/; but while these are challenges (some fixable) they are not necessarily blockers. The thread has already touched on a few of those comments. Despite only six months elapsing since I wrote that email, #1 no longer applies, and it sounds like #4 may not be as much of a concern. As you've already stated #3 can be easily addressed - setting up a git mirror is very easy. I think #2 is more of a fundamental design difference that probably will never go away. If your tree is a year old then git WILL take longer and transfer more data than rsync. My guess is that it will also cost more IO server-side than rsync, but it probably will be cheaper in CPU. However, I bet that 95% of our users sync weekly or daily and in that use case it is going to go a lot faster, and probably be less mirror load as well, and it will be a TON less IO load on the client side. I'm not sure how much IO cost there is to git garbage collection - that might offset this in the common shallow clone scenario. I'd suggest that those with concerns give it a shot using Zac's suggested settings and see how it goes. Really all you have to do is delete your local repo and adjust your sync settings and resync. I think the local disk use is going to be the biggest source of user objection and I'm interested in what people observe here. -- Rich ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 11:34 ` Rich Freeman @ 2018-12-16 21:10 ` Matthew Thode 2018-12-20 1:26 ` Kent Fredric 1 sibling, 0 replies; 40+ messages in thread From: Matthew Thode @ 2018-12-16 21:10 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1956 bytes --] On 18-12-16 06:34:07, Rich Freeman wrote: > On Sat, Dec 15, 2018 at 11:15 PM Alec Warner <antarus@gentoo.org> wrote: > > > > [1] Rich talked about some downsides earlier at https://lwn.net/Articles/759539/; but while these are challenges (some fixable) they are not necessarily blockers. > > The thread has already touched on a few of those comments. Despite > only six months elapsing since I wrote that email, #1 no longer > applies, and it sounds like #4 may not be as much of a concern. As > you've already stated #3 can be easily addressed - setting up a git > mirror is very easy. > > I think #2 is more of a fundamental design difference that probably > will never go away. If your tree is a year old then git WILL take > longer and transfer more data than rsync. My guess is that it will > also cost more IO server-side than rsync, but it probably will be > cheaper in CPU. However, I bet that 95% of our users sync weekly or > daily and in that use case it is going to go a lot faster, and > probably be less mirror load as well, and it will be a TON less IO > load on the client side. I'm not sure how much IO cost there is to > git garbage collection - that might offset this in the common shallow > clone scenario. > > I'd suggest that those with concerns give it a shot using Zac's > suggested settings and see how it goes. Really all you have to do is > delete your local repo and adjust your sync settings and resync. I > think the local disk use is going to be the biggest source of user > objection and I'm interested in what people observe here. > I wonder if we can add a little logic to help at least a little bit on the yearly syncers. If over a 6 months, remove old git sync'd dir and replance with new shallow clone? Not perfect, but workable maybe. Do we need to tell users to set up a git gc cron job or does portage handle that for us now? -- Matthew Thode (prometheanfire) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 11:34 ` Rich Freeman 2018-12-16 21:10 ` Matthew Thode @ 2018-12-20 1:26 ` Kent Fredric 1 sibling, 0 replies; 40+ messages in thread From: Kent Fredric @ 2018-12-20 1:26 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 836 bytes --] On Sun, 16 Dec 2018 06:34:07 -0500 Rich Freeman <rich0@gentoo.org> wrote: > My guess is that it will > also cost more IO server-side than rsync, Surely that's dependent on how much of the rsync mirror is retained in the VFS cache, and how efficiently the server in question avoids paging. To the best of my understanding, server-side of rsync requires IO on *thousands* of files, (lots of stat, open(), checksum), whereas server-side for git can be reduced to only a handful of large files (packs). Even if we assume in both cases everything needed fits in VFS cache, the rsync option still has reams of stat and open syscalls, that the git option avoids, surely. ( My observations made with vmtouch indicate that git doesn't even need to load the entire pack into memory for a large majority of operations ) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 4:15 [gentoo-project] RFC: Dropping rsync as a tree distribution method Alec Warner 2018-12-16 4:40 ` Matt Turner 2018-12-16 11:34 ` Rich Freeman @ 2018-12-16 17:15 ` Toralf Förster 2018-12-16 17:38 ` M. J. Everitt 2018-12-18 9:55 ` Andrew Savchenko 2018-12-18 18:14 ` Brian Evans 4 siblings, 1 reply; 40+ messages in thread From: Toralf Förster @ 2018-12-16 17:15 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1: Type: text/plain, Size: 183 bytes --] On 12/16/18 5:15 AM, Alec Warner wrote: > - Other things i have not thought of. > IMO git is not in the current stage3 image, isn't it? -- Toralf PGP 23217DA7 9B888F45 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 17:15 ` Toralf Förster @ 2018-12-16 17:38 ` M. J. Everitt 2018-12-16 18:05 ` M. J. Everitt 0 siblings, 1 reply; 40+ messages in thread From: M. J. Everitt @ 2018-12-16 17:38 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1: Type: text/plain, Size: 300 bytes --] On 16/12/18 17:15, Toralf Förster wrote: > On 12/16/18 5:15 AM, Alec Warner wrote: >> - Other things i have not thought of. >> > IMO git is not in the current stage3 image, isn't it? > > It's certainly not in the current install ISO images .. pretty sure its not in stage3 either IIRC... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 17:38 ` M. J. Everitt @ 2018-12-16 18:05 ` M. J. Everitt 2018-12-16 18:36 ` Rich Freeman 0 siblings, 1 reply; 40+ messages in thread From: M. J. Everitt @ 2018-12-16 18:05 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1: Type: text/plain, Size: 674 bytes --] On 16/12/18 17:38, M. J. Everitt wrote: > On 16/12/18 17:15, Toralf Förster wrote: >> On 12/16/18 5:15 AM, Alec Warner wrote: >>> - Other things i have not thought of. >>> >> IMO git is not in the current stage3 image, isn't it? >> >> > It's certainly not in the current install ISO images .. pretty sure its not > in stage3 either IIRC... > Nor is GPG at present either .. in case you start having more thoughts about increasing @system's scope (enjoy the bikeshed on that). FWIW, there are issues with eg. git with musl libc, so that wants sorting out whilst you're at it .. (although its one motivation to get the musl patches into the main tree ..) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 18:05 ` M. J. Everitt @ 2018-12-16 18:36 ` Rich Freeman 2018-12-16 18:41 ` M. J. Everitt 0 siblings, 1 reply; 40+ messages in thread From: Rich Freeman @ 2018-12-16 18:36 UTC (permalink / raw To: gentoo-project On Sun, Dec 16, 2018 at 1:05 PM M. J. Everitt <m.j.everitt@iee.org> wrote: > > Nor is GPG at present either .. in case you start having more thoughts > about increasing @system's scope (enjoy the bikeshed on that). > If we are going to do this might I suggest that it would be nice to create a new set for things that we want to be present by default, but which are not part of @system. Some things like a libc virtual make more sense in @system. You can't run without them, and devs don't want to specify them as dependencies (though I personally think we'd be better served by making them explicit deps anyway). However, there are always things like editors, sshd, and now gpg/git/etc that are sensible defaults, but there really is no harm if you uninstall them and no reason to give them special treatment for parallel builds or dependency specifications. So, having an additional set would make sense. This set would be part of the stage3 and livecd, but could be more easily uninstalled without as many scary warnings, and dependencies would have to be explicit, and parallel builds would work fine. So, how is that for a bikeshed? -- Rich ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 18:36 ` Rich Freeman @ 2018-12-16 18:41 ` M. J. Everitt 0 siblings, 0 replies; 40+ messages in thread From: M. J. Everitt @ 2018-12-16 18:41 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1: Type: text/plain, Size: 1848 bytes --] On 16/12/18 18:36, Rich Freeman wrote: > On Sun, Dec 16, 2018 at 1:05 PM M. J. Everitt <m.j.everitt@iee.org> wrote: >> Nor is GPG at present either .. in case you start having more thoughts >> about increasing @system's scope (enjoy the bikeshed on that). >> > If we are going to do this might I suggest that it would be nice to > create a new set for things that we want to be present by default, but > which are not part of @system. > > Some things like a libc virtual make more sense in @system. You can't > run without them, and devs don't want to specify them as dependencies > (though I personally think we'd be better served by making them > explicit deps anyway). > > However, there are always things like editors, sshd, and now > gpg/git/etc that are sensible defaults, but there really is no harm if > you uninstall them and no reason to give them special treatment for > parallel builds or dependency specifications. So, having an > additional set would make sense. This set would be part of the stage3 > and livecd, but could be more easily uninstalled without as many scary > warnings, and dependencies would have to be explicit, and parallel > builds would work fine. > > So, how is that for a bikeshed? > By the same token, the standard install image should become a stage4 with all these extra components included, and leave the existing stage3 as a bare-bones image. I've long thought that a system logger, ssh and one or two other packages should be 'core tools' in the stage3 (and have a custom stage4 spec set up for this all-but) but I hear the argument that the @system set should be genuinely minimal (and is already excessive with an init system for container installs) so perhaps I'm opening up the bikeshed here for a bigger debate/discussion on the 'correct' way forward here ... [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 4:15 [gentoo-project] RFC: Dropping rsync as a tree distribution method Alec Warner ` (2 preceding siblings ...) 2018-12-16 17:15 ` Toralf Förster @ 2018-12-18 9:55 ` Andrew Savchenko 2018-12-18 11:36 ` Raymond Jennings ` (2 more replies) 2018-12-18 18:14 ` Brian Evans 4 siblings, 3 replies; 40+ messages in thread From: Andrew Savchenko @ 2018-12-18 9:55 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1152 bytes --] On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: > Hi, > > I am currently embarking on a plan to redo our existing rsync[0] mirror > network. The current network has aged a bit. Its likely too large and is > under-maintained. I think in the ideal case we would instead pivot this > project to scaling out our git mirror capabilities and slowly migrate all > consumers to pulling the git tree directly. To that end, I'm looking for > blockers as to why various customers cannot switch to pulling the gentoo > ebuild repository from git[1] instead of rsync. > > So for example: > > - bandwidth concerns (preferably with documentation / data.) > - Firewall concerns > - CPU concerns (e.g. rsync is great for tiny systems?) > - Disk usage for git vs rsync > - Other things i have not thought of. My main concern with git is downlink fault tolerance. If rsync connection is broken, it can be easily restored without much data retransmission. If git download connection is broken, it has to start all over again. So there are cases where rsync will be always much more preferable than git. Best regards, Andrew Savchenko [-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 9:55 ` Andrew Savchenko @ 2018-12-18 11:36 ` Raymond Jennings 2018-12-18 17:14 ` Andrew Savchenko 2018-12-18 11:55 ` Michał Górny 2018-12-20 1:43 ` Kent Fredric 2 siblings, 1 reply; 40+ messages in thread From: Raymond Jennings @ 2018-12-18 11:36 UTC (permalink / raw To: gentoo-project On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@gentoo.org> wrote: > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: > > Hi, > > > > I am currently embarking on a plan to redo our existing rsync[0] mirror > > network. The current network has aged a bit. Its likely too large and is > > under-maintained. I think in the ideal case we would instead pivot this > > project to scaling out our git mirror capabilities and slowly migrate all > > consumers to pulling the git tree directly. To that end, I'm looking for > > blockers as to why various customers cannot switch to pulling the gentoo > > ebuild repository from git[1] instead of rsync. > > > > So for example: > > > > - bandwidth concerns (preferably with documentation / data.) > > - Firewall concerns > > - CPU concerns (e.g. rsync is great for tiny systems?) > > - Disk usage for git vs rsync > > - Other things i have not thought of. > > My main concern with git is downlink fault tolerance. If rsync > connection is broken, it can be easily restored without much data > retransmission. If git download connection is broken, it has to > start all over again. So there are cases where rsync will be always > much more preferable than git. Are you talking about in comparison to the initial clone? If so, would having the clone default to shallow mitigate this? For the curious, I ran a benchmark. With a completely purged /usr/portage: emerge-webrsync took 30.302s emerge-sync (with git clone --depth 1) took 33.902s emerge-sync (with regular rsync) took a whoping 1m25.863s After a fresh sync: emerge-sync (with regular rsync) took 7.564s emerge-sync (with git fetch --depth 1, and after priming the repo with a full clone) took 2.086s Up front, webrsync seems to be a small winner for initial setups, with git clone a close second, and regular rsync is 3 fold worse Routine syncs would seem to prefer git, especially if they are done with presistent regularity which IMO would amortize things. My opinion is that over time git would also place less stress on the servers since it only has to look at the commit chain instead of checksumming every single file. That said, would I be correct to surmise that you're advancing a robustness issue and not simply a performance issue? > Best regards, > Andrew Savchenko ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 11:36 ` Raymond Jennings @ 2018-12-18 17:14 ` Andrew Savchenko 2018-12-18 18:00 ` Alec Warner 0 siblings, 1 reply; 40+ messages in thread From: Andrew Savchenko @ 2018-12-18 17:14 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 2923 bytes --] On Tue, 18 Dec 2018 03:36:14 -0800 Raymond Jennings wrote: > On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@gentoo.org> wrote: > > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: > > > Hi, > > > > > > I am currently embarking on a plan to redo our existing rsync[0] mirror > > > network. The current network has aged a bit. Its likely too large and is > > > under-maintained. I think in the ideal case we would instead pivot this > > > project to scaling out our git mirror capabilities and slowly migrate all > > > consumers to pulling the git tree directly. To that end, I'm looking for > > > blockers as to why various customers cannot switch to pulling the gentoo > > > ebuild repository from git[1] instead of rsync. > > > > > > So for example: > > > > > > - bandwidth concerns (preferably with documentation / data.) > > > - Firewall concerns > > > - CPU concerns (e.g. rsync is great for tiny systems?) > > > - Disk usage for git vs rsync > > > - Other things i have not thought of. > > > > My main concern with git is downlink fault tolerance. If rsync > > connection is broken, it can be easily restored without much data > > retransmission. If git download connection is broken, it has to > > start all over again. So there are cases where rsync will be always > > much more preferable than git. > > Are you talking about in comparison to the initial clone? > If so, would having the clone default to shallow mitigate this? > > For the curious, I ran a benchmark. > > With a completely purged /usr/portage: > > emerge-webrsync took 30.302s > emerge-sync (with git clone --depth 1) took 33.902s > emerge-sync (with regular rsync) took a whoping 1m25.863s > > After a fresh sync: > > emerge-sync (with regular rsync) took 7.564s > emerge-sync (with git fetch --depth 1, and after priming the repo with > a full clone) took 2.086s > > > > Up front, webrsync seems to be a small winner for initial setups, with > git clone a close second, and regular rsync is 3 fold worse > > Routine syncs would seem to prefer git, especially if they are done > with presistent regularity which IMO would amortize things. My > opinion is that over time git would also place less stress on the > servers since it only has to look at the commit chain instead of > checksumming every single file. > > > > That said, would I be correct to surmise that you're advancing a > robustness issue and not simply a performance issue? Yes, my interest here is in robustness, not performance. Sometimes I have to use unreliable uplink and other users may face the same problem. I agree that in most cases git should be a preferred way to go, but there are exceptions. So it would be nice to have rsync backup just in case. Daily or weekly portage snapshots available via rsync should be a solution as well. Best regards, Andrew Savchenko [-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 17:14 ` Andrew Savchenko @ 2018-12-18 18:00 ` Alec Warner 2018-12-18 22:13 ` M. J. Everitt 0 siblings, 1 reply; 40+ messages in thread From: Alec Warner @ 2018-12-18 18:00 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 4133 bytes --] On Tue, Dec 18, 2018 at 12:14 PM Andrew Savchenko <bircoph@gentoo.org> wrote: > On Tue, 18 Dec 2018 03:36:14 -0800 Raymond Jennings wrote: > > On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@gentoo.org> > wrote: > > > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: > > > > Hi, > > > > > > > > I am currently embarking on a plan to redo our existing rsync[0] > mirror > > > > network. The current network has aged a bit. Its likely too large > and is > > > > under-maintained. I think in the ideal case we would instead pivot > this > > > > project to scaling out our git mirror capabilities and slowly > migrate all > > > > consumers to pulling the git tree directly. To that end, I'm looking > for > > > > blockers as to why various customers cannot switch to pulling the > gentoo > > > > ebuild repository from git[1] instead of rsync. > > > > > > > > So for example: > > > > > > > > - bandwidth concerns (preferably with documentation / data.) > > > > - Firewall concerns > > > > - CPU concerns (e.g. rsync is great for tiny systems?) > > > > - Disk usage for git vs rsync > > > > - Other things i have not thought of. > > > > > > My main concern with git is downlink fault tolerance. If rsync > > > connection is broken, it can be easily restored without much data > > > retransmission. If git download connection is broken, it has to > > > start all over again. So there are cases where rsync will be always > > > much more preferable than git. > > > > Are you talking about in comparison to the initial clone? > > If so, would having the clone default to shallow mitigate this? > > > > For the curious, I ran a benchmark. > > > > With a completely purged /usr/portage: > > > > emerge-webrsync took 30.302s > > emerge-sync (with git clone --depth 1) took 33.902s > > emerge-sync (with regular rsync) took a whoping 1m25.863s > > > > After a fresh sync: > > > > emerge-sync (with regular rsync) took 7.564s > > emerge-sync (with git fetch --depth 1, and after priming the repo with > > a full clone) took 2.086s > > > > > > > > Up front, webrsync seems to be a small winner for initial setups, with > > git clone a close second, and regular rsync is 3 fold worse > > > > Routine syncs would seem to prefer git, especially if they are done > > with presistent regularity which IMO would amortize things. My > > opinion is that over time git would also place less stress on the > > servers since it only has to look at the commit chain instead of > > checksumming every single file. > > > > > > > > That said, would I be correct to surmise that you're advancing a > > robustness issue and not simply a performance issue? > > Yes, my interest here is in robustness, not performance. Sometimes I > have to use unreliable uplink and other users may face the same > problem. > > I agree that in most cases git should be a preferred way to go, but > there are exceptions. So it would be nice to have rsync backup just > in case. > Daily or weekly portage snapshots available via rsync should be a > solution as well. > Two things here. One is that in an ideal world we would run no rsync service and any design should keep that outcome in mind. Operationally we should continue to offer rsync until these types of problems are addressed by the new system. The second is that in this case I think the plan is to, as Robin mentioned, offer "git bundles" that are over raw http and support resume-able downloads. So instead of downloading an "rsync snapshot" you download a git bundle over http. Infra would offer these git bundles in a similar way to existing rsync snapshot offerings[0]. These bundles would be applied to a machine local clone of a git repo. Does this conceptually address your problem? I agree it will be difficult to know outside of actual practical testing. -A [0] http://gentoo.ussg.indiana.edu/snapshots/ is one example of the current system. Instead of tarballs of an 'rsync tree' these would be git bundles[1] that you fetch and apply locally. We would support a worldwide mirror network for these bundles. [1] https://git-scm.com/docs/git-bundle > > Best regards, > Andrew Savchenko > [-- Attachment #2: Type: text/html, Size: 5570 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 18:00 ` Alec Warner @ 2018-12-18 22:13 ` M. J. Everitt 0 siblings, 0 replies; 40+ messages in thread From: M. J. Everitt @ 2018-12-18 22:13 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1.1: Type: text/plain, Size: 2428 bytes --] On 18/12/18 18:00, Alec Warner wrote: > > On Tue, Dec 18, 2018 at 12:14 PM Andrew Savchenko <bircoph@gentoo.org > <mailto:bircoph@gentoo.org>> wrote: > > > > > > That said, would I be correct to surmise that you're advancing a > > robustness issue and not simply a performance issue? > > Yes, my interest here is in robustness, not performance. Sometimes I > have to use unreliable uplink and other users may face the same > problem. > > I agree that in most cases git should be a preferred way to go, but > there are exceptions. So it would be nice to have rsync backup just > in case. > > > Daily or weekly portage snapshots available via rsync should be a > solution as well. > > > Two things here. One is that in an ideal world we would run no rsync > service and any design should keep that outcome in mind. Operationally we > should continue to offer rsync until these types of problems are > addressed by the new system. > > The second is that in this case I think the plan is to, as Robin > mentioned, offer "git bundles" that are over raw http and support > resume-able downloads. So instead of downloading an "rsync snapshot" you > download a git bundle over http. Infra would offer these git bundles in a > similar way to existing rsync snapshot offerings[0]. These bundles would > be applied to a machine local clone of a git repo. Does this conceptually > address your problem? I agree it will be difficult to know outside of > actual practical testing. > > -A > > [0] http://gentoo.ussg.indiana.edu/snapshots/ is one example of the > current system. Instead of tarballs of an 'rsync tree' these would be git > bundles[1] that you fetch and apply locally. We would support a worldwide > mirror network for these bundles. > [1] https://git-scm.com/docs/git-bundle > > > > Best regards, > Andrew Savchenko > I'm inclined to suggest that perhaps you set up the necessary infra to do the git bundles, etc, and we give it a trial - we can postulate and pontificate as long as we like (otherwise known simply as 'bikeshedding') .. but we'll have no "real world data" until we actually implement it (and discover all the pitfalls en route). We then have to option of pushing through the migration process if it works, or we revert back if it doesn't. How does that grab you Alec?! :) MJE/veremitz. [-- Attachment #1.1.2: Type: text/html, Size: 4863 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 9:55 ` Andrew Savchenko 2018-12-18 11:36 ` Raymond Jennings @ 2018-12-18 11:55 ` Michał Górny 2018-12-20 1:43 ` Kent Fredric 2 siblings, 0 replies; 40+ messages in thread From: Michał Górny @ 2018-12-18 11:55 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1487 bytes --] On Tue, 2018-12-18 at 12:55 +0300, Andrew Savchenko wrote: > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: > > Hi, > > > > I am currently embarking on a plan to redo our existing rsync[0] mirror > > network. The current network has aged a bit. Its likely too large and is > > under-maintained. I think in the ideal case we would instead pivot this > > project to scaling out our git mirror capabilities and slowly migrate all > > consumers to pulling the git tree directly. To that end, I'm looking for > > blockers as to why various customers cannot switch to pulling the gentoo > > ebuild repository from git[1] instead of rsync. > > > > So for example: > > > > - bandwidth concerns (preferably with documentation / data.) > > - Firewall concerns > > - CPU concerns (e.g. rsync is great for tiny systems?) > > - Disk usage for git vs rsync > > - Other things i have not thought of. > > My main concern with git is downlink fault tolerance. If rsync > connection is broken, it can be easily restored without much data > retransmission. If git download connection is broken, it has to > start all over again. So there are cases where rsync will be always > much more preferable than git. > I think this mostly applies to the initial clone, and in this case the git bundles (that will be) offered by Infra should solve it. You'd download them over regular HTTP(S) connection which you can freely resume. -- Best regards, Michał Górny [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 963 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 9:55 ` Andrew Savchenko 2018-12-18 11:36 ` Raymond Jennings 2018-12-18 11:55 ` Michał Górny @ 2018-12-20 1:43 ` Kent Fredric 2018-12-20 2:33 ` Rich Freeman 2 siblings, 1 reply; 40+ messages in thread From: Kent Fredric @ 2018-12-20 1:43 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1123 bytes --] On Tue, 18 Dec 2018 12:55:55 +0300 Andrew Savchenko <bircoph@gentoo.org> wrote: > My main concern with git is downlink fault tolerance. If rsync > connection is broken, it can be easily restored without much data > retransmission. If git download connection is broken, it has to > start all over again. So there are cases where rsync will be always > much more preferable than git. I suspect there's a mechanism available to get git to sync forward only "n-much", but not entirely sure. I'll have to re-read and re-comprehend `git help fetch` though to be sure. But if there was, an alternative for "I have problems with links flaking" would be to do batches of smaller fast-forwards. This option would *theoretically* be equivalent to having published bundles, except of course allowing you to jump forward an arbitrary step-size. I suspect a published list of SHA1's broken down by time might also help here in conjunction with passing required ones as "refspec" values to fetch, which would also approximate the bundle strategy, albeit using substantially less server-side storage space. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-20 1:43 ` Kent Fredric @ 2018-12-20 2:33 ` Rich Freeman 2018-12-20 16:21 ` Kent Fredric 0 siblings, 1 reply; 40+ messages in thread From: Rich Freeman @ 2018-12-20 2:33 UTC (permalink / raw To: gentoo-project On Wed, Dec 19, 2018 at 8:43 PM Kent Fredric <kentnl@gentoo.org> wrote: > > I suspect a published list of SHA1's broken down by time might also > help here in conjunction with passing required ones as "refspec" values > to fetch, which would also approximate the bundle strategy, albeit > using substantially less server-side storage space. I'm not sure how necessary this is, but another way to do this is to just use tags, perhaps date-based (eg year-month). Perhaps this could be combined with some level of QA as well to ensure the tree is clean at the time it was tagged. From the command line this would be simpler than copy/pasting hashes from some webpage, but it obviously clutters the repo. Granted, it isn't much clutter if you only do it monthly. Git fetch does not seem to support any kind of relative refspec. You need a hash/branch/tag/ref. Git ls-remote just lists refs and not history. If super-unreliable connections are the concern it probably would be cleaner to just use the previous suggestion of providing bundles with resume support. They can be downloaded and then pulled/fetched from. Do we really have that much of a need for this? -- Rich ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-20 2:33 ` Rich Freeman @ 2018-12-20 16:21 ` Kent Fredric 0 siblings, 0 replies; 40+ messages in thread From: Kent Fredric @ 2018-12-20 16:21 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1855 bytes --] On Wed, 19 Dec 2018 21:33:29 -0500 Rich Freeman <rich0@gentoo.org> wrote: > I'm not sure how necessary this is, but another way to do this is to > just use tags, perhaps date-based (eg year-month). Perhaps this could > be combined with some level of QA as well to ensure the tree is clean > at the time it was tagged. From the command line this would be > simpler than copy/pasting hashes from some webpage, but it obviously > clutters the repo. Granted, it isn't much clutter if you only do it > monthly. Ew. Please no. Even when used appropriately, tags create a lot of mess when dealing with repos on a regular basis. Using them to simply communicate metadata is just wrong. My suggestion would probably be easier with some instrumentation in portage if we worked out how to do it, eg: emerge --sync-to=2018-12-21 *maybe* it could be done with a ref spec that doesn't collide with the tag/head space, enough that they show up in git ls-remote, but otherwise don't involve reference copying when people do naive git clones on stock configuration ( because syncing a bunch of tags that will never be useful after you've synced them is um... ) The downside though of that is using non-standard ref names will mean mirrors won't clone them by default. > Git fetch does not seem to support any kind of relative refspec. You > need a hash/branch/tag/ref. Git ls-remote just lists refs and not > history. > If super-unreliable connections are the concern it probably would be > cleaner to just use the previous suggestion of providing bundles with > resume support. They can be downloaded and then pulled/fetched from. > Do we really have that much of a need for this? Indeed, there's also the opportunity to replicate bundles via bittorrent, but not sure how much demand there is for that either. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-16 4:15 [gentoo-project] RFC: Dropping rsync as a tree distribution method Alec Warner ` (3 preceding siblings ...) 2018-12-18 9:55 ` Andrew Savchenko @ 2018-12-18 18:14 ` Brian Evans 2018-12-18 18:37 ` Alec Warner ` (2 more replies) 4 siblings, 3 replies; 40+ messages in thread From: Brian Evans @ 2018-12-18 18:14 UTC (permalink / raw To: gentoo-project [-- Attachment #1.1: Type: text/plain, Size: 1540 bytes --] On 12/15/2018 11:15 PM, Alec Warner wrote: > Hi, > > I am currently embarking on a plan to redo our existing rsync[0] mirror > network. The current network has aged a bit. Its likely too large and is > under-maintained. I think in the ideal case we would instead pivot this > project to scaling out our git mirror capabilities and slowly migrate > all consumers to pulling the git tree directly. To that end, I'm looking > for blockers as to why various customers cannot switch to pulling the > gentoo ebuild repository from git[1] instead of rsync. > > So for example: > > - bandwidth concerns (preferably with documentation / data.) > - Firewall concerns > - CPU concerns (e.g. rsync is great for tiny systems?) > - Disk usage for git vs rsync > - Other things i have not thought of. > > -A > > [0] This excludes emerge-webrsync; which I don't plan on touching. > [1] Rich talked about some downsides earlier > at https://lwn.net/Articles/759539/; but while these are challenges > (some fixable) they are not necessarily blockers. I personally would be sad to see rsync go as I use the git developer tree as my main repository on 2 machines. This is so I can develop and update from the single source. These have no news or md5-cache and it can be painful to generate metadata on one of them. I rely on scripts to pull down the rsync metadata to expedite this process. eg. rsync <host>/gentoo-portage/metadata/md5-cache/. Git has no easy sub-tree download equivalent that I know of. Brian [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 834 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 18:14 ` Brian Evans @ 2018-12-18 18:37 ` Alec Warner 2018-12-18 18:38 ` Raymond Jennings 2018-12-18 18:42 ` Rich Freeman 2018-12-19 23:46 ` Robin H. Johnson 2 siblings, 1 reply; 40+ messages in thread From: Alec Warner @ 2018-12-18 18:37 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 2369 bytes --] On Tue, Dec 18, 2018 at 1:15 PM Brian Evans <grknight@gentoo.org> wrote: > On 12/15/2018 11:15 PM, Alec Warner wrote: > > Hi, > > > > I am currently embarking on a plan to redo our existing rsync[0] mirror > > network. The current network has aged a bit. Its likely too large and is > > under-maintained. I think in the ideal case we would instead pivot this > > project to scaling out our git mirror capabilities and slowly migrate > > all consumers to pulling the git tree directly. To that end, I'm looking > > for blockers as to why various customers cannot switch to pulling the > > gentoo ebuild repository from git[1] instead of rsync. > > > > So for example: > > > > - bandwidth concerns (preferably with documentation / data.) > > - Firewall concerns > > - CPU concerns (e.g. rsync is great for tiny systems?) > > - Disk usage for git vs rsync > > - Other things i have not thought of. > > > > -A > > > > [0] This excludes emerge-webrsync; which I don't plan on touching. > > [1] Rich talked about some downsides earlier > > at https://lwn.net/Articles/759539/; but while these are challenges > > (some fixable) they are not necessarily blockers. > > I personally would be sad to see rsync go as I use the git developer > tree as my main repository on 2 machines. This is so I can develop and > update from the single source. These have no news or md5-cache and it > can be painful to generate metadata on one of them. > So my strawperson response is that you should have 2 repos. PORTDIR=https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=master # a local copy of this thing. PORTDIR_OVERLAY=/path/to/your/checkout/of/gentoo.git I suspect however that this likely performs ...poorly, particularly in worst case situations as the 'overlay' would of course be massive in this configuration. > > I rely on scripts to pull down the rsync metadata to expedite this > process. eg. rsync <host>/gentoo-portage/metadata/md5-cache/. Git has > no easy sub-tree download equivalent that I know of. > So I think overlaying the news and GSLA bits are easy (you have a post-sync script that cd's into various directories and clones the news and GSLA repos.) The costly bit is likely the metadata regeneration for your development branch of the tree. I'd be curious to see how much this costs (both cold and hot) for you to generate locally. -A > > Brian > > [-- Attachment #2: Type: text/html, Size: 3488 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 18:37 ` Alec Warner @ 2018-12-18 18:38 ` Raymond Jennings 2018-12-18 20:29 ` Alec Warner 0 siblings, 1 reply; 40+ messages in thread From: Raymond Jennings @ 2018-12-18 18:38 UTC (permalink / raw To: gentoo-project What if as a first step, rsync was only dropped as the default? If you change the default from rsync to git, you'd be closer to removing rsync, but it's not as drastic as a sudden removal. Would give time to make sure it works properly without the risk of breaking everything. On Tue, Dec 18, 2018 at 10:37 AM Alec Warner <antarus@gentoo.org> wrote: > > > > On Tue, Dec 18, 2018 at 1:15 PM Brian Evans <grknight@gentoo.org> wrote: >> >> On 12/15/2018 11:15 PM, Alec Warner wrote: >> > Hi, >> > >> > I am currently embarking on a plan to redo our existing rsync[0] mirror >> > network. The current network has aged a bit. Its likely too large and is >> > under-maintained. I think in the ideal case we would instead pivot this >> > project to scaling out our git mirror capabilities and slowly migrate >> > all consumers to pulling the git tree directly. To that end, I'm looking >> > for blockers as to why various customers cannot switch to pulling the >> > gentoo ebuild repository from git[1] instead of rsync. >> > >> > So for example: >> > >> > - bandwidth concerns (preferably with documentation / data.) >> > - Firewall concerns >> > - CPU concerns (e.g. rsync is great for tiny systems?) >> > - Disk usage for git vs rsync >> > - Other things i have not thought of. >> > >> > -A >> > >> > [0] This excludes emerge-webrsync; which I don't plan on touching. >> > [1] Rich talked about some downsides earlier >> > at https://lwn.net/Articles/759539/; but while these are challenges >> > (some fixable) they are not necessarily blockers. >> >> I personally would be sad to see rsync go as I use the git developer >> tree as my main repository on 2 machines. This is so I can develop and >> update from the single source. These have no news or md5-cache and it >> can be painful to generate metadata on one of them. > > > So my strawperson response is that you should have 2 repos. > > PORTDIR=https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=master # a local copy of this thing. > PORTDIR_OVERLAY=/path/to/your/checkout/of/gentoo.git > > I suspect however that this likely performs ...poorly, particularly in worst case situations as the 'overlay' would of course be massive in this configuration. > >> >> >> I rely on scripts to pull down the rsync metadata to expedite this >> process. eg. rsync <host>/gentoo-portage/metadata/md5-cache/. Git has >> no easy sub-tree download equivalent that I know of. > > > So I think overlaying the news and GSLA bits are easy (you have a post-sync script that cd's into various directories and clones the news and GSLA repos.) The costly bit is likely the metadata regeneration for your development branch of the tree. I'd be curious to see how much this costs (both cold and hot) for you to generate locally. > > -A > >> >> >> Brian >> ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 18:38 ` Raymond Jennings @ 2018-12-18 20:29 ` Alec Warner 0 siblings, 0 replies; 40+ messages in thread From: Alec Warner @ 2018-12-18 20:29 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 3364 bytes --] On Tue, Dec 18, 2018 at 1:39 PM Raymond Jennings <shentino@gmail.com> wrote: > What if as a first step, rsync was only dropped as the default? > > If you change the default from rsync to git, you'd be closer to > removing rsync, but it's not as drastic as a sudden removal. Would > give time to make sure it works properly without the risk of breaking > everything. > To clarify, my proposal is not a sudden removal of the rsync network. Cost-wise it is cheap to operate. Operationally, I'd prefer to operate fewer systems out of human concerns (fewer moving parts are better.) I'm trying to ascertain what use cases need to be taken into account before rsync is discontinued, hence this thread. -A > > On Tue, Dec 18, 2018 at 10:37 AM Alec Warner <antarus@gentoo.org> wrote: > > > > > > > > On Tue, Dec 18, 2018 at 1:15 PM Brian Evans <grknight@gentoo.org> wrote: > >> > >> On 12/15/2018 11:15 PM, Alec Warner wrote: > >> > Hi, > >> > > >> > I am currently embarking on a plan to redo our existing rsync[0] > mirror > >> > network. The current network has aged a bit. Its likely too large and > is > >> > under-maintained. I think in the ideal case we would instead pivot > this > >> > project to scaling out our git mirror capabilities and slowly migrate > >> > all consumers to pulling the git tree directly. To that end, I'm > looking > >> > for blockers as to why various customers cannot switch to pulling the > >> > gentoo ebuild repository from git[1] instead of rsync. > >> > > >> > So for example: > >> > > >> > - bandwidth concerns (preferably with documentation / data.) > >> > - Firewall concerns > >> > - CPU concerns (e.g. rsync is great for tiny systems?) > >> > - Disk usage for git vs rsync > >> > - Other things i have not thought of. > >> > > >> > -A > >> > > >> > [0] This excludes emerge-webrsync; which I don't plan on touching. > >> > [1] Rich talked about some downsides earlier > >> > at https://lwn.net/Articles/759539/; but while these are challenges > >> > (some fixable) they are not necessarily blockers. > >> > >> I personally would be sad to see rsync go as I use the git developer > >> tree as my main repository on 2 machines. This is so I can develop and > >> update from the single source. These have no news or md5-cache and it > >> can be painful to generate metadata on one of them. > > > > > > So my strawperson response is that you should have 2 repos. > > > > PORTDIR=https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=master # > a local copy of this thing. > > PORTDIR_OVERLAY=/path/to/your/checkout/of/gentoo.git > > > > I suspect however that this likely performs ...poorly, particularly in > worst case situations as the 'overlay' would of course be massive in this > configuration. > > > >> > >> > >> I rely on scripts to pull down the rsync metadata to expedite this > >> process. eg. rsync <host>/gentoo-portage/metadata/md5-cache/. Git has > >> no easy sub-tree download equivalent that I know of. > > > > > > So I think overlaying the news and GSLA bits are easy (you have a > post-sync script that cd's into various directories and clones the news and > GSLA repos.) The costly bit is likely the metadata regeneration for your > development branch of the tree. I'd be curious to see how much this costs > (both cold and hot) for you to generate locally. > > > > -A > > > >> > >> > >> Brian > >> > > [-- Attachment #2: Type: text/html, Size: 4758 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 18:14 ` Brian Evans 2018-12-18 18:37 ` Alec Warner @ 2018-12-18 18:42 ` Rich Freeman 2018-12-19 23:46 ` Robin H. Johnson 2 siblings, 0 replies; 40+ messages in thread From: Rich Freeman @ 2018-12-18 18:42 UTC (permalink / raw To: gentoo-project On Tue, Dec 18, 2018 at 1:14 PM Brian Evans <grknight@gentoo.org> wrote: > > I personally would be sad to see rsync go as I use the git developer > tree as my main repository on 2 machines. This is so I can develop and > update from the single source. These have no news or md5-cache and it > can be painful to generate metadata on one of them. > The stable git repos contain news and cache. Users would sync from these. Also, people have mentioned concerns with load on infra, but presumably if we have dozens of people willing to host rsync mirrors, I'd think that we'd find enough willing to host git mirrors. And of course there are a ton of semi-proprietary services that are free to mirror on. I don't really see how it matters that much if we have some mirrors that are proprietary - I doubt we have many servers with FOSS firmware and CPUs and so on. > Git has no easy sub-tree download equivalent that I know of. The nature of git would make it very difficult to only clone part of a repo as it is structured at the top level by commit, not directory. Of course somebody could create their own mirror of only part of the tree, but I'm not sure what the value of that would be. Your use case of downloading metadata/etc isn't needed since we already have git repos containing this. -- Rich ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method 2018-12-18 18:14 ` Brian Evans 2018-12-18 18:37 ` Alec Warner 2018-12-18 18:42 ` Rich Freeman @ 2018-12-19 23:46 ` Robin H. Johnson 2 siblings, 0 replies; 40+ messages in thread From: Robin H. Johnson @ 2018-12-19 23:46 UTC (permalink / raw To: gentoo-project [-- Attachment #1: Type: text/plain, Size: 1149 bytes --] On Tue, Dec 18, 2018 at 01:14:44PM -0500, Brian Evans wrote: > I personally would be sad to see rsync go as I use the git developer > tree as my main repository on 2 machines. This is so I can develop and > update from the single source. These have no news or md5-cache and it > can be painful to generate metadata on one of them. > I rely on scripts to pull down the rsync metadata to expedite this > process. eg. rsync <host>/gentoo-portage/metadata/md5-cache/. As point out elsewhere, news and md5-cache ARE available in the git sync repos. > Git has no easy sub-tree download equivalent that I know of. Upstream Git does have development efforts going on towards this goal, spearheaded by developers at Google & Microsoft, who want to work with sub-trees in massive repos. Without those new enhancements, it was already possible to checkout only a subtree (but you still had to download all of it). -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 1113 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-12-20 16:22 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-16 4:15 [gentoo-project] RFC: Dropping rsync as a tree distribution method Alec Warner 2018-12-16 4:40 ` Matt Turner 2018-12-16 5:13 ` Georgy Yakovlev 2018-12-16 5:17 ` Alec Warner 2018-12-16 6:50 ` Raymond Jennings 2018-12-16 6:52 ` Raymond Jennings 2018-12-16 7:38 ` Zac Medico 2018-12-16 7:42 ` Zac Medico 2018-12-18 17:28 ` Andrew Savchenko 2018-12-16 6:55 ` Raymond Jennings 2018-12-16 10:22 ` Toralf Förster 2018-12-17 17:26 ` Matt Turner 2018-12-17 17:43 ` Raymond Jennings 2018-12-18 3:57 ` Georgy Yakovlev 2018-12-18 4:02 ` Raymond Jennings 2018-12-18 8:06 ` Robin H. Johnson 2018-12-20 1:18 ` Kent Fredric 2018-12-16 11:34 ` Rich Freeman 2018-12-16 21:10 ` Matthew Thode 2018-12-20 1:26 ` Kent Fredric 2018-12-16 17:15 ` Toralf Förster 2018-12-16 17:38 ` M. J. Everitt 2018-12-16 18:05 ` M. J. Everitt 2018-12-16 18:36 ` Rich Freeman 2018-12-16 18:41 ` M. J. Everitt 2018-12-18 9:55 ` Andrew Savchenko 2018-12-18 11:36 ` Raymond Jennings 2018-12-18 17:14 ` Andrew Savchenko 2018-12-18 18:00 ` Alec Warner 2018-12-18 22:13 ` M. J. Everitt 2018-12-18 11:55 ` Michał Górny 2018-12-20 1:43 ` Kent Fredric 2018-12-20 2:33 ` Rich Freeman 2018-12-20 16:21 ` Kent Fredric 2018-12-18 18:14 ` Brian Evans 2018-12-18 18:37 ` Alec Warner 2018-12-18 18:38 ` Raymond Jennings 2018-12-18 20:29 ` Alec Warner 2018-12-18 18:42 ` Rich Freeman 2018-12-19 23:46 ` Robin H. Johnson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox