* Re: [gentoo-user] an efficient idea for an alternative portage synchronisation
@ 2021-06-18 14:16 99% ` Michael Jones
0 siblings, 0 replies; 1+ results
From: Michael Jones @ 2021-06-18 14:16 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 4997 bytes --]
On Fri, Jun 18, 2021, 07:10 caveman رَجُلُ الْكَهْفِ 穴居人 <
toraboracaveman@protonmail.com> wrote:
> tl;dr - i'm suggesting a new file syncing protocol
> for portage syncing. details of this one is in
> section 2.
>
>
> 1. background
> -------------
> rsync needs to read all files in order to compare
> them. this is too expensive and doesn't scale as
> portage's tree grows in size..
>
> on the other hand, git gets away with this, by
> maintaining a history of edits. so git doesn't
> need to compare all files, instead it walks
> through the history.
>
> but git has another issue: the history getting
> too big. this causes:
> - `git clone` to needlessly take too long, as
> many old histories become irrelevant as they
> get fully overwridden by newer ones.
> - this also causes `git pull` to be slower
> than needed, as the history is not ideally
> compressed.
> - plus, the disk space that's wasted for
> histories.
>
>
> 2. new protocol
> ---------------
> to solve issues above, i think the ideal solution
> is this protocol:
> - each history is a number representing a
> logical clock. 1st history is 0, 2nd is 1,
> etc.
> - the server maintains a list of N past many
> histories of the portage tree.
> - when a client requests to update its portage
> tree, it tells the server its current
> history. e.g. say client is currently
> located in logical time 1234567.
> - the server is maintaining only the past N
> histories:
> - if 1234567 is behind those maintained N
> ones, then the server sends a full
> portage tree from scratch.
> - if 1234567 is within those maintained N
> ones, then the server has two options:
> (1) either send all changes since
> 1234567, as they happened
> historically. this is a bad idea.
> no good reason for it.
>
> (2) better: the server can send the
> compressed histories. compressed
> histories are done once, and
> cached, in a scalable way. the
> cache itself is incremental, so
> updating the cache is cheap
> (details section 2.2.).
>
> e.g. if there are 5000 histories
> that the client lacks since time
> 1234567, then there is a chance
> that many of the changes are just
> a waste of time. e.g. add a file,
> then delete the same file, then
> add a different file again. so
> why not just lie about the
> history, and send the last file,
> escaping ones int he middle? same
> can be thought about diffs to code
> blocks.
>
> 2.1. properties of this new protocol
> ------------------------------------
> so this new protocol has these properties:
> - unlike rsync, it doesn't need to compare all files
> individually.
> - unlike git, the history doesn't grow on the
> client. history remains only a single
> number representing a logical clock.
> - the history on the server is limited to N
> past entries. no devs will cry, because
> this is not a code collaboration app, but
> simply a file synchronisation app to replace
> rsync. so the admins are free to set N as
> small as they please, without worrying about
> harming collaborating devs.
> - server has the option to compress histories
> to clients, and these histories are
> cacheable for more performance.
>
>
> 2.2. how it will feel to admins/devs
> ------------------------------------
> - the devs simply commit their changes to the
> portage tree via git.
> - the git server will have hooks to execute an
> external command for this new protocol, that
> will calculate all diffs necessary in order
> to build a new history.
>
> e.g. if current history is 30000, and a dev
> makes a new commit via git, then the git
> hooks will execute the external command to
> calculate the diff for the affected files by
> the git commit, such that history 30001 is
> created.
>
> the hooked external command will also see if
> it can compress the histories, for the past
> M many entries since 30001.
>
> so that clients that live in time 30001-M,
> who ask for 30001, can get the compressed
> history instead of raw actual histories from
> 30001-m to 30001.
>
> ty,
> cm
>
It seems like you are almost asking for git's --clone-depth and
--sync-depth flags.
Its not an exact match for your proposal but its very close.
>
[-- Attachment #2: Type: text/html, Size: 6205 bytes --]
^ permalink raw reply [relevance 99%]
Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-06-18 12:10 [gentoo-user] an efficient idea for an alternative portage synchronisation caveman رَجُلُ الْكَهْفِ 穴居人
2021-06-18 14:16 99% ` Michael Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox