public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [gentoo-user] an efficient idea for an alternative portage synchronisation
  @ 2021-06-18 14:16 99% ` Michael Jones
  0 siblings, 0 replies; 1+ results
From: Michael Jones @ 2021-06-18 14:16 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 4997 bytes --]

On Fri, Jun 18, 2021, 07:10 caveman رَجُلُ الْكَهْفِ 穴居人 <
toraboracaveman@protonmail.com> wrote:

> tl;dr - i'm suggesting a new file syncing protocol
> for portage syncing.  details of this one is in
> section 2.
>
>
> 1. background
> -------------
> rsync needs to read all files in order to compare
> them.  this is too expensive and doesn't scale as
> portage's tree grows in size..
>
> on the other hand, git gets away with this, by
> maintaining a history of edits.  so git doesn't
> need to compare all files, instead it walks
> through the history.
>
> but git has another issue:  the history getting
> too big.  this causes:
>     - `git clone` to needlessly take too long, as
>       many old histories become irrelevant as they
>       get fully overwridden by newer ones.
>     - this also causes `git pull` to be slower
>       than needed, as the history is not ideally
>       compressed.
>     - plus, the disk space that's wasted for
>       histories.
>
>
> 2. new protocol
> ---------------
> to solve issues above, i think the ideal solution
> is this protocol:
>     - each history is a number representing a
>       logical clock.  1st history is 0, 2nd is 1,
>       etc.
>     - the server maintains a list of N past many
>       histories of the portage tree.
>     - when a client requests to update its portage
>       tree, it tells the server its current
>       history.  e.g. say client is currently
>       located in logical time 1234567.
>     - the server is maintaining only the past N
>       histories:
>         - if 1234567 is behind those maintained N
>           ones, then the server sends a full
>           portage tree from scratch.
>         - if 1234567 is within those maintained N
>           ones, then the server has two options:
>             (1) either send all changes since
>                 1234567, as they happened
>                 historically.  this is a bad idea.
>                 no good reason for it.
>
>             (2) better: the server can send the
>                 compressed histories.  compressed
>                 histories are done once, and
>                 cached, in a scalable way.  the
>                 cache itself is incremental, so
>                 updating the cache is cheap
>                 (details section 2.2.).
>
>                 e.g. if there are 5000 histories
>                 that the client lacks since time
>                 1234567, then there is a chance
>                 that many of the changes are just
>                 a waste of time.  e.g. add a file,
>                 then delete the same file, then
>                 add a different file again.  so
>                 why not just lie about the
>                 history, and send the last file,
>                 escaping ones int he middle?  same
>                 can be thought about diffs to code
>                 blocks.
>
> 2.1. properties of this new protocol
> ------------------------------------
> so this new protocol has these properties:
>     - unlike rsync, it doesn't need to compare all files
>       individually.
>     - unlike git, the history doesn't grow on the
>       client.  history remains only a single
>       number representing a logical clock.
>     - the history on the server is limited to N
>       past entries.  no devs will cry, because
>       this is not a code collaboration app, but
>       simply a file synchronisation app to replace
>       rsync.  so the admins are free to set N as
>       small as they please, without worrying about
>       harming collaborating devs.
>     - server has the option to compress histories
>       to clients, and these histories are
>       cacheable for more performance.
>
>
> 2.2. how it will feel to admins/devs
> ------------------------------------
>     - the devs simply commit their changes to the
>       portage tree via git.
>     - the git server will have hooks to execute an
>       external command for this new protocol, that
>       will calculate all diffs necessary in order
>       to build a new history.
>
>       e.g. if current history is 30000, and a dev
>       makes a new commit via git, then the git
>       hooks will execute the external command to
>       calculate the diff for the affected files by
>       the git commit, such that history 30001 is
>       created.
>
>       the hooked external command will also see if
>       it can compress the histories, for the past
>       M many entries since 30001.
>
>       so that clients that live in time 30001-M,
>       who ask for 30001, can get the compressed
>       history instead of raw actual histories from
>       30001-m to 30001.
>
> ty,
> cm
>


It seems like you are almost asking for git's --clone-depth and
--sync-depth flags.

Its not an exact match for your proposal but its very close.

>

[-- Attachment #2: Type: text/html, Size: 6205 bytes --]

^ permalink raw reply	[relevance 99%]

Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-06-18 12:10     [gentoo-user] an efficient idea for an alternative portage synchronisation caveman رَجُلُ الْكَهْفِ 穴居人
2021-06-18 14:16 99% ` Michael Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox