On Fri, Jun 18, 2021, 07:10 caveman رَجُلُ الْكَهْفِ 穴居人 < toraboracaveman@protonmail.com> wrote: > tl;dr - i'm suggesting a new file syncing protocol > for portage syncing. details of this one is in > section 2. > > > 1. background > ------------- > rsync needs to read all files in order to compare > them. this is too expensive and doesn't scale as > portage's tree grows in size.. > > on the other hand, git gets away with this, by > maintaining a history of edits. so git doesn't > need to compare all files, instead it walks > through the history. > > but git has another issue: the history getting > too big. this causes: > - `git clone` to needlessly take too long, as > many old histories become irrelevant as they > get fully overwridden by newer ones. > - this also causes `git pull` to be slower > than needed, as the history is not ideally > compressed. > - plus, the disk space that's wasted for > histories. > > > 2. new protocol > --------------- > to solve issues above, i think the ideal solution > is this protocol: > - each history is a number representing a > logical clock. 1st history is 0, 2nd is 1, > etc. > - the server maintains a list of N past many > histories of the portage tree. > - when a client requests to update its portage > tree, it tells the server its current > history. e.g. say client is currently > located in logical time 1234567. > - the server is maintaining only the past N > histories: > - if 1234567 is behind those maintained N > ones, then the server sends a full > portage tree from scratch. > - if 1234567 is within those maintained N > ones, then the server has two options: > (1) either send all changes since > 1234567, as they happened > historically. this is a bad idea. > no good reason for it. > > (2) better: the server can send the > compressed histories. compressed > histories are done once, and > cached, in a scalable way. the > cache itself is incremental, so > updating the cache is cheap > (details section 2.2.). > > e.g. if there are 5000 histories > that the client lacks since time > 1234567, then there is a chance > that many of the changes are just > a waste of time. e.g. add a file, > then delete the same file, then > add a different file again. so > why not just lie about the > history, and send the last file, > escaping ones int he middle? same > can be thought about diffs to code > blocks. > > 2.1. properties of this new protocol > ------------------------------------ > so this new protocol has these properties: > - unlike rsync, it doesn't need to compare all files > individually. > - unlike git, the history doesn't grow on the > client. history remains only a single > number representing a logical clock. > - the history on the server is limited to N > past entries. no devs will cry, because > this is not a code collaboration app, but > simply a file synchronisation app to replace > rsync. so the admins are free to set N as > small as they please, without worrying about > harming collaborating devs. > - server has the option to compress histories > to clients, and these histories are > cacheable for more performance. > > > 2.2. how it will feel to admins/devs > ------------------------------------ > - the devs simply commit their changes to the > portage tree via git. > - the git server will have hooks to execute an > external command for this new protocol, that > will calculate all diffs necessary in order > to build a new history. > > e.g. if current history is 30000, and a dev > makes a new commit via git, then the git > hooks will execute the external command to > calculate the diff for the affected files by > the git commit, such that history 30001 is > created. > > the hooked external command will also see if > it can compress the histories, for the past > M many entries since 30001. > > so that clients that live in time 30001-M, > who ask for 30001, can get the compressed > history instead of raw actual histories from > 30001-m to 30001. > > ty, > cm > It seems like you are almost asking for git's --clone-depth and --sync-depth flags. Its not an exact match for your proposal but its very close. >