public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Bug #565566: Why is it still not fixed?
@ 2016-02-23 17:14 Patrick Lauer
  2016-02-23 18:07 ` Alec Warner
  2016-02-23 18:46 ` [gentoo-dev] " Alexis Ballier
  0 siblings, 2 replies; 37+ messages in thread
From: Patrick Lauer @ 2016-02-23 17:14 UTC (permalink / raw)
  To: gentoo-dev

See https://bugs.gentoo.org/show_bug.cgi?id=565566

Since we have ChangeLogs again (November) they've been in backwards
order. Which is not really good - it breaks tools (like emerge
--changelog) and makes it harder to read for humans.

As a bonus it's inconsistent because the old Changelog-2015 files are in
normal order, and the new ones are reversed. Which sense no makes.

The suggestions in the bug are of great entertainment value, but they
all avoid the simple idea of generating ChangeLogs in changelog (reverse
chronological) order. Which would fix all tools and make almost every
consumer of changelogs happy.

So, can we please, after over 4 months of stalling, just fix this
embarassment?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Bug #565566: Why is it still not fixed?
  2016-02-23 17:14 [gentoo-dev] Bug #565566: Why is it still not fixed? Patrick Lauer
@ 2016-02-23 18:07 ` Alec Warner
  2016-02-23 21:53   ` Patrick Lauer
  2016-02-23 18:46 ` [gentoo-dev] " Alexis Ballier
  1 sibling, 1 reply; 37+ messages in thread
From: Alec Warner @ 2016-02-23 18:07 UTC (permalink / raw)
  To: Gentoo Dev

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

On Tue, Feb 23, 2016 at 9:14 AM, Patrick Lauer <patrick@gentoo.org> wrote:

> See https://bugs.gentoo.org/show_bug.cgi?id=565566
>
> Since we have ChangeLogs again (November) they've been in backwards
> order. Which is not really good - it breaks tools (like emerge
> --changelog) and makes it harder to read for humans.
>
> As a bonus it's inconsistent because the old Changelog-2015 files are in
> normal order, and the new ones are reversed. Which sense no makes.
>
> The suggestions in the bug are of great entertainment value, but they
> all avoid the simple idea of generating ChangeLogs in changelog (reverse
> chronological) order. Which would fix all tools and make almost every
> consumer of changelogs happy.
>
> So, can we please, after over 4 months of stalling, just fix this
> embarassment?
>
>
I don't see any attached patches...so it looks like there is room for you
to contribute.

-A

[-- Attachment #2: Type: text/html, Size: 1518 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Bug #565566: Why is it still not fixed?
  2016-02-23 17:14 [gentoo-dev] Bug #565566: Why is it still not fixed? Patrick Lauer
  2016-02-23 18:07 ` Alec Warner
@ 2016-02-23 18:46 ` Alexis Ballier
  2016-02-23 21:54   ` Patrick Lauer
  1 sibling, 1 reply; 37+ messages in thread
From: Alexis Ballier @ 2016-02-23 18:46 UTC (permalink / raw)
  To: gentoo-dev

On Tue, 23 Feb 2016 18:14:36 +0100
Patrick Lauer <patrick@gentoo.org> wrote:

> See https://bugs.gentoo.org/show_bug.cgi?id=565566
> 
> Since we have ChangeLogs again (November) they've been in backwards
> order. Which is not really good - it breaks tools (like emerge
> --changelog) and makes it harder to read for humans.
> 
> As a bonus it's inconsistent because the old Changelog-2015 files are
> in normal order, and the new ones are reversed. Which sense no makes.
> 
> The suggestions in the bug are of great entertainment value, but they
> all avoid the simple idea of generating ChangeLogs in changelog
> (reverse chronological) order. Which would fix all tools and make
> almost every consumer of changelogs happy.
> 
> So, can we please, after over 4 months of stalling, just fix this
> embarassment?
> 

As much as I agree with you there, please use proper communication
channels and avoid spamming -dev list for single issues.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Bug #565566: Why is it still not fixed?
  2016-02-23 18:07 ` Alec Warner
@ 2016-02-23 21:53   ` Patrick Lauer
  2016-02-24  0:33     ` [gentoo-dev] " Duncan
  0 siblings, 1 reply; 37+ messages in thread
From: Patrick Lauer @ 2016-02-23 21:53 UTC (permalink / raw)
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On 02/23/2016 07:07 PM, Alec Warner wrote:
> On Tue, Feb 23, 2016 at 9:14 AM, Patrick Lauer <patrick@gentoo.org
> <mailto:patrick@gentoo.org>> wrote:
>
>     See https://bugs.gentoo.org/show_bug.cgi?id=565566
>
>     Since we have ChangeLogs again (November) they've been in backwards
>     order. Which is not really good - it breaks tools (like emerge
>     --changelog) and makes it harder to read for humans.
>
>     As a bonus it's inconsistent because the old Changelog-2015 files
>     are in
>     normal order, and the new ones are reversed. Which sense no makes.
>
>     The suggestions in the bug are of great entertainment value, but they
>     all avoid the simple idea of generating ChangeLogs in changelog
>     (reverse
>     chronological) order. Which would fix all tools and make almost every
>     consumer of changelogs happy.
>
>     So, can we please, after over 4 months of stalling, just fix this
>     embarassment?
>
>
> I don't see any attached patches...so it looks like there is room for
> you to contribute.
>
> -A
>
>
from gitweb.gentoo.org -
infra/mastermirror-scripts.git

~line 167:

|

case $HOURS in
	3|9|15|21) EGENCACHE_CHANGELOG="--update-changelogs --changelog-reversed
--changelog-output ChangeLog" ;;
esac


remove "--changelog-reversed"

Enjoy cookie.
(Of course this may require removing the existing changelogs first etc. ...)

|


[-- Attachment #2: Type: text/html, Size: 3268 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Bug #565566: Why is it still not fixed?
  2016-02-23 18:46 ` [gentoo-dev] " Alexis Ballier
@ 2016-02-23 21:54   ` Patrick Lauer
  0 siblings, 0 replies; 37+ messages in thread
From: Patrick Lauer @ 2016-02-23 21:54 UTC (permalink / raw)
  To: gentoo-dev

On 02/23/2016 07:46 PM, Alexis Ballier wrote:
> On Tue, 23 Feb 2016 18:14:36 +0100
> Patrick Lauer <patrick@gentoo.org> wrote:
>
>> See https://bugs.gentoo.org/show_bug.cgi?id=565566
>>
>> Since we have ChangeLogs again (November) they've been in backwards
>> order. Which is not really good - it breaks tools (like emerge
>> --changelog) and makes it harder to read for humans.
>>
>> As a bonus it's inconsistent because the old Changelog-2015 files are
>> in normal order, and the new ones are reversed. Which sense no makes.
>>
>> The suggestions in the bug are of great entertainment value, but they
>> all avoid the simple idea of generating ChangeLogs in changelog
>> (reverse chronological) order. Which would fix all tools and make
>> almost every consumer of changelogs happy.
>>
>> So, can we please, after over 4 months of stalling, just fix this
>> embarassment?
>>
> As much as I agree with you there, please use proper communication
> channels and avoid spamming -dev list for single issues.
>
Proper communication channels have failed for 3+ months, maybe putting a
spotlight on things gets things moving.

User-visible breakage for half a year is just insane, I have no idea why
this is "impossible to fix".

Maybe people have gotten used to things being clunky and
not-really-working, I don't get used to it and will rub capsaicin in the
wounds until situation improves.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-23 21:53   ` Patrick Lauer
@ 2016-02-24  0:33     ` Duncan
  2016-02-24  0:50       ` Kristian Fiskerstrand
                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Duncan @ 2016-02-24  0:33 UTC (permalink / raw)
  To: gentoo-dev

Patrick Lauer posted on Tue, 23 Feb 2016 22:53:32 +0100 as excerpted:

> On 02/23/2016 07:07 PM, Alec Warner wrote:
>> On Tue, Feb 23, 2016 at 9:14 AM, Patrick Lauer <patrick@gentoo.org
>> <mailto:patrick@gentoo.org>> wrote:
>>
>>     See https://bugs.gentoo.org/show_bug.cgi?id=565566
>>
>>     Since we have ChangeLogs again (November) they've been in backwards
>>     order. Which is not really good - it breaks tools (like emerge
>>     --changelog) and makes it harder to read for humans.
>>
>>     As a bonus it's inconsistent because the old Changelog-2015 files
>>     are in normal order, and the new ones are reversed. Which sense no
>>     makes.
>>
>>     The suggestions in the bug are of great entertainment value, but
>>     they all avoid the simple idea of generating ChangeLogs in
>>     changelog (reverse chronological) order. Which would fix all tools
>>     and make almost every consumer of changelogs happy.
>>
>>     So, can we please, after over 4 months of stalling, just fix this
>>     embarassment?
>>
>>
>> I don't see any attached patches...so it looks like there is room for
>> you to contribute.
>>
>> -A
>>
>>
> from gitweb.gentoo.org -
> infra/mastermirror-scripts.git
> 
> ~line 167:
> 
> |
> 
> case $HOURS in
> 	3|9|15|21) EGENCACHE_CHANGELOG="--update-changelogs
> 	--changelog-reversed
> --changelog-output ChangeLog" ;;
> esac
> 
> 
> remove "--changelog-reversed"
> 
> Enjoy cookie.
> (Of course this may require removing the existing changelogs first etc.
> ...)

If you read the previous threads on the topic, here and on the portage-
devel list...

That option is there, and indeed, a patch providing it was specifically 
added to portage for infra to use, because appending entries to existing 
files is vastly easier and more performant than trying to prepend entries 
and having to rewrite the entire file as a result.


So chronological order (not traditional changelog reverse chronological 
order) is deliberate, helping to get the changelogs out there in the 
first place, given that they were missing /entirely/ for awhile (as of 
course you well know, given previous threads, one of which did seem to 
help get the ball moving on getting changelogs of /any/ kind available 
again).

And it's unlikely to change, as long as infra is generating the 
changelogs from git logs at least, because prepends aren't ever going to 
be as cheap as appends.

Which means it's the tools that expect reverse-chronological order that 
must change.  Either that, or people /that/ concerned about the 
changelogs can simply switch to the git repos and use the existing git 
tools to read their changelogs, as many (including me, as I regularly 
check changelog entries, and now that I can, sometimes the actual diff, 
on one or more packages at nearly every update) already are.


IMO, what's actually happening here is the slow deprecation of rsync 
mirrors in favor of git.  I doubt they'd be created at all if gentoo were 
being created today -- git would be the only choice and they'd be git 
mirrors.  Just like the switch to git for the main tree itself, it's 
taking time, but /unlike/ the main tree switch, it doesn't have to be as 
formalized and is happening in a much more ad hoc manner.

At some point, gentoo will need to start mirroring its git repo (with 
metadata and etc added), much as it does the rsync repo, and the various 
handbook documentation and etc will need to change.  Right now, it's 
available, but AFAIK, only via the github mirror, at least at any serious 
scaling level, tho admittedly that leaves gentoo in a somewhat precarious 
position with github dependencies for those doing git.  But when the git 
mirror structure is up and people are switching, the rsync mirrors can 
gradually be phased out as people do switch and the rsync demand requires 
fewer servers.  Tho I don't believe there's anything yet replacing the 
signed security of webrsync with the appropriate security features turned 
on, so that will need to stay around for awhile.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  0:33     ` [gentoo-dev] " Duncan
@ 2016-02-24  0:50       ` Kristian Fiskerstrand
  2016-02-24  2:53         ` Rich Freeman
  2016-02-24  2:39       ` Rich Freeman
  2016-02-27 13:14       ` Luca Barbato
  2 siblings, 1 reply; 37+ messages in thread
From: Kristian Fiskerstrand @ 2016-02-24  0:50 UTC (permalink / raw)
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 02/24/2016 01:33 AM, Duncan wrote:
> 
> IMO, what's actually happening here is the slow deprecation of 
> rsync mirrors in favor of git.  I doubt they'd be created at all
> if gentoo were

I don't agree to this at all. For one thing git is very resource
intensive compared to rsync mirroring, and there are anyways things
that needs to be properly prepared in a staging area before being
presented to a user. For one thing we can't expect users to keep an up
to date copy of all gentoo developer's OpenPGP keys to verify each git
commit, additionally this will cause issues with retirement and
similar situations (certificate revocation, subkey rotations, expiries).

Git is a good tool for revision control (if used properly), but it is
not a panacea

- -- 
Kristian Fiskerstrand
Public PGP key 0xE3EDFAE3 at hkp://pool.sks-keyservers.net
fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3
-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJWzP5gAAoJECULev7WN52F9BsIAJ/0lCFUYEttFkMU4rsQ2mKY
C8fWgtelOxTQoyqDHuQAGnYRbGoxNe8IfgtlYEwfHtH4C0aZfGr/AwDfo6FmM+nm
ChpyQIFX/V4SaoP+kBoK2ER1nhexWYCADMvIweqzgJwOYaPJfD5/dhJj38cmfkaq
5uvredv3UqwZOcMLexqp2N1X29qDneMve4RDElIp8O4hh344H5Ffonhht+AI7hj0
kqXyHXFtsP1Hq3NB7OdkWfkzcZnG9DZwRmFL3DJ6HXRmXcjV8JPeC4SAGt4/Ea/x
3ck8VRlhCeHMKcwC2pqxmBGnuXNpxVkPXfV4D48ukjt8SfaJbkM7EM/asAlN98A=
=2qTt
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  0:33     ` [gentoo-dev] " Duncan
  2016-02-24  0:50       ` Kristian Fiskerstrand
@ 2016-02-24  2:39       ` Rich Freeman
  2016-02-27 13:14       ` Luca Barbato
  2 siblings, 0 replies; 37+ messages in thread
From: Rich Freeman @ 2016-02-24  2:39 UTC (permalink / raw)
  To: gentoo-dev

On Tue, Feb 23, 2016 at 7:33 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Which means it's the tools that expect reverse-chronological order that
> must change.  Either that, or people /that/ concerned about the
> changelogs can simply switch to the git repos and use the existing git
> tools to read their changelogs, as many (including me, as I regularly
> check changelog entries, and now that I can, sometimes the actual diff,
> on one or more packages at nearly every update) already are.

Setting aside the whole git-vs-rsync debate, I'd generally recommend
that anybody interested in programmatic analysis of changes in the
tree use git anyway, because there are far better ways to walk git
commits/etc programatically than parsing changelogs.  In python you
can trivially iterate over commits, access the content of files, all
the metadata, and so on.

I'm not against devs doing the work to provide changelogs for those
who prefer them, but I'd just go right to git if I were writing tools.

-- 
Rich


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  0:50       ` Kristian Fiskerstrand
@ 2016-02-24  2:53         ` Rich Freeman
  2016-02-24  4:24           ` Duncan
  2016-02-24  4:38           ` Vadim A. Misbakh-Soloviov
  0 siblings, 2 replies; 37+ messages in thread
From: Rich Freeman @ 2016-02-24  2:53 UTC (permalink / raw)
  To: gentoo-dev

On Tue, Feb 23, 2016 at 7:50 PM, Kristian Fiskerstrand <k_f@gentoo.org> wrote:
>
> On 02/24/2016 01:33 AM, Duncan wrote:
>>
>> IMO, what's actually happening here is the slow deprecation of
>> rsync mirrors in favor of git.  I doubt they'd be created at all
>> if gentoo were
>
> I don't agree to this at all. For one thing git is very resource
> intensive compared to rsync mirroring,

Is this actually true?  For the typical use case of daily or close to
daily updates I'd think that git would be much more efficient.

rsync has to traverse an entire directory tree (both client and
server-side, though of course either could have it cached) and
synchronize across the network the metadata for every file to
determine what has changed, and then figure out what changed in each
file and transfer it.  With a large git repository with only a few
hundred new commits the client just tells the server what its last
commit is, the server walks back in history to find it, and then the
server can quickly identify all the new commits/trees/blobs and send
just those.  With the COW design of git this is very efficient, not
requiring traversing any subdirectory in which no files have changed.

In the degenerate case where nothing has changed, an rsync still needs
to walk the full tree and send a file list, while git just sends a
commit ID and terminates.

Now, for an infrequent sync (think months) where most of the tree has
changed I could certainly buy that a webrsync would be far more
efficient for everybody.

And just like rsync git is easy to mirror, with github being an
example of a service that will mirror anybody's repo for free and they
seem to have no end to their bandwidth (though I've found that pushing
a full historical gentoo git tree to them does make them choke on it
for about 30min before it shows up).

So, while I'll agree with the validity of your other points, I'd be
interested in actual data to back up the resource claim.  I could see
that going either way, and that is likely to be based on how
well-optimized everything is.  Linus did a pretty good job with git.

> For one thing we can't expect users to keep an up
> to date copy of all gentoo developer's OpenPGP keys to verify each git
> commit, additionally this will cause issues with retirement and
> similar situations (certificate revocation, subkey rotations, expiries).

Well, we could do something (eventually) to make tracking keys easier,
but I'll still buy that the thick manifests are more secure.  Git
commit signatures are only bound to their contents with sha1.  I get
that nobody has demonstrated a practical attack on that, but I think
most crypto experts wouldn't heartily endorse the design.

Keep in mind that we do have git mirrors that include metadata/etc
hosted on Github.  I know people have concerns with their software
being proprietary but as far as syncing goes it is just a mirror.  I
doubt most of us audit all the distfiles mirrors we use to make sure
they're only using FOSS ftp/http servers and so on.  There really
isn't any reason that it couldn't be hosted on infra either, assuming
they wanted the extra load (and I don't see the point in it, since it
is just a mirror, and if it ever goes away it is trivial to just point
the scripts that generate it to push to some other mirror instead -
git itself is completely FOSS).

Again, I have nothing against devs maintaining rsync and changelogs,
and users making use of them.  I just don't see it as the end of the
world if devs decide to stop taking care of them.

-- 
Rich


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  2:53         ` Rich Freeman
@ 2016-02-24  4:24           ` Duncan
  2016-02-24  5:49             ` Kent Fredric
  2016-02-24  4:38           ` Vadim A. Misbakh-Soloviov
  1 sibling, 1 reply; 37+ messages in thread
From: Duncan @ 2016-02-24  4:24 UTC (permalink / raw)
  To: gentoo-dev

Rich Freeman posted on Tue, 23 Feb 2016 21:53:45 -0500 as excerpted:

> In the degenerate case where nothing has changed, an rsync still needs
> to walk the full tree and send a file list, while git just sends a
> commit ID and terminates.

Technicality:  While I believe you're correct for pure rsync, AFAIK, 
portage (and presumably the others) and the gentoo mirrors use a hybrid 
rsync method, where first the timestamp file is compared, and if it 
hasn't changed the rsync itself doesn't occur.

So for gentoo rsync you'd need to argue the single-file single-line 
single-commit change case, not the zero-change case.  But your point 
stands.

>> For one thing we can't expect users to keep an up to date copy of all
>> gentoo developer's OpenPGP keys to verify each git commit, additionally
>> this will cause issues with retirement and similar situations
>> (certificate revocation, subkey rotations, expiries).
> 
> Well, we could do something (eventually) to make tracking keys easier,
> but I'll still buy that the thick manifests are more secure.  Git commit
> signatures are only bound to their contents with sha1.  I get that
> nobody has demonstrated a practical attack on that, but I think most
> crypto experts wouldn't heartily endorse the design.

Which is why I mentioned that there isn't a proper replacement for secure 
webrsync yet, so it'd have to stay around.  But git, synced over a secure 
connection at least, is certainly not /worse/ than normal rsync, and it 
arguably has at least the potential to be far better.

> Keep in mind that we do have git mirrors that include metadata/etc
> hosted on Github.  I know people have concerns with their software being
> proprietary but as far as syncing goes it is just a mirror.  I doubt
> most of us audit all the distfiles mirrors we use to make sure they're
> only using FOSS ftp/http servers and so on.

This is why I don't have a problem syncing from github any more than I do 
from whatever rsync mirror, despite freedomware being a relatively high 
priority concern of mine in general.  As long as the protocols are open 
and there's freedomware solutions available, whether a particular host 
I'm connecting to actually runs 100% freedomware isn't typically 
something I worry about... unless of course I'm the admin responsible for 
deciding what that host runs, in which case I'm unlikely to run anything 
/but/ freedomware on it (above BIOS/firmware level, anyway).

> There really isn't any
> reason that it couldn't be hosted on infra either, assuming they wanted
> the extra load (and I don't see the point in it, since it is just a
> mirror, and if it ever goes away it is trivial to just point the scripts
> that generate it to push to some other mirror instead -
> git itself is completely FOSS).

I'd say doable, but wouldn't call it "trivial".  Consider the difficulty 
the kernel had when bitkeeper pulled the rug out from under the Linux 
kernel, thus creating the need for git in the first place.  The switch 
was doable, and eventually done (and like Linux itself, it ultimately 
became the world standard), but I wouldn't exactly call it "trivial".

Of course in this case we're talking repo mirrors not repo software, but 
if gentoo's full rsync volume were to switch to git using github, and 
then github were to pull the rug out from under us... procuring and 
getting up and running that sort of hosting power on a week or even 90-
day notice wouldn't be exactly "trivial", unless of course we suddenly 
have Trump's credit card or similar to charge it on!  (Sorry, I'm 
following Nevada Republican caucus results 2nite as well, so it's Trump's 
CC, not Gates' or Ellison's or ...)

> Again, I have nothing against devs maintaining rsync and changelogs,
> and users making use of them.  I just don't see it as the end of the
> world if devs decide to stop taking care of them.

Particularly when the basic changelog information is there, it's simply 
quibbling about chronological or reverse-chronological order we're doing 
now, and people who /really/ care about it by rights should be going 
straight to the git logs in the first place.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  2:53         ` Rich Freeman
  2016-02-24  4:24           ` Duncan
@ 2016-02-24  4:38           ` Vadim A. Misbakh-Soloviov
  2016-02-24  5:36             ` Duncan
  1 sibling, 1 reply; 37+ messages in thread
From: Vadim A. Misbakh-Soloviov @ 2016-02-24  4:38 UTC (permalink / raw)
  To: gentoo-dev

> Is this actually true?  For the typical use case of daily or close to
> daily updates I'd think that git would be much more efficient.
As there were noticed multiple times on the list already, this should
not ever happen, at least, until git will support resumable
fetches/clones/whatever. Otherwise you'll make a lot of people, using
bad quality internet access, to frustrate.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  4:38           ` Vadim A. Misbakh-Soloviov
@ 2016-02-24  5:36             ` Duncan
  0 siblings, 0 replies; 37+ messages in thread
From: Duncan @ 2016-02-24  5:36 UTC (permalink / raw)
  To: gentoo-dev

Vadim A. Misbakh-Soloviov posted on Wed, 24 Feb 2016 10:38:55 +0600 as
excerpted:

>> Is this actually true?  For the typical use case of daily or close to
>> daily updates I'd think that git would be much more efficient.

> As there were noticed multiple times on the list already, this should
> not ever happen, at least, until git will support resumable
> fetches/clones/whatever. Otherwise you'll make a lot of people, using
> bad quality internet access, to frustrate.

[I was confused at first as your response has little or nothing to do 
with the bit you quoted, but rather, with the idea of switching from 
rsync to git, in general.]

While I agree that git not being able to resume is certainly a serious 
pain for those with unreliable connections, they should be able to switch 
to webrsync, should the rsync method itself be deprecated.  As I've said, 
git syncing can't replace webrsync security, so webrsync will need to 
stay around for the foreseeable future.

Meanwhile, deprecated doesn't necessarily mean shut down or entirely 
unsupported.  Indeed, the implication is that it continues to stay around 
for quite some time, or rsync (or at least support for it) would be 
dropped, not deprecated.  It simply means that the handbook, etc, will 
stress non-deprecated alternatives instead of deprecated ones, and that 
over a period of years, the need for rsync mirrors will go down such that 
eventually, perhaps one per region/continent will suffice, and perhaps 
ultimately, only one period, instead of the multiple per continent we 
tend to have now.

Most of the others would presumably be converted to git and tarball 
mirror resources, with a few converted to webrsync instead, since it'll 
no doubt get some uptick in usage as rsync fades out.

But given the installed base and the number of folks already using rsync 
that wouldn't see a reason to change, this deprecation and phase down 
would be on the scale of years, three years at absolute minimum I 
suppose, and more likely 5-10 years.

Do you realize just how long 10 years is in Linux distro terms?  Gentoo's 
certainly past that now, but who knows what will happen in ten years?  
Gentoo itself may no longer be around by then, or maybe it'll be around, 
but will only have enough users for a single mirror or two.  Or maybe 
hardware advances will be such that building from sources will be trivial 
by then and gentoo will either be a top-three distro again or everybody 
and their brother will be doing from-source distros and there will be as 
many of them as there are binary distros now.

And of course what happens to a good portion of those presently flaky 
connections over another decade is just as up in the air.  Ideally, it 
wouldn't be a problem we'd have to worry about by then, but I don't think 
anyone considers that likely.  OTOH, it could be that more people are 
moving to mobile-only by then, and a /lower/ percentage of gentooers have 
reliable high-bandwidth connections that don't cost an arm and a leg per 
gigabyte.

But regardless of why or how, I don't expect gentoo rsync syncing to have 
anything like the same level of usage, a decade from now.  Maybe it'll 
still be around, with one or a half dozen gentoo rsync mirrors, but I 
don't expect there will be a need for the several per region that many 
regions have now.  Of course I could be wrong, but I just don't see it.

(And yes, I'm posting this in the full awareness that someone could 
dredge it from the archives a decade from now and point out how wrong I 
ended up being.  In part that's why the "I could be wrong", to cover my 
bases, but I /still/ don't see it, and if it happens to be, my present 
self will be quite surprised, tho my future self might well find that in 
hindsight it should have been predictable, and I just didn't see it.)

IOW, it's nothing individual users need to be concerned about in the near 
term (out to three years or so), /probably/ nothing they need to be 
concerned about in the intermediate term (say 3-6 years), and beyond 
that, so much is likely to have changed that any predictions made now are 
certain to have missed important events that drastically change the way 
we look at things in the mean time, and thus be rather off base.

After all, git itself is only from 2005, 11 years ago, and ten years ago 
today, while it was definitely used for the kernel, I doubt that anyone 
would have predicted it would have pretty much taken over the (D)VCS 
landscape like it did, github and all.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  4:24           ` Duncan
@ 2016-02-24  5:49             ` Kent Fredric
  2016-02-24  7:29               ` Duncan
  0 siblings, 1 reply; 37+ messages in thread
From: Kent Fredric @ 2016-02-24  5:49 UTC (permalink / raw)
  To: gentoo-dev

On 24 February 2016 at 17:24, Duncan <1i5t5.duncan@cox.net> wrote:
> Particularly when the basic changelog information is there, it's simply
> quibbling about chronological or reverse-chronological order we're doing
> now, and people who /really/ care about it by rights should be going
> straight to the git logs in the first place.


Gentoo actually make this problem worse than it should be.

Most of the suffering with having Changelogs in tree is due to the
whole "Every commit must have a changelog entry" madness, and the
natural consequences of having a lot of those leads to lots of merge
collision.

By comparison, in other places ( for example, CPAN ), having a dumb
git -> changelog mapping tends to be right up there on the list of
dumb ideas, and the natural response I have ( and most CPAN hackers
have ) upon seeing a git output based changelog is typically "close
page, assume there was no changes".

Like from a changelog perspective, I don't think anyone cares about
stabilization changes.

Its either stabilized, or it isn't, change logs indicating you tweaked
a flag tends to not be the sort of thing people go looking at
Changelogs for.

If you want granular, commit-by-commit details about what changed,
yes, Git _is_ the right option for that.

But having that level of detail in the changelog is itself the madness
we should avoid.

Changelogs are really supposed to be _for humans_ giving changes that
_humans_ will care about.

Like on Published Open Source software, things you tend to look at the
changelog for is:
- What are the new features in this new version
- What bugs were fixed in this new version
- What security concerns were resolved by this new version.

The point being "If I just look at the diff directly I might not
understand what is happening"

And that's why there's the convention of being recent-first.

Because you open the changelog at the top, and you read down consuming
the aggregate changes of relevance to you mentally, and then you stop
when you reach a version you've already seen.

A big log of "Stabilized X" is just ... a waste of time IME.

But I'm sure at least one person out there has probably gone looking
for a changelog to see when something got stabilized/keyworded.

If we released ourselves from this inanity of annotating every change
at a level beyond which normal people could care about, we could
probably get away with manually maintained Changelogs again.

Because not *every* change warrants telling an end user "Hey, we changed this".

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  5:49             ` Kent Fredric
@ 2016-02-24  7:29               ` Duncan
  2016-02-24 10:35                 ` Kent Fredric
  0 siblings, 1 reply; 37+ messages in thread
From: Duncan @ 2016-02-24  7:29 UTC (permalink / raw)
  To: gentoo-dev

Kent Fredric posted on Wed, 24 Feb 2016 18:49:06 +1300 as excerpted:

> But I'm sure at least one person out there has probably gone looking for
> a changelog to see when something got stabilized/keyworded.

<raises hand>

In particular, I tend to be looking for this level of "introduced-on ymd 
to the kde overlay, tho masked as unreleased upstream, unmasked to ~arch 
in the overlay on ymd after upstream public release, introduced to the 
main tree on ymd, revision-bumped to get the patch fixing problem to 
users on ymd, first-arch-stabilized on ymd, x86 stabilized-on ymd, amd64-
stabilized on ymd, removed from tree as obsolete on ymd" type detail, 
when comparing notes with for instance users running LFS and Arch on the 
fresh end, and users running RHEL/CentOS/Debian-stable on the stale end.

Comparing this sort of version and stabilization history across distros, 
cross-referenced with when particular bugs showed up or disappeared, can 
be extremely helpful in tracking down whether problematic behavior is 
version-limited or perhaps doesn't originate with the leaf app in 
question at all, but instead, with some library that happened to be 
updated on one distro at the particular time a bug appeared, that either 
hasn't appeared or has been updated with a fix, on some other distro.

And I imagine tracking such ~arch keywording and stable-event dates could 
be even more critical on less common archs with bugs that don't tend to 
trigger on the common archs and that thus don't get fixed until someone 
experiences and reports them on the trigger archs.


Tho I understand your point, and have had the same experience when 
looking at, for instance, upstream kde git-level logs.

But to some extent it can be argued that at the distro packaging level as 
opposed to the upstream code development level, every commit /is/ 
potentially of interest to users of that package, at least of the arch 
involved if it's something like keywording/stabilizing, or the commit 
likely wouldn't be worth making in the first place.

Which is one reason it's so frustrating that gentoo's git guidelines 
recommend /against/ using merges and merge-comments as used with the 
upstream Linux kernel, for instance.  There, if I as an amd64 aka x86_64 
user am not interested in say arm commits, they're all sectioned off in 
big giant merges that I can look for and skip over perhaps hundreds of 
arm commits once I see that merge is from arm.  On gentoo, by contrast, 
every single arm stabilization commit tends to be its own individual 
commit to the tree, perhaps pushed as what /would/ be a single merge, but 
with a recommended rebase, so each one appears individually instead of 
under a single arm merge with that noted in the merge comment so I can 
skip them all at once!

As a result I have to use a different log tracking strategy on the gentoo 
git tree compared to to the kernel.  Where on the kernel I'll often hit b 
to go to the bottom and then crawl up the entire update log, checking for 
merges and drilling down into components I'm interested in, on the kernel 
I have to do an emerge --update --deep --newuse -ask in one terminal, 
wait for it to generate its list of updates, then do a git log 
ORIG_HEAD.. in another terminal, hit b to seek to the bottom and cache it 
all, hit t to return to the top, and hit / and enter specific package 
versions I'm interested in to search for.  As a result, I don't pick up 
the general community status information on packages/components I'm not 
specifically interested in, that I do following the actually far more 
detailed kernel commit logs, because to keep things manageable I have to 
search for specific packages of interest instead of being able to exclude 
entire merge trees with just a high level glance at the main merge 
comment, where I actually /do/ pick up lots of general status information 
about kernel subsystems I'm not generally interested in.

I guess another way of putting it in the context of changelogs, would be 
that if gentoo were using git merges correctly, a changelog summary 
generator could simply take the high-level merge summary comments and 
turn that into its changelog summary.  Instead of ten dozen individual 
"cat-egory/pkg-x.y.z arm-stable" entries, there'd be one or two "arm-
stable various packages in these categories: xxx, yyy, zzz, aaa" entries, 
and people who don't care about arm could skip the further detail while 
still getting an overall idea of arm activity, while those who do care 
about arm and want further information could drill down further as 
necessary, but would be able to skip the corresponding merge entries for 
x86 and amd64.

With proper git usage, the information would already be there in the git 
log merge commit comments for people like me who like to read those, but 
it would also be not only far simpler, but actually /possible/ to 
automate a summarizer that generates summaries from only those merge 
entries, that then could be stored in the rsync tree or published to 
packages.gentoo.org or the gentoo front page, or wherever.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  7:29               ` Duncan
@ 2016-02-24 10:35                 ` Kent Fredric
  2016-02-24 19:18                   ` Raymond Jennings
  2016-02-25  5:03                   ` Duncan
  0 siblings, 2 replies; 37+ messages in thread
From: Kent Fredric @ 2016-02-24 10:35 UTC (permalink / raw)
  To: gentoo-dev

On 24 February 2016 at 20:29, Duncan <1i5t5.duncan@cox.net> wrote:
> I guess another way of putting it in the context of changelogs, would be
> that if gentoo were using git merges correctly, a changelog summary
> generator could simply take the high-level merge summary comments and
> turn that into its changelog summary.  Instead of ten dozen individual
> "cat-egory/pkg-x.y.z arm-stable" entries, there'd be one or two "arm-
> stable various packages in these categories: xxx, yyy, zzz, aaa" entries,
> and people who don't care about arm could skip the further detail while
> still getting an overall idea of arm activity, while those who do care
> about arm and want further information could drill down further as
> necessary, but would be able to skip the corresponding merge entries for
> x86 and amd64.
>
> With proper git usage, the information would already be there in the git
> log merge commit comments for people like me who like to read those, but
> it would also be not only far simpler, but actually /possible/ to
> automate a summarizer that generates summaries from only those merge
> entries, that then could be stored in the rsync tree or published to
> packages.gentoo.org or the gentoo front page, or wherever.


Indeed, we could probably establish better conventions for identifying
certain kinds of commits such that a static log analyser would be able
to give a good result.

And you can probably trivially filter out  ( or in your case, filter
exclusively for ) changes that relate to a specific arch simply by
examining the "DIFF" data.

And we could probably to much better at formatting merge commits as
well ( I've been encouraging such a thing already because "merged
branch x/y/z" is not informative enough )

"Good" changelog automation pretty much relies on the quality of the
underling data and the ability to identify commits that are to be
included/excluded smartly, and pick the data out of those commits that
are relevant.

Though personally I feel for the goal of stabilization tracking, you
aught to be analysing the git repo. Not only can you then see when a
given package was stabilised, but you can see the other packages that
were stabilized in its proximity, which is way too hard to do with the
Changelogs.


-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24 10:35                 ` Kent Fredric
@ 2016-02-24 19:18                   ` Raymond Jennings
  2016-02-24 20:16                     ` Luis Ressel
  2016-02-25  5:03                   ` Duncan
  1 sibling, 1 reply; 37+ messages in thread
From: Raymond Jennings @ 2016-02-24 19:18 UTC (permalink / raw)
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

Seems like there's a trade off in resource usage re: git vs rsync

Rsync seems to be relatively cheap, but has a fixed part of its overhead.
Probably one of the reasons that you get temp-banned from the mirrors if
you sync too often.

Git overhead appears ot be higher on the variable parts but lower on the
fixed parts, and from what I gather, the more often you sync, the lower the
overhead.

As far as changelog generation, what about causing the changelogs to be
autogenerated by the end user's computer?  Divide and conquer.

[-- Attachment #2: Type: text/html, Size: 672 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24 19:18                   ` Raymond Jennings
@ 2016-02-24 20:16                     ` Luis Ressel
  2016-02-24 21:15                       ` Daniel Campbell
                                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Luis Ressel @ 2016-02-24 20:16 UTC (permalink / raw)
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

On Wed, 24 Feb 2016 11:18:55 -0800
Raymond Jennings <shentino@gmail.com> wrote:

> As far as changelog generation, what about causing the changelogs to
> be autogenerated by the end user's computer?  Divide and conquer.

That would require a local git clone. And that's exactly what those who
still want Changelogs are trying to avoid.

-- 
Regards,
Luis Ressel

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 949 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24 20:16                     ` Luis Ressel
@ 2016-02-24 21:15                       ` Daniel Campbell
  2016-02-24 22:16                       ` Brian Dolbec
  2016-02-25  7:12                       ` Martin Vaeth
  2 siblings, 0 replies; 37+ messages in thread
From: Daniel Campbell @ 2016-02-24 21:15 UTC (permalink / raw)
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 02/24/2016 12:16 PM, Luis Ressel wrote:
> On Wed, 24 Feb 2016 11:18:55 -0800 Raymond Jennings
> <shentino@gmail.com> wrote:
> 
>> As far as changelog generation, what about causing the changelogs
>> to be autogenerated by the end user's computer?  Divide and
>> conquer.
> 
> That would require a local git clone. And that's exactly what those
> who still want Changelogs are trying to avoid.
> 
What are some arguments/reasonings for that? Whether it's a dependency
on rsync or a dependency on git, a new Gentoo machine will need one of
them in order to sync.

I can understand mirrors may not want to run git cloning on their
infra, that's a fair point as it requires additional setup (afaict).
And syncing is technically a separate concern than version control,
but if the entire point is to retain a changelog, and we're generating
changelogs from git commits, then it seems to me that version control
is the correct tool for the job.

I'm not advocating for rsync to be done away with, as it has its
benefits, but changelogs are logically related to version control.

- -- 
Daniel Campbell - Gentoo Developer
OpenPGP Key: 0x1EA055D6 @ hkp://keys.gnupg.net
fpr: AE03 9064 AE00 053C 270C  1DE4 6F7A 9091 1EA0 55D6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJWzh2HAAoJEAEkDpRQOeFwL0QP/jf6pL3ZzKwtYYZBVIhe74eE
09R9zeTdnzSi5yyVUi2nVXDogBe6+bafwLDA/dzS9iskhzrznzKAnaUroOtnx6rN
8QVe3ojy9DxhvmnQSPUGEEjMe70kIyGM3Z+enOg59k6VGl97x+f53xvQkj3oZAId
W6aGYCi7m0ApsqdLrYoNhcE6toNHrpd/YhzS7bJTnhaNezx523EWzYsk5ej/Vyyt
GpjEJMEpiiU3KkjoiVS+sb3SYJ+VneIq7n3mszmw+O/pFbGX76lxoywCVx+P1Z8K
1aGoG9ZcYPij4jQWXdIdB8Rhw+DQF6FIYW3A1aw3hmnQsQFM31tt6V6wJKzWEjO8
xDZ69iIYqQevUHSUanXm/p5BGumF6HOq+DS0A0+gpFz/+FlxULj97eKMotaO0R1v
BPIbGXGNASXz62kYG8SJmA6KU8mc622JnZ9dY1XMLzY6vcTM89vbkudZXBq2Wyyc
CnbEuC9+1eBrOXOIWTPZ0+8XVaScz9kiBvgeQXYOd8VbgKS+GuFGOjyh1JWZdPyY
LAyarpAVmhUpRwBWw3oXeKUm50h2WXiBQWJnELYBWO9sNG7Z4g1u3QAY5zzYuYJr
VdphdxUzRtsMLI86oP/Lr7Hw5v3wMhebwsPeuvHtebU63A4noGpWyeokuZ9tfrjS
ecd156K/a1QezkWwAY0D
=Bw//
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24 20:16                     ` Luis Ressel
  2016-02-24 21:15                       ` Daniel Campbell
@ 2016-02-24 22:16                       ` Brian Dolbec
  2016-02-25  7:12                       ` Martin Vaeth
  2 siblings, 0 replies; 37+ messages in thread
From: Brian Dolbec @ 2016-02-24 22:16 UTC (permalink / raw)
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]

On Wed, 24 Feb 2016 21:16:13 +0100
Luis Ressel <aranea@aixah.de> wrote:

> On Wed, 24 Feb 2016 11:18:55 -0800
> Raymond Jennings <shentino@gmail.com> wrote:
> 
> > As far as changelog generation, what about causing the changelogs to
> > be autogenerated by the end user's computer?  Divide and conquer.  
> 
> That would require a local git clone. And that's exactly what those
> who still want Changelogs are trying to avoid.
> 

Not only that, but their generation along with thick manifests are
already quite resource intensive and time consuming for a relatively
high powered server (a big reason behind this thread).

Now make some older users system or low powered arm system do that with
much lower resources and you are talking about a long time for
completion.

-- 
Brian Dolbec <dolsen>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 951 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24 10:35                 ` Kent Fredric
  2016-02-24 19:18                   ` Raymond Jennings
@ 2016-02-25  5:03                   ` Duncan
  2016-02-25  5:46                     ` Kent Fredric
  1 sibling, 1 reply; 37+ messages in thread
From: Duncan @ 2016-02-25  5:03 UTC (permalink / raw)
  To: gentoo-dev

Kent Fredric posted on Wed, 24 Feb 2016 23:35:57 +1300 as excerpted:

> Though personally I feel for the goal of stabilization tracking, you
> aught to be analysing the git repo. Not only can you then see when a
> given package was stabilised, but you can see the other packages that
> were stabilized in its proximity, which is way too hard to do with the
> Changelogs.

Which I am (running from the git repo), and that ability to (as a user, 
easily) actually track all that extra data was one of my own biggest 
reasons for so looking forward to the git switch for so long, and is now 
one of the biggest reason's I'm a /huge/ supporter of the new git repo, 
in spite of the time it took and the imperfections it still has.

=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-25  5:03                   ` Duncan
@ 2016-02-25  5:46                     ` Kent Fredric
  2016-02-25  8:02                       ` Consus
  0 siblings, 1 reply; 37+ messages in thread
From: Kent Fredric @ 2016-02-25  5:46 UTC (permalink / raw)
  To: gentoo-dev

On 25 February 2016 at 18:03, Duncan <1i5t5.duncan@cox.net> wrote:
> Which I am (running from the git repo), and that ability to (as a user,
> easily) actually track all that extra data was one of my own biggest
> reasons for so looking forward to the git switch for so long, and is now
> one of the biggest reason's I'm a /huge/ supporter of the new git repo,
> in spite of the time it took and the imperfections it still has.


I'm considering bolting together some Perl that would allow you to run
a small HTTP service rooted in a git repo dir, and would then generate
given changes files on demand and then cache their results somehow.


Then you could have a "Live changes as a service" where interested
parties could simply do:

 curl http://thing.gentoo.org/changes/dev-lang/perl

and get a changelog spewed out instead of burdening the rsync server
with generating them for every sync.

That way the aggregate CPU Load would be grossly reduced because the
sync server wouldn't have to spend time generating changes for every
update/update window, and it wouldn't have to be full-tree aware.

But thinking about it makes me go "eeeh, thats a lot of effort really"

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24 20:16                     ` Luis Ressel
  2016-02-24 21:15                       ` Daniel Campbell
  2016-02-24 22:16                       ` Brian Dolbec
@ 2016-02-25  7:12                       ` Martin Vaeth
  2016-02-25 23:12                         ` Gordon Pettey
  2 siblings, 1 reply; 37+ messages in thread
From: Martin Vaeth @ 2016-02-25  7:12 UTC (permalink / raw)
  To: gentoo-dev

Luis Ressel <aranea@aixah.de> wrote:
>
> That would require a local git clone. And that's exactly what those who
> still want Changelogs are trying to avoid.

You need even a deep git clone with full history.

Already now this means that you need 2 (or already 3?) times the
disk space as for an rysnc mirror; multiply all numbers by 4
if you used squashfs to store the tree.

In the course of the years the factor will continue to increase;
I guess at least by 1 for every year (there is possibility of some
compression of history, but OTOH, many packages are added and
removed, eclasses keep changing, etc.)

So in 2-3 years, it can be for some users 20 times the disk storage
than what it needs now.

I think there must be a very good reason to require that amount of
disk space on everybody's machine. ChangeLogs IMHO isn't.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-25  5:46                     ` Kent Fredric
@ 2016-02-25  8:02                       ` Consus
  2016-02-25  8:59                         ` Kent Fredric
  0 siblings, 1 reply; 37+ messages in thread
From: Consus @ 2016-02-25  8:02 UTC (permalink / raw)
  To: gentoo-dev

On 18:46 Thu 25 Feb, Kent Fredric wrote:
> I'm considering bolting together some Perl that would allow you to run
> a small HTTP service rooted in a git repo dir, and would then generate
> given changes files on demand and then cache their results somehow.
> 
> Then you could have a "Live changes as a service" where interested
> parties could simply do:
> 
>  curl http://thing.gentoo.org/changes/dev-lang/perl
> 
> and get a changelog spewed out instead of burdening the rsync server
> with generating them for every sync.

Well, we do have one

        https://gitweb.gentoo.org/repo/gentoo.git/log/dev-lang/perl
    
I bet folks want to check out what's new in their local copy of Portage
tree.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-25  8:02                       ` Consus
@ 2016-02-25  8:59                         ` Kent Fredric
  2016-02-25 10:48                           ` M. J. Everitt
  0 siblings, 1 reply; 37+ messages in thread
From: Kent Fredric @ 2016-02-25  8:59 UTC (permalink / raw)
  To: gentoo-dev

On 25 February 2016 at 21:02, Consus <consus@gmx.com> wrote:
> Well, we do have one
>
>         https://gitweb.gentoo.org/repo/gentoo.git/log/dev-lang/perl
>
> I bet folks want to check out what's new in their local copy of Portage
> tree.


With a custom, portage oriented, on-demand log generator you could
produce a lot more detail ( and in a text format that doesn't require
a web browser to view ) , and potentially use understanding of portage
conventions to generate change data outside those explicitly stated.

Though that would be a "later feature" you could potentially bolt on
after the main logic was sorted out.

The idea being you could request a changelog for a package with a list
of "interest aspects" and have the log reduced to changes that affect
those interests.

For instance, you could do :

   curl http://thing.gentoo.org/changes/dev-lang/perl?arch=~x86

And with a bit of effort, you could generate a changelog that is only
relevant for somebody who is on ~x86, eliding changes that x86 didn't
get yet.

For instance, an ~x86 filter would elide stabilizations for ~x86,
because you don't care about stabilizations if you're assuming ~arch.
( And it would elide changes that were only visible for other arches )

And this filter wouldn't necessarily be implemented in "grep for
keywords in the commit message", but *analyse the change in the
directory* based, which would give the ability to do things that would
otherwise only be possible with a git clone.



-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-25  8:59                         ` Kent Fredric
@ 2016-02-25 10:48                           ` M. J. Everitt
  0 siblings, 0 replies; 37+ messages in thread
From: M. J. Everitt @ 2016-02-25 10:48 UTC (permalink / raw)
  To: gentoo-dev

On 25/02/16 08:59, Kent Fredric wrote:
> On 25 February 2016 at 21:02, Consus <consus@gmx.com> wrote:
>> Well, we do have one
>> 
>> https://gitweb.gentoo.org/repo/gentoo.git/log/dev-lang/perl
>> 
>> I bet folks want to check out what's new in their local copy of 
>> Portage tree.
> 
> 
> With a custom, portage oriented, on-demand log generator you could
>  produce a lot more detail ( and in a text format that doesn't 
> require a web browser to view ) , and potentially use
> understanding of portage conventions to generate change data
> outside those explicitly stated.
> 
> Though that would be a "later feature" you could potentially bolt 
> on after the main logic was sorted out.
> 
> The idea being you could request a changelog for a package with a 
> list of "interest aspects" and have the log reduced to changes
> that affect those interests.
> 
> For instance, you could do :
> 
> curl http://thing.gentoo.org/changes/dev-lang/perl?arch=~x86
> 
> And with a bit of effort, you could generate a changelog that is 
> only relevant for somebody who is on ~x86, eliding changes that
> x86 didn't get yet.
> 
> For instance, an ~x86 filter would elide stabilizations for ~x86, 
> because you don't care about stabilizations if you're assuming 
> ~arch. ( And it would elide changes that were only visible for 
> other arches )
> 
> And this filter wouldn't necessarily be implemented in "grep for 
> keywords in the commit message", but *analyse the change in the 
> directory* based, which would give the ability to do things that 
> would otherwise only be possible with a git clone.
> 
> 
> 
This idea is quite neat - you could do either some basic User-Agent
check and either render a web page for viewing online for changes, or
even have a specifier that gave you some other output options .. eg.
ChangeLog (rev. chron) or basic web or XML or JSON which you could
then post-process if you desired.

I know this is kind of bloating the idea, but the flexibility and such
would make it Really Useful .. I think, anyhow ...

MJE


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-25  7:12                       ` Martin Vaeth
@ 2016-02-25 23:12                         ` Gordon Pettey
  2016-02-26 11:00                           ` Martin Vaeth
  0 siblings, 1 reply; 37+ messages in thread
From: Gordon Pettey @ 2016-02-25 23:12 UTC (permalink / raw)
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

On Thu, Feb 25, 2016 at 1:12 AM, Martin Vaeth <martin@mvath.de> wrote:

> Luis Ressel <aranea@aixah.de> wrote:
> >
> > That would require a local git clone. And that's exactly what those who
> > still want Changelogs are trying to avoid.
>
> You need even a deep git clone with full history.
>
> Already now this means that you need 2 (or already 3?) times the
> disk space as for an rysnc mirror; multiply all numbers by 4
> if you used squashfs to store the tree.
>
> In the course of the years the factor will continue to increase;
> I guess at least by 1 for every year (there is possibility of some
> compression of history, but OTOH, many packages are added and
> removed, eclasses keep changing, etc.)
>
> So in 2-3 years, it can be for some users 20 times the disk storage
> than what it needs now.
>

Or, in 2-3 years, maybe people will stop with the hyperbole. Hopefully
sooner. The tree is a bunch of text files, of which a whole lot of text is
repeated (esomewrapper, eclass-based builds which are identical but for a
single line, version updates to packages that make no changes at all to the
ebuild, etc.) which is great for compression, which git does.

[-- Attachment #2: Type: text/html, Size: 1612 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-25 23:12                         ` Gordon Pettey
@ 2016-02-26 11:00                           ` Martin Vaeth
  2016-02-26 11:11                             ` Rich Freeman
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Vaeth @ 2016-02-26 11:00 UTC (permalink / raw)
  To: gentoo-dev

Gordon Pettey <petteyg359@gmail.com> wrote:
>>
>> Already now this means that you need 2 (or already 3?) times the
>> disk space as for an rysnc mirror; multiply all numbers by 4
>> if you used squashfs to store the tree. [...]
>
> Or, in 2-3 years, maybe people will stop with the hyperbole

Hyperbole? Really?

Let's first look at the current data.
Instead of guessing I now fetched the git tree
to get the exact number:

git on ext2 (8K blocks): 704 M
squashfs with lz4: 120 M

lz4 is the fastest algorithm, but not the best concerning space.
More seriously: The git data is still missing metadata information
which will add some more.

It seems my estimate of the factor 2*4 = 8
for the current state was rather realistic.

Not to forget that this was a fresh checkout where the .git
data itself is fully compressed in one file (which is by default
not the case when you update frequently - it depends on your
git configuration and perhaps whether you use a cron job for
recompression). So perhaps for some git users the bracket in
my estimate (3*4=12) is already correct.

Whether 1 GB of permanent disk space only for the
overhead of package management is appropriate, everybody
must decide by himself. Compared to other distributions,
this is an awful lot.
Only for getting ChangeLogs it is IMHO way too much.

And currently the git history is still almost empty...

Before I turn to the future, some remarks:

> The tree is a bunch of text files, of which a whole lot of text is
> repeated

That's why squashfs is so effective already compared to plain rsync.
Of course, a lot of the *current* factor comes from this.

>  which is great for compression, which git does.

You seem to pretend that I ignored this, but I did not:

>> (there is possibility of some
>> compression of history, but OTOH, many packages are added and
>> removed, eclasses keep changing, etc.)

Of course, concerning future, one must make some assumptions.
Perhaps it is reasonable to assume that roughly a constant amount of
new data is added every year, i.e., the quotient (git data/squashfs)
increases every year by a constant summand.

Compression will not change this "constantness", but at most influence
the summand itself. Quite the opposite, in the moment when the history
evades a certain size - depending on the memory window size used by
the gzip implementation of git, compression will eventually
become much less effective: You can see the difference essentially
in the gzip vs. xz compresssion size, because the main difference
here is the size of the mentioned memory window.

And as mentioned above, unless you are regularly recompressing
(by a cron job or by git configuration after updating) you hardly
profit from the git compression at all.

How large the yearly summand is, can only be guessed, currently.
I think my assumption that after 1 year the number of new/modified files
is roughly the total amount of files in the tree is realistic, perhaps
even too low. (Not to forget that also every commit adds data by
itself.)

So in 2-3 years, the factor (compared to squashfs) might be roughly:
8*2.5 = 20 without recompressing .git
8 + 2.5 = 10.5 with fully compresssed .git
(The latter factor is unrealistically low, because git's gzip compression
is less effective than lz4 and *much* less effective than xz).

And even if I should have overestimated the yearly summand by the
factor 2, you only need to double the number of years which you
have to wait...



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-26 11:00                           ` Martin Vaeth
@ 2016-02-26 11:11                             ` Rich Freeman
  2016-02-26 12:59                               ` Martin Vaeth
  0 siblings, 1 reply; 37+ messages in thread
From: Rich Freeman @ 2016-02-26 11:11 UTC (permalink / raw)
  To: gentoo-dev

On Fri, Feb 26, 2016 at 6:00 AM, Martin Vaeth <martin@mvath.de> wrote:
>
> And currently the git history is still almost empty...
>

If you want pre-migration history you need to fetch that separately.
It is about 1.7G.

Considering that this represents a LOT more than 2-3 years of history
(including periods where the commit rate was higher than it is today)
I think your estimates of where the migrated repo will be in 2-3 years
is too high.  It will of course be larger than the space required for
an rsync squashfs.

-- 
Rich


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-26 11:11                             ` Rich Freeman
@ 2016-02-26 12:59                               ` Martin Vaeth
  2016-02-26 13:37                                 ` Rich Freeman
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Vaeth @ 2016-02-26 12:59 UTC (permalink / raw)
  To: gentoo-dev

Rich Freeman <rich0@gentoo.org> wrote:
>>
>> And currently the git history is still almost empty...
>>
>
> If you want pre-migration history you need to fetch that separately.

How? Neither on gitweb.gentoo.org nor on github I found an obvious
repository with this data.

> It is about 1.7G.
> Considering that this represents a LOT more than 2-3 years of history

If the 1.7G are fully compressed history, this would confirm
my estimate rather precisely, if it represents (1700/120 - 1) ~ 13 years.

Gentoo exists since 2002, so it seems my estimate was very good.

> (including periods where the commit rate was higher than it is today)

One of my assumptions for the estimate was that this rate is
constant in the average. Also I am not sure whether you right
that this rate was really higher, previously: Nowadays, even a
rather trivial eclass-update is separated into several commits,
increasing the amount of data needed for storage.

> I think your estimates of where the migrated repo will be in 2-3 years
> is too high.

Note that I compared squashfs with a git user who does not even
care about git-internal recompression. Of course, you can decrease
the factor somewhat if e.g. your checked-out tree is still stored
on squashfs. This does not change the fact that the factor will
increase every year by about 1 (or probably more, because git
uses the uneffective gzip compression, only).



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-26 12:59                               ` Martin Vaeth
@ 2016-02-26 13:37                                 ` Rich Freeman
  2016-02-27 10:30                                   ` Martin Vaeth
  0 siblings, 1 reply; 37+ messages in thread
From: Rich Freeman @ 2016-02-26 13:37 UTC (permalink / raw)
  To: gentoo-dev

On Fri, Feb 26, 2016 at 7:59 AM, Martin Vaeth <martin@mvath.de> wrote:
> Rich Freeman <rich0@gentoo.org> wrote:
>>>
>>> And currently the git history is still almost empty...
>>>
>>
>> If you want pre-migration history you need to fetch that separately.
>
> How? Neither on gitweb.gentoo.org nor on github I found an obvious
> repository with this data.

https://wiki.gentoo.org/wiki/Gentoo_git_workflow#Grafting_Gentoo_History_Onto_the_Active_Repo

If you're interested in history it is easy to do, and the repo on
github works fine for web access or the various github stats/etc.
Well, sort-of - I get the impression that github doesn't host a lot of
repos with that much history and when you push that repo to github for
the first time it will timeout and die and the repo will appear on the
site 30-60min later (I imagine subsequent pushes would be fine).  I
think we actually have one of the largest git repos out there in terms
of number of objects.  At least, when I was keeping tabs on other
migration efforts there weren't many that came close (including some
projects that you'd think of as having a lot of history).  The fact
that every package revision+patch+etc is a file in Gentoo is a big
part of that.

>
>> It is about 1.7G.
>> Considering that this represents a LOT more than 2-3 years of history
>
> If the 1.7G are fully compressed history, this would confirm
> my estimate rather precisely, if it represents (1700/120 - 1) ~ 13 years.

Perhaps I misread your post then.  I saw lots of numbers but not many
units, and I probably didn't follow what you intended to say.

>
> Note that I compared squashfs with a git user who does not even
> care about git-internal recompression. Of course, you can decrease
> the factor somewhat if e.g. your checked-out tree is still stored
> on squashfs. This does not change the fact that the factor will
> increase every year by about 1 (or probably more, because git
> uses the uneffective gzip compression, only).
>

A checkout of gentoo-x86 is about 590M.  If you use the repo that
includes cache/etc it expands to 1.2G.  13 years of history is 1.7G.
Clearly it doesn't increase by a factor of 1 every year, unless again
I'm misunderstanding what you're intending to communicate.

A git checkout consists of two parts.  It has the .git directory which
contains all the data, and it consists of the working tree.  In the
case of gentoo-x86 the working tree is about 440MB and the history is
about 150M.

The working tree doesn't really change in size much - it just reflects
the size of the current revision of the tree. It is also not
compressed (unless you stick the whole thing in a squashfs, which you
could certainly do).  It is the history which continuously grows.
However, the history IS compressed and the reality is that most new
ebuilds are similar to ebuilds that are already in the history, so it
compresses very well.  Of course it would be nice if you could use
something other than gzip to compress it.

There is no reason that somebody couldn't distribute squashfs versions
of a git /usr/portage, but if you want the full history it would still
be around 1.7G.  It would still be smaller than a checked-out tree
(the 1.7G figure is just history - it doesn't include the extra 440MB
or so for the checkout).

My point wasn't so much that there aren't sized benefits to squashfs
and no history.  I'm just saying that git is pretty efficient for what
it does do.

-- 
Rich


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-26 13:37                                 ` Rich Freeman
@ 2016-02-27 10:30                                   ` Martin Vaeth
  0 siblings, 0 replies; 37+ messages in thread
From: Martin Vaeth @ 2016-02-27 10:30 UTC (permalink / raw)
  To: gentoo-dev

Rich Freeman <rich0@gentoo.org> wrote:
>
> Clearly it doesn't increase by a factor of 1 every year

The yearly increase of the factor is rather precisely 1:
According to current data, it is .95, see below.
With xz compression for squashfs, it is even 1.4!

(Note: increase _of_ the factor, not _by_ the factor, of course;
we are speaking about a linear increase, not an exponential one.)

More precisely: If in both cases you extremeley optimize for space
(details see below) then a change from rsync to git (non-shallow)
costs you

a) now: the factor 2.6 of needed disk space

b) in future for every year this factor is increased
by the summand 1.4. For example, in 2.5 years you will need roughly
2.6 + (1.4 * 2.5) = 6.1 times the disk space than for rsync.
After 2.5 more years, the factor will be more than 10.

For a) I assumed that in both cases the current repository is kept
compressed with squashfs (xz). This first factor will be much
larger, of course, if you omit squashfs when you switch to git.
(You must take measurements to keep the checked-out repository separate:
you cannot use standard emerge --sync to get this optimization.)

For both numbers, I even optimized the .git compression by
executing repeatedly
	git prune; git repack -a -d; git gc --agressive
which for the historical repository took several hours;
thus, unless you use a cron-job, this is not realistic.
Without this optimization, both numbers would be even larger.

Here are the plain data I used for the calculation:

1. RSYNC =    84,062,208
   (rsync gentoo repository, compressed with squashfs (-comp xz).)

2. GIT   =   136,322,616
   (Current .git data, without checked-out tree;
   compression optimized by the time-costly commands above.)

3. FULL  = 1,923,685,435
   (.git data as in 2, but with history added)

4. YEARS = 15
   (length of the historical data: first checkin was June 2000;
   change to git was IIRC somewhere in middle 2015).

So the number from a) is

 size with git      $GIT + $RSYNC
 --------------- =  ------------- ~ 2.6
 size with rysnc       $RSYNC

The number from b) is

 size of history increase per year    ($FULL - $GIT) / $YEARS
 --------------------------------- = ------------------------ ~ 1.4
         size with rsync                      $RSYNC

In the previus postings, I was assuming the much faster squashfs
compression -comp lz4 -Xhc instead of -comp xz. In this case,
the number from 1 changes to

   RSYNC =    125784064

which leads to the factor .95 ~ 1 for b) which I mentioned in the
beginning.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-24  0:33     ` [gentoo-dev] " Duncan
  2016-02-24  0:50       ` Kristian Fiskerstrand
  2016-02-24  2:39       ` Rich Freeman
@ 2016-02-27 13:14       ` Luca Barbato
  2016-02-27 22:35         ` Raymond Jennings
                           ` (2 more replies)
  2 siblings, 3 replies; 37+ messages in thread
From: Luca Barbato @ 2016-02-27 13:14 UTC (permalink / raw)
  To: gentoo-dev

On 24/02/16 01:33, Duncan wrote:
> That option is there, and indeed, a patch providing it was specifically 
> added to portage for infra to use, because appending entries to existing 
> files is vastly easier and more performant than trying to prepend entries 
> and having to rewrite the entire file as a result.

This sounds wrong in many different ways. The changelog files are tiny
and makes next to no difference truncate+write or append.

lu


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-27 13:14       ` Luca Barbato
@ 2016-02-27 22:35         ` Raymond Jennings
  2016-02-27 22:50         ` Robin H. Johnson
  2016-02-28  3:28         ` Duncan
  2 siblings, 0 replies; 37+ messages in thread
From: Raymond Jennings @ 2016-02-27 22:35 UTC (permalink / raw)
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

Especially if the changelog files are broken up by year or so.

On Sat, Feb 27, 2016 at 5:14 AM, Luca Barbato <lu_zero@gentoo.org> wrote:

> On 24/02/16 01:33, Duncan wrote:
> > That option is there, and indeed, a patch providing it was specifically
> > added to portage for infra to use, because appending entries to existing
> > files is vastly easier and more performant than trying to prepend entries
> > and having to rewrite the entire file as a result.
>
> This sounds wrong in many different ways. The changelog files are tiny
> and makes next to no difference truncate+write or append.
>
> lu
>
>

[-- Attachment #2: Type: text/html, Size: 971 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-27 13:14       ` Luca Barbato
  2016-02-27 22:35         ` Raymond Jennings
@ 2016-02-27 22:50         ` Robin H. Johnson
  2016-02-27 23:08           ` Patrick Lauer
  2016-02-28  3:28         ` Duncan
  2 siblings, 1 reply; 37+ messages in thread
From: Robin H. Johnson @ 2016-02-27 22:50 UTC (permalink / raw)
  To: gentoo-dev

On Sat, Feb 27, 2016 at 02:14:12PM +0100, Luca Barbato wrote:
> On 24/02/16 01:33, Duncan wrote:
> > That option is there, and indeed, a patch providing it was specifically 
> > added to portage for infra to use, because appending entries to existing 
> > files is vastly easier and more performant than trying to prepend entries 
> > and having to rewrite the entire file as a result.
> This sounds wrong in many different ways. The changelog files are tiny
> and makes next to no difference truncate+write or append.
Prior to seperating ChangeLog files into years, this was way worse:
a kernel bump present in any of gentoo-sources, hardened-sources,
vanilla-sources meant another 100k of data to sent. It's not a lot
overall, but here's some quick stats from one of our rsync servers, on
bytes sent.

Stats for Feb 25, from one of the 3 primary rsync.g.o servers, on the
'bytes sent' output from rsyncd.

rsyncd example output:
Feb 25 00:03:17 quetzal rsyncd[27280]: sent 4930260 bytes  received 32215 bytes  total size 408174052

3909 entries.

Min RAW size: 4833709 bytes [1]
Median RAW size: 22436094 bytes.
Mean RAW size: 45652781 bytes.
Sum of RAW size: 178456721459 bytes = ~166GiB (per day!)

The min possible transfer size is forcing an rsync with no changes; it
just sends the metadata about the files (path, mtime, size, etc).

Let's subtract that from all the rest of the entries, to get stats about
the data transfer.

Median data size: 17602385 bytes
Mean data size: 40819072 bytes

So, now the question:
If we use appending changelogs, the large changelogs only differ by a
few hundred bytes. If we instead have to rewrite them, it's 50k+ per
changelog.

For each 50k changelog, the median transfer would get 0.25% larger.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-27 22:50         ` Robin H. Johnson
@ 2016-02-27 23:08           ` Patrick Lauer
  2016-02-28  8:27             ` Martin Vaeth
  0 siblings, 1 reply; 37+ messages in thread
From: Patrick Lauer @ 2016-02-27 23:08 UTC (permalink / raw)
  To: gentoo-dev

On 02/27/2016 11:50 PM, Robin H. Johnson wrote:
> On Sat, Feb 27, 2016 at 02:14:12PM +0100, Luca Barbato wrote:
>> On 24/02/16 01:33, Duncan wrote:
>>> That option is there, and indeed, a patch providing it was specifically 
>>> added to portage for infra to use, because appending entries to existing 
>>> files is vastly easier and more performant than trying to prepend entries 
>>> and having to rewrite the entire file as a result.
>> This sounds wrong in many different ways. The changelog files are tiny
>> and makes next to no difference truncate+write or append.
> Prior to seperating ChangeLog files into years, this was way worse:
> a kernel bump present in any of gentoo-sources, hardened-sources,
> vanilla-sources meant another 100k of data to sent. It's not a lot
> overall, but here's some quick stats from one of our rsync servers, on
> bytes sent.
[snip]
>
> So, now the question:
> If we use appending changelogs, the large changelogs only differ by a
> few hundred bytes. If we instead have to rewrite them, it's 50k+ per
> changelog.
from /usr/share/portage/config/make.globals:

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times
--omit-dir-times --compress --force --whole-file --delete --stats
--human-readable --timeout=180 --exclude=/distfiles --exclude=/local
--exclude=/packages --exclude=/.git"

Notice the --whole-file part there.

>
> For each 50k changelog, the median transfer would get 0.25% larger.
>
Well, we could just have less changes ;)

160GB/day per server is about 2MB/s, ~16Mbit, or about 5TB/month. That's
still included in the 'free' bandwidth that el cheapo hosters like
Hetzner provide with their smallest servers ...


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-27 13:14       ` Luca Barbato
  2016-02-27 22:35         ` Raymond Jennings
  2016-02-27 22:50         ` Robin H. Johnson
@ 2016-02-28  3:28         ` Duncan
  2 siblings, 0 replies; 37+ messages in thread
From: Duncan @ 2016-02-28  3:28 UTC (permalink / raw)
  To: gentoo-dev

Luca Barbato posted on Sat, 27 Feb 2016 14:14:12 +0100 as excerpted:

> On 24/02/16 01:33, Duncan wrote:
>> That option is there, and indeed, a patch providing it was specifically
>> added to portage for infra to use, because appending entries to
>> existing files is vastly easier and more performant than trying to
>> prepend entries and having to rewrite the entire file as a result.
> 
> This sounds wrong in many different ways. The changelog files are tiny
> and makes next to no difference truncate+write or append.

FWIW, here's the egencache --updatechangelogs related patch-thread 
starters (from which the links can be followed to the threads) from back 
in early November, on the portage-dev list.  The 4/4 patch in the first 
thread added --reverse-order, tho as the intro suggests, there was prior 
discussion, which my quick "changelog" search didn't pick up.  IIRC  the 
original discussion was triggered on the gentoo-dev list by... patrick@, 
and infra/robbat2's replies started there.  

http://permalink.gmane.org/gmane.linux.gentoo.portage.devel/5899
http://permalink.gmane.org/gmane.linux.gentoo.portage.devel/5934
http://permalink.gmane.org/gmane.linux.gentoo.portage.devel/5964
http://permalink.gmane.org/gmane.linux.gentoo.portage.devel/5989

The last two also mention:

https://bugs.gentoo.org/show_bug.cgi?id=565540
(egencache --update-changelog: parallel support, bug still, BTW,
status: IN_PROGRESS)

OK, here's the original gentoo-dev thread OP by patrick@:

http://permalink.gmane.org/gmane.linux.gentoo.devel/98287

And here's the beginning of the "infra response" subthread with infra/
robbat2@'s responses:

http://permalink.gmane.org/gmane.linux.gentoo.devel/98337

And... as it happens that first infra response explains the oldest-first, 
straight from the horse's mouth, as the saying goes. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [gentoo-dev] Re: Bug #565566: Why is it still not fixed?
  2016-02-27 23:08           ` Patrick Lauer
@ 2016-02-28  8:27             ` Martin Vaeth
  0 siblings, 0 replies; 37+ messages in thread
From: Martin Vaeth @ 2016-02-28  8:27 UTC (permalink / raw)
  To: gentoo-dev

Patrick Lauer <patrick@gentoo.org> wrote:
>
> Notice the --whole-file part there.

Are there perhaps plans to remove this?

Before the reversed ChangeLogs, this option was useful,
but perhaps now removing it would really lower the traffic?

One would have to make a bunch of tests over 1-2 months, perhaps:

a) two with rsyncing every day
b) two with rsyncing 1/week, and perhaps even
c) two with rsyncing 1/month.

(where "two" means one with and onewithout --whole-file)
and add up the traffic.

I am not sure whether servers can be set up to implicitly assume
--whole-file to save possibly some resources needed to calculate
the checksums. This would give wrong results, of course.
In this case, it would be necessary for the tests to set up a
server with full support, locally. In the latter case, one could
also use historical webrsync data if they are available somewhere
which would mean that the tests could be done rather quickly...



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-02-28  8:27 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-23 17:14 [gentoo-dev] Bug #565566: Why is it still not fixed? Patrick Lauer
2016-02-23 18:07 ` Alec Warner
2016-02-23 21:53   ` Patrick Lauer
2016-02-24  0:33     ` [gentoo-dev] " Duncan
2016-02-24  0:50       ` Kristian Fiskerstrand
2016-02-24  2:53         ` Rich Freeman
2016-02-24  4:24           ` Duncan
2016-02-24  5:49             ` Kent Fredric
2016-02-24  7:29               ` Duncan
2016-02-24 10:35                 ` Kent Fredric
2016-02-24 19:18                   ` Raymond Jennings
2016-02-24 20:16                     ` Luis Ressel
2016-02-24 21:15                       ` Daniel Campbell
2016-02-24 22:16                       ` Brian Dolbec
2016-02-25  7:12                       ` Martin Vaeth
2016-02-25 23:12                         ` Gordon Pettey
2016-02-26 11:00                           ` Martin Vaeth
2016-02-26 11:11                             ` Rich Freeman
2016-02-26 12:59                               ` Martin Vaeth
2016-02-26 13:37                                 ` Rich Freeman
2016-02-27 10:30                                   ` Martin Vaeth
2016-02-25  5:03                   ` Duncan
2016-02-25  5:46                     ` Kent Fredric
2016-02-25  8:02                       ` Consus
2016-02-25  8:59                         ` Kent Fredric
2016-02-25 10:48                           ` M. J. Everitt
2016-02-24  4:38           ` Vadim A. Misbakh-Soloviov
2016-02-24  5:36             ` Duncan
2016-02-24  2:39       ` Rich Freeman
2016-02-27 13:14       ` Luca Barbato
2016-02-27 22:35         ` Raymond Jennings
2016-02-27 22:50         ` Robin H. Johnson
2016-02-27 23:08           ` Patrick Lauer
2016-02-28  8:27             ` Martin Vaeth
2016-02-28  3:28         ` Duncan
2016-02-23 18:46 ` [gentoo-dev] " Alexis Ballier
2016-02-23 21:54   ` Patrick Lauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox