public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] A few modest suggestions regarding tree size
@ 2004-10-12 21:37 Ciaran McCreesh
  2004-10-12 21:59 ` Roman Gaufman
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Ciaran McCreesh @ 2004-10-12 21:37 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 7495 bytes --]

It has come to my attention that, during recent weeks, a small number of
users have been complaining recently about the size of the rsync tree.
My august colleagues have proposed many ingenious solutions, but
misfortunately they are all complicated and involve a lot of manual
work. I believe the following small changes (which can mostly be
automated) would prove of much larger benefit to the community for a
vastly reduced cost.

To begin with, I'd like to draw your attention to comments in ebuilds.
It is an oft-forgotten fact that these items provide absolutely no
benefit to the end user. "Surely", I hear you say, "it is not worth
getting hung up over such an insignificant triviality! What harm do a
few trifling little remarks do?".  Yet, when actually measured, these
'innocent minutiae' (as you might call them had you a penchant for
obsolete vocabulary or a predilection for pomposity) account for
approximately 20% of the total ebuild content in the tree. It is obvious
that an immediate ban upon these silly things, alongside a small script
to remove them from the tree, would provide a very large gain for our
users without having to remove any existing code. Adding in a repoman
check to error out if such lines were present would clearly be a good
start.

Next up are blank lines, which, as all the world knows are of no use at
all to anyone. These account for a staggering 150KBytes of data in the
main tree, which, over a 9600 dialup line, would save us over two
minutes on an emerge sync. Again, removing these pointless wastes of
space via a bash script is trivial.

Staying with the blank spaces thing, leading whitespaces (which serve no
practical purpose and are only used to make the code "look pretty" --
although how a bash script could ever be considered "pretty" is beyond
my limited mind) account for nearly half a megabyte of data. Clearly
these should immediately be removed and any developer using them in the
future should have their cvs access suspended pending a review of their
status within the project -- as devrel and our managers will tell you,
being nice to the users is our number one priority.

There are other trivial ways to save space too. The commonly used helper
function "emake", for example, is a shocking five bytes in length.
Replacing this with a much more helpfully named "e", and likewise
replacing "econf" with "c", would gain something like 50KBytes. If we
also replace src_unpack, src_compile and src_install with more
appropriate alternatives we could shave off a further 300KBytes. I have
no doubt that the reader could extend this logic to the other portage
internals and common function names, bring the total up to half a
megabyte or more.

This can be extended to other functions, of course. In particular I'd
like to draw your attention to the absurdly named "flag-o-matic.eclass".
Merely inheriting this eclass adds at least thirteen bytes (that's over
a hundred bits!) of bloat to an ebuild, and that's before we start on
the ridiculously verbose function names. What's all this "replace-flags"
nonsense I ask you?  Any educated programmer can see that "rf" is a far
more useful name. Even those who are not convinced that space needs to
be saved must surely notice how much developer time would be saved
through reduced typing.

It remains a mystery to me how anyone could possibly have overlooked the
following suggestion. Currently, we install 'dependency information'
inside ebuilds. This is blatantly pointless -- as RedHat have so ably
demonstrated with their 'rpm' installer (and, albeit in a non-Linux
environment, I am assured that Microsoft are in the same boat), there is
no need for automatic dependency tracking and resolution. Our users are
more than capable of working this out for themselves. Similarly, the
HOMEPAGE variable is entirely pointless and has been supersede by Google
[1].

Oh, and then we come to metadata.xml. As all the world knows, xml is a
massive waste of space, and (as a data interchange format not a data
storage format) utterly unsuited for configuration files. A typical
metadata.xml file is 95%+ noise. By replacing these with flat text files
listing the maintainers, we could save somewhere in the region of one
and a half megabytes.

Also, no-one has yet considered all the useless fluff in the tree that
nobody actually uses. By removing all ebuilds and eclasses related to
emacs, kde, gnome, php, gaim or java related from the tree, as well as
anything which is only supplied as a binary we could save... Well, I'll
let you do the calculations yourselves. Although mathematics is not the
main focus of my degree, I believe I understand enough to know that the
result is a very big number.

Similarly, all those "compile fix" patches we supply are obviously
worthless.  If anyone has any doubt, I suggest they just look at how
many users are using broken CFLAGS and compilers -- clearly, working
code is not a major concern.  We should of course leave in security
patches, since security is our number one priority.

ChangeLogs are the next thing to fall under my scrutiny. Clearly these
are entirely worthless, since anyone who cares can just read the cvs
logs and use diff. Kiss goodbye to 14MBytes of junk. Hang on? Did I just
say 14MBytes? Yes.  Fourteen Megabytes. That's a one, then a four, then
six zeros. That's fourteen million bytes, or over one hundred and ten
million bits. When syncing my GPRS phone whilst sitting inside a large
metal cage in north Yorkshire, that could save me over TWELVE HOURS on
sync time.

I understand that my previous point may cause a small amount of disquiet
amongst a small proportion of our userbase. After all, how are they
supposed to decide whether to update if they do not know what an update
will change? To them, I must point out that whilst such an attitude is
appropriate for a small hobbyist distribution aimed at skilled users, it
is utterly at odds with what enterprise users require. For them, it is
important that they can perform updates without having to know what they
are doing -- remember that in a corporate environment, any information
is too much information, and time spent reading ChangeLogs is time not
spent doing useful work. Please do not forget that better enterprise
support is our number one priority.

Finally, I must draw KEYWORDS to your scrutiny, and in particular the
misguided choice of ~ to indicate unstable. In ASCII, the tilde
character is represented by the octet 0x7E (hexadecimal), or, in binary,
01111110. A cursory glance at this will show that it contains
significantly more 1 bits than 0 bits. As anyone who has had a basic
schooling in the field of compression can tell you, 1 bits do not
compress as well as 0 bits (they don't have as much empty space in the
middle), so clearly we would be better off picking something else. I
propose the ( character, which has only one 1 bit for every four 0 bits.
Also, I suggest we drop the amd64 keyword and just use x86 to save
space, since we all know fine well that amd64 is just like x86 with a
few extra bits stuck onto the end. Or rather, the start, since x86 gets
its bytes backwards...

Gentlemen, ladies, jforman, I believe those remedies outlined herein are
a far more sensible solution than any other current proposal. I eagerly
await the implementation.

[1]: http://www.google.ca/

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:29     ` Ciaran McCreesh
@ 2004-10-12 21:41       ` Donnie Berkholz
  2004-10-12 22:52         ` Jason Rhinelander
  0 siblings, 1 reply; 41+ messages in thread
From: Donnie Berkholz @ 2004-10-12 21:41 UTC (permalink / raw
  To: gentoo-dev

On Tue, 2004-10-12 at 15:29, Ciaran McCreesh wrote:
> On Tue, 12 Oct 2004 15:29:13 -0700 Jason Rhinelander
> <jason@gossamer-threads.com> wrote:
> | PLEASE be more scientific than "I think it's like 25%".  Let's get it 
> | straight just how much we're talking about, before throwing out things
> 
> Well, if you want to get the numbers that I did, do your measurements
> against the ebuilds, not the entire tree. It gives better numbers that
> way :)

Actually, that's a good idea. Instead of dealing with unknowns, subtract
the knowns. *.ebuild, metadata.xml, digest*, ChangeLog, Manifest,
profiles/, etc. You can figure out how much stuff is actually floating
around the tree rather than how much is nicely named *.patch or *.diff.
-- 
Donnie Berkholz
Gentoo Linux


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
@ 2004-10-12 21:59 ` Roman Gaufman
  2004-10-12 22:29   ` Jason Rhinelander
  2004-10-12 22:11 ` Luke-Jr
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 41+ messages in thread
From: Roman Gaufman @ 2004-10-12 21:59 UTC (permalink / raw
  To: Ciaran McCreesh; +Cc: gentoo-dev

I think its wrong for a maintainer to troll like this on the dev
mailing list -- but then again its ciaranm - he's a joker :)

But seriously, there were some nice ideas in the last few messages,
and it wasnt 50Kb improvements, but a major drop in file number and
size, like 25% drop, thats pretty major -- also, it doesnt affect
developers much in terms of productivity.

Including my idea of syncing based on a changelog, that will reduce
time to literally less than a second without affecting ebuild
maintainers at all.

On Tue, 12 Oct 2004 22:37:25 +0100, Ciaran McCreesh <ciaranm@gentoo.org> wrote:
> It has come to my attention that, during recent weeks, a small number of
> users have been complaining recently about the size of the rsync tree.
> My august colleagues have proposed many ingenious solutions, but
> misfortunately they are all complicated and involve a lot of manual
> work. I believe the following small changes (which can mostly be
> automated) would prove of much larger benefit to the community for a
> vastly reduced cost.
> 
> To begin with, I'd like to draw your attention to comments in ebuilds.
> It is an oft-forgotten fact that these items provide absolutely no
> benefit to the end user. "Surely", I hear you say, "it is not worth
> getting hung up over such an insignificant triviality! What harm do a
> few trifling little remarks do?".  Yet, when actually measured, these
> 'innocent minutiae' (as you might call them had you a penchant for
> obsolete vocabulary or a predilection for pomposity) account for
> approximately 20% of the total ebuild content in the tree. It is obvious
> that an immediate ban upon these silly things, alongside a small script
> to remove them from the tree, would provide a very large gain for our
> users without having to remove any existing code. Adding in a repoman
> check to error out if such lines were present would clearly be a good
> start.
> 
> Next up are blank lines, which, as all the world knows are of no use at
> all to anyone. These account for a staggering 150KBytes of data in the
> main tree, which, over a 9600 dialup line, would save us over two
> minutes on an emerge sync. Again, removing these pointless wastes of
> space via a bash script is trivial.
> 
> Staying with the blank spaces thing, leading whitespaces (which serve no
> practical purpose and are only used to make the code "look pretty" --
> although how a bash script could ever be considered "pretty" is beyond
> my limited mind) account for nearly half a megabyte of data. Clearly
> these should immediately be removed and any developer using them in the
> future should have their cvs access suspended pending a review of their
> status within the project -- as devrel and our managers will tell you,
> being nice to the users is our number one priority.
> 
> There are other trivial ways to save space too. The commonly used helper
> function "emake", for example, is a shocking five bytes in length.
> Replacing this with a much more helpfully named "e", and likewise
> replacing "econf" with "c", would gain something like 50KBytes. If we
> also replace src_unpack, src_compile and src_install with more
> appropriate alternatives we could shave off a further 300KBytes. I have
> no doubt that the reader could extend this logic to the other portage
> internals and common function names, bring the total up to half a
> megabyte or more.
> 
> This can be extended to other functions, of course. In particular I'd
> like to draw your attention to the absurdly named "flag-o-matic.eclass".
> Merely inheriting this eclass adds at least thirteen bytes (that's over
> a hundred bits!) of bloat to an ebuild, and that's before we start on
> the ridiculously verbose function names. What's all this "replace-flags"
> nonsense I ask you?  Any educated programmer can see that "rf" is a far
> more useful name. Even those who are not convinced that space needs to
> be saved must surely notice how much developer time would be saved
> through reduced typing.
> 
> It remains a mystery to me how anyone could possibly have overlooked the
> following suggestion. Currently, we install 'dependency information'
> inside ebuilds. This is blatantly pointless -- as RedHat have so ably
> demonstrated with their 'rpm' installer (and, albeit in a non-Linux
> environment, I am assured that Microsoft are in the same boat), there is
> no need for automatic dependency tracking and resolution. Our users are
> more than capable of working this out for themselves. Similarly, the
> HOMEPAGE variable is entirely pointless and has been supersede by Google
> [1].
> 
> Oh, and then we come to metadata.xml. As all the world knows, xml is a
> massive waste of space, and (as a data interchange format not a data
> storage format) utterly unsuited for configuration files. A typical
> metadata.xml file is 95%+ noise. By replacing these with flat text files
> listing the maintainers, we could save somewhere in the region of one
> and a half megabytes.
> 
> Also, no-one has yet considered all the useless fluff in the tree that
> nobody actually uses. By removing all ebuilds and eclasses related to
> emacs, kde, gnome, php, gaim or java related from the tree, as well as
> anything which is only supplied as a binary we could save... Well, I'll
> let you do the calculations yourselves. Although mathematics is not the
> main focus of my degree, I believe I understand enough to know that the
> result is a very big number.
> 
> Similarly, all those "compile fix" patches we supply are obviously
> worthless.  If anyone has any doubt, I suggest they just look at how
> many users are using broken CFLAGS and compilers -- clearly, working
> code is not a major concern.  We should of course leave in security
> patches, since security is our number one priority.
> 
> ChangeLogs are the next thing to fall under my scrutiny. Clearly these
> are entirely worthless, since anyone who cares can just read the cvs
> logs and use diff. Kiss goodbye to 14MBytes of junk. Hang on? Did I just
> say 14MBytes? Yes.  Fourteen Megabytes. That's a one, then a four, then
> six zeros. That's fourteen million bytes, or over one hundred and ten
> million bits. When syncing my GPRS phone whilst sitting inside a large
> metal cage in north Yorkshire, that could save me over TWELVE HOURS on
> sync time.
> 
> I understand that my previous point may cause a small amount of disquiet
> amongst a small proportion of our userbase. After all, how are they
> supposed to decide whether to update if they do not know what an update
> will change? To them, I must point out that whilst such an attitude is
> appropriate for a small hobbyist distribution aimed at skilled users, it
> is utterly at odds with what enterprise users require. For them, it is
> important that they can perform updates without having to know what they
> are doing -- remember that in a corporate environment, any information
> is too much information, and time spent reading ChangeLogs is time not
> spent doing useful work. Please do not forget that better enterprise
> support is our number one priority.
> 
> Finally, I must draw KEYWORDS to your scrutiny, and in particular the
> misguided choice of ~ to indicate unstable. In ASCII, the tilde
> character is represented by the octet 0x7E (hexadecimal), or, in binary,
> 01111110. A cursory glance at this will show that it contains
> significantly more 1 bits than 0 bits. As anyone who has had a basic
> schooling in the field of compression can tell you, 1 bits do not
> compress as well as 0 bits (they don't have as much empty space in the
> middle), so clearly we would be better off picking something else. I
> propose the ( character, which has only one 1 bit for every four 0 bits.
> Also, I suggest we drop the amd64 keyword and just use x86 to save
> space, since we all know fine well that amd64 is just like x86 with a
> few extra bits stuck onto the end. Or rather, the start, since x86 gets
> its bytes backwards...
> 
> Gentlemen, ladies, jforman, I believe those remedies outlined herein are
> a far more sensible solution than any other current proposal. I eagerly
> await the implementation.
> 
> [1]: http://www.google.ca/
> 
> --
> Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
> Mail            : ciaranm at gentoo.org
> Web             : http://dev.gentoo.org/~ciaranm
> 
> 
>

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
  2004-10-12 21:59 ` Roman Gaufman
@ 2004-10-12 22:11 ` Luke-Jr
  2004-10-12 22:28   ` Colin Kingsley
                     ` (2 more replies)
  2004-10-12 22:31 ` Robin H. Johnson
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 41+ messages in thread
From: Luke-Jr @ 2004-10-12 22:11 UTC (permalink / raw
  To: gentoo-dev

On Tuesday 12 October 2004 9:37 pm, Ciaran McCreesh wrote:
> It has come to my attention that, during recent weeks, a small number of
> users have been complaining recently about the size of the rsync tree.
> My august colleagues have proposed many ingenious solutions, but
> misfortunately they are all complicated and involve a lot of manual
> work. I believe the following small changes (which can mostly be
> automated) would prove of much larger benefit to the community for a
> vastly reduced cost.

The tree size is only a variable for users because they are required to have 
the entire tree on their computer. If Portage fetched these on-demand, it 
wouldn't be a problem. But that's already been discussed...

>
> To begin with, I'd like to draw your attention to comments in ebuilds.
> It is an oft-forgotten fact that these items provide absolutely no
> benefit to the end user. "Surely", I hear you say, "it is not worth
> getting hung up over such an insignificant triviality! What harm do a
> few trifling little remarks do?".  Yet, when actually measured, these
> 'innocent minutiae' (as you might call them had you a penchant for
> obsolete vocabulary or a predilection for pomposity) account for
> approximately 20% of the total ebuild content in the tree. It is obvious
> that an immediate ban upon these silly things, alongside a small script
> to remove them from the tree, would provide a very large gain for our
> users without having to remove any existing code. Adding in a repoman
> check to error out if such lines were present would clearly be a good
> start.

Even if they do take up a lot of space, they are often important so other 
devs/users can know why something was done. Perhaps the copyright comments 
should be removed, though, as copyrights exist with or without declaration of 
them.

>
> Next up are blank lines, which, as all the world knows are of no use at
> all to anyone. These account for a staggering 150KBytes of data in the
> main tree, which, over a 9600 dialup line, would save us over two
> minutes on an emerge sync. Again, removing these pointless wastes of
> space via a bash script is trivial.

Blank lines are often nice for readability. 150K isn't much; if you're on 
dialup, two minutes is nothing, not to mention the fact that nobody should be 
on 9600 let alone 28.8... Also, that 2 minutes is assuming you're just 
downloading the tree. Rsync isn't a simple file transfer protocol.

>
> Staying with the blank spaces thing, leading whitespaces (which serve no
> practical purpose and are only used to make the code "look pretty" --
> although how a bash script could ever be considered "pretty" is beyond
> my limited mind) account for nearly half a megabyte of data. Clearly
> these should immediately be removed and any developer using them in the
> future should have their cvs access suspended pending a review of their
> status within the project -- as devrel and our managers will tell you,
> being nice to the users is our number one priority.

Yet another readability issue. Half a meg isn't much.

>
> There are other trivial ways to save space too. The commonly used helper
> function "emake", for example, is a shocking five bytes in length.
> Replacing this with a much more helpfully named "e", and likewise
> replacing "econf" with "c", would gain something like 50KBytes. If we
> also replace src_unpack, src_compile and src_install with more
> appropriate alternatives we could shave off a further 300KBytes. I have
> no doubt that the reader could extend this logic to the other portage
> internals and common function names, bring the total up to half a
> megabyte or more.

Developers are volunteers... are you seriously suggesting killing all 
readability just to make syncs a bit shorter? I wonder how many current 
volunteers would tolerate this.

>
> This can be extended to other functions, of course. In particular I'd
> like to draw your attention to the absurdly named "flag-o-matic.eclass".
> Merely inheriting this eclass adds at least thirteen bytes (that's over
> a hundred bits!) of bloat to an ebuild, and that's before we start on
> the ridiculously verbose function names. What's all this "replace-flags"
> nonsense I ask you?  Any educated programmer can see that "rf" is a far
> more useful name. Even those who are not convinced that space needs to
> be saved must surely notice how much developer time would be saved
> through reduced typing.

And annoyed by having to memorize what meaningless abbreviates mean...

>
> It remains a mystery to me how anyone could possibly have overlooked the
> following suggestion. Currently, we install 'dependency information'
> inside ebuilds. This is blatantly pointless -- as RedHat have so ably
> demonstrated with their 'rpm' installer (and, albeit in a non-Linux
> environment, I am assured that Microsoft are in the same boat), there is
> no need for automatic dependency tracking and resolution. Our users are
> more than capable of working this out for themselves. Similarly, the
> HOMEPAGE variable is entirely pointless and has been supersede by Google
> [1].

Ok, suggesting removal of dependency info has me convinced this is a bad 
joke...

>
> Oh, and then we come to metadata.xml. As all the world knows, xml is a
> massive waste of space, and (as a data interchange format not a data
> storage format) utterly unsuited for configuration files. A typical
> metadata.xml file is 95%+ noise. By replacing these with flat text files
> listing the maintainers, we could save somewhere in the region of one
> and a half megabytes.

And I'm sure rsync can probably filter out *.xml client-side...

>
> Also, no-one has yet considered all the useless fluff in the tree that
> nobody actually uses. By removing all ebuilds and eclasses related to
> emacs, kde, gnome, php, gaim or java related from the tree, as well as
> <snip>

No more reading. It's a joke.
-- 
Luke-Jr
Developer, Utopios
http://utopios.org/

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:11 ` Luke-Jr
@ 2004-10-12 22:28   ` Colin Kingsley
  2004-10-13  1:14   ` Alan Frazier
  2004-10-13  3:17   ` Ed Grimm
  2 siblings, 0 replies; 41+ messages in thread
From: Colin Kingsley @ 2004-10-12 22:28 UTC (permalink / raw
  To: gentoo-dev

A) Ciaran, thats hilarious

B) @($* people, it was a joke. Chill out, have a beer or something.

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:29   ` Jason Rhinelander
@ 2004-10-12 22:29     ` Ciaran McCreesh
  2004-10-12 21:41       ` Donnie Berkholz
  2004-10-12 22:58     ` Daniel Goller
  2004-10-13  4:05     ` Nicholas Jones
  2 siblings, 1 reply; 41+ messages in thread
From: Ciaran McCreesh @ 2004-10-12 22:29 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On Tue, 12 Oct 2004 15:29:13 -0700 Jason Rhinelander
<jason@gossamer-threads.com> wrote:
| PLEASE be more scientific than "I think it's like 25%".  Let's get it 
| straight just how much we're talking about, before throwing out things

Well, if you want to get the numbers that I did, do your measurements
against the ebuilds, not the entire tree. It gives better numbers that
way :)

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:59 ` Roman Gaufman
@ 2004-10-12 22:29   ` Jason Rhinelander
  2004-10-12 22:29     ` Ciaran McCreesh
                       ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Jason Rhinelander @ 2004-10-12 22:29 UTC (permalink / raw
  To: Roman Gaufman; +Cc: Ciaran McCreesh, gentoo-dev

I actually quite appreciated ciaranm's response - it shows he's not 
buying into the whole "Gentoo is rice" deal, and I think that's a very 
good thing for Gentoo and Gentoo's users.

PLEASE be more scientific than "I think it's like 25%".  Let's get it 
straight just how much we're talking about, before throwing out things 
like "pretty major" or "like 25%".

First, everything in portage:

find . \( -not -path './distfiles/*' \) | perl -lne '$s+=-s; $c++; 
END{print "$c files, $s bytes total"}'
100120 files, 403235992 bytes total

Now, all the diffs/patches in the files/ directories:

find . -path '*/*/files/*.patch' -or -path "*/*/files/*.diff" | perl 
-lne '$s+=-s; $c++; END{print "$c files, $s bytes total"}'
5082 files, 13142437 bytes total

So, by removing all of these, you end up removing 5% -- not 25% -- of 
the files from portage, comprising 3% of the space used by a portage 
tree.  Is this really "pretty major"?

-- Jason Rhinelander
-- Gossamer Threads, Inc.


Roman Gaufman wrote:
> I think its wrong for a maintainer to troll like this on the dev
> mailing list -- but then again its ciaranm - he's a joker :)
> 
> But seriously, there were some nice ideas in the last few messages,
> and it wasnt 50Kb improvements, but a major drop in file number and
> size, like 25% drop, thats pretty major -- also, it doesnt affect
> developers much in terms of productivity.
> 
> Including my idea of syncing based on a changelog, that will reduce
> time to literally less than a second without affecting ebuild
> maintainers at all.
>

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
  2004-10-12 21:59 ` Roman Gaufman
  2004-10-12 22:11 ` Luke-Jr
@ 2004-10-12 22:31 ` Robin H. Johnson
  2004-10-13  7:01   ` Spider
  2004-10-13  9:52 ` Paul de Vrieze
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 41+ messages in thread
From: Robin H. Johnson @ 2004-10-12 22:31 UTC (permalink / raw
  To: Gentoo Developers

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On Tue, Oct 12, 2004 at 10:37:25PM +0100, Ciaran McCreesh wrote:
> It has come to my attention that, during recent weeks, a small number of
> users have been complaining recently about the size of the rsync tree.
> My august colleagues have proposed many ingenious solutions, but
> misfortunately they are all complicated and involve a lot of manual
> work. I believe the following small changes (which can mostly be
> automated) would prove of much larger benefit to the community for a
> vastly reduced cost.
[snip to a wonderfully worded set of satirical suggestions]

For real benefits, reducing the number of files, or using a filesystem
that performs tail packing reduces the amount of disk seek that must be
done, really increases performance given the number of small files.

-- 
Robin Hugh Johnson
E-Mail     : robbat2@orbis-terrarum.net
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ#       : 30269588 or 41961639
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:41       ` Donnie Berkholz
@ 2004-10-12 22:52         ` Jason Rhinelander
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Rhinelander @ 2004-10-12 22:52 UTC (permalink / raw
  To: gentoo-dev

Donnie Berkholz wrote:
> On Tue, 2004-10-12 at 15:29, Ciaran McCreesh wrote:
> 
>>On Tue, 12 Oct 2004 15:29:13 -0700 Jason Rhinelander
>><jason@gossamer-threads.com> wrote:
>>| PLEASE be more scientific than "I think it's like 25%".  Let's get it 
>>| straight just how much we're talking about, before throwing out things
>>
>>Well, if you want to get the numbers that I did, do your measurements
>>against the ebuilds, not the entire tree. It gives better numbers that
>>way :)
> 
> 
> Actually, that's a good idea. Instead of dealing with unknowns, subtract
> the knowns. *.ebuild, metadata.xml, digest*, ChangeLog, Manifest,
> profiles/, etc. You can figure out how much stuff is actually floating
> around the tree rather than how much is nicely named *.patch or *.diff.

Well, let's see; this should be everything other than digests in files/ 
directories:

find . -path '*/*/files/*' -and -not -path "*/*/files/digest*" | perl 
-lne '$s+=-s; $c++; END{print "$c files, $s bytes total"}'
8776 files, 19822736 bytes total

So now we're at ~ 9% files, ~ 5% space - but this is including all sorts 
of other, legitimate things from files/, such as conf files, init 
scripts, etc.  There may be various other cruft floating around, but the 
original suggestion and comments were specifically about patches.

-- Jason Rhinelander
-- Gossamer Threads, Inc.

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:29   ` Jason Rhinelander
  2004-10-12 22:29     ` Ciaran McCreesh
@ 2004-10-12 22:58     ` Daniel Goller
  2004-10-13  4:05     ` Nicholas Jones
  2 siblings, 0 replies; 41+ messages in thread
From: Daniel Goller @ 2004-10-12 22:58 UTC (permalink / raw
  To: Jason Rhinelander; +Cc: Roman Gaufman, Ciaran McCreesh, gentoo-dev

the point of the serious thread was to look for the .tar.bz2, .tar.gz, 
.tbz2, .tgz2 and possible other binary files together with the files you 
summed up

Jason Rhinelander wrote:

> I actually quite appreciated ciaranm's response - it shows he's not 
> buying into the whole "Gentoo is rice" deal, and I think that's a very 
> good thing for Gentoo and Gentoo's users.
>
> PLEASE be more scientific than "I think it's like 25%".  Let's get it 
> straight just how much we're talking about, before throwing out things 
> like "pretty major" or "like 25%".
>
> First, everything in portage:
>
> find . \( -not -path './distfiles/*' \) | perl -lne '$s+=-s; $c++; 
> END{print "$c files, $s bytes total"}'
> 100120 files, 403235992 bytes total
>
> Now, all the diffs/patches in the files/ directories:
>
> find . -path '*/*/files/*.patch' -or -path "*/*/files/*.diff" | perl 
> -lne '$s+=-s; $c++; END{print "$c files, $s bytes total"}'
> 5082 files, 13142437 bytes total
>
> So, by removing all of these, you end up removing 5% -- not 25% -- of 
> the files from portage, comprising 3% of the space used by a portage 
> tree.  Is this really "pretty major"?
>
> -- Jason Rhinelander
> -- Gossamer Threads, Inc.
>
>
> Roman Gaufman wrote:
>
>> I think its wrong for a maintainer to troll like this on the dev
>> mailing list -- but then again its ciaranm - he's a joker :)
>>
>> But seriously, there were some nice ideas in the last few messages,
>> and it wasnt 50Kb improvements, but a major drop in file number and
>> size, like 25% drop, thats pretty major -- also, it doesnt affect
>> developers much in terms of productivity.
>>
>> Including my idea of syncing based on a changelog, that will reduce
>> time to literally less than a second without affecting ebuild
>> maintainers at all.
>>
>
> -- 
> gentoo-dev@gentoo.org mailing list
>
>

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:11 ` Luke-Jr
  2004-10-12 22:28   ` Colin Kingsley
@ 2004-10-13  1:14   ` Alan Frazier
  2004-10-17 21:45     ` Philippe Trottier
  2004-10-13  3:17   ` Ed Grimm
  2 siblings, 1 reply; 41+ messages in thread
From: Alan Frazier @ 2004-10-13  1:14 UTC (permalink / raw
  To: gentoo-dev

Luke-Jr <luke-jr@utopios.org> wrote:
> Ok, suggesting removal of dependency info has me convinced
> this is a bad joke...

The Force is not strong with this one.

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:11 ` Luke-Jr
  2004-10-12 22:28   ` Colin Kingsley
  2004-10-13  1:14   ` Alan Frazier
@ 2004-10-13  3:17   ` Ed Grimm
  2 siblings, 0 replies; 41+ messages in thread
From: Ed Grimm @ 2004-10-13  3:17 UTC (permalink / raw
  To: Luke-Jr; +Cc: gentoo-dev

On Tue, 12 Oct 2004, Luke-Jr wrote:

> On Tuesday 12 October 2004 9:37 pm, Ciaran McCreesh wrote:
>> It has come to my attention that, during recent weeks, a small number of
>> users have been complaining recently about the size of the rsync tree.
>> My august colleagues have proposed many ingenious solutions, but
>> misfortunately they are all complicated and involve a lot of manual
>> work. I believe the following small changes (which can mostly be
>> automated) would prove of much larger benefit to the community for a
>> vastly reduced cost.
>
> The tree size is only a variable for users because they are required to have
> the entire tree on their computer. If Portage fetched these on-demand, it
> wouldn't be a problem. But that's already been discussed...

I'm naive.  Where/when?  I see a bit of conversation last week.

My thoughts on this would be to have a RSYNC_INCLUDEFROM, which is used
except for once every RSYNC_EVERYTHING_INTERVAL syncs, when the whole
tree is synced.  This would only be used if configured.  If
RSYNC_AUTOINCLUDE is set, every new package installed, whether by
explicit merge or a dependancy merge, is also added to
RSYNC_INCLUDEFROM, and any package removed by unmerge or depclean is
removed from RSYNC_INCLUDEFROM.

Note that I realize this is not full on-demand fetching; if an
application is not listed, it's not fetched, most of the time.  But it
does fetch only those applications demanded.  I also know that changing
dependancies can cause it major problems.  However, since my scheme does
have it doing occasional full syncs (for example, once a week, Saturday
evening), this problem should be reasonably mitigated - if ever a
program gains a dependancy that has not yet existed for a week, you
don't want it running on a production server anyway.

This should satisfy those people inclined to run underpowered[1] production
systems with automated syncs - after all, on a production system, you
better know all the packages you care about.  It also doesn't kill the
sync server with N connections.  I personally would code this feature so
that it would enforce a minimum of 24 hours between syncs (by activating
a separate feature, so that others could use it also); a real enterprise
server, where stability trumps latest-and-greatest, wouldn't be updating
itself that often[2].


While I am not a Python programmer, I could try my hand at a
proof-of-concept if the rest of the underpowered coalition is
interested, but interested Python programmers don't exist.  Chances are
good anything I'd code would work, but would be rejected by any python
coders as looking like some mad monkey took bits and pieces of code from
elsewhere in the program, put them together in a manner that somehow
worked, and then tried to make it purty; this is not due to my not being
a decent programmer, but rather only knowing python uses indentation
instead of curlies, and it somehow lives without line end characters.
(Well, ok, I know a bit more, but everything else I know about it comes
from looking at portage code.)

Ed

[1] Note to anyone who feels annoyed that I'm calling their production
systems underpowered: If your system doesn't have enough spare resources
to the extent that an emerge sync during an idle time takes 5 times
longer than if the server tasks were turned off, it doesn't have enough
spare resources to handle a real peak either.  I understand you may not
be able to afford better.  But one should be realistic about what one is
running.  Of course, if your production system does an emerge sync in 15
seconds during peak load, well, I wonder why you're reading this, as
your system's apparently ludicrously overpowered.  And I want some of
your bandwidth.

[2] I generally find security by design greatly reduces the
vulnerability to threats.  Not to mention, my inclination would be to
load the update on a test system, regression test it and fix test it if
possible, and then force install the binary package created when loading
it on the test box to all production machines.  If you have more than a
handfull of machines, I'd think some software aid for this would be in
order.  Shouldn't be too hard; I can think of several models that should
work for low or medium security environments.  If you have a high
security environment, don't even try to tell me why you're running
automatic updates without testing from the Internet.  (After all, you
may trust gentoo, but they don't own all of their mirrors.)

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:29   ` Jason Rhinelander
  2004-10-12 22:29     ` Ciaran McCreesh
  2004-10-12 22:58     ` Daniel Goller
@ 2004-10-13  4:05     ` Nicholas Jones
  2004-10-13  5:17       ` George Shapovalov
                         ` (2 more replies)
  2 siblings, 3 replies; 41+ messages in thread
From: Nicholas Jones @ 2004-10-13  4:05 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]


> PLEASE be more scientific

I agree. Also... stop including more than you comment on!
Way too many people do this and it bothers me and all that
is wave and/or particle based.

> First, everything in portage:
> 
> find . \( -not -path './distfiles/*' \) | perl -lne '$s+=-s; $c++; 
> END{print "$c files, $s bytes total"}'
> 100120 files, 403235992 bytes total

Now... about this scientific thing... We appear to have a problem
in your method and/or collection process here... Or you have some
impressively large clustering on your box.

Here are some easily obtainable numbers... Rsync provides them.

Number of files: 100450
Total file size: 77658481 bytes
File list size: 2274372


Given this info and a fairly sane FS... like Reiser3 with tail
packing... You consume around 80 megs of disk space... About
one-fifth of what you mention... Even with 330 files less!

Spork.

--NJ


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  4:05     ` Nicholas Jones
@ 2004-10-13  5:17       ` George Shapovalov
  2004-10-13  5:49       ` Georgi Georgiev
  2004-10-13  6:14       ` Jason Rhinelander
  2 siblings, 0 replies; 41+ messages in thread
From: George Shapovalov @ 2004-10-13  5:17 UTC (permalink / raw
  To: gentoo-dev

On Tuesday 12 October 2004 21:05, Nicholas Jones wrote:
> Given this info and a fairly sane FS... like Reiser3 with tail
> packing... You consume around 80 megs of disk space... About
> one-fifth of what you mention... Even with 330 files less!
You know, given bunch of under 1k files and typical block size of 4k on file 
systems nowadays, reiserfs might be the reason there is 5x size difference in 
these two examples :). Oh, that and 45 min sync time quite as well (although 
still highly unlikely even on ext2).

George


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  4:05     ` Nicholas Jones
  2004-10-13  5:17       ` George Shapovalov
@ 2004-10-13  5:49       ` Georgi Georgiev
  2004-10-13  6:55         ` Robin H. Johnson
  2004-10-13  6:14       ` Jason Rhinelander
  2 siblings, 1 reply; 41+ messages in thread
From: Georgi Georgiev @ 2004-10-13  5:49 UTC (permalink / raw
  To: gentoo-dev

maillog: 13/10/2004-00:05:20(-0400): Nicholas Jones types
> Given this info and a fairly sane FS... like Reiser3 with tail
> packing... You consume around 80 megs of disk space... About
> one-fifth of what you mention... Even with 330 files less!

What tools are there that can measure this size? "du" shows weird
numbers on reiserfs.

-- 
/    Georgi Georgiev   /  interlard - vt., to intersperse; diversify   /
\     chutz@gg3.net    \  -- Webster's New World Dictionary Of The     \
/   +81(90)6266-1163   /  American Language                            /

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  4:05     ` Nicholas Jones
  2004-10-13  5:17       ` George Shapovalov
  2004-10-13  5:49       ` Georgi Georgiev
@ 2004-10-13  6:14       ` Jason Rhinelander
  2 siblings, 0 replies; 41+ messages in thread
From: Jason Rhinelander @ 2004-10-13  6:14 UTC (permalink / raw
  To: gentoo-dev

Nicholas Jones wrote:
> > find . \( -not -path './distfiles/*' \) | perl -lne '$s+=-s; $c++; 
> > END{print "$c files, $s bytes total"}'
> > 100120 files, 403235992 bytes total
> 
> Now... about this scientific thing...

Oops, my mistake - it seems I had a few binary packages in packages/.  The corrected figure, from a
brand-new /usr/portage, for those interested:

find . -not -path './distfiles/*' -and -not -path './packages/*' | perl -lne '-f and $s+=-s; $c++;
END{print "$c files/dirs, $s bytes total"}'
100448 files/dirs, 77657813 bytes total

With the correction, the patches works out to about 15% of the tree.

-- Jason Rhinelander


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  5:49       ` Georgi Georgiev
@ 2004-10-13  6:55         ` Robin H. Johnson
  0 siblings, 0 replies; 41+ messages in thread
From: Robin H. Johnson @ 2004-10-13  6:55 UTC (permalink / raw
  To: Gentoo Developers

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On Wed, Oct 13, 2004 at 02:49:00PM +0900, Georgi Georgiev wrote:
> maillog: 13/10/2004-00:05:20(-0400): Nicholas Jones types
> > Given this info and a fairly sane FS... like Reiser3 with tail
> > packing... You consume around 80 megs of disk space... About
> > one-fifth of what you mention... Even with 330 files less!
> What tools are there that can measure this size? "du" shows weird
> numbers on reiserfs.
'du -b' presents correct byte sizes (not whole blocks)

-- 
Robin Hugh Johnson
E-Mail     : robbat2@orbis-terrarum.net
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ#       : 30269588 or 41961639
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 22:31 ` Robin H. Johnson
@ 2004-10-13  7:01   ` Spider
  2004-10-13  7:31     ` Robin H. Johnson
  0 siblings, 1 reply; 41+ messages in thread
From: Spider @ 2004-10-13  7:01 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

begin  quote
On Tue, 12 Oct 2004 15:31:47 -0700
"Robin H. Johnson" <robbat2@gentoo.org> wrote:


First off,   Congratulations for your Satire, Ciaran.  Been reading much
of Mark Twain lately?


> For real benefits, reducing the number of files, or using a filesystem
> that performs tail packing reduces the amount of disk seek that must
> be done, really increases performance given the number of small files.


Well, here's another method ;)

/root/portage.img on /usr/portage type ext2 (rw,noatime,loop=/dev/loop0)
-rw-r--r--  1 root root 293M Oct 12 23:17 /root/portage.img
/root/portage.img     257M  195M   62M  77% /usr/portage 


some varied interesting things from tune2fs -l 
Filesystem features:      dir_index sparse_super
Inode count:              300144
Block count:              300000
Free blocks:              62825
Free inodes:              154512
Block size:               1024
Fragment size:            1024


//Spider


-- 
begin  .signature
Tortured users / Laughing in pain
See Microsoft KB Article Q265230 for more information.
end

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  7:01   ` Spider
@ 2004-10-13  7:31     ` Robin H. Johnson
  2004-10-13  9:21       ` Spider
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Robin H. Johnson @ 2004-10-13  7:31 UTC (permalink / raw
  To: Gentoo Developers; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

On Wed, Oct 13, 2004 at 09:01:06AM +0200, Spider wrote:
> > For real benefits, reducing the number of files, or using a filesystem
> > that performs tail packing reduces the amount of disk seek that must
> > be done, really increases performance given the number of small files.
This is still applicable to your method as well.

The one thing that your (previously known) method does bring out is that
reducing the I/O required really helps.

> Well, here's another method ;)
> 
> /root/portage.img on /usr/portage type ext2 (rw,noatime,loop=/dev/loop0)
> -rw-r--r--  1 root root 293M Oct 12 23:17 /root/portage.img
> /root/portage.img     257M  195M   62M  77% /usr/portage 
> 
> 
> some varied interesting things from tune2fs -l 
> Filesystem features:      dir_index sparse_super
> Inode count:              300144
> Block count:              300000
> Free blocks:              62825
> Free inodes:              154512
> Block size:               1024
> Fragment size:            1024
Pack it into a loopback reiserfs instead, way better performance.  For
an even bigger boost, put the loop file into tmpfs or use some other
direct memory scheme.

See:
http://dev.gentoo.org/~robbat2/fastcvstest

I developed the above when I was working on super-fast CVS repositories,
as I needed my client to not be the bottleneck ;-). My record for a
complete CVS checkout of gentoo-x86 (over the network to a remote
client), stands at 65 seconds. This is quite a bit more work than an
rsync checkout as well.

Provided you can assure only a single client is using the loopback
system, here is a very good way of keeping it fast, but not needing the
network traffic of a full checkout:
portage loop file is usually on disk, when a sync is needed:
1. umount loop file
2. copy loop file to /dev/shm or other fast place
3. mount loop file again (from new location)
4. run updates to loop filesystem ('cvs up; emerge metadata' or 'emerge sync') 
5. umount loop file, copy back to disk
6. mount loop file again

The optimal reiserfs mount options are approximately:
noexec,nosuid,nodev,noatime,nodiratime,nolog

Your performance may vary with nolog, I use it for the workload of the
CVS server tmpdir, which is a very frequent creation of 50,000 tiny
files [for every checkout/update].

Solar has been doing work on putting the contents of the tree into a
read-only squashfs filesystem and distributing that.

-- 
Robin Hugh Johnson
E-Mail     : robbat2@orbis-terrarum.net
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ#       : 30269588 or 41961639
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  7:31     ` Robin H. Johnson
@ 2004-10-13  9:21       ` Spider
  2004-10-14 14:07       ` Ned Ludd
  2004-10-14 16:24       ` Luke-Jr
  2 siblings, 0 replies; 41+ messages in thread
From: Spider @ 2004-10-13  9:21 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

begin  quote
On Wed, 13 Oct 2004 00:31:34 -0700
"Robin H. Johnson" <robbat2@gentoo.org> wrote:

> The one thing that your (previously known) method does bring out is
> that reducing the I/O required really helps.

Agreed,  preferrably we should be able to distribute a bin-ball of the
files that we can rsync out, since if a file doesn't move inside it (as
is the case with filesystems) they stay on the same place and gets good
replication in rsync.


> Pack it into a loopback reiserfs instead, way better performance. 
actually not.  it kills performance to put a journalled filesystem
in a loopback system onto a data-journalling filesystem.

I care more about data and fragmentation than I care about performance,
that was the sole reason I first spiralled it off like this. The tree is
constrained.

As for the reiser protectionist idea,   *cough*     I don't have good
experiences with reiser, tailpacking, and performance.


>  For an even bigger boost, put the loop file into tmpfs or use some
>  other direct memory scheme.

Kills reliability. And performance isn't everything.  (could just as
well increase write times on the hosting fs, increase it even more on
the loopback, then simply "cat filesystem.img > /dev/null"  before
operations.  I don't. )


Overall, I'm not after complete performance.  I walk the middle road,
between performance and reliability.   


//Spider


-- 
begin  .signature
Tortured users / Laughing in pain
See Microsoft KB Article Q265230 for more information.
end

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
                   ` (2 preceding siblings ...)
  2004-10-12 22:31 ` Robin H. Johnson
@ 2004-10-13  9:52 ` Paul de Vrieze
  2004-10-14 14:43 ` Mark Dierolf
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Paul de Vrieze @ 2004-10-13  9:52 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Tuesday 12 October 2004 23:37, Ciaran McCreesh wrote:
> It has come to my attention that, during recent weeks, a small number
> of users have been complaining recently about the size of the rsync
> tree. My august colleagues have proposed many ingenious solutions, but
> misfortunately they are all complicated and involve a lot of manual
> work. I believe the following small changes (which can mostly be
> automated) would prove of much larger benefit to the community for a
> vastly reduced cost.

I understand your concerns, but urge you to consider the option of 
postprocessing ebuilds when they get transferred from CVS to the rsync 
master. To be able to read an ebuild efficiently we need spaces, tabs, 
AND COMMENTS. The thing needs to be maintainable. Sometimes comments are 
not so much for users as for developers.

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  7:31     ` Robin H. Johnson
  2004-10-13  9:21       ` Spider
@ 2004-10-14 14:07       ` Ned Ludd
  2004-10-14 16:24       ` Luke-Jr
  2 siblings, 0 replies; 41+ messages in thread
From: Ned Ludd @ 2004-10-14 14:07 UTC (permalink / raw
  To: Robin H. Johnson; +Cc: Gentoo Developers

[-- Attachment #1: Type: text/plain, Size: 3318 bytes --]

On Wed, 2004-10-13 at 03:31, Robin H. Johnson wrote:
> On Wed, Oct 13, 2004 at 09:01:06AM +0200, Spider wrote:
> > > For real benefits, reducing the number of files, or using a filesystem
> > > that performs tail packing reduces the amount of disk seek that must
> > > be done, really increases performance given the number of small files.
> This is still applicable to your method as well.
> 
> The one thing that your (previously known) method does bring out is that
> reducing the I/O required really helps.
> 
> > Well, here's another method ;)
> > 
> > /root/portage.img on /usr/portage type ext2 (rw,noatime,loop=/dev/loop0)
> > -rw-r--r--  1 root root 293M Oct 12 23:17 /root/portage.img
> > /root/portage.img     257M  195M   62M  77% /usr/portage 
> > 
> > 
> > some varied interesting things from tune2fs -l 
> > Filesystem features:      dir_index sparse_super
> > Inode count:              300144
> > Block count:              300000
> > Free blocks:              62825
> > Free inodes:              154512
> > Block size:               1024
> > Fragment size:            1024
> Pack it into a loopback reiserfs instead, way better performance.  For
> an even bigger boost, put the loop file into tmpfs or use some other
> direct memory scheme.
> 
> See:
> http://dev.gentoo.org/~robbat2/fastcvstest
> 
> I developed the above when I was working on super-fast CVS repositories,
> as I needed my client to not be the bottleneck ;-). My record for a
> complete CVS checkout of gentoo-x86 (over the network to a remote
> client), stands at 65 seconds. This is quite a bit more work than an
> rsync checkout as well.
> 
> Provided you can assure only a single client is using the loopback
> system, here is a very good way of keeping it fast, but not needing the
> network traffic of a full checkout:
> portage loop file is usually on disk, when a sync is needed:
> 1. umount loop file
> 2. copy loop file to /dev/shm or other fast place
> 3. mount loop file again (from new location)
> 4. run updates to loop filesystem ('cvs up; emerge metadata' or 'emerge sync') 
> 5. umount loop file, copy back to disk
> 6. mount loop file again
> 
> The optimal reiserfs mount options are approximately:
> noexec,nosuid,nodev,noatime,nodiratime,nolog
> 
> Your performance may vary with nolog, I use it for the workload of the
> CVS server tmpdir, which is a very frequent creation of 50,000 tiny
> files [for every checkout/update].
> 
> Solar has been doing work on putting the contents of the tree into a
> read-only squashfs filesystem and distributing that.

New loopback size is 11M after reading this thread and dumping ChangeLog
& metadata.xml files which does seem like a perfectly feasible thing for
us to do. Removing leading/trailing whitespace and erroneous newlines
yielded no noticeable gains.

For fun I took it a step further to see what we could get if we moved
away from having locally stored digest/Manifest files then re-compressed
and got the portage tree down 8.5M. Yeah that's 8.5M down from Spiders
195M at a cost savings of 187.5M. I don't think dumping the
digest/Manifest would be to feasible at this time however.

-- 
Ned Ludd <solar@gentoo.org>
Gentoo (hardened,security,infrastructure,embedded,toolchain) Developer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
                   ` (3 preceding siblings ...)
  2004-10-13  9:52 ` Paul de Vrieze
@ 2004-10-14 14:43 ` Mark Dierolf
  2004-10-14 14:49   ` Ciaran McCreesh
  2004-10-14 15:14   ` Patrick Lauer
  2004-10-15  0:57 ` Jason Huebel
  2004-10-18  9:16 ` Wolfram Schlich
  6 siblings, 2 replies; 41+ messages in thread
From: Mark Dierolf @ 2004-10-14 14:43 UTC (permalink / raw
  To: gentoo-dev

I've been watching this discussion as far as tree size, and i'm suprised 
nobody has brought the idea of on-demand downloading yet.

I don't see the point of having to run rsync every night. Portage should 
auto-download ebuilds when it needs them. I don't think I use even 5% of the 
ebuilds in portage. It's a total waste of space. It's a total waste of 
bandwidth. It makes me feel hurt inside, that my bandwidth is being 
squandered.

Though, half the fun of gentoo is watching pages after pages of confusing 
compile/rsync/etc. data scroll by nightly.

With all the bandwidth we would save, we could make a little gentoo anthem mp3 
download and play while the user is emerging.  Something like "Ride of the 
Valkyries" mixed with "C&C Music Factory".

Let's start the flame war :)

Mark Dierolf


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 14:43 ` Mark Dierolf
@ 2004-10-14 14:49   ` Ciaran McCreesh
  2004-10-14 15:17     ` Georgi Georgiev
                       ` (2 more replies)
  2004-10-14 15:14   ` Patrick Lauer
  1 sibling, 3 replies; 41+ messages in thread
From: Ciaran McCreesh @ 2004-10-14 14:49 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 468 bytes --]

On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@3e0.com> wrote:
| I've been watching this discussion as far as tree size, and i'm
| suprised nobody has brought the idea of on-demand downloading yet.

Nobody has mentioned it because it has been discussed and dismissed as
unworkable several times before.

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 14:43 ` Mark Dierolf
  2004-10-14 14:49   ` Ciaran McCreesh
@ 2004-10-14 15:14   ` Patrick Lauer
  1 sibling, 0 replies; 41+ messages in thread
From: Patrick Lauer @ 2004-10-14 15:14 UTC (permalink / raw
  To: Mark Dierolf; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]

On Thu, 2004-10-14 at 16:43, Mark Dierolf wrote:
> I don't see the point of having to run rsync every night.
Then don't. On some machines I only run it every month or so and they
still work.

>  Portage should auto-download ebuilds when it needs them.
So you only have to rsync the dependency info. You save maybe 50%
traffic, but need some ebuild servers that will be hit by millions of
small requests for single ebuilds. No thanks.

>  I don't think I use even 5% of the ebuilds in portage. It's a total waste of space. It's a total waste of 
> bandwidth. It makes me feel hurt inside, that my bandwidth is being 
> squandered.
So maybe a x86-only, a ppc-only, ... rsync repository would be more useful?
I'm sure that it's not much better, but it would reduce the number of
files.

> Though, half the fun of gentoo is watching pages after pages of confusing 
> compile/rsync/etc. data scroll by nightly.
You do know about obsessive-compulsive disorder, right? ;-)

> With all the bandwidth we would save, we could make a little gentoo anthem mp3 
> download and play while the user is emerging.  Something like "Ride of the 
> Valkyries" mixed with "C&C Music Factory".
How about a fractal-based rsync visualizer? ;-)

> Let's start the flame war :)
I thought those were only on tuesdays?

Patrick

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 14:49   ` Ciaran McCreesh
@ 2004-10-14 15:17     ` Georgi Georgiev
  2004-10-14 16:30     ` Luke-Jr
  2004-10-14 17:05     ` Mark Dierolf
  2 siblings, 0 replies; 41+ messages in thread
From: Georgi Georgiev @ 2004-10-14 15:17 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

maillog: 14/10/2004-15:49:17(+0100): Ciaran McCreesh types
> On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@3e0.com> wrote:
> | I've been watching this discussion as far as tree size, and i'm
> | suprised nobody has brought the idea of on-demand downloading yet.
> 
> Nobody has mentioned it because it has been discussed and dismissed as
> unworkable several times before.

I am not really a fan of these on-demand downloads, but I am just
brainstorming here:

Why not sync the metadata directory only? Then use the information in
there to check what packages are needed to emerge something. The
metadata is only 9MB and it has everything that's needed to calculate
the dependencies, right? It also contains the SRC_URI, so ebuild
downloads and source file downloads can go in parallel or something.

-- 
\/   Georgi Georgiev   \/ Enemies strengthen you; allies weaken. --    \/
/\    chutz@gg3.net    /\ EMPEROR ELROOD IX, Deathbed Insights         /\
\/  +81(90)6266-1163   \/                                              \/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  7:31     ` Robin H. Johnson
  2004-10-13  9:21       ` Spider
  2004-10-14 14:07       ` Ned Ludd
@ 2004-10-14 16:24       ` Luke-Jr
  2004-10-14 18:07         ` Ned Ludd
  2 siblings, 1 reply; 41+ messages in thread
From: Luke-Jr @ 2004-10-14 16:24 UTC (permalink / raw
  To: Gentoo Developers, gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

On Wednesday 13 October 2004 7:31 am, Robin H. Johnson wrote:
> portage loop file is usually on disk, when a sync is needed:
> 1. umount loop file
> 2. copy loop file to /dev/shm or other fast place
> 3. mount loop file again (from new location)
> 4. run updates to loop filesystem ('cvs up; emerge metadata' or 'emerge
> sync') 5. umount loop file, copy back to disk
> 6. mount loop file again

Since (from what I've heard) Portage's speed issues are mostly I/O, why not 
keep the mounted copy on a tmpfs and simply make an on-disk backup after 
syncing (and copy it back on rebooting)?

On Thursday 14 October 2004 2:07 pm, Ned Ludd wrote:
> New loopback size is 11M after reading this thread and dumping ChangeLog
> & metadata.xml files which does seem like a perfectly feasible thing for
> us to do. Removing leading/trailing whitespace and erroneous newlines
> yielded no noticeable gains.

11 MB of RAM may or may not seem reasonable to users depending on how often 
they do things with Portage.
-- 
Luke-Jr
Developer, Utopios
http://utopios.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 14:49   ` Ciaran McCreesh
  2004-10-14 15:17     ` Georgi Georgiev
@ 2004-10-14 16:30     ` Luke-Jr
  2004-10-14 16:41       ` Georgi Georgiev
       [not found]       ` <921ad39e04101409351dd72779@mail.gmail.com>
  2004-10-14 17:05     ` Mark Dierolf
  2 siblings, 2 replies; 41+ messages in thread
From: Luke-Jr @ 2004-10-14 16:30 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

On Thursday 14 October 2004 2:49 pm, Ciaran McCreesh wrote:
> On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@3e0.com> wrote:
> | I've been watching this discussion as far as tree size, and i'm
> | suprised nobody has brought the idea of on-demand downloading yet.
>
> Nobody has mentioned it because it has been discussed and dismissed as
> unworkable several times before.

It's quite workable. Every binary distro does it. From what I can see, Portage 
devs just don't see as much a benefit since the tree is much smaller than, 
for example, an entire copy of all binary packages Debian provides.

On Thursday 14 October 2004 3:14 pm, Patrick Lauer wrote:
> So you only have to rsync the dependency info. You save maybe 50%
> traffic, but need some ebuild servers that will be hit by millions of
> small requests for single ebuilds. No thanks.

Actually, you don't even need to sync that. Simply download the primary 
ebuild, read the dep info, download the next one, etc. Most modern versions 
of file transfer protocols (HTTP and FTP, at least; don't know about rsync) 
support multiple transfers in a single connection.
-- 
Luke-Jr
Developer, Utopios
http://utopios.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 16:30     ` Luke-Jr
@ 2004-10-14 16:41       ` Georgi Georgiev
       [not found]       ` <921ad39e04101409351dd72779@mail.gmail.com>
  1 sibling, 0 replies; 41+ messages in thread
From: Georgi Georgiev @ 2004-10-14 16:41 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

maillog: 14/10/2004-16:30:29(+0000): Luke-Jr types
<snip>
> > Nobody has mentioned it because it has been discussed and dismissed as
> > unworkable several times before.
<snip>
> Actually, you don't even need to sync that. Simply download the primary 
> ebuild, read the dep info, download the next one, etc. Most modern versions 
> of file transfer protocols (HTTP and FTP, at least; don't know about rsync) 
> support multiple transfers in a single connection.

The part where the HTTP and FTP internals get handled by portage
internally, instead of handling them to an external program like wget,
are the reason why the idea was dismissed as unworkable several times
before.

-- 
*>   Georgi Georgiev   *>  The dame was hysterical. Dames Usually      *>
<*    chutz@gg3.net    <* are. -- Calvin as Tracer Bullet              <*
*>  +81(90)6266-1163   *>                                              *>

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
       [not found]       ` <921ad39e04101409351dd72779@mail.gmail.com>
@ 2004-10-14 16:51         ` Luke-Jr
  2004-10-15  0:40           ` Ed Grimm
  0 siblings, 1 reply; 41+ messages in thread
From: Luke-Jr @ 2004-10-14 16:51 UTC (permalink / raw
  To: Roman Gaufman; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2904 bytes --]

On Thursday 14 October 2004 4:35 pm, Roman Gaufman wrote:
> On Thu, 14 Oct 2004 16:30:29 +0000, Luke-Jr <luke-jr@utopios.org> wrote:
> > On Thursday 14 October 2004 2:49 pm, Ciaran McCreesh wrote:
> > > On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@3e0.com> wrote:
> > > | I've been watching this discussion as far as tree size, and i'm
> > > | suprised nobody has brought the idea of on-demand downloading yet.
> > >
> > > Nobody has mentioned it because it has been discussed and dismissed as
> > > unworkable several times before.
> >
> > It's quite workable. Every binary distro does it. From what I can see,
> > Portage devs just don't see as much a benefit since the tree is much
> > smaller than, for example, an entire copy of all binary packages Debian
> > provides.
>
> Huh? -- name 1 binary distribution that does that? -- all of the ones
> I tried fetch a list of available packages -- which is exactly what
> the portage tree provides.

Why would they need a list of available packages? Such a list is useful *only* 
to the user. apt-get, ipkg, and urpmi are going to know the package name 
beforehand. Figuring out the version might be an issue, but nothing that 
can't be solved simply by including a PHP (in the case of HTTP fetching) to 
choose the latest version and include the name in a header.

>
> > On Thursday 14 October 2004 3:14 pm, Patrick Lauer wrote:
> > > So you only have to rsync the dependency info. You save maybe 50%
> > > traffic, but need some ebuild servers that will be hit by millions of
> > > small requests for single ebuilds. No thanks.
> >
> > Actually, you don't even need to sync that. Simply download the primary
> > ebuild, read the dep info, download the next one, etc. Most modern
> > versions of file transfer protocols (HTTP and FTP, at least; don't know
> > about rsync) support multiple transfers in a single connection.
>
> How would it know what ebuild to fetch exactly?  --- just think about
> that for a second.

ebuild doesn't deal with dependencys anyway, AFAIK. emerge would need the 
fetching functionality and could figure out the name based on (originally) 
the user's specification and (for deps) the DEPEND contents themselves. 
Portage *already* needs to know what the name of the package is anyway.

On Thursday 14 October 2004 4:41 pm, Georgi Georgiev wrote:
> The part where the HTTP and FTP internals get handled by portage
> internally, instead of handling them to an external program like wget,
> are the reason why the idea was dismissed as unworkable several times
> before.

Not really a good excuse. HTTP isn't an overly complicated protocol. Including 
the fetching functionality also has other advantages, such as one less 
program to depend on (and thus one fewer that can be broken and screw up 
Portage).
-- 
Luke-Jr
Developer, Utopios
http://utopios.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 14:49   ` Ciaran McCreesh
  2004-10-14 15:17     ` Georgi Georgiev
  2004-10-14 16:30     ` Luke-Jr
@ 2004-10-14 17:05     ` Mark Dierolf
  2 siblings, 0 replies; 41+ messages in thread
From: Mark Dierolf @ 2004-10-14 17:05 UTC (permalink / raw
  To: gentoo-dev

What's unworkable about it? IMO, it may be difficult, but not unworkable.

It may be hard to modify portage now, but it should at least be thought about 
for portage-ng or wherever things are going. Do we realistically think that 
we should put installers for every single piece of software for gentoo on 
every single users computer??? That road leads to insanity! What happens when 
portage doubles in size - again? and again? and again?

Mark Dierolf

On Thursday 14 October 2004 7:49 am, Ciaran McCreesh wrote:
> On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@3e0.com> wrote:
> | I've been watching this discussion as far as tree size, and i'm
> | suprised nobody has brought the idea of on-demand downloading yet.
>
> Nobody has mentioned it because it has been discussed and dismissed as
> unworkable several times before.

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 16:24       ` Luke-Jr
@ 2004-10-14 18:07         ` Ned Ludd
  0 siblings, 0 replies; 41+ messages in thread
From: Ned Ludd @ 2004-10-14 18:07 UTC (permalink / raw
  To: Luke-Jr; +Cc: Gentoo Developers

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On Thu, 2004-10-14 at 12:24, Luke-Jr wrote:
> On Wednesday 13 October 2004 7:31 am, Robin H. Johnson wrote:
> > portage loop file is usually on disk, when a sync is needed:
> > 1. umount loop file
> > 2. copy loop file to /dev/shm or other fast place
> > 3. mount loop file again (from new location)
> > 4. run updates to loop filesystem ('cvs up; emerge metadata' or 'emerge
> > sync') 5. umount loop file, copy back to disk
> > 6. mount loop file again
> 
> Since (from what I've heard) Portage's speed issues are mostly I/O, why not 
> keep the mounted copy on a tmpfs and simply make an on-disk backup after 
> syncing (and copy it back on rebooting)?
> 
> On Thursday 14 October 2004 2:07 pm, Ned Ludd wrote:
> > New loopback size is 11M after reading this thread and dumping ChangeLog
> > & metadata.xml files which does seem like a perfectly feasible thing for
> > us to do. Removing leading/trailing whitespace and erroneous newlines
> > yielded no noticeable gains.
> 
> 11 MB of RAM may or may not seem reasonable to users depending on how often 
> they do things with Portage.

indeed not all users are willing to leave something in resident memory
for extended periods of time. However ebuild/emerge could be extended to
mounting a portage tree vs bailing when one does not exist.

I'm simply providing info more for statistical reasons stating what can
be done possibly for those working in constrained or unique
environments.
On that note I did some more experimenting here and got the tree down to
around 2.7M just enough for an emerge system for my ARCH/KEYWORDS. I
still have a little bit of bloat leftover which is are packages that are
not a direct part of my depgraph or unneeded stuff from the PN/files/

-- 
Ned Ludd <solar@gentoo.org>
Gentoo (hardened,security,infrastructure,embedded,toolchain) Developer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-14 16:51         ` Luke-Jr
@ 2004-10-15  0:40           ` Ed Grimm
  0 siblings, 0 replies; 41+ messages in thread
From: Ed Grimm @ 2004-10-15  0:40 UTC (permalink / raw
  Cc: gentoo-dev

On Thu, 14 Oct 2004, Luke-Jr wrote:
> On Thursday 14 October 2004 4:35 pm, Roman Gaufman wrote:
>> On Thu, 14 Oct 2004 16:30:29 +0000, Luke-Jr <luke-jr@utopios.org> wrote:
>>> On Thursday 14 October 2004 2:49 pm, Ciaran McCreesh wrote:
>>>> On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@3e0.com> wrote:
>>>>| I've been watching this discussion as far as tree size, and i'm
>>>>| suprised nobody has brought the idea of on-demand downloading yet.

You should've watched closer.  I did not mention true on-demand
downloading, because of having seen in the archive the last time it was
discussed and dismissed.  I think that what I proposed would probably be
quicker than downloading the metadata, it doesn't deviate from the
concepts that have already made it into code so widely, and it hasn't
been rejected four times yet.

To reitterate, my idea was that you're probably most interested in the
packages you've already installed; so have an option to just sync
particular files, to complement the option of not syncing particular
files.

>> Huh? -- name 1 binary distribution that does that? -- all of the ones
>> I tried fetch a list of available packages -- which is exactly what
>> the portage tree provides.
>
> Why would they need a list of available packages? Such a list is
> useful *only* to the user. apt-get, ipkg, and urpmi are going to know
> the package name beforehand.

How do these programs accomplish that?  They request a list of available
packages.

>>> On Thursday 14 October 2004 3:14 pm, Patrick Lauer wrote:
>>>> So you only have to rsync the dependency info. You save maybe 50%
>>>> traffic, but need some ebuild servers that will be hit by millions
>>>> of small requests for single ebuilds. No thanks.
>>>
>>> Actually, you don't even need to sync that. Simply download the
>>> primary ebuild, read the dep info, download the next one, etc. Most
>>> modern versions of file transfer protocols (HTTP and FTP, at least;
>>> don't know about rsync) support multiple transfers in a single
>>> connection.
>>
>> How would it know what ebuild to fetch exactly?  --- just think about
>> that for a second.

The metadata files list dependancies, keywords, a description.  It would
be technically feasible to do the dependancy evaluation and ebuild
selection for the entire ebuild session just using metadata, and have a
single medium rsync connection per emerge run.  However, I couldn't code
it in Python, and I can't really explain it in English.

> ebuild doesn't deal with dependencys anyway, AFAIK. emerge would need
> the fetching functionality and could figure out the name based on
> (originally) the user's specification and (for deps) the DEPEND
> contents themselves.  Portage *already* needs to know what the name of
> the package is anyway.

ebuild files are the ultimate source of the dependancy information.  The
point on your side is that they're not the sole repository of same;
someone saw fit to export that data into cache files, so one could use
those cache files for your goal.

> On Thursday 14 October 2004 4:41 pm, Georgi Georgiev wrote:
>> the part where the http and ftp internals get handled by portage
>> internally, instead of handling them to an external program like
>> wget, are the reason why the idea was dismissed as unworkable several
>> times before.
>
> Not really a good excuse. HTTP isn't an overly complicated protocol.
> Including the fetching functionality also has other advantages, such
> as one less program to depend on (and thus one fewer that can be
> broken and screw up Portage).

I think the part that probably intimidates them is where we're
processing a particular list of stuff, and then we decide we want to get
more stuff.  This basically requires explicit threading to pull it off
properly; it also requires a mindset that can deal with threading.  As
someone with such a mindset, I can confidently say, no one writes that
kind of code without good cause.  As an example, email servers could
definitely use this type of code, but most of them, including sendmail,
do not use it.


Luke, do you have the coding ability to write the changes that would be
required to get something like this to work?  I ask, because I think
what would be needed for you to convince anyone would be a proof of
concept, which made at most one connection to a mirror.  Until you have
such a thing, the prior ideas that have been discussed (which, despite
my having found the previous discussion, I did not find, as that was
another, "this has been discussed before" dismissal) are much firmer in
their minds than anything you are presenting, and I don't think you're
going to overcome that.

In any event, I think that you and I, and anyone else interested in
having this happen should get together off the list, outside of gentoo
discussion space.  The idea is only partially formed, and none of the
devs are going to be convinced by anything less than a full plan that
addresses all of their concerns, although I think a working prototype
would be better.  (You may think your idea is complete, but it could not
be coded simply on the ideas that have been discussed on this list over
the past couple of weeks.  What we need is something thorough enough to
both build the code and demonstrate to all that it won't make the
infrastructure hurt.  By the way, the only way to do that is to prove
that it will actually reduce infrastructure load.)

Ed

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
                   ` (4 preceding siblings ...)
  2004-10-14 14:43 ` Mark Dierolf
@ 2004-10-15  0:57 ` Jason Huebel
  2004-10-15  7:09   ` George Shapovalov
  2004-10-15  7:20   ` Seemant Kulleen
  2004-10-18  9:16 ` Wolfram Schlich
  6 siblings, 2 replies; 41+ messages in thread
From: Jason Huebel @ 2004-10-15  0:57 UTC (permalink / raw
  To: gentoo-dev; +Cc: amd64, gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

On Tuesday 12 October 2004 4:37 pm, Ciaran McCreesh wrote:
> Also, I suggest we drop the amd64 keyword and just use x86 to save
> space, since we all know fine well that amd64 is just like x86 with a
> few extra bits stuck onto the end. Or rather, the start, since x86 gets
> its bytes backwards...

I'll be very succinct here... NO. Your assumption is incorrect. Period. Don't 
bring it up again.

-- 
Jason Huebel
Gentoo/amd64 Strategic Lead
Gentoo Developer Relations/Recruiter

GPG Public Key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x9BA9E230

"Do not weep; do not wax indignant. Understand."
Baruch Spinoza (1632 - 1677)

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-15  0:57 ` Jason Huebel
@ 2004-10-15  7:09   ` George Shapovalov
  2004-10-15  7:20   ` Seemant Kulleen
  1 sibling, 0 replies; 41+ messages in thread
From: George Shapovalov @ 2004-10-15  7:09 UTC (permalink / raw
  To: gentoo-dev

Come on Jason, this whole message was a prank, largely suggested by the rest 
of this thread :).

George

On Thursday 14 October 2004 17:57, Jason Huebel wrote:
> On Tuesday 12 October 2004 4:37 pm, Ciaran McCreesh wrote:
> > Also, I suggest we drop the amd64 keyword and just use x86 to save
> > space, since we all know fine well that amd64 is just like x86 with a
> > few extra bits stuck onto the end. Or rather, the start, since x86 gets
> > its bytes backwards...
>
> I'll be very succinct here... NO. Your assumption is incorrect. Period.
> Don't bring it up again.


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-15  0:57 ` Jason Huebel
  2004-10-15  7:09   ` George Shapovalov
@ 2004-10-15  7:20   ` Seemant Kulleen
  2004-10-15  7:51     ` Dylan Carlson
  1 sibling, 1 reply; 41+ messages in thread
From: Seemant Kulleen @ 2004-10-15  7:20 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

On Thu, 2004-10-14 at 17:57, Jason Huebel wrote:
> On Tuesday 12 October 2004 4:37 pm, Ciaran McCreesh wrote:
> > Also, I suggest we drop the amd64 keyword and just use x86 to save
> > space, since we all know fine well that amd64 is just like x86 with a
> > few extra bits stuck onto the end. Or rather, the start, since x86 gets
> > its bytes backwards...
> 
> I'll be very succinct here... NO. Your assumption is incorrect. Period. Don't 
> bring it up again.

Actually, the trustees, in an overwhelming 12-3 vote late yesterday
afternoon (UTC), decided that his assumption IS correct, and that
measures should be taken immediately to rectify the cruftiness of an
"amd64" keyword.  Additionally, we then pre-empt AMD's lawyers from
issuing a cease-and-desist on their copywritten letters (those being
capital a, capital m, and capital d in a specific order).

Sorry, but that's just how it is...
-- 
Seemant Kulleen
http://dev.gentoo.org/~seemant

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x3458780E
Key fingerprint = 23A9 7CB5 9BBB 4F8D 549B 6593 EDA2 65D8 3458 780E


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-15  7:20   ` Seemant Kulleen
@ 2004-10-15  7:51     ` Dylan Carlson
  2004-10-15  8:41       ` Seemant Kulleen
  0 siblings, 1 reply; 41+ messages in thread
From: Dylan Carlson @ 2004-10-15  7:51 UTC (permalink / raw
  To: gentoo-dev

On Fri October 15 2004 03:20, Seemant Kulleen wrote:
> Actually, the trustees, in an overwhelming 12-3 vote late yesterday
> afternoon (UTC), decided that his assumption IS correct, and that
> measures should be taken immediately to rectify the cruftiness of an
> "amd64" keyword.  Additionally, we then pre-empt AMD's lawyers from
> issuing a cease-and-desist on their copywritten letters (those being
> capital a, capital m, and capital d in a specific order).

ok.  So what will it be?  x86_64?  And when does this need to be changed 
by?

-- 
Dylan Carlson [absinthe@gentoo.org]
Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x708E165F


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-15  7:51     ` Dylan Carlson
@ 2004-10-15  8:41       ` Seemant Kulleen
  2004-10-19 12:17         ` Paul de Vrieze
  0 siblings, 1 reply; 41+ messages in thread
From: Seemant Kulleen @ 2004-10-15  8:41 UTC (permalink / raw
  To: absinthe; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2628 bytes --]

On Fri, 2004-10-15 at 00:51, Dylan Carlson wrote:
> On Fri October 15 2004 03:20, Seemant Kulleen wrote:
> > Actually, the trustees, in an overwhelming 12-3 vote late yesterday
> > afternoon (UTC), decided that his assumption IS correct, and that
> > measures should be taken immediately to rectify the cruftiness of an
> > "amd64" keyword.  Additionally, we then pre-empt AMD's lawyers from
> > issuing a cease-and-desist on their copywritten letters (those being
> > capital a, capital m, and capital d in a specific order).
> 
> ok.  So what will it be?  x86_64?  And when does this need to be changed 
> by?

Here, I'll just paste the relevant snippet of the trustees meeeting on
irc (please note, I have obfuscated the names of the trustees to protect
their privacy and their vote):

<kliebe*>So, this amd64 thing is a pain up the backside.  I don't think
we should support a separate profile until and unless AMD gives each and
every one of us a box for free.  Who's with me?
<M*thod> Yar!
<Z*eN> Yar
<g3boojum>I could use a new computer, sure.
<schields>Can I get a "hell yeah"
<seem*nt>Well, ok, but what do we do about the users who already use
this amd64 profile thing?  And by the way, where *is* the profile?
<*feifer>it's in ${PORTDIR}/profiles
<seem*nt>cd PORTDIR/profiles
	-bash: cd: PORTDIR/profiles: No such file or directory
<seem*nt>anyway whatever, I vote yes
<*dmw*ters>ci*aranm has a good suggestion on the -dev mailing list.
<*lieber>I'm for it, let's force the amd64 devs to just obey the x86
keywords. After all, it's the same thing really, just with more
<M*th*d>bits on the end!
<csh*elds> Can I get another "hell yeah"
<p*uldv>There are some trustees who are absent.  What do we do about
their votes.
<seem*nt>Since there's 6 missing and 6 of us, we can all vote twice. 
Like, I'll vote for one, you for another, and so on.  We'll just imagine
what the other person voted for.
<M*thod> I vote yes and no.
<Zh*N>ditto
g2b*ojum>I'll vote no and yes
<kli*ber>I'll vote yes and yes
<cshi*lds> I'll vote yes and abstain
<pa*ldv>no and abstain
<pf*ifer>yes and yes
<s*emant>yes and no

SNIP.

Note that at the end of the meeting we just voted the abstainers and the
nay-sayers OFF the board.  We've had it up to here with their dissent,
quite frankly, so no more trouble out of that lot. 

So, who's bidding the on the bridge I'm selling?
         
-- 
Seemant Kulleen
http://dev.gentoo.org/~seemant

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x3458780E
Key fingerprint = 23A9 7CB5 9BBB 4F8D 549B 6593 EDA2 65D8 3458 780E


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-13  1:14   ` Alan Frazier
@ 2004-10-17 21:45     ` Philippe Trottier
  0 siblings, 0 replies; 41+ messages in thread
From: Philippe Trottier @ 2004-10-17 21:45 UTC (permalink / raw
  To: gentoo-dev

Alan Frazier wrote:

>Luke-Jr <luke-jr@utopios.org> wrote:
>  
>
>>Ok, suggesting removal of dependency info has me convinced
>>this is a bad joke...
>>    
>>
>
>The Force is not strong with this one.
>  
>
For those with slow lines consider the snapshots ...

If I am not totally farbot...
16M @ 9.6k = 4.6 Hrs
16M @ 57.6K = 46 Min
16M @ 64K = 42Min
16M @ 128K = 21Min
16M @ 4M = 40 sec
16M @ 10M = 16 sec

I just did it in 4 seconds ... -burp-

tchiwam

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
                   ` (5 preceding siblings ...)
  2004-10-15  0:57 ` Jason Huebel
@ 2004-10-18  9:16 ` Wolfram Schlich
  6 siblings, 0 replies; 41+ messages in thread
From: Wolfram Schlich @ 2004-10-18  9:16 UTC (permalink / raw
  To: gentoo-dev

* Ciaran McCreesh <ciaranm@gentoo.org> [2004-10-12 23:46]:
> It has come to my attention that, during recent weeks, a small number of
> users have been complaining recently about the size of the rsync tree.
> My august colleagues have proposed many ingenious solutions, but
> misfortunately they are all complicated and involve a lot of manual
> work. I believe the following small changes (which can mostly be
> automated) would prove of much larger benefit to the community for a
> vastly reduced cost.
> [jokes]

Hmm, is it April 1st yet?! :>
-- 
Wolfram Schlich

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [gentoo-dev] A few modest suggestions regarding tree size
  2004-10-15  8:41       ` Seemant Kulleen
@ 2004-10-19 12:17         ` Paul de Vrieze
  0 siblings, 0 replies; 41+ messages in thread
From: Paul de Vrieze @ 2004-10-19 12:17 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 369 bytes --]

On Friday 15 October 2004 10:41, Seemant Kulleen wrote:

> Here, I'll just paste the relevant snippet of the trustees meeeting on
> irc (please note, I have obfuscated the names of the trustees to
> protect their privacy and their vote):

Good joke ;-)

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2004-10-19 12:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-12 21:37 [gentoo-dev] A few modest suggestions regarding tree size Ciaran McCreesh
2004-10-12 21:59 ` Roman Gaufman
2004-10-12 22:29   ` Jason Rhinelander
2004-10-12 22:29     ` Ciaran McCreesh
2004-10-12 21:41       ` Donnie Berkholz
2004-10-12 22:52         ` Jason Rhinelander
2004-10-12 22:58     ` Daniel Goller
2004-10-13  4:05     ` Nicholas Jones
2004-10-13  5:17       ` George Shapovalov
2004-10-13  5:49       ` Georgi Georgiev
2004-10-13  6:55         ` Robin H. Johnson
2004-10-13  6:14       ` Jason Rhinelander
2004-10-12 22:11 ` Luke-Jr
2004-10-12 22:28   ` Colin Kingsley
2004-10-13  1:14   ` Alan Frazier
2004-10-17 21:45     ` Philippe Trottier
2004-10-13  3:17   ` Ed Grimm
2004-10-12 22:31 ` Robin H. Johnson
2004-10-13  7:01   ` Spider
2004-10-13  7:31     ` Robin H. Johnson
2004-10-13  9:21       ` Spider
2004-10-14 14:07       ` Ned Ludd
2004-10-14 16:24       ` Luke-Jr
2004-10-14 18:07         ` Ned Ludd
2004-10-13  9:52 ` Paul de Vrieze
2004-10-14 14:43 ` Mark Dierolf
2004-10-14 14:49   ` Ciaran McCreesh
2004-10-14 15:17     ` Georgi Georgiev
2004-10-14 16:30     ` Luke-Jr
2004-10-14 16:41       ` Georgi Georgiev
     [not found]       ` <921ad39e04101409351dd72779@mail.gmail.com>
2004-10-14 16:51         ` Luke-Jr
2004-10-15  0:40           ` Ed Grimm
2004-10-14 17:05     ` Mark Dierolf
2004-10-14 15:14   ` Patrick Lauer
2004-10-15  0:57 ` Jason Huebel
2004-10-15  7:09   ` George Shapovalov
2004-10-15  7:20   ` Seemant Kulleen
2004-10-15  7:51     ` Dylan Carlson
2004-10-15  8:41       ` Seemant Kulleen
2004-10-19 12:17         ` Paul de Vrieze
2004-10-18  9:16 ` Wolfram Schlich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox