* [gentoo-dev] Idea for the portage maintainers @ 2004-04-11 11:55 Tom St Denis 2004-04-12 10:45 ` Alexander Gretencord 2004-04-12 11:57 ` Senor Rodgman 0 siblings, 2 replies; 23+ messages in thread From: Tom St Denis @ 2004-04-11 11:55 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think a cool function [which I didn't see in the latest portage release] is a "snapshot" and restore set of functionality. e.g. you can snapshot the current install set and later restore (by adding/removing packages) as required. I know I could have used this functionality before. Like when I tried out GNOME and decided on KDE.... I still have GNOME code lying around. There's probably a dozen other dependencies lying around from packages I tried out... Also any plans to optimize the portage files? 80k small files amounts to huge waste of space. Tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAeTIXsP+tEsHHY0ARAmLJAJsFGqB8vIEipmMdQC00DEmvp4iBbACeMQm/ oTL7WjjZbYWWMaUTiwz8WiE= =b3Fw -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-11 11:55 [gentoo-dev] Idea for the portage maintainers Tom St Denis @ 2004-04-12 10:45 ` Alexander Gretencord 2004-04-12 12:03 ` Tom St Denis 2004-04-12 11:57 ` Senor Rodgman 1 sibling, 1 reply; 23+ messages in thread From: Alexander Gretencord @ 2004-04-12 10:45 UTC (permalink / raw To: gentoo-dev On Sunday 11 April 2004 13:55, Tom St Denis wrote: > I think a cool function [which I didn't see in the latest portage release] > is a "snapshot" and restore set of functionality. e.g. you can snapshot > the current install set and later restore (by adding/removing packages) as > required. What exactly do you mean by that? What irritates me is that "by adding/removing packages". If you just want a list of all installed packages you can already get it. What I'd understand by snapshotting is really packaging a set of packages and their deps. Kind of a '-b' but after merging. > Also any plans to optimize the portage files? 80k small files amounts to > huge waste of space. Depends on your filesystem. With certain filesystems you could adjust the blocksize for /usr/portage/ to a smaller value (with a larger value for /usr/portage/distfiles of course) or use reiserfs with tail packing on. Alex -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 10:45 ` Alexander Gretencord @ 2004-04-12 12:03 ` Tom St Denis 2004-04-12 12:23 ` Georgi Georgiev 2004-04-13 12:18 ` Chris Bainbridge 0 siblings, 2 replies; 23+ messages in thread From: Tom St Denis @ 2004-04-12 12:03 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On April 12, 2004 06:45 am, Alexander Gretencord wrote: > On Sunday 11 April 2004 13:55, Tom St Denis wrote: > > I think a cool function [which I didn't see in the latest portage > > release] is a "snapshot" and restore set of functionality. e.g. you can > > snapshot the current install set and later restore (by adding/removing > > packages) as required. > > What exactly do you mean by that? What irritates me is that "by > adding/removing packages". If you just want a list of all installed > packages you can already get it. What I'd understand by snapshotting is > really packaging a set of packages and their deps. Kind of a '-b' but after > merging. Well Ideally a command that simply emits a list of installed packages is what I'm talking about. But specifically, it emits the list [and say GZIPs it at the same time] into a db of "restore points". So as another poster said I could do say emerge restore "11/04/04" If I installed something today that I didn't like. The trick is to make this painless for the user so the user won't scream in horror and go install Windows. > > Also any plans to optimize the portage files? 80k small files amounts to > > huge waste of space. > > Depends on your filesystem. With certain filesystems you could adjust the > blocksize for /usr/portage/ to a smaller value (with a larger value > for /usr/portage/distfiles of course) or use reiserfs with tail packing on. Oh, ok so I'll just format my disk, reinstall Gentoo from scratch so that I can not waste 200M of space on 80k small files. That's not really user friendly. Could have done a JAR like setup for each dir of the tree. e.g. all of app-text be one huge ZIP file [with no compression]. Such a setup might be a little slower to add/remove files but would waste less space. The idea would make a little sense though in practice. When I do "emerge sync" instead of fetching 1000s of small files I just check the timestamp on the directory zips and download them wholesale. [Ok so maybe compression makes sense here]. That way as a user I don't have to worry about inserting/deleting files from a zip [which would take a while] only the server has todo it. In fact the server could still use the "many files" approach and just zip on the fly when a user syncs. I know that would make things a bit more complicated since mirrors update via "sync" as well. I guess you could just have two types of portage trees. e.g. "packaged" and "straight" or something like that. End users would use the "packaged" type [e.g. zip per category] and mirrors would use straight [e.g. all 80k files]. Anyways, I'm just throwing out ideas here. Using the zip approach makes sense for end users. First, it makes for faster syncing [less smaller files means fewer metadata commands] and wastes much less space [and would be faster since the kernel could cache it quicker!]. Tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAeoWDsP+tEsHHY0ARAg/dAJwMV7cUdGl8GJ/qA0StH9RFwRajCACfWqL4 hv++FVTFxDrzXKCKrrN1s4Y= =wEEo -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 12:03 ` Tom St Denis @ 2004-04-12 12:23 ` Georgi Georgiev 2004-04-12 12:36 ` Tom St Denis 2004-04-13 12:18 ` Chris Bainbridge 1 sibling, 1 reply; 23+ messages in thread From: Georgi Georgiev @ 2004-04-12 12:23 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1203 bytes --] maillog: 12/04/2004-08:03:13(-0400): Tom St Denis types > Oh, ok so I'll just format my disk, reinstall Gentoo from scratch so that I > can not waste 200M of space on 80k small files. You can always use a loopback device with a filesystem of your choice. http://forums.gentoo.org/viewtopic.php?t=68215 > That's not really user friendly. Could have done a JAR like setup for each > dir of the tree. e.g. all of app-text be one huge ZIP file [with no > compression]. Such a setup might be a little slower to add/remove files but > would waste less space. > > The idea would make a little sense though in practice. When I do "emerge > sync" instead of fetching 1000s of small files I just check the timestamp on > the directory zips and download them wholesale. [Ok so maybe compression > makes sense here]. How is this method faster or in any way better than rsync? -- ( Georgi Georgiev ( I don't need to compromise my principles, ( ) chutz@gg3.net ) because they don't have the slightest ) ( +81(90)6266-1163 ( bearing on what happens to me anyway. -- ( ) ------------------- ) Calvin ) [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 12:23 ` Georgi Georgiev @ 2004-04-12 12:36 ` Tom St Denis 2004-04-12 14:18 ` N. Owen Gunden 2004-04-12 15:12 ` Troy Dack 0 siblings, 2 replies; 23+ messages in thread From: Tom St Denis @ 2004-04-12 12:36 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On April 12, 2004 08:23 am, Georgi Georgiev wrote: > maillog: 12/04/2004-08:03:13(-0400): Tom St Denis types > > > Oh, ok so I'll just format my disk, reinstall Gentoo from scratch so that > > I can not waste 200M of space on 80k small files. > > You can always use a loopback device with a filesystem of your choice. > http://forums.gentoo.org/viewtopic.php?t=68215 That's a potential solution but a bit out of the way for the user don't you think? Isn't the point of quality software to attract users simply by having merit? > > That's not really user friendly. Could have done a JAR like setup for > > each dir of the tree. e.g. all of app-text be one huge ZIP file [with no > > compression]. Such a setup might be a little slower to add/remove files > > but would waste less space. > > > > The idea would make a little sense though in practice. When I do "emerge > > sync" instead of fetching 1000s of small files I just check the timestamp > > on the directory zips and download them wholesale. [Ok so maybe > > compression makes sense here]. > > How is this method faster or in any way better than rsync? Less metadata? If I want to download 1000 files off your site I have to say +GET /file1 - -HTTP/1.1 200 OK ... data +GET /file2 - -HTTP/1.1 200 OK ... data +GET /file3 - -HTTP/1.1 200 OK ... data ETC The overhead translates on both sides too. E.g. My FS now has to find room for and enter in 1000s of files. The server side has to locate 1000s of files. Also the portage files are not compressed. If you take an entire directory [again say app-text] and compress it you save bandwidth. A quick check of app-text on my box [updated last night] gives a 850K zip file [290K tar.bz2] compared to 2.7M of raw data. Obviously sending the entire zip would be wasting more bandwidth when only small changes occur. Specifically my point isn't to use zip but to find a way to cluster files [automatically]. A loopback on the client side would work but it shouldn't be manual. Anyways I'm not trying to rock the boat here. portage for the most part does work. It's just not very scaleable [I recall a sync not taking so f'ing long to update the cache, or a "world" update being lightning fast...] Tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD4DBQFAeo0/sP+tEsHHY0ARAgGYAJ9g6vrv4rwB7vkxKTiuz0fpKTiaXwCXYeWL PSsib2TAPxz1hDwq1G6L6A== =IUg+ -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 12:36 ` Tom St Denis @ 2004-04-12 14:18 ` N. Owen Gunden 2004-04-12 15:12 ` Troy Dack 1 sibling, 0 replies; 23+ messages in thread From: N. Owen Gunden @ 2004-04-12 14:18 UTC (permalink / raw To: gentoo-dev On Mon, Apr 12, 2004 at 08:36:15AM -0400, Tom St Denis wrote: > > How is this method faster or in any way better than rsync? > > Less metadata? If I want to download 1000 files off your site I have to say > > +GET /file1 > - -HTTP/1.1 200 OK > ... data > +GET /file2 > - -HTTP/1.1 200 OK > ... data > +GET /file3 > - -HTTP/1.1 200 OK > ... data > ETC Hmm... it looks like you need to do a little research into how rsync transfers files. It's much more efficient than what you're thinking. And it certainly doesn't use HTTP. - O -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 12:36 ` Tom St Denis 2004-04-12 14:18 ` N. Owen Gunden @ 2004-04-12 15:12 ` Troy Dack 2004-04-12 15:15 ` Jason Stubbs 2004-04-12 16:22 ` Andrew Gaffney 1 sibling, 2 replies; 23+ messages in thread From: Troy Dack @ 2004-04-12 15:12 UTC (permalink / raw To: gentoo-dev On Mon, 2004-04-12 at 22:36, Tom St Denis wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On April 12, 2004 08:23 am, Georgi Georgiev wrote: > > maillog: 12/04/2004-08:03:13(-0400): Tom St Denis types > > > > > Oh, ok so I'll just format my disk, reinstall Gentoo from scratch so that > > > I can not waste 200M of space on 80k small files. > > > > You can always use a loopback device with a filesystem of your choice. > > http://forums.gentoo.org/viewtopic.php?t=68215 > > That's a potential solution but a bit out of the way for the user don't you > think? Isn't the point of quality software to attract users simply by having > merit? > > > > That's not really user friendly. Could have done a JAR like setup for > > > each dir of the tree. e.g. all of app-text be one huge ZIP file [with no > > > compression]. Such a setup might be a little slower to add/remove files > > > but would waste less space. > > > > > > The idea would make a little sense though in practice. When I do "emerge > > > sync" instead of fetching 1000s of small files I just check the timestamp > > > on the directory zips and download them wholesale. [Ok so maybe > > > compression makes sense here]. > > > > How is this method faster or in any way better than rsync? > > Less metadata? If I want to download 1000 files off your site I have to say > > +GET /file1 > - -HTTP/1.1 200 OK > ... data > +GET /file2 > - -HTTP/1.1 200 OK > ... data > +GET /file3 > - -HTTP/1.1 200 OK > ... data > ETC > > The overhead translates on both sides too. E.g. My FS now has to find room > for and enter in 1000s of files. The server side has to locate 1000s of > files. > > Also the portage files are not compressed. If you take an entire directory > [again say app-text] and compress it you save bandwidth. A quick check of > app-text on my box [updated last night] gives a 850K zip file [290K tar.bz2] > compared to 2.7M of raw data. Obviously sending the entire zip would be > wasting more bandwidth when only small changes occur. Rsync compresses the information it sends, additionally (and most importantly) rsync only sends the differences, not the whole file, so it is even more efficient. Visit rsync.samba.org and have a read about how it all works. Another point against a monolithic zip containing all the ebuilds (or even per directory zips) is the performance hit that slow machines would take, not everybody runs gentoo on a 2GHz plus machine (eg: my little PII-400 in the corner) -- Troy Dack http://linux.tkdack.com <troy@tkdack.com> http://webportage.sf.net Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x4D90BE3C Key fingerprint = 1F3D 6C15 16AA 09D5 0C96 92E5 FD89 16F9 4D90 BE3C -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 15:12 ` Troy Dack @ 2004-04-12 15:15 ` Jason Stubbs 2004-04-12 16:22 ` Andrew Gaffney 1 sibling, 0 replies; 23+ messages in thread From: Jason Stubbs @ 2004-04-12 15:15 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 13 April 2004 00:12, Troy Dack wrote: > Rsync compresses the information it sends, additionally (and most > importantly) rsync only sends the differences, not the whole file, so it > is even more efficient. Visit rsync.samba.org and have a read about how > it all works. Rsync does only send differences by default but emerge disables this behaviour simply because most of the files are about 1kb in size. The overhead CPU on both client and server is not worth the relatively small gain of doing it. Regards, Jason Stubbs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQCVAwUBQHqyo1oikN4/5jfsAQJxIAP+Mjv7Auw0r0QOU8ov7EHJ/qlmh8JqKc2Z /Bd8+hLsuGkmtrL38WRwtDiPKXashbPF+VYQnHSEO8+2n8Ewivpp84iefgJGTRzX SP5m50GunidHQ96ulS6Sffaemws0iEVy16ovnw6vesccxZ2uQcJpHtTNDHeW+fZx wOvNGt7lBik= =x1zJ -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 15:12 ` Troy Dack 2004-04-12 15:15 ` Jason Stubbs @ 2004-04-12 16:22 ` Andrew Gaffney 2004-04-12 16:23 ` Todd Berman 1 sibling, 1 reply; 23+ messages in thread From: Andrew Gaffney @ 2004-04-12 16:22 UTC (permalink / raw To: Troy Dack; +Cc: gentoo-dev Troy Dack wrote: > Another point against a monolithic zip containing all the ebuilds (or > even per directory zips) is the performance hit that slow machines would > take, not everybody runs gentoo on a 2GHz plus machine (eg: my little > PII-400 in the corner) Or my little P233 Thinkpad... -- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548 -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 16:22 ` Andrew Gaffney @ 2004-04-12 16:23 ` Todd Berman 2004-04-12 16:59 ` Andrew Gaffney 0 siblings, 1 reply; 23+ messages in thread From: Todd Berman @ 2004-04-12 16:23 UTC (permalink / raw To: Andrew Gaffney; +Cc: Troy Dack, gentoo-dev On Mon, 2004-12-04 at 11:22 -0500, Andrew Gaffney wrote: > Troy Dack wrote: > > Another point against a monolithic zip containing all the ebuilds (or > > even per directory zips) is the performance hit that slow machines would > > take, not everybody runs gentoo on a 2GHz plus machine (eg: my little > > PII-400 in the corner) > > Or my little P233 Thinkpad... > And with the current setup of writing thousands of 1K files that little p233 thinkpad really flys i bet... --Todd -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 16:23 ` Todd Berman @ 2004-04-12 16:59 ` Andrew Gaffney 2004-04-12 17:03 ` Todd Berman 2004-04-12 17:09 ` [gentoo-dev] Idea for the portage maintainers Tom St Denis 0 siblings, 2 replies; 23+ messages in thread From: Andrew Gaffney @ 2004-04-12 16:59 UTC (permalink / raw To: gentoo-dev Todd Berman wrote: > On Mon, 2004-12-04 at 11:22 -0500, Andrew Gaffney wrote: > >>Troy Dack wrote: >> >>>Another point against a monolithic zip containing all the ebuilds (or >>>even per directory zips) is the performance hit that slow machines would >>>take, not everybody runs gentoo on a 2GHz plus machine (eg: my little >>>PII-400 in the corner) >> >>Or my little P233 Thinkpad... >> > > > And with the current setup of writing thousands of 1K files that little > p233 thinkpad really flys i bet... I'm not sure if that was supposed to be sarcastic, but yes, it does fly. It only takes slightly longer to sync than my Athlon 1.3GHz desktop. The only part that takes forever is updating the portage cache. That's why I just use a NFS shared portage tree from my desktop machine now. -- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548 -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 16:59 ` Andrew Gaffney @ 2004-04-12 17:03 ` Todd Berman 2004-04-12 17:17 ` Andrew Gaffney 2004-04-13 15:39 ` [gentoo-dev] Idea for the portage maintainers - personal experiences with a .zip-db Karl Trygve Kalleberg 2004-04-12 17:09 ` [gentoo-dev] Idea for the portage maintainers Tom St Denis 1 sibling, 2 replies; 23+ messages in thread From: Todd Berman @ 2004-04-12 17:03 UTC (permalink / raw To: Andrew Gaffney; +Cc: gentoo-dev On Mon, 2004-12-04 at 11:59 -0500, Andrew Gaffney wrote: > Todd Berman wrote: > > On Mon, 2004-12-04 at 11:22 -0500, Andrew Gaffney wrote: > > > >>Troy Dack wrote: > >> > >>>Another point against a monolithic zip containing all the ebuilds (or > >>>even per directory zips) is the performance hit that slow machines would > >>>take, not everybody runs gentoo on a 2GHz plus machine (eg: my little > >>>PII-400 in the corner) > >> > >>Or my little P233 Thinkpad... > >> > > > > > > And with the current setup of writing thousands of 1K files that little > > p233 thinkpad really flys i bet... > > I'm not sure if that was supposed to be sarcastic, but yes, it does fly. It only takes > slightly longer to sync than my Athlon 1.3GHz desktop. The only part that takes forever is > updating the portage cache. That's why I just use a NFS shared portage tree from my > desktop machine now. In a way it was and in a way it wasn't. I honestly don't understand how you can explain to me that a compression-less zip file be any slower than the current setup. With compression I could understand, but without I don't think any speed difference would be noticeable. However, even with compression the operation should be fairly fast. zip is not like a tar.gz or tar.bz2, ie, you can read a file out of it and search through it fairly fast, and you don't have to uncompress the entire archive to get a single file. --Todd -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 17:03 ` Todd Berman @ 2004-04-12 17:17 ` Andrew Gaffney 2004-04-12 17:39 ` Todd Berman 2004-04-13 15:39 ` [gentoo-dev] Idea for the portage maintainers - personal experiences with a .zip-db Karl Trygve Kalleberg 1 sibling, 1 reply; 23+ messages in thread From: Andrew Gaffney @ 2004-04-12 17:17 UTC (permalink / raw To: Todd Berman; +Cc: gentoo-dev Todd Berman wrote: > On Mon, 2004-12-04 at 11:59 -0500, Andrew Gaffney wrote: > >>Todd Berman wrote: >> >>>On Mon, 2004-12-04 at 11:22 -0500, Andrew Gaffney wrote: >>> >>> >>>>Troy Dack wrote: >>>> >>>> >>>>>Another point against a monolithic zip containing all the ebuilds (or >>>>>even per directory zips) is the performance hit that slow machines would >>>>>take, not everybody runs gentoo on a 2GHz plus machine (eg: my little >>>>>PII-400 in the corner) >>>> >>>>Or my little P233 Thinkpad... >>>> >>> >>> >>>And with the current setup of writing thousands of 1K files that little >>>p233 thinkpad really flys i bet... >> >>I'm not sure if that was supposed to be sarcastic, but yes, it does fly. It only takes >>slightly longer to sync than my Athlon 1.3GHz desktop. The only part that takes forever is >>updating the portage cache. That's why I just use a NFS shared portage tree from my >>desktop machine now. > > In a way it was and in a way it wasn't. I honestly don't understand how > you can explain to me that a compression-less zip file be any slower > than the current setup. > > With compression I could understand, but without I don't think any speed > difference would be noticeable. However, even with compression the > operation should be fairly fast. zip is not like a tar.gz or tar.bz2, > ie, you can read a file out of it and search through it fairly fast, and > you don't have to uncompress the entire archive to get a single file. Even without compression, there is still a little bit of overhead with having ebuilds and such contained in ZIP files. The overhead comes into play both during sync and emerge. It wouldn't be noticable on a fast machine (e.g. my Athlon 1.3GHz desktop) but it would make an operation like 'emerge -uDpv world' take quite a bit longer on my Thinkpad. As someone else pointed out, the ZIP files wouldn't make syncs faster either. Rsync transfers only the differences in files and not the entire app-text directory or whatever category you're working with. As another person pointed out, it would be more difficult to modify ebuilds in the tree by hand when they're contained in ZIP files. It was a good idea, but I don't think it is practical. -- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548 -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 17:17 ` Andrew Gaffney @ 2004-04-12 17:39 ` Todd Berman 2004-04-13 1:04 ` Jason Stubbs 0 siblings, 1 reply; 23+ messages in thread From: Todd Berman @ 2004-04-12 17:39 UTC (permalink / raw To: Andrew Gaffney; +Cc: gentoo-dev On Mon, 2004-12-04 at 12:17 -0500, Andrew Gaffney wrote: > Todd Berman wrote: > > On Mon, 2004-12-04 at 11:59 -0500, Andrew Gaffney wrote: > > > >>Todd Berman wrote: > >> > >>>On Mon, 2004-12-04 at 11:22 -0500, Andrew Gaffney wrote: > >>> > >>> > >>>>Troy Dack wrote: > >>>> > >>>> > >>>>>Another point against a monolithic zip containing all the ebuilds (or > >>>>>even per directory zips) is the performance hit that slow machines would > >>>>>take, not everybody runs gentoo on a 2GHz plus machine (eg: my little > >>>>>PII-400 in the corner) > >>>> > >>>>Or my little P233 Thinkpad... > >>>> > >>> > >>> > >>>And with the current setup of writing thousands of 1K files that little > >>>p233 thinkpad really flys i bet... > >> > >>I'm not sure if that was supposed to be sarcastic, but yes, it does fly. It only takes > >>slightly longer to sync than my Athlon 1.3GHz desktop. The only part that takes forever is > >>updating the portage cache. That's why I just use a NFS shared portage tree from my > >>desktop machine now. > > > > In a way it was and in a way it wasn't. I honestly don't understand how > > you can explain to me that a compression-less zip file be any slower > > than the current setup. > > > > With compression I could understand, but without I don't think any speed > > difference would be noticeable. However, even with compression the > > operation should be fairly fast. zip is not like a tar.gz or tar.bz2, > > ie, you can read a file out of it and search through it fairly fast, and > > you don't have to uncompress the entire archive to get a single file. > > Even without compression, there is still a little bit of overhead with having ebuilds and > such contained in ZIP files. The overhead comes into play both during sync and emerge. It > wouldn't be noticable on a fast machine (e.g. my Athlon 1.3GHz desktop) but it would make > an operation like 'emerge -uDpv world' take quite a bit longer on my Thinkpad. > This is going to be a slow operation on that thinkpad regardless. > As someone else pointed out, the ZIP files wouldn't make syncs faster either. Rsync > transfers only the differences in files and not the entire app-text directory or whatever > category you're working with. > Portage does not use this as it takes more cpu server and client side to figure out what changed with 80 thousand 1K files than it does to just transfer the files across. > As another person pointed out, it would be more difficult to modify ebuilds in the tree by > hand when they're contained in ZIP files. > This could easily be dealt with. the cvs->rsync process could create zip files, and portage would use the zip stuff by default for /usr/portage/ and use the old directory structure code for your overlays. > It was a good idea, but I don't think it is practical. It is absolutely practical, and I think would cut down on the time taken to sync (because now you could turn compression back on, and actually transfer just changes because you wouldnt be syncing 80 thousand files). It might potentially have small speed loses when running emerge -vuDp world, but as that process takes a long time regardless, I don't think it would be noticeable. The time it takes to read a single file out of an uncompressed zip is not noticeably longer than the time it takes to read a single file off the hard drive. Note, as of writing this the portage tree takes up 333MB of space on my hard drive and a compression-less zipfile of the tree takes up 74MB. To me, the potential savings are incredible. --Todd -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 17:39 ` Todd Berman @ 2004-04-13 1:04 ` Jason Stubbs 2004-04-13 3:35 ` Todd Berman 0 siblings, 1 reply; 23+ messages in thread From: Jason Stubbs @ 2004-04-13 1:04 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 13 April 2004 02:39, Todd Berman wrote: > I think would cut down on the time taken to sync (because now you could turn > compression back on, and actually transfer just changes because you wouldnt > be syncing 80 thousand files). If this is referring to my other comment, I never said that compression is disabled. It is not and is definately a real benefit. I find that the data is compressed at approximately 5:1 on average. Regards, Jason Stubbs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQCVAwUBQHs8rFoikN4/5jfsAQJ5zQP/dpSBTGEBycUp6/dwLLO3OHwhkSLrNHAs yoTcy2u4ZEb9yPTaqc0iwMd00PYYFuBGnsbDmScke1gVA7qewUWCBLTYae7fk7Zm 79Lgk8n4ZAWmw5sVYJVfkzT81o7m56pTv2G4X45eYf8wvRQbTtsY6/1WCcpaaDSC V8N/Npw8uy0= =t2gz -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-13 1:04 ` Jason Stubbs @ 2004-04-13 3:35 ` Todd Berman 0 siblings, 0 replies; 23+ messages in thread From: Todd Berman @ 2004-04-13 3:35 UTC (permalink / raw To: Jason Stubbs; +Cc: gentoo-dev On Tue, 2004-13-04 at 10:04 +0900, Jason Stubbs wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Tuesday 13 April 2004 02:39, Todd Berman wrote: > > I think would cut down on the time taken to sync (because now you could turn > > compression back on, and actually transfer just changes because you wouldnt > > be syncing 80 thousand files). > > If this is referring to my other comment, I never said that compression is > disabled. It is not and is definately a real benefit. I find that the data is > compressed at approximately 5:1 on average. > > Regards, > Jason Stubbs Yeah, sorry, differences is what i mean. --Todd > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > > iQCVAwUBQHs8rFoikN4/5jfsAQJ5zQP/dpSBTGEBycUp6/dwLLO3OHwhkSLrNHAs > yoTcy2u4ZEb9yPTaqc0iwMd00PYYFuBGnsbDmScke1gVA7qewUWCBLTYae7fk7Zm > 79Lgk8n4ZAWmw5sVYJVfkzT81o7m56pTv2G4X45eYf8wvRQbTtsY6/1WCcpaaDSC > V8N/Npw8uy0= > =t2gz > -----END PGP SIGNATURE----- > > -- > gentoo-dev@gentoo.org mailing list > -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers - personal experiences with a .zip-db 2004-04-12 17:03 ` Todd Berman 2004-04-12 17:17 ` Andrew Gaffney @ 2004-04-13 15:39 ` Karl Trygve Kalleberg 1 sibling, 0 replies; 23+ messages in thread From: Karl Trygve Kalleberg @ 2004-04-13 15:39 UTC (permalink / raw To: Todd Berman; +Cc: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 2244 bytes --] On Mon, Apr 12, 2004 at 01:03:18PM -0400, Todd Berman wrote: > > With compression I could understand, but without I don't think any speed > difference would be noticeable. However, even with compression the > operation should be fairly fast. zip is not like a tar.gz or tar.bz2, > ie, you can read a file out of it and search through it fairly fast, and > you don't have to uncompress the entire archive to get a single file. About 1.5 years ago I hacked portage to store the /usr/portage (portdir) in one .zip-file. At the time, du -h reported ~116MB for the portdir tree. Zip, compressed with -9, resulted in a 16MB file. I did two prinicpal tests 1) emerge --emptytree -up world with /var/cache/edb/dep and 2) emerge --emptytree -up world without /var/cache/edb/dep I also did a few emerges of selected packages. All testing was done on a 100MHz K5 with 96MBs of RAM. There was no noticeable speed drop in case (1) compared to a non-zipped version, most probably thanks to the cache. Case (2) was of course noticeably slower. In some cases, the >100MB space gain my very well be worth it, on firewalls or older systems (where we want to use a distcc). Actual emerges were not a problem either. Uncompressing the relevant ebuild file and its auxiliary files (the contents in files/) is too quick to be measurable, even on this very slow test box, since .zip-files are indexed. I didn't get so far as to implement an alternative sync mechanism; obviously rsync isn't ideal for this sort of stuff, perhaps xdeltas would be better? I did have a scheme in mind however, based on regular incremental backups, that would insure only downloading the ebuilds that have changed since last time (and the new ones), called a changeset. Each changeset would be compressed in a .zip-file and served statically over http, thus allowing one server to serve thousands instead of tens of users at any one time (rsync requires a lot of server cpu power compared to static file serving over http). However, as this work was geared against gentoo-embedded, which is not a bit dormant, and I haven't had any time to keep it current against portage, it never fell into use:) Kind regards, Karl T [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 16:59 ` Andrew Gaffney 2004-04-12 17:03 ` Todd Berman @ 2004-04-12 17:09 ` Tom St Denis 2004-04-12 17:19 ` Norberto Bensa 1 sibling, 1 reply; 23+ messages in thread From: Tom St Denis @ 2004-04-12 17:09 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On April 12, 2004 12:59 pm, Andrew Gaffney wrote: > Todd Berman wrote: > > On Mon, 2004-12-04 at 11:22 -0500, Andrew Gaffney wrote: > >>Troy Dack wrote: > >>>Another point against a monolithic zip containing all the ebuilds (or > >>>even per directory zips) is the performance hit that slow machines would > >>>take, not everybody runs gentoo on a 2GHz plus machine (eg: my little > >>>PII-400 in the corner) > >> > >>Or my little P233 Thinkpad... > > > > And with the current setup of writing thousands of 1K files that little > > p233 thinkpad really flys i bet... > > I'm not sure if that was supposed to be sarcastic, but yes, it does fly. It > only takes slightly longer to sync than my Athlon 1.3GHz desktop. The only > part that takes forever is updating the portage cache. That's why I just > use a NFS shared portage tree from my desktop machine now. That isn't a solution though. "Gentoo, the distributed home computer operating system. " "Gentoo, takes 2379MIPS to sync within a week." etc... I mean if it's slow now what happens in [say] 2008 when portage has 160k files or something... I don't know exactly how portage works but from when I started using gentoo [~60k files in portage] to now [~80k files in portage] it's definitely not just a linear amount slower. While I'm at it what of security? I mean how do I know the files on the mirror I get them from are from the CVS? Tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAes1FsP+tEsHHY0ARApG8AKCQhFU2mkTIHo8LIB8pkZzINmRQaACcDuC9 c8XWfm8lMvKVXobONXSoWLE= =D9Cy -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 17:09 ` [gentoo-dev] Idea for the portage maintainers Tom St Denis @ 2004-04-12 17:19 ` Norberto Bensa 2004-04-12 17:21 ` Tom St Denis 0 siblings, 1 reply; 23+ messages in thread From: Norberto Bensa @ 2004-04-12 17:19 UTC (permalink / raw To: gentoo-dev Tom St Denis wrote: > I started using gentoo [~60k files in portage] to now [~80k files in > portage] it's definitely not just a linear amount slower. rsync isn't slow. "emerge --sync" is slow in the portage cache updates stage (I really want to know what portage cache update does :-/ ) Regards, Norberto -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 17:19 ` Norberto Bensa @ 2004-04-12 17:21 ` Tom St Denis 0 siblings, 0 replies; 23+ messages in thread From: Tom St Denis @ 2004-04-12 17:21 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On April 12, 2004 01:19 pm, Norberto Bensa wrote: > Tom St Denis wrote: > > I started using gentoo [~60k files in portage] to now [~80k files in > > portage] it's definitely not just a linear amount slower. > > rsync isn't slow. "emerge --sync" is slow in the portage cache updates > stage (I really want to know what portage cache update does :-/ ) Syncs can be slow too. Sometimes they zoom but sometimes it will say things like 200 files ...... and twirl..... and twirl......and twirl...... Point is portage is useable and works decent. Updating is a bitch and occasionally a "world" update takes way too long [to get the package list]. It's a manner of being scaleable. Portage isn't. Tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAetA0sP+tEsHHY0ARAuJIAJ9jfAozHPtz2o49wiNQdni1FYU5FgCZAefS +qb63BEv19W73J6HN6v06WA= =CRvw -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-12 12:03 ` Tom St Denis 2004-04-12 12:23 ` Georgi Georgiev @ 2004-04-13 12:18 ` Chris Bainbridge 2004-04-13 16:12 ` Chris Bainbridge 1 sibling, 1 reply; 23+ messages in thread From: Chris Bainbridge @ 2004-04-13 12:18 UTC (permalink / raw To: gentoo-dev On Monday 12 April 2004 13:03, Tom St Denis wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Oh, ok so I'll just format my disk, reinstall Gentoo from scratch so that I > can not waste 200M of space on 80k small files. > > That's not really user friendly. Could have done a JAR like setup for each > dir of the tree. e.g. all of app-text be one huge ZIP file [with no > compression]. Such a setup might be a little slower to add/remove files > but would waste less space. > > The idea would make a little sense though in practice. When I do "emerge > sync" instead of fetching 1000s of small files I just check the timestamp > on the directory zips and download them wholesale. [Ok so maybe > compression makes sense here]. I once posted a test of using rsync with a single large file versus lots of small ones on the gentoo forums (http://forums.gentoo.org/viewtopic.php?t=10108). I was more concerned with bandwidth than speed, but I guess they're connected. Using one big file gave a 35% improvement over lots of small files. The problem then was how to generate one big file for syncing. In the end I got bored and decided it was rsyncs problem for not being as efficient with the small files ;-) The test of rsyncing uncompressed distfile sources rather than downloading them gave an 88% improvement in bandwidth. Problem here is that we would need lots of disk space for the uncompressed files.. so its a tradeoff, if your bandwidth is limited or expensive and computation and disk space are cheap (eg. dialup), then it seems like a good idea. -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-13 12:18 ` Chris Bainbridge @ 2004-04-13 16:12 ` Chris Bainbridge 0 siblings, 0 replies; 23+ messages in thread From: Chris Bainbridge @ 2004-04-13 16:12 UTC (permalink / raw To: gentoo-dev On Tuesday 13 April 2004 13:18, Chris Bainbridge wrote: > The test of rsyncing uncompressed distfile sources rather than downloading > them gave an 88% improvement in bandwidth. I should add that rsyncing from an old distfiles tarball to the new one used 323k downstream + 58k up, when downloading the new bzip2 file would've used about 2.6MB downstream. So uncompressed tar+rsync is over 8x more bandwidth efficient. -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-dev] Idea for the portage maintainers 2004-04-11 11:55 [gentoo-dev] Idea for the portage maintainers Tom St Denis 2004-04-12 10:45 ` Alexander Gretencord @ 2004-04-12 11:57 ` Senor Rodgman 1 sibling, 0 replies; 23+ messages in thread From: Senor Rodgman @ 2004-04-12 11:57 UTC (permalink / raw To: gentoo-dev Hi, I implemented this some time ago. Have a look at bug #40127 (http://bugs.gentoo.org/show_bug.cgi?id=40127). It lets you do stuff like: emerge --restore 10/03/2003-18:00 Unfortunately what with the portage rewrite it doesn't look like this will get merged into mainstream portage. Have a look, it might be what you want. dave On Sun, 11 Apr 2004, Tom St Denis wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I think a cool function [which I didn't see in the latest portage release] is > a "snapshot" and restore set of functionality. e.g. you can snapshot the > current install set and later restore (by adding/removing packages) as > required. > > I know I could have used this functionality before. Like when I tried out > GNOME and decided on KDE.... I still have GNOME code lying around. There's > probably a dozen other dependencies lying around from packages I tried out... > > Also any plans to optimize the portage files? 80k small files amounts to huge > waste of space. > > Tom > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > > iD8DBQFAeTIXsP+tEsHHY0ARAmLJAJsFGqB8vIEipmMdQC00DEmvp4iBbACeMQm/ > oTL7WjjZbYWWMaUTiwz8WiE= > =b3Fw > -----END PGP SIGNATURE----- > > -- > gentoo-dev@gentoo.org mailing list > > -- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2004-04-13 16:12 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-11 11:55 [gentoo-dev] Idea for the portage maintainers Tom St Denis 2004-04-12 10:45 ` Alexander Gretencord 2004-04-12 12:03 ` Tom St Denis 2004-04-12 12:23 ` Georgi Georgiev 2004-04-12 12:36 ` Tom St Denis 2004-04-12 14:18 ` N. Owen Gunden 2004-04-12 15:12 ` Troy Dack 2004-04-12 15:15 ` Jason Stubbs 2004-04-12 16:22 ` Andrew Gaffney 2004-04-12 16:23 ` Todd Berman 2004-04-12 16:59 ` Andrew Gaffney 2004-04-12 17:03 ` Todd Berman 2004-04-12 17:17 ` Andrew Gaffney 2004-04-12 17:39 ` Todd Berman 2004-04-13 1:04 ` Jason Stubbs 2004-04-13 3:35 ` Todd Berman 2004-04-13 15:39 ` [gentoo-dev] Idea for the portage maintainers - personal experiences with a .zip-db Karl Trygve Kalleberg 2004-04-12 17:09 ` [gentoo-dev] Idea for the portage maintainers Tom St Denis 2004-04-12 17:19 ` Norberto Bensa 2004-04-12 17:21 ` Tom St Denis 2004-04-13 12:18 ` Chris Bainbridge 2004-04-13 16:12 ` Chris Bainbridge 2004-04-12 11:57 ` Senor Rodgman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox