public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] (FS) Attributes for Ebuilds?
@ 2003-06-05  6:47 Michael Kohl
  2003-06-05  7:02 ` Joseph Hardin
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Michael Kohl @ 2003-06-05  6:47 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

Hi all!

Following all the recent discussion about categories in the Portage
tree, having packages in several categories at once, defining key words
for packages to ease finding a similar package an idea came to my mind.

Would it be possible to use filesystem attributes for Ebuilds (of course
only if the FS supports this, maybe a local useflag can do the trick)?
This would allow users to build categories "on the fly" using a kind of
live query mechanism. 

People familiar with BeFS most probably know what I'm talking about, for
anyone else just a little info:

This would allow to store metadata in text form for each ebuild as a
filesystem attribute. Therefore your filesystem kind of acts like a
database. Using this mechanism you also could add your own attributes
(e.g. "try_this" for ebuilds you're interested in testing sometime) and
then list all ebuilds having this attribute. 

Also the setup part of an Ebuild could set an attribute like "installed"
in pkg_postinst, so it would be even easier to find all the packages
installed on your system. Using live queries (e.g. in a nice GUI) this
list would change immediately after you emerged a new package. Also
finding applications similar to each other would be quite easy, as you
can store quite a lot of metadata (e.g. mp3, ogg, media, player, etc.
for the xmms ebuild). Sure this could be done in various other ways, but
using FS attributes just sounds like a good way of doing it.

Comments (especially about the various FS and their usefullnes for this
purpose), ideas, thoughts anyone?

Michael 

P.S. Sorry, the thoughts in this mail aren't all that well organized or
explained, I'm not feeling to good today...

-- 
www.cargal.org 
GnuPG-key-ID: 0x90CA09E3
Jabber-ID: citizen428 [at] cargal [dot] org
Registered Linux User #278726

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  6:47 [gentoo-dev] (FS) Attributes for Ebuilds? Michael Kohl
@ 2003-06-05  7:02 ` Joseph Hardin
  2003-06-05  7:21   ` Michael Kohl
  2003-06-05  9:39 ` Georgi Georgiev
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Joseph Hardin @ 2003-06-05  7:02 UTC (permalink / raw
  To: Michael Kohl; +Cc: gentoo-dev

Why do this at a filesystem level? I may be missing the point, but why 
not just incorporate this into a seperate tool and keep a file to store 
all the comments and data. Or just include it as a comment in the 
.ebuilds and build a tool to search through these comment strings and 
write its own as u add categories?

								Joe Hardin

On Thursday, Jun 5, 2003, at 00:47 America/Denver, Michael Kohl wrote:

> Hi all!
>
> Following all the recent discussion about categories in the Portage
> tree, having packages in several categories at once, defining key words
> for packages to ease finding a similar package an idea came to my mind.
>
> Would it be possible to use filesystem attributes for Ebuilds (of 
> course
> only if the FS supports this, maybe a local useflag can do the trick)?
> This would allow users to build categories "on the fly" using a kind of
> live query mechanism.
>
> People familiar with BeFS most probably know what I'm talking about, 
> for
> anyone else just a little info:
>
> This would allow to store metadata in text form for each ebuild as a
> filesystem attribute. Therefore your filesystem kind of acts like a
> database. Using this mechanism you also could add your own attributes
> (e.g. "try_this" for ebuilds you're interested in testing sometime) and
> then list all ebuilds having this attribute.
>
> Also the setup part of an Ebuild could set an attribute like 
> "installed"
> in pkg_postinst, so it would be even easier to find all the packages
> installed on your system. Using live queries (e.g. in a nice GUI) this
> list would change immediately after you emerged a new package. Also
> finding applications similar to each other would be quite easy, as you
> can store quite a lot of metadata (e.g. mp3, ogg, media, player, etc.
> for the xmms ebuild). Sure this could be done in various other ways, 
> but
> using FS attributes just sounds like a good way of doing it.
>
> Comments (especially about the various FS and their usefullnes for this
> purpose), ideas, thoughts anyone?
>
> Michael
>
> P.S. Sorry, the thoughts in this mail aren't all that well organized or
> explained, I'm not feeling to good today...
>
> -- 
> www.cargal.org
> GnuPG-key-ID: 0x90CA09E3
> Jabber-ID: citizen428 [at] cargal [dot] org
> Registered Linux User #278726
> <mime-attachment>


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  7:02 ` Joseph Hardin
@ 2003-06-05  7:21   ` Michael Kohl
  2003-06-05  7:52     ` George Shapovalov
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Michael Kohl @ 2003-06-05  7:21 UTC (permalink / raw
  To: Joseph Hardin; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

On Thu, 5 Jun 2003 01:02:35 -0600
Joseph Hardin <jhlazer@charter.net> wrote:

> Why do this at a filesystem level? I may be missing the point, but why
> not just incorporate this into a seperate tool and keep a file to
> store all the comments and data.

Because:

a. it should be faster (at least judging from my admittedly limited
experience with this matter)
b. you wouldn't have to build a new tool, filesystems
capable of handling these kind of attributes have them already
c. live queries are pretty nifty and maybe much harder to
incorporate with files (in a live query you show all files having
a specific atrribute set, when a new file with this attribute is created
it immediately shows up in this selection). If you find a BeOS zealot
I'm sure he can explain all this much better to you, because
that's one of the things quite a lot of people seemed to like about
BeFS. 
d. all this benefits without having to force a database as a dependancy
on Gentoo users.

Also note that I didn't propose or request this, I was just interested
in some feedback and discussion if/why this is a good/bad approach in
handling this category issue (and others, like if the name of a package
changes you maybe could keep the old name as an attributes). I just
think that Portage is hell of a package managment system and think
discussion about how to further improve it (even my suggestion
may not even be an improvement, but let the people who know
Portage much better than I do clarify this) couldn't hurt.

Michael

P.S. I'm CC'ing this to the list, I hope it's ok for you, but I don't
want to answer a similar question again.

-- 
www.cargal.org 
GnuPG-key-ID: 0x90CA09E3
Jabber-ID: citizen428 [at] cargal [dot] org
Registered Linux User #278726

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  7:21   ` Michael Kohl
@ 2003-06-05  7:52     ` George Shapovalov
  2003-06-05 10:17       ` Michael Kohl
  2003-06-05  8:00     ` Daniel Armyr
  2003-06-05  9:17     ` Evan Powers
  2 siblings, 1 reply; 18+ messages in thread
From: George Shapovalov @ 2003-06-05  7:52 UTC (permalink / raw
  To: gentoo-dev

A nice idea it is, however this will basically make portage *require* to have 
the tree reside on a filesystem that supports ACL's (I suppose you meant this 
by fs attributes? Otherwise please be more specific). Even forcing allocation 
of a separate partition to keep portage tree in some cases. This makes it, 
um, problematic to say the least..

The following point that you mention might offset the "downside":
> d. all this benefits without having to force a database as a dependancy
> on Gentoo users.
however I am not so sure. ACL's provide one with the means to store this 
"meta" information, however we also need a processing capability. Thus I am 
not sure that the requirement for db dependency is really eliminated - either 
portage will depend on db processing engine or it will reimplement the wheel 
once again :).


> Also note that I didn't propose or request this, I was just interested
> in some feedback and discussion if/why this is a good/bad approach in
> handling this category issue (and others, like if the name of a package
> changes you maybe could keep the old name as an attributes). I just
> think that Portage is hell of a package managment system and think
> discussion about how to further improve it (even my suggestion
> may not even be an improvement, but let the people who know
> Portage much better than I do clarify this) couldn't hurt.
Yup, its a nice try nontheless, and might be worth it further down the 
timeline, when say ACL's get universally accepted. However right now I am 
afraid this might be a showstopper :( 
(we need to think about the whole varietty of platforms we already support or 
plan supporting).
Well, this is just my understanding of the situation anyway and if anybody 
thinks otherwise (like the requirement isn't that gross and can be tolerated 
for the most part...) you are certainly welcome to contribute to discussion..

George




--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  7:21   ` Michael Kohl
  2003-06-05  7:52     ` George Shapovalov
@ 2003-06-05  8:00     ` Daniel Armyr
  2003-06-05  9:50       ` Marko Mikulicic
  2003-06-05  9:17     ` Evan Powers
  2 siblings, 1 reply; 18+ messages in thread
From: Daniel Armyr @ 2003-06-05  8:00 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> b. you wouldn't have to build a new tool, filesystems
> capable of handling these kind of attributes have them already

Do Reiser/ext2/ext3 etc support this? If not, I assume this would mean one needs a separade partition for /usr/portage, no? Althought using allready available tools is a good thing, I feel needing yet another partition complicates things, as well allowing less HD space to be used.

My 2 CFLAGS
- --Daniel Armyr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+3vikhxtTUWLs2lERAikKAKCeTaB7n5UBCMnUQWtEsAvBJGgf9gCfQsUc
lWILmNIHkbl3VkkuNKMQdnc=
=3eJw
-----END PGP SIGNATURE-----

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  7:21   ` Michael Kohl
  2003-06-05  7:52     ` George Shapovalov
  2003-06-05  8:00     ` Daniel Armyr
@ 2003-06-05  9:17     ` Evan Powers
  2003-06-05 12:23       ` Michael Kohl
  2 siblings, 1 reply; 18+ messages in thread
From: Evan Powers @ 2003-06-05  9:17 UTC (permalink / raw
  To: gentoo-dev

On Thursday 05 June 2003 03:21 am, Michael Kohl wrote:
> Joseph Hardin <jhlazer@charter.net> wrote:
> > Why do this at a filesystem level? I may be missing the point, but why
> > not just incorporate this into a seperate tool and keep a file to
> > store all the comments and data.
>
> Because:
...

If you haven't already, you should read Hans Reiser's Future Visions document, 
here:

http://www.namesys.com/whitepaper.html

The summary is, in brief, that ReiserFS 6 (current version is 4) is going to 
really kick ass. From what I understand it has a superset of BeFS's 
functionality.

You'll be able to do things like (syntax isn't actually decided yet):

bash$ ls '[subject/[illegal strike] to/elves from/santa ultimatum]'

Which would list any files you have which have a subject attribute containing 
sub-attributes illegal and strike, the attribute ultimatum, etc. (Your email 
program and the filesystem would each automatically generate pieces of these 
attribute trees.) Technically they aren't "attributes", but instead part of 
the filename itself--it's just that filenames would now have two operators, 
'/' (ordering) and '[ ... ]' (grouping), instead of only one, '/'.

Regardless, while I very much like your idea on principle, I do think it's an 
idea whose time has not yet come. That doesn't mean we can't think about how 
to do it when it's time does come, however. ;-)

Evan


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  6:47 [gentoo-dev] (FS) Attributes for Ebuilds? Michael Kohl
  2003-06-05  7:02 ` Joseph Hardin
@ 2003-06-05  9:39 ` Georgi Georgiev
  2003-06-05 14:43 ` Marius Mauch
  2003-06-09 21:40 ` [gentoo-dev] " ross girshick
  3 siblings, 0 replies; 18+ messages in thread
From: Georgi Georgiev @ 2003-06-05  9:39 UTC (permalink / raw
  To: Michael Kohl; +Cc: gentoo-dev

On 05/06/2003 at 14:47:33(+0000), Michael Kohl used 2.1Kbytes, just to say:
> Hi all!
> 
> Following all the recent discussion about categories in the Portage
> tree, having packages in several categories at once, defining key words
> for packages to ease finding a similar package an idea came to my mind.
> 
> Would it be possible to use filesystem attributes for Ebuilds (of course
> only if the FS supports this, maybe a local useflag can do the trick)?
> This would allow users to build categories "on the fly" using a kind of
> live query mechanism. 

What happens when one rsyncs?

Do you define attributes for each .ebuild or maybe on a per package basis (i.e.
the directory). Ebuilds change a little too often after all.

-- 
 /^^^^^^^^^^^^^^^^^^^^^^^^^^^\/^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\
/ Georgi Georgiev    (-<     / Memory fault --                  \
\ chutz@chubaka.net  /\   .o)\ core...uh...um...core... Oh      /
/ +81(90)6266-1163  V_/_ |(/)/ dammit, I forget!                \
\___________________________/\__________________________________/

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  8:00     ` Daniel Armyr
@ 2003-06-05  9:50       ` Marko Mikulicic
  0 siblings, 0 replies; 18+ messages in thread
From: Marko Mikulicic @ 2003-06-05  9:50 UTC (permalink / raw
  To: Daniel Armyr; +Cc: gentoo-dev


On Giovedì, giugno 5, 2003, at 10:00 AM, Daniel Armyr wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>> b. you wouldn't have to build a new tool, filesystems
>> capable of handling these kind of attributes have them already
>
> Do Reiser/ext2/ext3 etc support this? If not, I assume this would mean 
> one needs a separade partition for /usr/portage, no? Althought using 
> allready available tools is a good thing, I feel needing yet another 
> partition complicates things, as well allowing less HD space to be 
> used.
>
1) there is support for ext2/ext3 extended attributes, but is currently 
available only as kernel patches.
2) you don't need to have another partition with (say) XFS, you could 
create a loopback filesystem
for /usr/portage (xfs_growfs can grow filesystem online).
3) it's complicated. that's guaranteed.

 From a theorical point of view I welcome any innovation in the 
filesystem.
Extended attributes and posix ACLs are just a little step, if you 
compare it with the features
which the OpenVMS filesystem has, but however we live in a UNIX world 
and
that means tar, ftp, nfs, scp .... all these tools would simply drop 
the extended attributes.
  (star can save ACLs but not generic extended attributes)
  The Apple approach to UNIX has made me think about it; they came from 
an attributed
filesystem which also allowed multiple data streams (forks) and 
switched back to a standard UNIX flat fs.
  The structured files the once had are now built with directories and 
the highest level of UI hides this
trick and treats those "special" directories as files. This approach is 
not very elegant, expecially when
you use command-line tools that does not obey to this abstraction, but 
is compatible with all file transfer
and publishing methods.
  I don't know what is best. Today you have to cope with a long history
and inertia has it's own inertia.

Marko



--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  7:52     ` George Shapovalov
@ 2003-06-05 10:17       ` Michael Kohl
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kohl @ 2003-06-05 10:17 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

On Thu, 5 Jun 2003 00:52:34 -0700
George Shapovalov <george@gentoo.org> wrote:

> A nice idea it is, however this will basically make portage *require*
> to have the tree reside on a filesystem that supports ACL's 

True, but IIRC gentoo-sources use the patches for ext2/3, right? That's
what I meant by making this whole thing optional via a local useflag, so
just people who use a capable filesystem have to use it.

> however I am not so sure. ACL's provide one with the means to store
> this "meta" information, however we also need a processing capability.
> Thus I am not sure that the requirement for db dependency is really
> eliminated - either portage will depend on db processing engine or it
> will reimplement the wheel once again :).

Ok, this one was just a guess ;-) I assumed that filesystems which
support ACL's/extended attributes/whatever have the tools to deal with
them included...

> Yup, its a nice try nontheless, and might be worth it further down the
> timeline, when say ACL's get universally accepted. However right now I
> am afraid this might be a showstopper :( 

Thanks, that was actually the kind of answer I expected. As I already
said, I just wanted to bring up this idea and see how people like it. 

Michael

-- 
www.cargal.org 
GnuPG-key-ID: 0x90CA09E3
Jabber-ID: citizen428 [at] cargal [dot] org
Registered Linux User #278726

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  9:17     ` Evan Powers
@ 2003-06-05 12:23       ` Michael Kohl
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kohl @ 2003-06-05 12:23 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]

On Thu, 5 Jun 2003 05:17:38 -0400
Evan Powers <powers.161@osu.edu> wrote:

> If you haven't already, you should read Hans Reiser's Future Visions
> document, here:
> 
> http://www.namesys.com/whitepaper.html

I bookmarked the page once, but actually haven't read it up until now.
But judging from your summary I'll catch up on that one soon.

> Regardless, while I very much like your idea on principle, I do think
> it's an idea whose time has not yet come. That doesn't mean we can't
> think about how to do it when it's time does come, however. ;-)

That's actually the same answer George gave and which I expected.

And here for some off-topic: I started on writing a little document
about Portage for myself, containing the most often heard proposals
(like a database backend, p2p/bittorrent support etc.) and concerns
(security, naming conflicts). If anyone else would find this useful
(also I'm pretty sure Nick has most of the stuff already somewhere on
his agenda), I could "polish" it and but it up on the new Wiki (Zach?).
I don't really know if this could be useful for anybody, but if you
think so let me know.

Michael

-- 
www.cargal.org 
GnuPG-key-ID: 0x90CA09E3
Jabber-ID: citizen428 [at] cargal [dot] org
Registered Linux User #278726

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] (FS) Attributes for Ebuilds?
  2003-06-05  6:47 [gentoo-dev] (FS) Attributes for Ebuilds? Michael Kohl
  2003-06-05  7:02 ` Joseph Hardin
  2003-06-05  9:39 ` Georgi Georgiev
@ 2003-06-05 14:43 ` Marius Mauch
  2003-06-09 21:40 ` [gentoo-dev] " ross girshick
  3 siblings, 0 replies; 18+ messages in thread
From: Marius Mauch @ 2003-06-05 14:43 UTC (permalink / raw
  To: gentoo-dev

On Thu, 5 Jun 2003 14:47:33 +0800 Michael Kohl wrote:

> Would it be possible to use filesystem attributes for Ebuilds (of
> course only if the FS supports this, maybe a local useflag can do the
> trick)? This would allow users to build categories "on the fly" using
> a kind of live query mechanism. 

What I'd like to see were multiple portage backends for the tree
(filesystem, sql-db, berkley-db, ...) and the user can choose in
make.conf which he wants to use. So people who already have a sql db on
their machine can use this without forcing these dependencies on all
users. Of course this is a lot of work but would give portage a lot of
flexibility. Backend conversion could be covered by some tools or simply
changing backend module and rsync'ing the tree.

Marius

--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-dev] Re: (FS) Attributes for Ebuilds?
  2003-06-05  6:47 [gentoo-dev] (FS) Attributes for Ebuilds? Michael Kohl
                   ` (2 preceding siblings ...)
  2003-06-05 14:43 ` Marius Mauch
@ 2003-06-09 21:40 ` ross girshick
  2003-06-10  2:46   ` Michael Kohl
  3 siblings, 1 reply; 18+ messages in thread
From: ross girshick @ 2003-06-09 21:40 UTC (permalink / raw
  To: gentoo-dev

Michael,

I've actually already implemented something like this for a school
project. I built a metadata "database" on top of file system using ext2
and extended attributes. I modified the ext2 kernel module and built two
userspace utils for getting and setting the metadata. I also patched
portage to set the metadata when installing packages. While designing and
building this system I ran into a number practical issues, such as
extended attributes are currently only allowed 1 block on disk. Because of
this I use a lot of hashing tricks and only store 32bit integers on disk.
My user space utils translate between ascii names and integers. My basic
goal was to distribute the portage database through the file system so
that disk the database represents the actual state of the disk, rather
than the state at the time you emerged something. This allows for packages
to have files added and removed, while not accumulating cruft over time.

Rather than going into too much detail, I suggest that you read the paper
that I wrote on it. I haven't had time to get organized, but I can make
the source code available soon too. If anyone is interested in working on
it please let me know.

The paper:
http://people.brandeis.edu/~rossgir/pkgman.ps

cheers,
ross

p.s. it should be noted that this project was my first adventure into
kernel programming, so it's quite amateurish and could use the guide of a
wise sage :). 


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] Re: (FS) Attributes for Ebuilds?
  2003-06-09 21:40 ` [gentoo-dev] " ross girshick
@ 2003-06-10  2:46   ` Michael Kohl
  2003-06-24 15:20     ` [gentoo-dev] "Updating Portage Cache" optimizations ross b girshick
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Kohl @ 2003-06-10  2:46 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

On Mon, 09 Jun 2003 17:40:54 -0400
"ross girshick" <rossgir@cs.brandeis.edu> wrote:

> I've actually already implemented something like this for a school
> project. I built a metadata "database" on top of file system using
> ext2 and extended attributes. 

Good to hear! Actually after reading the documents about ReiserFS 4 I'm
thinking I'll wait for this one before trying my luck (and maybe even
then I won't ;).

> Rather than going into too much detail, I suggest that you read the
> paper that I wrote on it. 

Thanks!

> I haven't had time to get organized, but I can make the source code
> available soon too. If anyone is interested in working on it please
> let me know.

I probably wouldn't start hacking away the moment I get it, but I
definately would like to see it.

Michael

-- 
www.cargal.org 
GnuPG-key-ID: 0x90CA09E3
Jabber-ID: citizen428 [at] cargal [dot] org
Registered Linux User #278726

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-dev] "Updating Portage Cache" optimizations
  2003-06-10  2:46   ` Michael Kohl
@ 2003-06-24 15:20     ` ross b girshick
  2003-06-24 15:41       ` Seemant Kulleen
  0 siblings, 1 reply; 18+ messages in thread
From: ross b girshick @ 2003-06-24 15:20 UTC (permalink / raw
  To: gentoo-dev

Hi,

Lately I've been bothered by how long it takes to update the portage cache 
after doing an emerge [r]sync. So I decided to dive into the portage code 
for the first time to do something about this. What I found seems a little 
confusing and inefficient. So I'm ask for people to clear up any 
misconceptions I might have and get some feedback on a _simple_ 
optimization.

The main time siphon during the cache updating process is the function 
portage.aux_get embedded in a double nested for loop. aux_get either 
copies the metadata file out of /usr/portage/metadata/cache/ into 
/var/cache/edb/dep/ or regenerates it using the ebuild if the cached 
version is old. My laptop's hard-drive is pretty slow (4200RPM, etc) so 
this process of copying ~ 36MB of small files takes about 4.5 minutes on 
average. In most cases the metadata files are copied directly. I did a diff 
on some categories in the /dep/ cache vs. the /metadata/ cache and found 
only a few files were regenerated.

So my first optimization, a whopping one-liner, reduces the cache update 
time from 4.5 minutes to 2.25 minutes on my system (and saves about 35MB 
of disk space). Based on the code, I think a lot of other optimization can 
be added to (such as symlinking whole category directories when there are 
no regens in it).

So far I've had no problems after making this change. Can anyone think of 
how this would introduce a bug?

Thanks,
Ross Girshick

p.s. I've been using gentoo for quite a while now, but I've just started 
getting into the dev side of it. What's the proper channel for submitting 
patches?

Here's the patch:

--- portage.py.orig     2003-06-24 10:13:49.000000000 -0400
+++ portage.py  2003-06-24 10:15:04.000000000 -0400
@@ -3400,7 +3400,8 @@
                                                if not os.path.exists(mydir):
                                                        os.makedirs(mydir, 2775)
							os.chown(mydir,uid,portage_gid)
-                                               shutil.copy2(mymdkey, mydbkey)
+                                               #shutil.copy2(mymdkey, mydbkey)
+                                               os.symlink(mymdkey, mydbkey)
                                                usingmdcache=1
                                        except Exception,e:
                                                print "!!! Unable to copy '"+mymdkey+"' to '"+mydbkey+"'"



--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] "Updating Portage Cache" optimizations
  2003-06-24 15:20     ` [gentoo-dev] "Updating Portage Cache" optimizations ross b girshick
@ 2003-06-24 15:41       ` Seemant Kulleen
  2003-06-24 15:56         ` ross b girshick
  0 siblings, 1 reply; 18+ messages in thread
From: Seemant Kulleen @ 2003-06-24 15:41 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]


> portage.aux_get embedded in a double nested for loop. aux_get either 
> copies the metadata file out of /usr/portage/metadata/cache/ into 
> /var/cache/edb/dep/ or regenerates it using the ebuild if the cached 
> version is old. My laptop's hard-drive is pretty slow (4200RPM, etc) so 
> this process of copying ~ 36MB of small files takes about 4.5 minutes on 
> average. In most cases the metadata files are copied directly. I did a diff 
> on some categories in the /dep/ cache vs. the /metadata/ cache and found 
> only a few files were regenerated.
> 

Having not seen the current code at all, but if it is as you say, I should think an rsync would be more efficient to do.  As for proper channels, please point your browser to bugs.gentoo.org

-- 
Seemant Kulleen
Developer and Project Co-ordinator,
Gentoo Linux					http://www.gentoo.org/~seemant

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x3458780E
Key fingerprint = 23A9 7CB5 9BBB 4F8D 549B 6593 EDA2 65D8 3458 780E

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] "Updating Portage Cache" optimizations
  2003-06-24 15:41       ` Seemant Kulleen
@ 2003-06-24 15:56         ` ross b girshick
  2003-06-24 16:08           ` Seemant Kulleen
  0 siblings, 1 reply; 18+ messages in thread
From: ross b girshick @ 2003-06-24 15:56 UTC (permalink / raw
  To: gentoo-dev

> Having not seen the current code at all, but if it is as you say, I should think an rsync would be more efficient to do.  
> As for proper channels, please point your browser to bugs.gentoo.org

The cache in /metadata/cache is updated through rsync. Though it seems 
that it is not guaranteed to be up-to-date (I haven't studied how to set 
up a gentoo rsync mirror, so I'm not sure how and when the cache is 
compiled). If this cache was guaranteed to be up-to-date, then 
/var/cache/edb/dep would be 100% redundant. Right now it's 98% redundant, 
and emerge spends a lot of time regenerating that missing 2%.

Ross


--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] "Updating Portage Cache" optimizations
  2003-06-24 15:56         ` ross b girshick
@ 2003-06-24 16:08           ` Seemant Kulleen
  2003-06-24 21:46             ` ross b girshick
  0 siblings, 1 reply; 18+ messages in thread
From: Seemant Kulleen @ 2003-06-24 16:08 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 288 bytes --]

no I meant rsync on the local filesystem

-- 
Seemant Kulleen
Developer and Project Co-ordinator,
Gentoo Linux					http://www.gentoo.org/~seemant

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x3458780E
Key fingerprint = 23A9 7CB5 9BBB 4F8D 549B 6593 EDA2 65D8 3458 780E

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-dev] "Updating Portage Cache" optimizations
  2003-06-24 16:08           ` Seemant Kulleen
@ 2003-06-24 21:46             ` ross b girshick
  0 siblings, 0 replies; 18+ messages in thread
From: ross b girshick @ 2003-06-24 21:46 UTC (permalink / raw
  To: gentoo-dev

On Tue, 24 Jun 2003, Seemant Kulleen wrote:

> no I meant rsync on the local filesystem

Ah, I see what you mean. Here's a patch to do that. The first time it 
runs, it takes my machine ~ 3 minutes (compared to 4.5 in past) and after 
that usually <1 minute (30 seconds if there aren't many updates to to the 
tree).

If people are interested in testing this out and/or providing feedback 
please do. I've been using it all day without anything breaking (i wish 
you luck too) ;).

Ross Girshick

--- emerge.orig 2003-06-24 15:51:10.000000000 -0400
+++ emerge      2003-06-24 17:40:26.000000000 -0400
@@ -1656,31 +1656,55 @@
                sys.exit(1)
        if os.path.exists(myportdir+"/metadata/cache"):
                print "\n>>> Updating Portage cache...  ",
-               os.umask(0002)
-               if os.path.exists(portage.dbcachedir):
-                       portage.spawn("rm -Rf "+portage.dbcachedir,free=1)
-               try:
-                       os.mkdir(portage.dbcachedir)
-                       os.chown(portage.dbcachedir, os.getuid(), portage.portage_gid)
-                       os.chmod(portage.dbcachedir, 06775)
-                       os.umask(002)
-               except:
-                       pass
-               mynodes=portage.portdb.cp_all()
+               sys.stdout.flush()
+               # We shouldn't have to worry about this because when portage is imported dbcachedir is created if it's missing
+               if not os.path.exists(portage.dbcachedir):
+                       print "!!! Cache Directory " + portage.dbcachedir + " does not exist. Re-running emerge should fix this"
+                       sys.exit(1)
+               # XXX If we don't --delete, then we don't have to regenerate the cache files...what danger does this create?
+               # maybe it's sufficient to use --delete only every N syncs??? XXX
+               #update_cache_command = "/usr/bin/rsync -rlptD --delete --delete-after "...
+               update_cache_command = "/usr/bin/rsync -rlptD " + myportdir + "/metadata/cache/ " + portage.dbcachedir
+               exitcode = portage.spawn(update_cache_command, free=1)
+               # print update_cache_command
+               if (exitcode > 0):
+                       ## more error info might be good
+                       print "!!! local rsync error (cache update failed): " + exitcode + "\n"
+                       sys.exit(1)
+               mynodes = portage.portdb.cp_all()
                for x in mynodes:
-                       myxsplit=x.split("/")
-                       if not os.path.exists(portage.dbcachedir+"/"+myxsplit[0]):
-                               os.mkdir(portage.dbcachedir+"/"+myxsplit[0])
-                               os.chown(portage.dbcachedir+"/"+myxsplit[0], os.getuid(), portage.portage_gid)
-                               os.chmod(portage.dbcachedir+"/"+myxsplit[0], 06775)
-                       mymatches=portage.portdb.xmatch("match-all",x)
+                       mymatches = portage.portdb.xmatch("match-all", x)
                        for y in mymatches:
                                update_spinner()
+                               mydbkey = portage.dbcachedir+y
+                               myebuild, in_overlay = portage.portdb.findname2(y)
+                               myebuild_mtime = os.stat(myebuild)[ST_MTIME]
                                try:
-                                       ignored=portage.portdb.aux_get(y,[],metacachedir=myportdir+"/metadata/cache")
-                               except:
-                                       pass
-               portage.spawn("chmod -R g+rw "+portage.dbcachedir, free=1)
+                                       mydbkeystat = os.stat(mydbkey)
+                                       if mydbkeystat[ST_SIZE] == 0 or myebuild_mtime != mydbkeystat[ST_MTIME]:
+                                               doregen = 1
+                                       else:
+                                               doregen = 0
+                               except OSError:
+                                       doregen = 1
+
+                               if doregen:
+                                       ##print "doregen " + mydbkey + "\n"
+                                       try:
+                                               os.unlink(mydbkey)
+                                       except:
+                                               pass
+                                       # regenerate the dep cache file using doebuild interface
+                                       if portage.doebuild(myebuild, "depend", "/"):
+                                               #depend returned non-zero exit code...
+                                               sys.stderr.write(str(red("\nemerge sync:")+" (0) Error in "+y+" ebuild.\n"
+                                               "               Check for syntax error or corruption in the ebuild. (--debug)\n\n"))
+                                       try:
+                                               os.utime(mydbkey, (myebuild_mtime, myebuild_mtime))
+                                       except (IOError, OSError):
+                                               sys.stderr.write(str(red("\nemerge sync:")+" (1) Error in "+y+" ebuild.\n"
+                                               "               Check for syntax error or corruption in the ebuild. (--debug)\n\n"))
+               ##portage.spawn("chmod -R g+rw "+portage.dbcachedir, free=1)
                sys.stdout.write("\b\b  ...done!\n\n")
                sys.stdout.flush()



--
gentoo-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-06-24 21:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-05  6:47 [gentoo-dev] (FS) Attributes for Ebuilds? Michael Kohl
2003-06-05  7:02 ` Joseph Hardin
2003-06-05  7:21   ` Michael Kohl
2003-06-05  7:52     ` George Shapovalov
2003-06-05 10:17       ` Michael Kohl
2003-06-05  8:00     ` Daniel Armyr
2003-06-05  9:50       ` Marko Mikulicic
2003-06-05  9:17     ` Evan Powers
2003-06-05 12:23       ` Michael Kohl
2003-06-05  9:39 ` Georgi Georgiev
2003-06-05 14:43 ` Marius Mauch
2003-06-09 21:40 ` [gentoo-dev] " ross girshick
2003-06-10  2:46   ` Michael Kohl
2003-06-24 15:20     ` [gentoo-dev] "Updating Portage Cache" optimizations ross b girshick
2003-06-24 15:41       ` Seemant Kulleen
2003-06-24 15:56         ` ross b girshick
2003-06-24 16:08           ` Seemant Kulleen
2003-06-24 21:46             ` ross b girshick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox