* [gentoo-dev] Gentoo XML Database
@ 2003-02-07 17:34 Yannick Koehler
2003-02-07 18:28 ` Vano D
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Yannick Koehler @ 2003-02-07 17:34 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]
For the fun of it, I created a little tool very custom and untested that will
read the the cache files of gentoo and generate on the stdout a valid xml
file.
Now the schema/dtd has been created without any thinking. This may or not
open the door to people to experiment with a gentoo equivalent database.
What's interesting is that the database is generated from a gentoo system
pretty easily because of the presence of the cache. One could easily think
about creating a direct ebuilds -> xml db software instead of passing through
the cache.
Discussion with carspaski reveal thought that the use of the database will
actually not speed up emerge. Because emerge loads the cache inside an
internal memory database and python allow him to leave that in memory in
between runs making it very fast and efficient as only the require entry of
the database gets loaded instead of the whole database.
Some benefit I see from the xml db is for side-tools, for example search
description of ebuilds is faster when using xml db as it is a single file and
software only look for string that start with <description>. One can use
grep/regexp to do such query or built an xml capable application.
I believe that more works need to be put into this to figure out a better dtd
and a separation of elements that would make more sense to some of the
application such as kportage and others gui tools that try to load all at
startup due to lack of persistent daemon keeping stuff in memory.
test.sh is the bash script that start the xml output and then do a recursive
ls of /var/cache/edb/dep. Then for each file it calls xmltest which is a
libxml2 app that will only convert the read text in ISO-8859-1 and then
output with escaping special chars as defined in XML 1.0.
I'm including only the source. I use the following compile line:
gcc -I /usr/include/libxml2 -o xmltest xmltest.c -lxml2
To run,
./test.sh > gentoo.xml
It generate a 9525071 bytes file.
--
Yannick Koehler
[-- Attachment #2: test.sh --]
[-- Type: application/x-shellscript, Size: 1820 bytes --]
[-- Attachment #3: xmltest.c --]
[-- Type: text/x-csrc, Size: 3220 bytes --]
#include <string.h>
#include <libxml/parser.h>
unsigned char*
convert (unsigned char *in, char *encoding)
{
unsigned char *out;
int ret,size,out_size,temp;
xmlCharEncodingHandlerPtr handler;
size = (int)strlen(in)+1;
out_size = size*2-1;
out = malloc((size_t)out_size);
if (out) {
handler = xmlFindCharEncodingHandler(encoding);
if (!handler) {
free(out);
out = NULL;
}
}
if (out) {
temp=size-1;
ret = handler->input(out, &out_size, in, &temp);
if (ret || temp-size+1) {
if (ret) {
printf("conversion wasn't successful.\n");
} else {
printf("conversion wasn't successful. converted: %i octets.\n",temp);
}
free(out);
out = NULL;
} else {
out = realloc(out,out_size+1);
out[out_size]=0; /*null terminating out*/
}
} else {
printf("no mem\n");
}
return (out);
}
int
main(int argc, char **argv) {
FILE *f = NULL;
char *tags[] = { "depend","runtime-depend","slot","sources","restrict","homepage","license","description","keywords","inherited","uses","cdepend","pdepend", NULL};
if (argc <= 1) {
printf("Usage: %s filename\n", argv[0]);
return(0);
}
if ((f = fopen(argv[1], "r"))) {
int currentLineNumber = 0;
unsigned char buffer[1000] = { 0 };
while (fgets(buffer, sizeof(buffer), f)) {
char * p = strtok(buffer, "\n");
if (p && p[0]) {
unsigned char *content, *out;
char *encoding = "ISO-8859-1";
int i = 0;
content = p;
if (NULL != (out = convert(content, encoding))) {
printf(" <%s>", tags[currentLineNumber]);
for (i = 0; i < strlen(out); i++) {
switch(p[i]) {
case '&':
putchar('&');
putchar('a');
putchar('m');
putchar('p');
putchar(';');
break;
case '\'':
putchar('&');
putchar('a');
putchar('p');
putchar('o');
putchar('s');
putchar(';');
break;
case '"':
putchar('&');
putchar('q');
putchar('u');
putchar('o');
putchar('t');
putchar(';');
break;
case '<':
putchar('&');
putchar('l');
putchar('t');
putchar(';');
break;
case '>':
putchar('&');
putchar('g');
putchar('t');
putchar(';');
break;
default:
putchar(p[i]);
break;
}
}
printf("</%s>\n", tags[currentLineNumber]);
free(out);
out = NULL;
}
}
/* Increment and test currentLineNumber */
if (!tags[++currentLineNumber]) {
break;
}
}
}
return (1);
}
[-- Attachment #4: Type: text/plain, Size: 37 bytes --]
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] Gentoo XML Database
2003-02-07 17:34 [gentoo-dev] Gentoo XML Database Yannick Koehler
@ 2003-02-07 18:28 ` Vano D
2003-02-07 18:41 ` Vano D
2003-02-08 15:14 ` [gentoo-dev] " Denys Duchier
2003-02-11 14:56 ` [gentoo-dev] Gentoo XML Database: More Data Yannick Koehler
2 siblings, 1 reply; 7+ messages in thread
From: Vano D @ 2003-02-07 18:28 UTC (permalink / raw
To: gentoo-dev
One interesting application I can think of using a database backend is
for making a "portage server" serving portage ebuilds and recording the
cache information (as in what is installed with what USE flags) for
every single client machine having an "account" on the db. So in effect
you would have all your machines without any portage-related files apart
from the "emergedb" command and accessory tools. It could be usefull for
anyone who needs to deploy a lot of differently configured Gentoo
machines really fast. It could also be usefull for administration of
production Gentoo machines freeing them of have any "portage bloat"
(bloat not in the bad sense, I love portage ;-) (here you would have the
machines connect to your central portage server. Very interesting even
though it is of limited use to many people.
On Fri, 2003-02-07 at 18:34, Yannick Koehler wrote:
> For the fun of it, I created a little tool very custom and untested that will
> read the the cache files of gentoo and generate on the stdout a valid xml
> file.
>
> Now the schema/dtd has been created without any thinking. This may or not
> open the door to people to experiment with a gentoo equivalent database.
...
> Discussion with carspaski reveal thought that the use of the database will
> actually not speed up emerge. Because emerge loads the cache inside an
> internal memory database and python allow him to leave that in memory in
> between runs making it very fast and efficient as only the require entry of
> the database gets loaded instead of the whole database.
>
> Some benefit I see from the xml db is for side-tools, for example search
> description of ebuilds is faster when using xml db as it is a single file and
> software only look for string that start with <description>. One can use
> grep/regexp to do such query or built an xml capable application.
--
Vano D <gentoo-dev@europeansoftware.com>
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] Gentoo XML Database
2003-02-07 18:28 ` Vano D
@ 2003-02-07 18:41 ` Vano D
2003-02-07 18:46 ` Yannick Koehler
0 siblings, 1 reply; 7+ messages in thread
From: Vano D @ 2003-02-07 18:41 UTC (permalink / raw
To: gentoo-dev
Sorry for double posting.
If that idea is extended and assuming that you have different machines
with different specs in a big organisation you want to deploy
gentoo clients to, you can in effect have a
"configuration management center" server to configure and
manage software in all of the gentoo machines in that organisation.
Ofcourse if all machines have the same specs you can still use this system
but without the need to compile software for each machine.
I think the idea is very interesting and can be usefull.
--
Vano D <gentoo-dev@europeansoftware.com>
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] Gentoo XML Database
2003-02-07 18:41 ` Vano D
@ 2003-02-07 18:46 ` Yannick Koehler
2003-02-07 19:10 ` Vano D
0 siblings, 1 reply; 7+ messages in thread
From: Yannick Koehler @ 2003-02-07 18:46 UTC (permalink / raw
To: gentoo-dev
On February 7, 2003 01:41 pm, Vano D wrote:
> Sorry for double posting.
>
> If that idea is extended and assuming that you have different machines
> with different specs in a big organisation you want to deploy
> gentoo clients to, you can in effect have a
> "configuration management center" server to configure and
> manage software in all of the gentoo machines in that organisation.
>
> Ofcourse if all machines have the same specs you can still use this system
> but without the need to compile software for each machine.
>
> I think the idea is very interesting and can be usefull.
Which brings up a ver old idea that I again posted on gentoo last summer about
having a script exporting all config file in an xml database/tree and have
utilities developped to display/present/change this information and then make
that information transform back into the original /etc files.
One could then export the xml and re-import it inside another system. Even
better, would be that you could configure more than simply linux because the
notion of "users" can easily exists in other system and using xslt on an xml
could help converting it to another similar format for the target platform.
--
Yannick Koehler
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] Gentoo XML Database
2003-02-07 18:46 ` Yannick Koehler
@ 2003-02-07 19:10 ` Vano D
0 siblings, 0 replies; 7+ messages in thread
From: Vano D @ 2003-02-07 19:10 UTC (permalink / raw
To: gentoo-dev
On Fri, 2003-02-07 at 19:46, Yannick Koehler wrote:
> On February 7, 2003 01:41 pm, Vano D wrote:
> > Sorry for double posting.
> >
> > If that idea is extended and assuming that you have different machines
> > with different specs in a big organisation you want to deploy
> > gentoo clients to, you can in effect have a
> > "configuration management center" server to configure and
> > manage software in all of the gentoo machines in that organisation.
> >
> > Ofcourse if all machines have the same specs you can still use this system
> > but without the need to compile software for each machine.
> >
> > I think the idea is very interesting and can be usefull.
>
> Which brings up a ver old idea that I again posted on gentoo last summer about
> having a script exporting all config file in an xml database/tree and have
> utilities developped to display/present/change this information and then make
> that information transform back into the original /etc files.
>
> One could then export the xml and re-import it inside another system. Even
> better, would be that you could configure more than simply linux because the
> notion of "users" can easily exists in other system and using xslt on an xml
> could help converting it to another similar format for the target platform.
It is interesting that this issue came up because I have a friend whose
end of year university project was the management and configuration of
software using tools which interacted with xml templates. Each software
configuration file (such as proftpd's config files or samba's) is
configured via xml with the use of xml schema defining the config files.
You then make "software modules" for each software package you want (or
in another words make the xml schema for the configuration file(s),
default values, dependencies between directives and values, and a set of
default/secure rules)
He then developed GUI tools to modify the xml parameters locally and
remotely. The whole system also includes dependencies and security
(originally the whole idea was for security, so say if you define an XYZ
directive in Samba it won't compromise the system because you also had
an ABC directive somewhere else.. etc. So in effect the whole system
with its dependencies and default/set rules takes care of security and
as a side effect: easy configuration).
Just thought to let you know about that project since you seem to be
interested in the same topic. I think if Gentoo is used with such a
system and with the ideas discussed in previous posts, you could have
one powerfull ((semi)auto) configuration management system with all its
bells and whistles.
Check http://inseguro.org/ it's all in Spanish unfortunately. You have
some screenshots of his GUIs for the configuration management. Also rpm
binaries for RedHat. He intends to release the code for everything when
it reaches 1.0.
--
Vano D <gentoo-dev@europeansoftware.com>
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gentoo-dev] Re: Gentoo XML Database
2003-02-07 17:34 [gentoo-dev] Gentoo XML Database Yannick Koehler
2003-02-07 18:28 ` Vano D
@ 2003-02-08 15:14 ` Denys Duchier
2003-02-11 14:56 ` [gentoo-dev] Gentoo XML Database: More Data Yannick Koehler
2 siblings, 0 replies; 7+ messages in thread
From: Denys Duchier @ 2003-02-08 15:14 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 9328 bytes --]
I quite liked the idea of reflecting the portage database in XML, but
not much the use of an auxiliary C program. Besides, there was much
more to be milked from portage. So here is my take on it. I wrote it
in Python, and I tried to properly parse the dependency specs (I hope
I got it right, but I have never been able to locate any realistic
documentation for that bizarre syntax which seems to have more in
common with vogon poetry than with a specification language :-)
simply invoke:
python toxml.py > EDB.xml
I attach the file "toxml.py" below:
import string,re,os
######################################################################
class Disj:
def __init__(self,alts):
self.alts = alts
def __str__(self):
return "<Disj %s>" % str(self.alts)
def xml(self,indent):
subindent = indent + ' '
print "%s<choice>" % indent
for a in self.alts:
a.xml(subindent)
print "%s</choice>" % indent
CMP_NAMES = {
'>=' : 'ge',
'<=' : 'le',
'=<' : 'le',
'>' : 'gt',
'<' : 'lt',
'=' : 'eq',
'!' : 'ne',
'~' : 'newest'
}
class Pkg:
def __init__(self,cmp,nam,star):
self.cmp = cmp
self.name = nam
self.newest = star
def __str__(self):
cmp = self.cmp or ''
nam = self.name
star = ''
if self.newest: star='*'
return "<Package %s%s%s>" % (cmp,nam,star)
def xml(self,indent):
cmp = self.cmp
if cmp:
cmp = " cmp='%s'" % CMP_NAMES[cmp]
else:
cmp = ''
if self.newest:
newest = " newest='yes'"
else:
newest = ''
print "%s<package name='%s'%s%s/>" % (indent,self.name,cmp,newest)
class Use:
def __init__(self,var,val):
self.var = var
self.val = val
def __str__(self):
var = self.var
val = '!'
if self.val: val=''
return "<Use %s%s>" % (val,var)
class Cond:
def __init__(self,use,yes,no):
self.use = use
self.yes = yes
self.no = no
def __str__(self):
use = str(self.use)
yes = str(self.yes)
no = str(self.no)
return "<Cond %s yes=%s no=%s>" % (use,yes,no)
def xml(self,indent):
var = self.use.var
subindent = indent + ' '
subsubindent = subindent + ' '
yes = self.yes
no = self.no
if not self.use.val:
yes,no = no,yes
print "%s<test use='%s'>" % (indent,var)
if yes:
print "%s<when value='yes'>" % subindent
for d in yes:
d.xml(subsubindent)
print "%s</when>" % subindent
if no:
print "%s<when value='no'>" % subindent
for d in no:
d.xml(subsubindent)
print "%s</when>" % subindent
print "%s</test>" % indent
######################################################################
TOKEN_RE = re.compile("^([()?:~!*]|\\|\\||>=|<=|=<|<|>|=|[#a-zA-Z0-9/.+_-]+)(.*)$")
def tokenize(s):
tokens=[]
for x in string.split(string.strip(s)):
while x:
res = TOKEN_RE.match(x)
tokens.append(res.group(1))
x = res.group(2)
return tokens
FILE = None
TOKS = None
def parse_error():
TOKS.reverse()
raise "parse error file="+FILE+" tokens="+str(TOKS)
def parse(s):
global TOKS
TOKS = tokenize(s)
TOKS.reverse()
deps = []
while TOKS:
deps.extend(parse_dep())
return deps
def parse_dep():
if not TOKS: parse_error()
elif TOKS[-1]=='(':
TOKS.pop()
deps = []
while TOKS and TOKS[-1]!=')':
deps.extend(parse_dep())
if TOKS and TOKS[-1]==')':
TOKS.pop()
return deps
else: parse_error()
elif TOKS[-1]=='||':
TOKS.pop()
if TOKS and TOKS[-1]=='(':
return [Disj(parse_dep())]
else:
parse_error()
else:
use = parse_use()
if use:
yes = parse_dep()
if TOKS and TOKS[-1]==':':
TOKS.pop()
no = parse_dep()
else:
no = []
return [Cond(use,yes,no)]
else:
return [parse_pkg()]
LETTER_RE = re.compile("[#a-zA-Z0-9]")
def is_name(s):
return LETTER_RE.match(s)
def parse_use():
n = len(TOKS)
if n >= 3 and TOKS[-1]=='!' and is_name(TOKS[-2]) and TOKS[-3]=='?':
TOKS.pop()
var = TOKS.pop()
val = False
TOKS.pop()
return Use(var,val)
elif n >= 2 and is_name(TOKS[-1]) and TOKS[-2]=='?':
var = TOKS.pop()
val = True
TOKS.pop()
return Use(var,val)
elif n >= 1 and TOKS[-1]=='?':
# I don't understand what this is supposed to mean
TOKS.pop()
return Use("",True)
else:
return None
CMP = ['>=','<=','=<','<','>','=','~','!']
CMP_NEG = {
'>=' : '<',
'<=' : '>',
'=<' : '>',
'<' : '>=',
'>' : '=<' }
CMP_NEG_KEYS = CMP_NEG.keys()
def parse_pkg():
if TOKS and (TOKS[-1] in CMP):
cmp = TOKS.pop()
if cmp=='!' and TOKS and (TOKS[-1] in CMP_NEG_KEYS):
cmp = TOKS.pop()
cmp = CMP_NEG[cmp]
else:
cmp = None
if TOKS and is_name(TOKS[-1]):
nam = TOKS.pop()
else:
parse_error()
if TOKS and TOKS[-1]=='*':
star = True
TOKS.pop()
else:
star = False
return Pkg(cmp,nam,star)
######################################################################
TAGS = [ "depend" ,
"rdepend" ,
"slot" ,
"sources" ,
"restrict" ,
"homepage" ,
"license" ,
"description" ,
"keywords" ,
"inherited" ,
"uses" ,
"cdepend" ,
"pdepend" ]
def do_file(filename):
#print "do_file(%s)" % filename
f = open(filename)
lines = f.readlines()
f.close()
table = {}
for tag,line in zip(TAGS,lines):
line = string.strip(line)
if tag=="description" or tag=="homepage" or tag=="slot":
table[tag] = line
elif tag=="depend" or tag=="rdepend":
global FILE
FILE = filename
table[tag] = parse(line)
else:
table[tag] = string.split(line)
return table
DIR = "/var/cache/edb/dep"
PACKAGE_REGEX = re.compile("^(.+)-([0-9]+(\\.[0-9]+)*[a-zA-Z]?(_(alpha|beta|pre|rc|p)[0-9]*)?(-r[0-9]+)?)$")
class Database:
def __init__(self):
self.table = {}
def xml(self,indent):
print "%s<database>" % indent
subindent = indent + ' '
for c in self.table.itervalues():
c.xml(subindent)
print "%s</database>" % indent
class Category:
def __init__(self,cat):
self.name = cat
self.table = {}
def xml(self,indent):
print "%s<category name='%s'>" % (indent,self.name)
subindent = indent + ' '
for p in self.table.itervalues():
p.xml(subindent)
print "%s</category>" % indent
class Package:
def __init__(self,pkg):
self.name = pkg
self.table = {}
def xml(self,indent):
print "%s<package name='%s'>" % (indent,self.name)
subindent = indent + ' '
for v in self.table.itervalues():
v.xml(subindent)
print "%s</package>" % indent
class Version:
def __init__(self,ver):
self.version = ver
self.table = {}
def xml(self,indent):
print "%s<version number='%s'>" % (indent,self.version)
subindent = indent + ' '
subsubindent = subindent + ' '
for tag in TAGS:
val = self.table.get(tag,None)
if not val:
print "%s<%s/>" % (subindent,tag)
elif tag=="description" or tag=="homepage" or tag=="slot":
print "%s<%s>%s</%s>" % (subindent,tag,escape(val),tag)
elif tag=="depend" or tag=="rdepend":
print "%s<%s>" % (subindent,tag)
for d in val:
d.xml(subsubindent)
print "%s</%s>" % (subindent,tag)
print "%s</version>" % indent
def escape(s):
s = string.replace(s,"&","&")
s = string.replace(s,"<","<")
s = string.replace(s,">",">")
s = string.replace(s,'"',""")
s = string.replace(s,"'","'")
return s
def do_categories():
DB = Database()
full_table = DB.table
for cat in os.listdir(DIR):
curdir = DIR+"/"+cat
files=os.listdir(curdir)
files.sort()
CAT = Category(cat)
cat_table = CAT.table
full_table[cat] = CAT
cur_pkg = None
for f in files:
res = PACKAGE_REGEX.match(f)
pkg = res.group(1)
ver = res.group(2)
if cat_table.has_key(pkg):
PKG = cat_table[pkg]
pkg_table = PKG.table
else:
PKG = Package(pkg)
pkg_table = PKG.table
cat_table[pkg] = PKG
VER = Version(ver)
VER.table = do_file(curdir+'/'+f)
pkg_table[ver] = VER
return DB
DB = do_categories()
DB.xml('')
[-- Attachment #2: Type: text/plain, Size: 73 bytes --]
Cheers,
--
Dr. Denys Duchier
Équipe Calligramme
LORIA, Nancy, FRANCE
[-- Attachment #3: Type: text/plain, Size: 37 bytes --]
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gentoo-dev] Gentoo XML Database: More Data
2003-02-07 17:34 [gentoo-dev] Gentoo XML Database Yannick Koehler
2003-02-07 18:28 ` Vano D
2003-02-08 15:14 ` [gentoo-dev] " Denys Duchier
@ 2003-02-11 14:56 ` Yannick Koehler
2 siblings, 0 replies; 7+ messages in thread
From: Yannick Koehler @ 2003-02-11 14:56 UTC (permalink / raw
To: gentoo-dev
On February 7, 2003 12:34 pm, Yannick Koehler wrote:
> Discussion with carspaski reveal thought that the use of the database will
> actually not speed up emerge. Because emerge loads the cache inside an
> internal memory database and python allow him to leave that in memory in
> between runs making it very fast and efficient as only the require entry of
> the database gets loaded instead of the whole database.
Just a note about that comment I made. When I said that it wouldn't speed-up
emerge it was related to specific functions. For example, if you do for the
first time:
emerge kde
Emerge will then fetch the kde ebuild get the dependency and fetch all the
dependency. Cache this information inside its internal persistent db and
then execute the operation.
In a DB mode, the database need to be loaded in some way or the index. It is
hard to imagine that the number of I/O will actually be less than the current
one described above.
But, there is cases where a db would speed up portage and that's why the xml
file is getting interesting. It allow to import into a db who knows about
xml or try things using xml/text related tools.
> Some benefit I see from the xml db is for side-tools, for example search
> description of ebuilds is faster when using xml db as it is a single file
> and software only look for string that start with <description>. One can
> use grep/regexp to do such query or built an xml capable application.
I have done the following experiment. When I posted the original mail, I
generated the gentoo.xml file. The file was actually 4762563 bytes. I did
report it was 9525071 bytes but this was wrong. My script had generated a
double file... I found this by using
grep "version name=\"kdelibs-3.1" gentoo.xml
which outputted two instance. After correcting the file was smaller.
Something that got my attention also is that generating a gentoo.xml today
got me a 4852253 bytes file. Now if you compare the date/size:
2003-02-07 12:34 -> 4762563 bytes
2003-02-11 09:10 -> 4852253 bytes
This is a 89680 bytes difference. Running emerge rsync daily is giving me
more than 1.2 megs a shot. So quick calculation:
11 Feb - 7 Feb = 4 days.
4 * 1.2 megs = 4.8 megs
4 * 89k = 356k
4.8 - 356k = ~4.45 megs
Which means that, if the gentoo.xml contained all the info required for
calculating the dependencies and only fetching required ebuilds which I'm
pretty sure it does, would mean that I have wasted 4.45 megs of bandwidth
this past 4 days.
Now consider that I'm not alone and that in my case I have a shared repository
both at home and at work, this is a huge waste of bandwidth for only 4
days... And those are bytes, not bits...
Other information...
ykoehler@corneille ykoehler $ time grep "version name=" -c gentoo.xml
gentoo2.xml
gentoo.xml:7262
gentoo2.xml:7373
real 0m0.107s
user 0m0.010s
sys 0m0.050s
It takes less than 1 seconds for grep to parse the file and retrieve all
occurrences of version name=". While it is true that grep doesn't generate
data structure in memory and parse the inner part of the version tag, this
actually make me re-think my original discussion with carpaski where we got
to the conclusion that the speedup for emerge would be minimal. I now
actually think that given a proper xml file which would minimize even more
parsing by adding to the xml file generation has actually possibility for
huge saving in bandwidth, hard disk space and speed on a local pc.
It also take less time to generate the xml file than to issue an emerge rsync
for a 1 day of changes. For example, I have run emerge rsync this morning:
rsync[15612] (receiver) heap statistics:
arena: 5362104 (bytes from sbrk)
ordblks: 551 (chunks not in use)
smblks: 2
hblks: 1 (chunks from mmap)
hblkhd: 258048 (bytes from mmap)
usmblks: 0
fsmblks: 40
uordblks: 4667912 (bytes used)
fordblks: 694192 (bytes free)
keepcost: 53464 (bytes in releasable chunk)
Number of files: 36464
Number of files transferred: 14444
Total file size: 29101431 bytes
Total transferred file size: 14445832 bytes
Literal data: 69 bytes
Matched data: 14445763 bytes
File list size: 848703
Total bytes written: 408076
Total bytes read: 1349498
wrote 408076 bytes read 1349498 bytes 1858.88 bytes/sec
total size is 29101431 speedup is 16.56
>>> Updating Portage cache...
real 16m19.682s
user 0m21.990s
sys 0m40.790s
So 16 min. later and 1.3 megs I got the database update for today...
corneille dep # time /home/ykoehler/test.sh >/home/ykoehler/gentoo3.xml
real 1m17.284s
user 0m21.780s
sys 0m32.350s
Generated the gentoo.xml file at size 4865557 bytes. Which if I had rsynced
from the server would have transform into (4865557 - 4852253) ~13304 bytes.
I heard that there was already works on getting portage to use a real db such
as berkeley or mysql etc.. In any case I think the distribution format that
make most sense is xml. This format can easily be manipulated using xslt to
fit the need of many people and can be use with mostly any existing text
tools. Using xsl it also can be nicely converted into HTML and a dependency
tree is easily built from there. RSync could quickly figure out which part
changes and update those in a fraction of time it takes today to diff the
trees.
Hard Info:
P3 800 Mhz
Slow IDE Hard Disk 5400 rpm
256 megs ram
i810 chipset
So, with those numbers now extracted, I'm about to attempt to move emerge to
this system on my PC and get you more "real" numbers using that mode.
--
Yannick Koehler
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-02-11 15:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-07 17:34 [gentoo-dev] Gentoo XML Database Yannick Koehler
2003-02-07 18:28 ` Vano D
2003-02-07 18:41 ` Vano D
2003-02-07 18:46 ` Yannick Koehler
2003-02-07 19:10 ` Vano D
2003-02-08 15:14 ` [gentoo-dev] " Denys Duchier
2003-02-11 14:56 ` [gentoo-dev] Gentoo XML Database: More Data Yannick Koehler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox