From: Yannick Koehler <yannick.koehler@colubris.com>
To: gentoo-dev@gentoo.org
Subject: [gentoo-dev] Gentoo XML Database
Date: Fri, 7 Feb 2003 12:34:43 -0500 [thread overview]
Message-ID: <200302071234.44766.yannick.koehler@colubris.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]
For the fun of it, I created a little tool very custom and untested that will
read the the cache files of gentoo and generate on the stdout a valid xml
file.
Now the schema/dtd has been created without any thinking. This may or not
open the door to people to experiment with a gentoo equivalent database.
What's interesting is that the database is generated from a gentoo system
pretty easily because of the presence of the cache. One could easily think
about creating a direct ebuilds -> xml db software instead of passing through
the cache.
Discussion with carspaski reveal thought that the use of the database will
actually not speed up emerge. Because emerge loads the cache inside an
internal memory database and python allow him to leave that in memory in
between runs making it very fast and efficient as only the require entry of
the database gets loaded instead of the whole database.
Some benefit I see from the xml db is for side-tools, for example search
description of ebuilds is faster when using xml db as it is a single file and
software only look for string that start with <description>. One can use
grep/regexp to do such query or built an xml capable application.
I believe that more works need to be put into this to figure out a better dtd
and a separation of elements that would make more sense to some of the
application such as kportage and others gui tools that try to load all at
startup due to lack of persistent daemon keeping stuff in memory.
test.sh is the bash script that start the xml output and then do a recursive
ls of /var/cache/edb/dep. Then for each file it calls xmltest which is a
libxml2 app that will only convert the read text in ISO-8859-1 and then
output with escaping special chars as defined in XML 1.0.
I'm including only the source. I use the following compile line:
gcc -I /usr/include/libxml2 -o xmltest xmltest.c -lxml2
To run,
./test.sh > gentoo.xml
It generate a 9525071 bytes file.
--
Yannick Koehler
[-- Attachment #2: test.sh --]
[-- Type: application/x-shellscript, Size: 1820 bytes --]
[-- Attachment #3: xmltest.c --]
[-- Type: text/x-csrc, Size: 3220 bytes --]
#include <string.h>
#include <libxml/parser.h>
unsigned char*
convert (unsigned char *in, char *encoding)
{
unsigned char *out;
int ret,size,out_size,temp;
xmlCharEncodingHandlerPtr handler;
size = (int)strlen(in)+1;
out_size = size*2-1;
out = malloc((size_t)out_size);
if (out) {
handler = xmlFindCharEncodingHandler(encoding);
if (!handler) {
free(out);
out = NULL;
}
}
if (out) {
temp=size-1;
ret = handler->input(out, &out_size, in, &temp);
if (ret || temp-size+1) {
if (ret) {
printf("conversion wasn't successful.\n");
} else {
printf("conversion wasn't successful. converted: %i octets.\n",temp);
}
free(out);
out = NULL;
} else {
out = realloc(out,out_size+1);
out[out_size]=0; /*null terminating out*/
}
} else {
printf("no mem\n");
}
return (out);
}
int
main(int argc, char **argv) {
FILE *f = NULL;
char *tags[] = { "depend","runtime-depend","slot","sources","restrict","homepage","license","description","keywords","inherited","uses","cdepend","pdepend", NULL};
if (argc <= 1) {
printf("Usage: %s filename\n", argv[0]);
return(0);
}
if ((f = fopen(argv[1], "r"))) {
int currentLineNumber = 0;
unsigned char buffer[1000] = { 0 };
while (fgets(buffer, sizeof(buffer), f)) {
char * p = strtok(buffer, "\n");
if (p && p[0]) {
unsigned char *content, *out;
char *encoding = "ISO-8859-1";
int i = 0;
content = p;
if (NULL != (out = convert(content, encoding))) {
printf(" <%s>", tags[currentLineNumber]);
for (i = 0; i < strlen(out); i++) {
switch(p[i]) {
case '&':
putchar('&');
putchar('a');
putchar('m');
putchar('p');
putchar(';');
break;
case '\'':
putchar('&');
putchar('a');
putchar('p');
putchar('o');
putchar('s');
putchar(';');
break;
case '"':
putchar('&');
putchar('q');
putchar('u');
putchar('o');
putchar('t');
putchar(';');
break;
case '<':
putchar('&');
putchar('l');
putchar('t');
putchar(';');
break;
case '>':
putchar('&');
putchar('g');
putchar('t');
putchar(';');
break;
default:
putchar(p[i]);
break;
}
}
printf("</%s>\n", tags[currentLineNumber]);
free(out);
out = NULL;
}
}
/* Increment and test currentLineNumber */
if (!tags[++currentLineNumber]) {
break;
}
}
}
return (1);
}
[-- Attachment #4: Type: text/plain, Size: 37 bytes --]
--
gentoo-dev@gentoo.org mailing list
next reply other threads:[~2003-02-07 17:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-02-07 17:34 Yannick Koehler [this message]
2003-02-07 18:28 ` [gentoo-dev] Gentoo XML Database Vano D
2003-02-07 18:41 ` Vano D
2003-02-07 18:46 ` Yannick Koehler
2003-02-07 19:10 ` Vano D
2003-02-08 15:14 ` [gentoo-dev] " Denys Duchier
2003-02-11 14:56 ` [gentoo-dev] Gentoo XML Database: More Data Yannick Koehler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200302071234.44766.yannick.koehler@colubris.com \
--to=yannick.koehler@colubris.com \
--cc=gentoo-dev@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox