From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-dev+bounces-74626-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	by finch.gentoo.org (Postfix) with ESMTP id C104F59CA3
	for <garchives@archives.gentoo.org>; Sun, 14 Feb 2016 11:38:36 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id A860721C01E;
	Sun, 14 Feb 2016 11:38:28 +0000 (UTC)
Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id 9F2B6E0782
	for <gentoo-dev@lists.gentoo.org>; Sun, 14 Feb 2016 11:38:27 +0000 (UTC)
Received: from [IPv6:2a02:8109:a640:180c:5ee0:c5ff:fe8e:77db] (unknown [IPv6:2a02:8109:a640:180c:5ee0:c5ff:fe8e:77db])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	(Authenticated sender: patrick)
	by smtp.gentoo.org (Postfix) with ESMTPSA id BEC50340B55
	for <gentoo-dev@lists.gentoo.org>; Sun, 14 Feb 2016 11:38:25 +0000 (UTC)
Subject: Re: [gentoo-dev] Uncoordinated changes
To: gentoo-dev@lists.gentoo.org
References: <56BCEBE6.8090404@gentoo.org>
From: Patrick Lauer <patrick@gentoo.org>
X-Enigmail-Draft-Status: N1110
Message-ID: <56C066FD.9010106@gentoo.org>
Date: Sun, 14 Feb 2016 12:37:33 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.0
Precedence: bulk
List-Post: <mailto:gentoo-dev@lists.gentoo.org>
List-Help: <mailto:gentoo-dev+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-dev+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@lists.gentoo.org
Reply-to: gentoo-dev@lists.gentoo.org
MIME-Version: 1.0
In-Reply-To: <56BCEBE6.8090404@gentoo.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Archives-Salt: b3fc0148-7ed3-4932-95a4-b84c20bd414c
X-Archives-Hash: d8fc6f6ed66b5fbfca5ab755135f5a86

On 02/11/2016 09:15 PM, Patrick Lauer wrote:
> Now instead of looking up [metadata.xml] -> (herd name) -> [herds.xml]
> -> email it goes backwards:
> [metadata.xml] -> (maintainer type=project) -> email -> [projects.xml]
> -> Project name
>
> Since this involves XML and python's ElementTree library it's a
> nontrivial change that also removes a few now useless helpers
> (_get_herd_email has no reason to be, but we'd need a _get_herd_name
> helper instead. Err, get_proj ... ah well, whatever name works)
>
> And all that just so (1) gentoolkit output works and (2) euscan updates
> properly. Both of which I don't really care about much, but now that
> I've invested ~4h into debugging and trying to fix it I'm a tiny bit
> IRRITATED.
>
So this turns out to be more fun than expected.

Having spent a little bit of time staring at XML, DTDs and wondering why
we do things the most difficult way ...

Previously the herd tag was defined as:
<!ELEMENT herd (#PCDATA)>

So we end up with, for example:
<herd>kde</herd>

The new schema collapses herd (err, project!) into maintainers (err,
sustainers ... staff ... linchpin?)
And maintainer is defined as:
<!ELEMENT maintainer ( email, (description| name)* )>

Which means that only email is mandatory. So instead of search by name
you are now required to search by email.
And it leads to inconsistent (partial) duplication: Some metadata.xml
entries carry Name, some Description, and some are Email only.

For example for gentoolkit this means that instead of search by name now
it needs to be search by email, and the previous search by name
functionality requires herds.xml, err, projects.xml to figure out the
name of a project. Which might not match the one in metadata.xml!
(And you may need to filter out maintainers-that-are-not-projects, and
what about maintainers that are undefined? So much extra code complexity!)

And this is why I avoided the topic and hoped that the 'migration' would
make sense:
(1) Using XML is mildly insane. Neither machine- nor human-readable
(2) The DTD is even more insane, and few people have the patience to
figure it out
(3) The recent changes to the DTD change the data model in subtle ways
so that there's even *more* denormalization possible
(4) The tooling is, due to XML, wonderfully horrible and requires things
like XPATH to get the required data (because query by attribute is
harder than query by tag)

There's fundamental questions that should be handled before doing more
modifications - for example, should the data be more normalized (e.g.
name only in projects.xml / maintainers.xml and only email in
metadata.xml)? If we allow denormalization, do we have tools to check
and autocorrect (e.g. a maintainer changing name)?

Once we decide to abstract it away so that people should use tools and
not mangle it manually (have you looked at herds.xml ?! omg ...) there's
the question ... why XML? It's about the worst format for this job, INI
format is sufficient and easier to parse. Or JSON, or YAML, or whatever
is trendy now. Or do we autogenerate from templates?

Another funny thing: projects.xml is not in the same repository, so
synchronizing changes gets more tricky. And the metadata.dtd is in yet
another place. Wouldn't it make sense to have this organized in a less
confusing way?

You see where this is going - and why I didn't object loud enough to the
changes: I want to not care about this whole cluster of topics and do
things that are more rewarding. But that choice got taken away when
things broke (oh, they didn't break, they Function Differently now) and
I had to spend some time investigating why things deviate.

Sigh.


Am I grumpy?