From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id C104F59CA3 for ; Sun, 14 Feb 2016 11:38:36 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id A860721C01E; Sun, 14 Feb 2016 11:38:28 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 9F2B6E0782 for ; Sun, 14 Feb 2016 11:38:27 +0000 (UTC) Received: from [IPv6:2a02:8109:a640:180c:5ee0:c5ff:fe8e:77db] (unknown [IPv6:2a02:8109:a640:180c:5ee0:c5ff:fe8e:77db]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: patrick) by smtp.gentoo.org (Postfix) with ESMTPSA id BEC50340B55 for ; Sun, 14 Feb 2016 11:38:25 +0000 (UTC) Subject: Re: [gentoo-dev] Uncoordinated changes To: gentoo-dev@lists.gentoo.org References: <56BCEBE6.8090404@gentoo.org> From: Patrick Lauer X-Enigmail-Draft-Status: N1110 Message-ID: <56C066FD.9010106@gentoo.org> Date: Sun, 14 Feb 2016 12:37:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: <56BCEBE6.8090404@gentoo.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: b3fc0148-7ed3-4932-95a4-b84c20bd414c X-Archives-Hash: d8fc6f6ed66b5fbfca5ab755135f5a86 On 02/11/2016 09:15 PM, Patrick Lauer wrote: > Now instead of looking up [metadata.xml] -> (herd name) -> [herds.xml] > -> email it goes backwards: > [metadata.xml] -> (maintainer type=project) -> email -> [projects.xml] > -> Project name > > Since this involves XML and python's ElementTree library it's a > nontrivial change that also removes a few now useless helpers > (_get_herd_email has no reason to be, but we'd need a _get_herd_name > helper instead. Err, get_proj ... ah well, whatever name works) > > And all that just so (1) gentoolkit output works and (2) euscan updates > properly. Both of which I don't really care about much, but now that > I've invested ~4h into debugging and trying to fix it I'm a tiny bit > IRRITATED. > So this turns out to be more fun than expected. Having spent a little bit of time staring at XML, DTDs and wondering why we do things the most difficult way ... Previously the herd tag was defined as: So we end up with, for example: kde The new schema collapses herd (err, project!) into maintainers (err, sustainers ... staff ... linchpin?) And maintainer is defined as: Which means that only email is mandatory. So instead of search by name you are now required to search by email. And it leads to inconsistent (partial) duplication: Some metadata.xml entries carry Name, some Description, and some are Email only. For example for gentoolkit this means that instead of search by name now it needs to be search by email, and the previous search by name functionality requires herds.xml, err, projects.xml to figure out the name of a project. Which might not match the one in metadata.xml! (And you may need to filter out maintainers-that-are-not-projects, and what about maintainers that are undefined? So much extra code complexity!) And this is why I avoided the topic and hoped that the 'migration' would make sense: (1) Using XML is mildly insane. Neither machine- nor human-readable (2) The DTD is even more insane, and few people have the patience to figure it out (3) The recent changes to the DTD change the data model in subtle ways so that there's even *more* denormalization possible (4) The tooling is, due to XML, wonderfully horrible and requires things like XPATH to get the required data (because query by attribute is harder than query by tag) There's fundamental questions that should be handled before doing more modifications - for example, should the data be more normalized (e.g. name only in projects.xml / maintainers.xml and only email in metadata.xml)? If we allow denormalization, do we have tools to check and autocorrect (e.g. a maintainer changing name)? Once we decide to abstract it away so that people should use tools and not mangle it manually (have you looked at herds.xml ?! omg ...) there's the question ... why XML? It's about the worst format for this job, INI format is sufficient and easier to parse. Or JSON, or YAML, or whatever is trendy now. Or do we autogenerate from templates? Another funny thing: projects.xml is not in the same repository, so synchronizing changes gets more tricky. And the metadata.dtd is in yet another place. Wouldn't it make sense to have this organized in a less confusing way? You see where this is going - and why I didn't object loud enough to the changes: I want to not care about this whole cluster of topics and do things that are more rewarding. But that choice got taken away when things broke (oh, they didn't break, they Function Differently now) and I had to spend some time investigating why things deviate. Sigh. Am I grumpy?