* [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
@ 2023-09-15 9:31 Alexander Neuwirth
2023-09-15 10:15 ` Ulrich Mueller
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Neuwirth @ 2023-09-15 9:31 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1.1.1.1: Type: text/plain, Size: 818 bytes --]
Dear Larry,
I am looking for a way to link scientific publications to
ebuilds/packages. The easiest, but hacky way right now is to use the
|<doc lang="doi">https://doi.org/...</doc>|. Integration with
|epkginfo|/|equery meta| works nicely out of the box. However, currently
|pkgcheck| and/or the XML format complains about repeated |lang| entries
and does not allow long |lang| attributes (i.e. |lang="inspirehep"|
fails understandably).
You can inspect a detailed example here
https://github.com/gentoo/sci/pull/1216.
The more sophisticated way, instead of abusing the |lang| attribute,
would be another attribute, perhaps |reference|, or a new element in
|upstream| instead of |doc|. Before I move forward with this idea, I
would be curious to hear your thoughts on it.
Best,
APN
[-- Attachment #1.1.1.2: Type: text/html, Size: 1663 bytes --]
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 8923 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
2023-09-15 9:31 [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers Alexander Neuwirth
@ 2023-09-15 10:15 ` Ulrich Mueller
2023-09-17 12:18 ` Alexander Neuwirth
0 siblings, 1 reply; 7+ messages in thread
From: Ulrich Mueller @ 2023-09-15 10:15 UTC (permalink / raw
To: Alexander Neuwirth; +Cc: gentoo-dev
>>>>> On Fri, 15 Sep 2023, Alexander Neuwirth wrote:
> I am looking for a way to link scientific publications to
> ebuilds/packages. The easiest, but hacky way right now is to use the
> |<doc lang="doi">https://doi.org/...</doc>|. Integration with
> |epkginfo|/|equery meta| works nicely out of the box. However,
> currently |pkgcheck| and/or the XML format complains about repeated
> |lang| entries and does not allow long |lang| attributes (i.e.
> |lang="inspirehep"| fails understandably).
Please don't do this. The lang attribute is of type xs:language [1]
so it must be a valid BCP 47 language tag.
As a matter of fact, "doi" happens to be a valid tag for the Dogri
language [2], but this isn't helpful either.
[1] https://gitweb.gentoo.org/data/xml-schema.git/tree/metadata.xsd?id=db829cfdb40ae0a0034848cce38ee741a7c8d68c#n257
[2] https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?code_ID=117
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
2023-09-15 10:15 ` Ulrich Mueller
@ 2023-09-17 12:18 ` Alexander Neuwirth
2023-09-17 15:18 ` Ulrich Mueller
2023-09-17 18:28 ` Florian Schmaus
0 siblings, 2 replies; 7+ messages in thread
From: Alexander Neuwirth @ 2023-09-17 12:18 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1.1.1: Type: text/plain, Size: 867 bytes --]
Thanks. Instead of using the lang entry I can imagine these other
approaches:
1. doi/arxiv/... links could also easily be plugged in custom upstream
remote ids, but that also feels a bit wrong since all other [upstream
remote
ids](https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types)
are repos/source code providers.
2. Adding something specific to GLEP 68, like `<upstream><reference
type="doi"> https...`. However that seems like a bit too much work for
adding something that only a small subset of users (science) cares
about. Also integration of parsing with existing tools is an extra overhead.
3. Put them also into `HOMEPAGE` of the ebuilds. Again bit of a wrong
place, but with the (minor) advantage of having possibly different/new
references per version.
Is any of these three superior/preferable?
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 8923 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
2023-09-17 12:18 ` Alexander Neuwirth
@ 2023-09-17 15:18 ` Ulrich Mueller
2023-09-17 18:28 ` Florian Schmaus
1 sibling, 0 replies; 7+ messages in thread
From: Ulrich Mueller @ 2023-09-17 15:18 UTC (permalink / raw
To: Alexander Neuwirth; +Cc: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]
>>>>> On Sun, 17 Sep 2023, Alexander Neuwirth wrote:
> Thanks. Instead of using the lang entry I can imagine these other
> approaches:
> 1. doi/arxiv/... links could also easily be plugged in custom upstream
> remote ids, but that also feels a bit wrong since all other [upstream
> remote
> ids](https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types)
> are repos/source code providers.
GLEP 68 rather abstractly says that the remote-id elements should point
to "package identification trackers", and its predecessor GLEP 46
explains that this means the upstream source. So this doesn't look like
a good fit.
> 2. Adding something specific to GLEP 68, like `<upstream><reference
> type="doi"> https...`. However that seems like a bit too much work for
> adding something that only a small subset of users (science) cares
> about. Also integration of parsing with existing tools is an extra
> overhead.
This would require maintenance of another list of types. Looks like the
semantic is implicit in the URL, so is a type really needed?
A simpler change would be to lift the uniqueness restriction for the
doc element, i.e. allow it multiple times for the same language.
> 3. Put them also into `HOMEPAGE` of the ebuilds. Again bit of a wrong
> place, but with the (minor) advantage of having possibly different/new
> references per version.
This wouldn't require any changes.
> Is any of these three superior/preferable?
It depends on how many packages in the Gentoo repository are expected to
use the feature.
If the answer is less than ten, then IMHO using HOMEPAGE is a reasonable
choice. If it would be at least an order of magnitude more, then we could
think about updating GLEP 68 (e.g. lift uniqueness of doc).
Ulrich
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
2023-09-17 12:18 ` Alexander Neuwirth
2023-09-17 15:18 ` Ulrich Mueller
@ 2023-09-17 18:28 ` Florian Schmaus
2023-09-17 19:06 ` Alexander Neuwirth
2023-09-17 20:55 ` Ulrich Mueller
1 sibling, 2 replies; 7+ messages in thread
From: Florian Schmaus @ 2023-09-17 18:28 UTC (permalink / raw
To: gentoo-dev, Alexander Neuwirth
[-- Attachment #1.1.1: Type: text/plain, Size: 1015 bytes --]
On 17/09/2023 14.18, Alexander Neuwirth wrote:
> Thanks. Instead of using the lang entry I can imagine these other
> approaches:
>
> 2. Adding something specific to GLEP 68, like `<upstream><reference
> type="doi"> https...`. However that seems like a bit too much work for
> adding something that only a small subset of users (science) cares
> about.
<upstream>
<reference uri='doi:10.17487/rfc6120'/>
</upstream>
sounds perfectly fine.
It would require (minor) adjustments to the schema and DTD. And besides
that, packages that do not have a use for this information do not have
to pay a cost. Hence, I am not sure why you assume its too much work.
> Also integration of parsing with existing tools is an extra
> overhead.
Most XML parsers are non-strict. Which means that they ignore elements
that they do not know. Therefore, the same argument as above can be
made: tools that do not need to extract the information, should not
require any adjustments.
- Flow
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
2023-09-17 18:28 ` Florian Schmaus
@ 2023-09-17 19:06 ` Alexander Neuwirth
2023-09-17 20:55 ` Ulrich Mueller
1 sibling, 0 replies; 7+ messages in thread
From: Alexander Neuwirth @ 2023-09-17 19:06 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1.1.1.1: Type: text/plain, Size: 1217 bytes --]
On 9/17/23 20:28, Florian Schmaus wrote:
> <upstream>
> <reference uri='doi:10.17487/rfc6120'/>
> </upstream>
>
> sounds perfectly fine.
Ideally I'd not limit it to only doi but also arxiv, zenodo, inspirehep.
They can all be referenced by https://... . I agree a specific type is
kind of unnecessary. However, the same paper can be referenced by all of
them. If one wants to capture that redundancy. Could something like this
work?
<upstream>
<reference doi='https://doi.org/10.5281/zenodo.1169739'
zenodo='https://zenodo.org/record/8256635'/>
<reference inspirehep='https://inspirehep.net/literature/2598491'
arxiv='https://arxiv.org/pdf/2211.15838'
doi='https://doi.org/10.22323/1.414.0245'/>
</upstream>
I don't think this grouping is important though, just something useful
one could add if one already goes for the GLEP route.
>
> Hence, I am not sure why you assume its too much work.
>
If a user wants to list the references epkginfo/equery already shows the
homepage or doc links, but not the new reference element, right?
Similarly, the references would need extra treatment to be directly
shown on packages.gentoo.org.|
|
Cheers,
APN
[-- Attachment #1.1.1.2: Type: text/html, Size: 2418 bytes --]
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 8923 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
2023-09-17 18:28 ` Florian Schmaus
2023-09-17 19:06 ` Alexander Neuwirth
@ 2023-09-17 20:55 ` Ulrich Mueller
1 sibling, 0 replies; 7+ messages in thread
From: Ulrich Mueller @ 2023-09-17 20:55 UTC (permalink / raw
To: Florian Schmaus; +Cc: gentoo-dev, Alexander Neuwirth
[-- Attachment #1: Type: text/plain, Size: 415 bytes --]
>>>>> On Sun, 17 Sep 2023, Florian Schmaus wrote:
> <upstream>
> <reference uri='doi:10.17487/rfc6120'/>
> </upstream>
> sounds perfectly fine.
Don't use an attribute if you can put the information in the (otherwise
empty) element. Especially, when other elements like <doc> already do it
that way.
> It would require (minor) adjustments to the schema and DTD.
Also an update of GLEP 68, in the first place.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-09-17 20:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-15 9:31 [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers Alexander Neuwirth
2023-09-15 10:15 ` Ulrich Mueller
2023-09-17 12:18 ` Alexander Neuwirth
2023-09-17 15:18 ` Ulrich Mueller
2023-09-17 18:28 ` Florian Schmaus
2023-09-17 19:06 ` Alexander Neuwirth
2023-09-17 20:55 ` Ulrich Mueller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox