public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
@ 2023-09-15  9:31 Alexander Neuwirth
  2023-09-15 10:15 ` Ulrich Mueller
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Neuwirth @ 2023-09-15  9:31 UTC (permalink / raw)
  To: gentoo-dev


[-- Attachment #1.1.1.1: Type: text/plain, Size: 818 bytes --]

Dear Larry,

I am looking for a way to link scientific publications to 
ebuilds/packages. The easiest, but hacky way right now is to use the 
|<doc lang="doi">https://doi.org/...</doc>|. Integration with 
|epkginfo|/|equery meta| works nicely out of the box. However, currently 
|pkgcheck| and/or the XML format complains about repeated |lang| entries 
and does not allow long |lang| attributes (i.e. |lang="inspirehep"| 
fails understandably).

You can inspect a detailed example here 
https://github.com/gentoo/sci/pull/1216.

The more sophisticated way, instead of abusing the |lang| attribute, 
would be another attribute, perhaps |reference|, or a new element in 
|upstream| instead of |doc|. Before I move forward with this idea, I 
would be curious to hear your thoughts on it.

Best,
APN


[-- Attachment #1.1.1.2: Type: text/html, Size: 1663 bytes --]

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 8923 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
  2023-09-15  9:31 [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers Alexander Neuwirth
@ 2023-09-15 10:15 ` Ulrich Mueller
  2023-09-17 12:18   ` Alexander Neuwirth
  0 siblings, 1 reply; 7+ messages in thread
From: Ulrich Mueller @ 2023-09-15 10:15 UTC (permalink / raw)
  To: Alexander Neuwirth; +Cc: gentoo-dev

>>>>> On Fri, 15 Sep 2023, Alexander Neuwirth wrote:

> I am looking for a way to link scientific publications to
> ebuilds/packages. The easiest, but hacky way right now is to use the 
> |<doc lang="doi">https://doi.org/...</doc>|. Integration with
> |epkginfo|/|equery meta| works nicely out of the box. However,
> currently |pkgcheck| and/or the XML format complains about repeated
> |lang| entries and does not allow long |lang| attributes (i.e.
> |lang="inspirehep"| fails understandably).

Please don't do this. The lang attribute is of type xs:language [1]
so it must be a valid BCP 47 language tag.

As a matter of fact, "doi" happens to be a valid tag for the Dogri
language [2], but this isn't helpful either.

[1] https://gitweb.gentoo.org/data/xml-schema.git/tree/metadata.xsd?id=db829cfdb40ae0a0034848cce38ee741a7c8d68c#n257
[2] https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?code_ID=117


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
  2023-09-15 10:15 ` Ulrich Mueller
@ 2023-09-17 12:18   ` Alexander Neuwirth
  2023-09-17 15:18     ` Ulrich Mueller
  2023-09-17 18:28     ` Florian Schmaus
  0 siblings, 2 replies; 7+ messages in thread
From: Alexander Neuwirth @ 2023-09-17 12:18 UTC (permalink / raw)
  To: gentoo-dev


[-- Attachment #1.1.1: Type: text/plain, Size: 867 bytes --]

Thanks. Instead of using the lang entry I can imagine these other 
approaches:

1. doi/arxiv/... links could also easily be plugged in custom upstream 
remote ids, but that also feels a bit wrong since all other [upstream 
remote 
ids](https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types) 
are repos/source code providers.
2. Adding something specific to GLEP 68, like `<upstream><reference 
type="doi"> https...`. However that seems like a bit too much work for 
adding something that only a small subset of users (science) cares 
about. Also integration of parsing with existing tools is an extra overhead.
3. Put them also into `HOMEPAGE` of the ebuilds. Again bit of a wrong 
place, but with the (minor) advantage of having possibly different/new 
references per version.

Is any of these three superior/preferable?


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 8923 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
  2023-09-17 12:18   ` Alexander Neuwirth
@ 2023-09-17 15:18     ` Ulrich Mueller
  2023-09-17 18:28     ` Florian Schmaus
  1 sibling, 0 replies; 7+ messages in thread
From: Ulrich Mueller @ 2023-09-17 15:18 UTC (permalink / raw)
  To: Alexander Neuwirth; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

>>>>> On Sun, 17 Sep 2023, Alexander Neuwirth wrote:

> Thanks. Instead of using the lang entry I can imagine these other
> approaches:

> 1. doi/arxiv/... links could also easily be plugged in custom upstream
> remote ids, but that also feels a bit wrong since all other [upstream
> remote
> ids](https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types)
> are repos/source code providers.

GLEP 68 rather abstractly says that the remote-id elements should point
to "package identification trackers", and its predecessor GLEP 46
explains that this means the upstream source. So this doesn't look like
a good fit.

> 2. Adding something specific to GLEP 68, like `<upstream><reference
> type="doi"> https...`. However that seems like a bit too much work for
> adding something that only a small subset of users (science) cares
> about. Also integration of parsing with existing tools is an extra
> overhead.

This would require maintenance of another list of types. Looks like the
semantic is implicit in the URL, so is a type really needed?

A simpler change would be to lift the uniqueness restriction for the
doc element, i.e. allow it multiple times for the same language.

> 3. Put them also into `HOMEPAGE` of the ebuilds. Again bit of a wrong
> place, but with the (minor) advantage of having possibly different/new
> references per version.

This wouldn't require any changes.

> Is any of these three superior/preferable?

It depends on how many packages in the Gentoo repository are expected to
use the feature.

If the answer is less than ten, then IMHO using HOMEPAGE is a reasonable
choice. If it would be at least an order of magnitude more, then we could
think about updating GLEP 68 (e.g. lift uniqueness of doc).

Ulrich

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
  2023-09-17 12:18   ` Alexander Neuwirth
  2023-09-17 15:18     ` Ulrich Mueller
@ 2023-09-17 18:28     ` Florian Schmaus
  2023-09-17 19:06       ` Alexander Neuwirth
  2023-09-17 20:55       ` Ulrich Mueller
  1 sibling, 2 replies; 7+ messages in thread
From: Florian Schmaus @ 2023-09-17 18:28 UTC (permalink / raw)
  To: gentoo-dev, Alexander Neuwirth


[-- Attachment #1.1.1: Type: text/plain, Size: 1015 bytes --]

On 17/09/2023 14.18, Alexander Neuwirth wrote:
> Thanks. Instead of using the lang entry I can imagine these other 
> approaches:
>
> 2. Adding something specific to GLEP 68, like `<upstream><reference 
> type="doi"> https...`. However that seems like a bit too much work for 
> adding something that only a small subset of users (science) cares 
> about.

<upstream>
   <reference uri='doi:10.17487/rfc6120'/>
</upstream>

sounds perfectly fine.

It would require (minor) adjustments to the schema and DTD. And besides 
that, packages that do not have a use for this information do not have 
to pay a cost. Hence, I am not sure why you assume its too much work.


> Also integration of parsing with existing tools is an extra 
> overhead.

Most XML parsers are non-strict. Which means that they ignore elements 
that they do not know. Therefore, the same argument as above can be 
made: tools that do not need to extract the information, should not 
require any adjustments.

- Flow


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
  2023-09-17 18:28     ` Florian Schmaus
@ 2023-09-17 19:06       ` Alexander Neuwirth
  2023-09-17 20:55       ` Ulrich Mueller
  1 sibling, 0 replies; 7+ messages in thread
From: Alexander Neuwirth @ 2023-09-17 19:06 UTC (permalink / raw)
  To: gentoo-dev


[-- Attachment #1.1.1.1: Type: text/plain, Size: 1217 bytes --]

On 9/17/23 20:28, Florian Schmaus wrote:
> <upstream>
>   <reference uri='doi:10.17487/rfc6120'/>
> </upstream>
>
> sounds perfectly fine.

Ideally I'd not limit it to only doi but also arxiv, zenodo, inspirehep. 
They can all be referenced by https://... . I agree a specific type is 
kind of unnecessary. However, the same paper can be referenced by all of 
them. If one wants to capture that redundancy. Could something like this 
work?

<upstream>
     <reference doi='https://doi.org/10.5281/zenodo.1169739' 
zenodo='https://zenodo.org/record/8256635'/>
     <reference inspirehep='https://inspirehep.net/literature/2598491' 
arxiv='https://arxiv.org/pdf/2211.15838' 
doi='https://doi.org/10.22323/1.414.0245'/>
</upstream>

I don't think this grouping is important though, just something useful 
one could add if one already goes for the GLEP route.

>
> Hence, I am not sure why you assume its too much work.
>
If a user wants to list the references epkginfo/equery already shows the 
homepage or doc links, but not the new reference element, right? 
Similarly, the references would need extra treatment to be directly 
shown on packages.gentoo.org.|
|

Cheers,
APN


[-- Attachment #1.1.1.2: Type: text/html, Size: 2418 bytes --]

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 8923 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers
  2023-09-17 18:28     ` Florian Schmaus
  2023-09-17 19:06       ` Alexander Neuwirth
@ 2023-09-17 20:55       ` Ulrich Mueller
  1 sibling, 0 replies; 7+ messages in thread
From: Ulrich Mueller @ 2023-09-17 20:55 UTC (permalink / raw)
  To: Florian Schmaus; +Cc: gentoo-dev, Alexander Neuwirth

[-- Attachment #1: Type: text/plain, Size: 415 bytes --]

>>>>> On Sun, 17 Sep 2023, Florian Schmaus wrote:

> <upstream>
>   <reference uri='doi:10.17487/rfc6120'/>
> </upstream>

> sounds perfectly fine.

Don't use an attribute if you can put the information in the (otherwise
empty) element. Especially, when other elements like <doc> already do it
that way.

> It would require (minor) adjustments to the schema and DTD.

Also an update of GLEP 68, in the first place.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-17 20:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15  9:31 [gentoo-dev] metadata.xml upstream docs as reference to scientific publications/papers Alexander Neuwirth
2023-09-15 10:15 ` Ulrich Mueller
2023-09-17 12:18   ` Alexander Neuwirth
2023-09-17 15:18     ` Ulrich Mueller
2023-09-17 18:28     ` Florian Schmaus
2023-09-17 19:06       ` Alexander Neuwirth
2023-09-17 20:55       ` Ulrich Mueller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox