public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5]
@ 2015-11-10 23:53 Mike Frysinger
  2015-11-11  1:54 ` Mike Frysinger
  2015-11-11  2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Frysinger @ 2015-11-10 23:53 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 224 bytes --]

i randomly stumbled across an ebuild that was using ^^ to make a variable
uppercase.  this is new to bash-4.0 and thus invalid for EAPI=[0-5].  only
the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5]
  2015-11-10 23:53 [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] Mike Frysinger
@ 2015-11-11  1:54 ` Mike Frysinger
  2015-11-11  2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Frysinger @ 2015-11-11  1:54 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

On 10 Nov 2015 18:53, Mike Frysinger wrote:
> i randomly stumbled across an ebuild that was using ^^ to make a variable
> uppercase.  this is new to bash-4.0 and thus invalid for EAPI=[0-5].  only
> the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2.

fixed the ones `git grep` turned up, albeit lightly tested

 app-admin/yaala/yaala-0.7.3-r1.ebuild                | 21 ++++++++++++---------
 app-office/calligra/calligra-2.9.6.ebuild            |  5 ++++-
 app-office/calligra/calligra-2.9.7.ebuild            |  5 ++++-
 app-office/calligra/calligra-9999.ebuild             |  5 ++++-
 app-text/searchmonkey/searchmonkey-2.0.0.ebuild      |  5 +++--
 dev-db/SchemaSync/SchemaSync-0.9.2.ebuild            |  6 ++++--
 dev-java/groovy/groovy-2.4.5.ebuild                  |  4 +++-
 dev-python/SchemaObject/SchemaObject-0.5.3.ebuild    |  6 ++++--
 dev-python/cvxopt/cvxopt-1.1.6-r2.ebuild             |  8 ++++++--
 dev-scheme/gauche-gl/gauche-gl-0.5.1.ebuild          |  5 +++--
 dev-scheme/gauche-gl/gauche-gl-0.6.ebuild            |  5 +++--
 dev-scheme/gauche/gauche-0.9.3.3.ebuild              |  5 +++--
 dev-scheme/gauche/gauche-0.9.4-r1.ebuild             |  3 ++-
 dev-vcs/cssc/cssc-1.4.0.ebuild                       | 11 ++++++++---
 eclass/cvs.eclass                                    |  4 ++--
 games-board/gambit/gambit-1.0.1.ebuild               | 12 ++++++++----
 games-board/gambit/gambit-1.0.3.ebuild               | 12 ++++++++----
 games-fps/xonotic/xonotic-0.8.0.ebuild               |  3 ++-
 games-fps/xonotic/xonotic-0.8.1.ebuild               |  5 +++--
 media-sound/clementine/clementine-1.2.2.ebuild       |  3 ++-
 media-sound/clementine/clementine-1.2.3.ebuild       |  3 ++-
 media-sound/clementine/clementine-9999.ebuild        |  3 ++-
 net-analyzer/apinger/apinger-0.4.1.ebuild            |  9 +++++++--
 net-dns/dnsmasq/dnsmasq-2.66.ebuild                  |  6 ++++--
 net-dns/dnsmasq/dnsmasq-2.72-r2.ebuild               |  4 +++-
 net-dns/dnsmasq/dnsmasq-2.75.ebuild                  |  4 +++-
 sci-mathematics/sha1-polyml/sha1-polyml-5.5.0.ebuild | 10 ++++++----
 sys-devel/byfl/byfl-1.4.ebuild                       |  3 ++-
 sys-devel/byfl/byfl-9999.ebuild                      |  3 ++-

http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2347c6fe886c6b89d93ffac0faf375a613c372b1
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-10 23:53 [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] Mike Frysinger
  2015-11-11  1:54 ` Mike Frysinger
@ 2015-11-11  2:51 ` Mike Frysinger
  2015-11-11  4:03   ` Mike Frysinger
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Mike Frysinger @ 2015-11-11  2:51 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

On 10 Nov 2015 18:53, Mike Frysinger wrote:
> i randomly stumbled across an ebuild that was using ^^ to make a variable
> uppercase.  this is new to bash-4.0 and thus invalid for EAPI=[0-5].  only
> the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2.

Arfrever highlights these are not even safe to use.  bash is locale aware,
so it'll apply LC_COLLATE rules when processing the ^/, casemods.  while
you can fix this with external programs ala:
	LC_COLLATE=C tr ...

you can't do it with inline code like:
	LC_COLLATE=C SRC_URI=".../${PN^^}/..."

you can if you do something like:
	SRC_URI=".../$(LC_COLLATE=C; echo "${PN^^}")/..."

but at this point, you lose most (all?) advantage to using these in the first
place: nice & tight code.  not running tr in global scope is nice too, but it's
better all around to just hardcode something like:
	MY_PN="APINGER"
and be done with it.

thoughts ?  we could add a repoman check to detect & reject usage of it, and
for the cases where the value isn't a constant, we could add a safe helper to
eutils like:
	tolower() { LC_COLLATE=C tr '[:upper:]' '[:lower:]'; }
	toupper() { LC_COLLATE=C tr '[:lower:]' '[:upper:]'; }

yes, i'm aware that this runs the risk of mojibake when given some UTF-8
strings, but we already have that problem, and i don't think the uses so
far will hit it (as people generally feed USE flags and PN values).  it
would require the C.UTF-8 locale to address.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger
@ 2015-11-11  4:03   ` Mike Frysinger
  2015-11-11  4:16   ` Ulrich Mueller
  2015-11-11  6:16   ` Patrick Lauer
  2 siblings, 0 replies; 20+ messages in thread
From: Mike Frysinger @ 2015-11-11  4:03 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 120 bytes --]

sorry, i meant char classification here (LC_CTYPE), not collation.
i've been dealing with sorting bugs lately ;).
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger
  2015-11-11  4:03   ` Mike Frysinger
@ 2015-11-11  4:16   ` Ulrich Mueller
  2015-11-11  7:16     ` René Neumann
  2015-11-11  7:42     ` Mike Frysinger
  2015-11-11  6:16   ` Patrick Lauer
  2 siblings, 2 replies; 20+ messages in thread
From: Ulrich Mueller @ 2015-11-11  4:16 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

>>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote:

> Arfrever highlights these are not even safe to use.  bash is locale aware,
> so it'll apply LC_COLLATE rules when processing the ^/, casemods.  while
> you can fix this with external programs ala:
> 	LC_COLLATE=C tr ...

> you can't do it with inline code like:
> 	LC_COLLATE=C SRC_URI=".../${PN^^}/..."

>>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote:

> sorry, i meant char classification here (LC_CTYPE), not collation.

Shouldn't these be safe to use if the string consists purely of ASCII
characters? I mean, A-Z and a-z should be uppercase and lowercase,
respectively, in any locale?

Ulrich

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger
  2015-11-11  4:03   ` Mike Frysinger
  2015-11-11  4:16   ` Ulrich Mueller
@ 2015-11-11  6:16   ` Patrick Lauer
  2015-11-11  9:13     ` Michał Górny
  2015-11-11 12:39     ` Ciaran McCreesh
  2 siblings, 2 replies; 20+ messages in thread
From: Patrick Lauer @ 2015-11-11  6:16 UTC (permalink / raw
  To: gentoo-dev



On 11/11/2015 03:51 AM, Mike Frysinger wrote:
> On 10 Nov 2015 18:53, Mike Frysinger wrote:
>> i randomly stumbled across an ebuild that was using ^^ to make a variable
>> uppercase.  this is new to bash-4.0 and thus invalid for EAPI=[0-5].  only
>> the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2.
> Arfrever highlights these are not even safe to use.  bash is locale aware,
> so it'll apply LC_COLLATE rules when processing the ^/, casemods.  while
> you can fix this with external programs ala:
> 	LC_COLLATE=C tr ...
>
> you can't do it with inline code like:
> 	LC_COLLATE=C SRC_URI=".../${PN^^}/..."
>
> you can if you do something like:
> 	SRC_URI=".../$(LC_COLLATE=C; echo "${PN^^}")/..."
>
This points out a class of problems we've hit in the past: locale-aware
things in ebuilds.

Wouldn't it be 'easier' (fsov easy) to have portage use sane-default
locale settings, so that estonian or turkish users don't get hit by
weirdness in the [a-z] character class etc.?

(And as a side-effect the build logs are always readable ;) )


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  4:16   ` Ulrich Mueller
@ 2015-11-11  7:16     ` René Neumann
  2015-11-11  7:37       ` Ulrich Mueller
  2015-11-11  7:42     ` Mike Frysinger
  1 sibling, 1 reply; 20+ messages in thread
From: René Neumann @ 2015-11-11  7:16 UTC (permalink / raw
  To: gentoo-dev



Am 11.11.2015 um 05:16 schrieb Ulrich Mueller:
>>>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote:
> 
>> Arfrever highlights these are not even safe to use.  bash is locale aware,
>> so it'll apply LC_COLLATE rules when processing the ^/, casemods.  while
>> you can fix this with external programs ala:
>> 	LC_COLLATE=C tr ...
> 
>> you can't do it with inline code like:
>> 	LC_COLLATE=C SRC_URI=".../${PN^^}/..."
> 
>>>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote:
> 
>> sorry, i meant char classification here (LC_CTYPE), not collation.
> 
> Shouldn't these be safe to use if the string consists purely of ASCII
> characters? I mean, A-Z and a-z should be uppercase and lowercase,
> respectively, in any locale?

Unfortunately, no (have been bitten by this issue already some years ago):

$ echo $LC_ALL
tr_TR
$ f=i; echo ${f^^}
İ
$ f=I; echo ${f,}
ı

- René



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  7:16     ` René Neumann
@ 2015-11-11  7:37       ` Ulrich Mueller
  2015-11-11  7:47         ` Mike Frysinger
  0 siblings, 1 reply; 20+ messages in thread
From: Ulrich Mueller @ 2015-11-11  7:37 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

>>>>> On Wed, 11 Nov 2015, René Neumann wrote:

>> Shouldn't these be safe to use if the string consists purely of
>> ASCII characters? I mean, A-Z and a-z should be uppercase and
>> lowercase, respectively, in any locale?

> Unfortunately, no (have been bitten by this issue already some years
> ago):

> $ echo $LC_ALL
> tr_TR
> $ f=i; echo ${f^^}
> İ
> $ f=I; echo ${f,}
> ı

This is wrong on so many levels. :( It starts with the fact that the
dot over the lowercase latin i historically never was a diacritical
mark [1].

Maybe we should advise users in our documentaion that they should
avoid such broken locales for ebuilds?

Ulrich


[1] https://commons.wikimedia.org/wiki/File:Evolution_of_minuscule.svg

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  4:16   ` Ulrich Mueller
  2015-11-11  7:16     ` René Neumann
@ 2015-11-11  7:42     ` Mike Frysinger
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Frysinger @ 2015-11-11  7:42 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

On 11 Nov 2015 05:16, Ulrich Mueller wrote:
> >>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote:
> 
> > Arfrever highlights these are not even safe to use.  bash is locale aware,
> > so it'll apply LC_COLLATE rules when processing the ^/, casemods.  while
> > you can fix this with external programs ala:
> > 	LC_COLLATE=C tr ...
> 
> > you can't do it with inline code like:
> > 	LC_COLLATE=C SRC_URI=".../${PN^^}/..."
> 
> >>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote:
> 
> > sorry, i meant char classification here (LC_CTYPE), not collation.
> 
> Shouldn't these be safe to use if the string consists purely of ASCII
> characters? I mean, A-Z and a-z should be uppercase and lowercase,
> respectively, in any locale?

nope.  it depends on the order of the chars in the locale and assumes
the first is A and the last is Z.  which not all do.
$ echo {a..z} | LC_ALL=et_EE.UTF-8 sed 's:[a-z]::g'
                   t u v w x y 

we could do something like the classic:
tolower() { tr 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' <<<"$*"; }

but that would still would not help with the bash builtins.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  7:37       ` Ulrich Mueller
@ 2015-11-11  7:47         ` Mike Frysinger
  2015-11-11  8:04           ` Ulrich Mueller
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Frysinger @ 2015-11-11  7:47 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On 11 Nov 2015 08:37, Ulrich Mueller wrote:
> >>>>> On Wed, 11 Nov 2015, René Neumann wrote:
> 
> >> Shouldn't these be safe to use if the string consists purely of
> >> ASCII characters? I mean, A-Z and a-z should be uppercase and
> >> lowercase, respectively, in any locale?
> 
> > Unfortunately, no (have been bitten by this issue already some years
> > ago):
> 
> > $ echo $LC_ALL
> > tr_TR
> > $ f=i; echo ${f^^}
> > İ
> > $ f=I; echo ${f,}
> > ı
> 
> This is wrong on so many levels. :( It starts with the fact that the
> dot over the lowercase latin i historically never was a diacritical
> mark [1].
> 
> Maybe we should advise users in our documentaion that they should
> avoid such broken locales for ebuilds?

i'm not sure telling people their native language is wrong is a smart
move.  it also would seem to cut against the purpose of the PMS.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  7:47         ` Mike Frysinger
@ 2015-11-11  8:04           ` Ulrich Mueller
  0 siblings, 0 replies; 20+ messages in thread
From: Ulrich Mueller @ 2015-11-11  8:04 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

>>>>> On Wed, 11 Nov 2015, Mike Frysinger wrote:

>> This is wrong on so many levels. :( It starts with the fact that the
>> dot over the lowercase latin i historically never was a diacritical
>> mark [1].
>> 
>> Maybe we should advise users in our documentaion that they should
>> avoid such broken locales for ebuilds?

> i'm not sure telling people their native language is wrong is a smart
> move.  it also would seem to cut against the purpose of the PMS.

There is of course nothing wrong with the Turkish language or writing
system.

However, if a language attaches meaning to the dot and uses it as a
diacritical mark, then it is (IMHO) not the smartest move to encode it
in a way that the letters I and i (which historically in the Latin
alphabet are upper and lower case variants of each other) are reused.
The sane thing would have been to encode the two Turkish i variants as
"LATIN SMALL LETTER I WITH DOT ABOVE" etc.

Ulrich

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  6:16   ` Patrick Lauer
@ 2015-11-11  9:13     ` Michał Górny
  2015-11-11 12:39     ` Ciaran McCreesh
  1 sibling, 0 replies; 20+ messages in thread
From: Michał Górny @ 2015-11-11  9:13 UTC (permalink / raw
  To: Patrick Lauer; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1487 bytes --]

On Wed, 11 Nov 2015 07:16:42 +0100
Patrick Lauer <patrick@gentoo.org> wrote:

> On 11/11/2015 03:51 AM, Mike Frysinger wrote:
> > On 10 Nov 2015 18:53, Mike Frysinger wrote:  
> >> i randomly stumbled across an ebuild that was using ^^ to make a variable
> >> uppercase.  this is new to bash-4.0 and thus invalid for EAPI=[0-5].  only
> >> the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2.  
> > Arfrever highlights these are not even safe to use.  bash is locale aware,
> > so it'll apply LC_COLLATE rules when processing the ^/, casemods.  while
> > you can fix this with external programs ala:
> > 	LC_COLLATE=C tr ...
> >
> > you can't do it with inline code like:
> > 	LC_COLLATE=C SRC_URI=".../${PN^^}/..."
> >
> > you can if you do something like:
> > 	SRC_URI=".../$(LC_COLLATE=C; echo "${PN^^}")/..."
> >  
> This points out a class of problems we've hit in the past: locale-aware
> things in ebuilds.
> 
> Wouldn't it be 'easier' (fsov easy) to have portage use sane-default
> locale settings, so that estonian or turkish users don't get hit by
> weirdness in the [a-z] character class etc.?
> 
> (And as a side-effect the build logs are always readable ;) )

Pretty much +1 here. Not saying we need to force full locale, but
having sane LC_CTYPE and LC_COLLATE would make sense. PMS already
forces it in a few places... we may as well force it globally.

-- 
Best regards,
Michał Górny
<http://dev.gentoo.org/~mgorny/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 949 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability
  2015-11-11  6:16   ` Patrick Lauer
  2015-11-11  9:13     ` Michał Górny
@ 2015-11-11 12:39     ` Ciaran McCreesh
  2015-11-11 15:48       ` [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Ulrich Mueller
  1 sibling, 1 reply; 20+ messages in thread
From: Ciaran McCreesh @ 2015-11-11 12:39 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

On Wed, 11 Nov 2015 07:16:42 +0100
Patrick Lauer <patrick@gentoo.org> wrote:
> Wouldn't it be 'easier' (fsov easy) to have portage use sane-default
> locale settings, so that estonian or turkish users don't get hit by
> weirdness in the [a-z] character class etc.?

Paludis forces all the LC variables to sane values. A few vocal
annoying users hate this, and patch it out...

-- 
Ciaran McCreesh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability)
  2015-11-11 12:39     ` Ciaran McCreesh
@ 2015-11-11 15:48       ` Ulrich Mueller
  2015-11-11 21:52         ` Jason A. Donenfeld
  0 siblings, 1 reply; 20+ messages in thread
From: Ulrich Mueller @ 2015-11-11 15:48 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

>>>>> On Wed, 11 Nov 2015, Ciaran McCreesh wrote:

> On Wed, 11 Nov 2015 07:16:42 +0100
> Patrick Lauer <patrick@gentoo.org> wrote:
>> Wouldn't it be 'easier' (fsov easy) to have portage use sane-default
>> locale settings, so that estonian or turkish users don't get hit by
>> weirdness in the [a-z] character class etc.?

> Paludis forces all the LC variables to sane values. A few vocal
> annoying users hate this, and patch it out...

Unfortunately, that doesn't help us, since ebuilds cannot rely on it.

Should we revise EAPI 6? It hasn't been cleared for usage in the tree
yet, so should be still possible. Losing such an important feature of
bash-4 seems to be reason enough. (And obviously, some people had been
aware of the problem. Why did nobody speak up before the spec was
approved?)

Paludis seems to do this:

    unset LANG ${!LC_*}
    export LC_ALL=C

We could just add this to the spec. Alternatively, something less
intrusive, like setting only LC_COLLATE and LC_CTYPE.

We already have LC_MESSAGES=C in the base profile, per 20130813
Council decision.

Ulrich

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability)
  2015-11-11 15:48       ` [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Ulrich Mueller
@ 2015-11-11 21:52         ` Jason A. Donenfeld
  2015-11-11 22:21           ` [gentoo-dev] Revise EAPI 6? Matthias Maier
  2015-11-12  6:52           ` [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Martin Vaeth
  0 siblings, 2 replies; 20+ messages in thread
From: Jason A. Donenfeld @ 2015-11-11 21:52 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

I'd be in favor of full-on LC_ALL=C. Ebuilds are meant for having a
particular determinism. They're machine scripts. The operations they do
need to be consistent.

For user-facing parts, such as printing information, or sorting user-shown
text, I can understand ebuild authors might want in some special
circumstances to run a command with the user's language. For that reason,
what if we did this:

    USER_LANG="$LANG"
    unset LANG ${!LC_*}
    export LC_ALL=C

That way, ebuild writers could do:

LC_ALL="$USER_LANG" einfo "Blah blah $(sort <blah)"

While the rest of the actual programmatic part of the ebuild functions
deterministically with LC_ALL=C.

This seems like a decent compromise...
On Nov 11, 2015 4:49 PM, "Ulrich Mueller" <ulm@gentoo.org> wrote:

> >>>>> On Wed, 11 Nov 2015, Ciaran McCreesh wrote:
>
> > On Wed, 11 Nov 2015 07:16:42 +0100
> > Patrick Lauer <patrick@gentoo.org> wrote:
> >> Wouldn't it be 'easier' (fsov easy) to have portage use sane-default
> >> locale settings, so that estonian or turkish users don't get hit by
> >> weirdness in the [a-z] character class etc.?
>
> > Paludis forces all the LC variables to sane values. A few vocal
> > annoying users hate this, and patch it out...
>
> Unfortunately, that doesn't help us, since ebuilds cannot rely on it.
>
> Should we revise EAPI 6? It hasn't been cleared for usage in the tree
> yet, so should be still possible. Losing such an important feature of
> bash-4 seems to be reason enough. (And obviously, some people had been
> aware of the problem. Why did nobody speak up before the spec was
> approved?)
>
> Paludis seems to do this:
>
>     unset LANG ${!LC_*}
>     export LC_ALL=C
>
> We could just add this to the spec. Alternatively, something less
> intrusive, like setting only LC_COLLATE and LC_CTYPE.
>
> We already have LC_MESSAGES=C in the base profile, per 20130813
> Council decision.
>
> Ulrich
>

[-- Attachment #2: Type: text/html, Size: 2532 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] Revise EAPI 6?
  2015-11-11 21:52         ` Jason A. Donenfeld
@ 2015-11-11 22:21           ` Matthias Maier
  2015-11-11 23:18             ` Ulrich Mueller
  2015-11-12  6:52           ` [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Martin Vaeth
  1 sibling, 1 reply; 20+ messages in thread
From: Matthias Maier @ 2015-11-11 22:21 UTC (permalink / raw
  To: gentoo-dev


On Wed, Nov 11, 2015, at 15:52 CST, "Jason A. Donenfeld" <zx2c4@gentoo.org> wrote:

> I'd be in favor of full-on LC_ALL=C.

++


I'm surprised that we do not have such a policy already.

Best,
Matthias


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] Revise EAPI 6?
  2015-11-11 22:21           ` [gentoo-dev] Revise EAPI 6? Matthias Maier
@ 2015-11-11 23:18             ` Ulrich Mueller
  2015-11-12  0:34               ` Mike Gilbert
  0 siblings, 1 reply; 20+ messages in thread
From: Ulrich Mueller @ 2015-11-11 23:18 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

>>>>> On Wed, 11 Nov 2015, Matthias Maier wrote:

> On Wed, Nov 11, 2015, at 15:52 CST, "Jason A. Donenfeld" <zx2c4@gentoo.org> wrote:

>> I'd be in favor of full-on LC_ALL=C.

> ++

> I'm surprised that we do not have such a policy already.

LC_ALL=C would disable UTF-8, and I am told that this would cause
problems for e.g. Python 3. What we would really want is C.UTF-8 [1]
but that's neither a standard nor is it ready.

In the meantime, we could go with the minimum changes necessary to
unbreak the bash 4.2 case conversion operators. Setting LC_COLLATE
to C and LC_CTYPE to some sane locale should be sufficient for that.

Ulrich


[1] https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] Revise EAPI 6?
  2015-11-11 23:18             ` Ulrich Mueller
@ 2015-11-12  0:34               ` Mike Gilbert
  2015-11-12  6:24                 ` Ulrich Mueller
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Gilbert @ 2015-11-12  0:34 UTC (permalink / raw
  To: Gentoo Dev

On Wed, Nov 11, 2015 at 6:18 PM, Ulrich Mueller <ulm@gentoo.org> wrote:
>>>>>> On Wed, 11 Nov 2015, Matthias Maier wrote:
>
>> On Wed, Nov 11, 2015, at 15:52 CST, "Jason A. Donenfeld" <zx2c4@gentoo.org> wrote:
>
>>> I'd be in favor of full-on LC_ALL=C.
>
>> ++
>
>> I'm surprised that we do not have such a policy already.
>
> LC_ALL=C would disable UTF-8, and I am told that this would cause
> problems for e.g. Python 3. What we would really want is C.UTF-8 [1]
> but that's neither a standard nor is it ready.
>

I can work around it in the python eclasses by adjusting the
python_export_utf8_locale function, but would prefer not to do that.

> In the meantime, we could go with the minimum changes necessary to
> unbreak the bash 4.2 case conversion operators. Setting LC_COLLATE
> to C and LC_CTYPE to some sane locale should be sufficient for that.

If you want to force specific locale categories to C, I don't mind. I
would just prefer that you don't mess with LC_ALL and keep LC_CTYPE to
something with UTF-8 support.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-dev] Revise EAPI 6?
  2015-11-12  0:34               ` Mike Gilbert
@ 2015-11-12  6:24                 ` Ulrich Mueller
  0 siblings, 0 replies; 20+ messages in thread
From: Ulrich Mueller @ 2015-11-12  6:24 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

>>>>> On Wed, 11 Nov 2015, Mike Gilbert wrote:

> On Wed, Nov 11, 2015 at 6:18 PM, Ulrich Mueller <ulm@gentoo.org> wrote:
>> In the meantime, we could go with the minimum changes necessary to
>> unbreak the bash 4.2 case conversion operators. Setting LC_COLLATE
>> to C and LC_CTYPE to some sane locale should be sufficient for that.

> If you want to force specific locale categories to C, I don't mind. I
> would just prefer that you don't mess with LC_ALL and keep LC_CTYPE to
> something with UTF-8 support.

We are thinking about adding a sentence like this:

   The package manager must ensure that the LC_COLLATE and LC_CTYPE
   locale categories are equivalent to the C locale, as far as
   characters in the ASCII range (U+0000 to U+007F) are concerned.

Essentially this requires LC_COLLATE=C, but it permits almost anything
for LC_CTYPE, except for such locales that change character categories
or the upper/lowercase mapping for ASCII characters.

Ulrich

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability)
  2015-11-11 21:52         ` Jason A. Donenfeld
  2015-11-11 22:21           ` [gentoo-dev] Revise EAPI 6? Matthias Maier
@ 2015-11-12  6:52           ` Martin Vaeth
  1 sibling, 0 replies; 20+ messages in thread
From: Martin Vaeth @ 2015-11-12  6:52 UTC (permalink / raw
  To: gentoo-dev

Jason A. Donenfeld <zx2c4@gentoo.org> wrote:
>
> I'd be in favor of full-on LC_ALL=C.

Setting LC_ALL seems wrong as it is meant as a quick hack
and should not be relied on by a "generic" tool like portage.

Better define to *unset* LC_ALL (remembering the previous value,
see below) and to set (all?) other LC_* to defined values.

When we are at it: Maybe it is even sufficient to define only
LC_CTYPE=C
LC_NUMERIC=C
LC_COLLATE=C
LC_MESSAGES=C
LC_MONETARY=C

In any case, the old values should be kept (and for simplicity
defined to the previous LC_ALL if the latter was set),
so that the ebuild author is able to stick to the user's
choice for certain/all values if he needs to:

In particular, for LC_CTYPE, this might be necessary,
because of correct UTF8-support, as already mentioned
(the ebuild author cannot say LC_CTYPE=*.UTF8).
But also e.g. for LC_MONETARY, this might be necessary for some
strange local banking tools.

It is perhaps not necessary to (re)define LANG at all:
Setting LC_MESSAGES should be sufficient for most build-time
stuff to get readable logs, and LANG=C might be the main reason,
why some people might not like the change and decided e.g.
to patch it out in paludis, as mentioned in this thread.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-11-12  6:52 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-10 23:53 [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] Mike Frysinger
2015-11-11  1:54 ` Mike Frysinger
2015-11-11  2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger
2015-11-11  4:03   ` Mike Frysinger
2015-11-11  4:16   ` Ulrich Mueller
2015-11-11  7:16     ` René Neumann
2015-11-11  7:37       ` Ulrich Mueller
2015-11-11  7:47         ` Mike Frysinger
2015-11-11  8:04           ` Ulrich Mueller
2015-11-11  7:42     ` Mike Frysinger
2015-11-11  6:16   ` Patrick Lauer
2015-11-11  9:13     ` Michał Górny
2015-11-11 12:39     ` Ciaran McCreesh
2015-11-11 15:48       ` [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Ulrich Mueller
2015-11-11 21:52         ` Jason A. Donenfeld
2015-11-11 22:21           ` [gentoo-dev] Revise EAPI 6? Matthias Maier
2015-11-11 23:18             ` Ulrich Mueller
2015-11-12  0:34               ` Mike Gilbert
2015-11-12  6:24                 ` Ulrich Mueller
2015-11-12  6:52           ` [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Martin Vaeth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox