* [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] @ 2015-11-10 23:53 Mike Frysinger 2015-11-11 1:54 ` Mike Frysinger 2015-11-11 2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger 0 siblings, 2 replies; 20+ messages in thread From: Mike Frysinger @ 2015-11-10 23:53 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 224 bytes --] i randomly stumbled across an ebuild that was using ^^ to make a variable uppercase. this is new to bash-4.0 and thus invalid for EAPI=[0-5]. only the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2. -mike [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] 2015-11-10 23:53 [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] Mike Frysinger @ 2015-11-11 1:54 ` Mike Frysinger 2015-11-11 2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger 1 sibling, 0 replies; 20+ messages in thread From: Mike Frysinger @ 2015-11-11 1:54 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 2347 bytes --] On 10 Nov 2015 18:53, Mike Frysinger wrote: > i randomly stumbled across an ebuild that was using ^^ to make a variable > uppercase. this is new to bash-4.0 and thus invalid for EAPI=[0-5]. only > the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2. fixed the ones `git grep` turned up, albeit lightly tested app-admin/yaala/yaala-0.7.3-r1.ebuild | 21 ++++++++++++--------- app-office/calligra/calligra-2.9.6.ebuild | 5 ++++- app-office/calligra/calligra-2.9.7.ebuild | 5 ++++- app-office/calligra/calligra-9999.ebuild | 5 ++++- app-text/searchmonkey/searchmonkey-2.0.0.ebuild | 5 +++-- dev-db/SchemaSync/SchemaSync-0.9.2.ebuild | 6 ++++-- dev-java/groovy/groovy-2.4.5.ebuild | 4 +++- dev-python/SchemaObject/SchemaObject-0.5.3.ebuild | 6 ++++-- dev-python/cvxopt/cvxopt-1.1.6-r2.ebuild | 8 ++++++-- dev-scheme/gauche-gl/gauche-gl-0.5.1.ebuild | 5 +++-- dev-scheme/gauche-gl/gauche-gl-0.6.ebuild | 5 +++-- dev-scheme/gauche/gauche-0.9.3.3.ebuild | 5 +++-- dev-scheme/gauche/gauche-0.9.4-r1.ebuild | 3 ++- dev-vcs/cssc/cssc-1.4.0.ebuild | 11 ++++++++--- eclass/cvs.eclass | 4 ++-- games-board/gambit/gambit-1.0.1.ebuild | 12 ++++++++---- games-board/gambit/gambit-1.0.3.ebuild | 12 ++++++++---- games-fps/xonotic/xonotic-0.8.0.ebuild | 3 ++- games-fps/xonotic/xonotic-0.8.1.ebuild | 5 +++-- media-sound/clementine/clementine-1.2.2.ebuild | 3 ++- media-sound/clementine/clementine-1.2.3.ebuild | 3 ++- media-sound/clementine/clementine-9999.ebuild | 3 ++- net-analyzer/apinger/apinger-0.4.1.ebuild | 9 +++++++-- net-dns/dnsmasq/dnsmasq-2.66.ebuild | 6 ++++-- net-dns/dnsmasq/dnsmasq-2.72-r2.ebuild | 4 +++- net-dns/dnsmasq/dnsmasq-2.75.ebuild | 4 +++- sci-mathematics/sha1-polyml/sha1-polyml-5.5.0.ebuild | 10 ++++++---- sys-devel/byfl/byfl-1.4.ebuild | 3 ++- sys-devel/byfl/byfl-9999.ebuild | 3 ++- http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2347c6fe886c6b89d93ffac0faf375a613c372b1 -mike [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-10 23:53 [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] Mike Frysinger 2015-11-11 1:54 ` Mike Frysinger @ 2015-11-11 2:51 ` Mike Frysinger 2015-11-11 4:03 ` Mike Frysinger ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: Mike Frysinger @ 2015-11-11 2:51 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1448 bytes --] On 10 Nov 2015 18:53, Mike Frysinger wrote: > i randomly stumbled across an ebuild that was using ^^ to make a variable > uppercase. this is new to bash-4.0 and thus invalid for EAPI=[0-5]. only > the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2. Arfrever highlights these are not even safe to use. bash is locale aware, so it'll apply LC_COLLATE rules when processing the ^/, casemods. while you can fix this with external programs ala: LC_COLLATE=C tr ... you can't do it with inline code like: LC_COLLATE=C SRC_URI=".../${PN^^}/..." you can if you do something like: SRC_URI=".../$(LC_COLLATE=C; echo "${PN^^}")/..." but at this point, you lose most (all?) advantage to using these in the first place: nice & tight code. not running tr in global scope is nice too, but it's better all around to just hardcode something like: MY_PN="APINGER" and be done with it. thoughts ? we could add a repoman check to detect & reject usage of it, and for the cases where the value isn't a constant, we could add a safe helper to eutils like: tolower() { LC_COLLATE=C tr '[:upper:]' '[:lower:]'; } toupper() { LC_COLLATE=C tr '[:lower:]' '[:upper:]'; } yes, i'm aware that this runs the risk of mojibake when given some UTF-8 strings, but we already have that problem, and i don't think the uses so far will hit it (as people generally feed USE flags and PN values). it would require the C.UTF-8 locale to address. -mike [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger @ 2015-11-11 4:03 ` Mike Frysinger 2015-11-11 4:16 ` Ulrich Mueller 2015-11-11 6:16 ` Patrick Lauer 2 siblings, 0 replies; 20+ messages in thread From: Mike Frysinger @ 2015-11-11 4:03 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 120 bytes --] sorry, i meant char classification here (LC_CTYPE), not collation. i've been dealing with sorting bugs lately ;). -mike [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger 2015-11-11 4:03 ` Mike Frysinger @ 2015-11-11 4:16 ` Ulrich Mueller 2015-11-11 7:16 ` René Neumann 2015-11-11 7:42 ` Mike Frysinger 2015-11-11 6:16 ` Patrick Lauer 2 siblings, 2 replies; 20+ messages in thread From: Ulrich Mueller @ 2015-11-11 4:16 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 651 bytes --] >>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote: > Arfrever highlights these are not even safe to use. bash is locale aware, > so it'll apply LC_COLLATE rules when processing the ^/, casemods. while > you can fix this with external programs ala: > LC_COLLATE=C tr ... > you can't do it with inline code like: > LC_COLLATE=C SRC_URI=".../${PN^^}/..." >>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote: > sorry, i meant char classification here (LC_CTYPE), not collation. Shouldn't these be safe to use if the string consists purely of ASCII characters? I mean, A-Z and a-z should be uppercase and lowercase, respectively, in any locale? Ulrich [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 4:16 ` Ulrich Mueller @ 2015-11-11 7:16 ` René Neumann 2015-11-11 7:37 ` Ulrich Mueller 2015-11-11 7:42 ` Mike Frysinger 1 sibling, 1 reply; 20+ messages in thread From: René Neumann @ 2015-11-11 7:16 UTC (permalink / raw To: gentoo-dev Am 11.11.2015 um 05:16 schrieb Ulrich Mueller: >>>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote: > >> Arfrever highlights these are not even safe to use. bash is locale aware, >> so it'll apply LC_COLLATE rules when processing the ^/, casemods. while >> you can fix this with external programs ala: >> LC_COLLATE=C tr ... > >> you can't do it with inline code like: >> LC_COLLATE=C SRC_URI=".../${PN^^}/..." > >>>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote: > >> sorry, i meant char classification here (LC_CTYPE), not collation. > > Shouldn't these be safe to use if the string consists purely of ASCII > characters? I mean, A-Z and a-z should be uppercase and lowercase, > respectively, in any locale? Unfortunately, no (have been bitten by this issue already some years ago): $ echo $LC_ALL tr_TR $ f=i; echo ${f^^} İ $ f=I; echo ${f,} ı - René ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 7:16 ` René Neumann @ 2015-11-11 7:37 ` Ulrich Mueller 2015-11-11 7:47 ` Mike Frysinger 0 siblings, 1 reply; 20+ messages in thread From: Ulrich Mueller @ 2015-11-11 7:37 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 743 bytes --] >>>>> On Wed, 11 Nov 2015, René Neumann wrote: >> Shouldn't these be safe to use if the string consists purely of >> ASCII characters? I mean, A-Z and a-z should be uppercase and >> lowercase, respectively, in any locale? > Unfortunately, no (have been bitten by this issue already some years > ago): > $ echo $LC_ALL > tr_TR > $ f=i; echo ${f^^} > İ > $ f=I; echo ${f,} > ı This is wrong on so many levels. :( It starts with the fact that the dot over the lowercase latin i historically never was a diacritical mark [1]. Maybe we should advise users in our documentaion that they should avoid such broken locales for ebuilds? Ulrich [1] https://commons.wikimedia.org/wiki/File:Evolution_of_minuscule.svg [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 7:37 ` Ulrich Mueller @ 2015-11-11 7:47 ` Mike Frysinger 2015-11-11 8:04 ` Ulrich Mueller 0 siblings, 1 reply; 20+ messages in thread From: Mike Frysinger @ 2015-11-11 7:47 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 892 bytes --] On 11 Nov 2015 08:37, Ulrich Mueller wrote: > >>>>> On Wed, 11 Nov 2015, René Neumann wrote: > > >> Shouldn't these be safe to use if the string consists purely of > >> ASCII characters? I mean, A-Z and a-z should be uppercase and > >> lowercase, respectively, in any locale? > > > Unfortunately, no (have been bitten by this issue already some years > > ago): > > > $ echo $LC_ALL > > tr_TR > > $ f=i; echo ${f^^} > > İ > > $ f=I; echo ${f,} > > ı > > This is wrong on so many levels. :( It starts with the fact that the > dot over the lowercase latin i historically never was a diacritical > mark [1]. > > Maybe we should advise users in our documentaion that they should > avoid such broken locales for ebuilds? i'm not sure telling people their native language is wrong is a smart move. it also would seem to cut against the purpose of the PMS. -mike [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 7:47 ` Mike Frysinger @ 2015-11-11 8:04 ` Ulrich Mueller 0 siblings, 0 replies; 20+ messages in thread From: Ulrich Mueller @ 2015-11-11 8:04 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 939 bytes --] >>>>> On Wed, 11 Nov 2015, Mike Frysinger wrote: >> This is wrong on so many levels. :( It starts with the fact that the >> dot over the lowercase latin i historically never was a diacritical >> mark [1]. >> >> Maybe we should advise users in our documentaion that they should >> avoid such broken locales for ebuilds? > i'm not sure telling people their native language is wrong is a smart > move. it also would seem to cut against the purpose of the PMS. There is of course nothing wrong with the Turkish language or writing system. However, if a language attaches meaning to the dot and uses it as a diacritical mark, then it is (IMHO) not the smartest move to encode it in a way that the letters I and i (which historically in the Latin alphabet are upper and lower case variants of each other) are reused. The sane thing would have been to encode the two Turkish i variants as "LATIN SMALL LETTER I WITH DOT ABOVE" etc. Ulrich [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 4:16 ` Ulrich Mueller 2015-11-11 7:16 ` René Neumann @ 2015-11-11 7:42 ` Mike Frysinger 1 sibling, 0 replies; 20+ messages in thread From: Mike Frysinger @ 2015-11-11 7:42 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1150 bytes --] On 11 Nov 2015 05:16, Ulrich Mueller wrote: > >>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote: > > > Arfrever highlights these are not even safe to use. bash is locale aware, > > so it'll apply LC_COLLATE rules when processing the ^/, casemods. while > > you can fix this with external programs ala: > > LC_COLLATE=C tr ... > > > you can't do it with inline code like: > > LC_COLLATE=C SRC_URI=".../${PN^^}/..." > > >>>>> On Tue, 10 Nov 2015, Mike Frysinger wrote: > > > sorry, i meant char classification here (LC_CTYPE), not collation. > > Shouldn't these be safe to use if the string consists purely of ASCII > characters? I mean, A-Z and a-z should be uppercase and lowercase, > respectively, in any locale? nope. it depends on the order of the chars in the locale and assumes the first is A and the last is Z. which not all do. $ echo {a..z} | LC_ALL=et_EE.UTF-8 sed 's:[a-z]::g' t u v w x y we could do something like the classic: tolower() { tr 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' <<<"$*"; } but that would still would not help with the bash builtins. -mike [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger 2015-11-11 4:03 ` Mike Frysinger 2015-11-11 4:16 ` Ulrich Mueller @ 2015-11-11 6:16 ` Patrick Lauer 2015-11-11 9:13 ` Michał Górny 2015-11-11 12:39 ` Ciaran McCreesh 2 siblings, 2 replies; 20+ messages in thread From: Patrick Lauer @ 2015-11-11 6:16 UTC (permalink / raw To: gentoo-dev On 11/11/2015 03:51 AM, Mike Frysinger wrote: > On 10 Nov 2015 18:53, Mike Frysinger wrote: >> i randomly stumbled across an ebuild that was using ^^ to make a variable >> uppercase. this is new to bash-4.0 and thus invalid for EAPI=[0-5]. only >> the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2. > Arfrever highlights these are not even safe to use. bash is locale aware, > so it'll apply LC_COLLATE rules when processing the ^/, casemods. while > you can fix this with external programs ala: > LC_COLLATE=C tr ... > > you can't do it with inline code like: > LC_COLLATE=C SRC_URI=".../${PN^^}/..." > > you can if you do something like: > SRC_URI=".../$(LC_COLLATE=C; echo "${PN^^}")/..." > This points out a class of problems we've hit in the past: locale-aware things in ebuilds. Wouldn't it be 'easier' (fsov easy) to have portage use sane-default locale settings, so that estonian or turkish users don't get hit by weirdness in the [a-z] character class etc.? (And as a side-effect the build logs are always readable ;) ) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 6:16 ` Patrick Lauer @ 2015-11-11 9:13 ` Michał Górny 2015-11-11 12:39 ` Ciaran McCreesh 1 sibling, 0 replies; 20+ messages in thread From: Michał Górny @ 2015-11-11 9:13 UTC (permalink / raw To: Patrick Lauer; +Cc: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1487 bytes --] On Wed, 11 Nov 2015 07:16:42 +0100 Patrick Lauer <patrick@gentoo.org> wrote: > On 11/11/2015 03:51 AM, Mike Frysinger wrote: > > On 10 Nov 2015 18:53, Mike Frysinger wrote: > >> i randomly stumbled across an ebuild that was using ^^ to make a variable > >> uppercase. this is new to bash-4.0 and thus invalid for EAPI=[0-5]. only > >> the fresh EAPI=6 permits it since we bumped the min ver to bash-4.2. > > Arfrever highlights these are not even safe to use. bash is locale aware, > > so it'll apply LC_COLLATE rules when processing the ^/, casemods. while > > you can fix this with external programs ala: > > LC_COLLATE=C tr ... > > > > you can't do it with inline code like: > > LC_COLLATE=C SRC_URI=".../${PN^^}/..." > > > > you can if you do something like: > > SRC_URI=".../$(LC_COLLATE=C; echo "${PN^^}")/..." > > > This points out a class of problems we've hit in the past: locale-aware > things in ebuilds. > > Wouldn't it be 'easier' (fsov easy) to have portage use sane-default > locale settings, so that estonian or turkish users don't get hit by > weirdness in the [a-z] character class etc.? > > (And as a side-effect the build logs are always readable ;) ) Pretty much +1 here. Not saying we need to force full locale, but having sane LC_CTYPE and LC_COLLATE would make sense. PMS already forces it in a few places... we may as well force it globally. -- Best regards, Michał Górny <http://dev.gentoo.org/~mgorny/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 949 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability 2015-11-11 6:16 ` Patrick Lauer 2015-11-11 9:13 ` Michał Górny @ 2015-11-11 12:39 ` Ciaran McCreesh 2015-11-11 15:48 ` [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Ulrich Mueller 1 sibling, 1 reply; 20+ messages in thread From: Ciaran McCreesh @ 2015-11-11 12:39 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 408 bytes --] On Wed, 11 Nov 2015 07:16:42 +0100 Patrick Lauer <patrick@gentoo.org> wrote: > Wouldn't it be 'easier' (fsov easy) to have portage use sane-default > locale settings, so that estonian or turkish users don't get hit by > weirdness in the [a-z] character class etc.? Paludis forces all the LC variables to sane values. A few vocal annoying users hate this, and patch it out... -- Ciaran McCreesh [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) 2015-11-11 12:39 ` Ciaran McCreesh @ 2015-11-11 15:48 ` Ulrich Mueller 2015-11-11 21:52 ` Jason A. Donenfeld 0 siblings, 1 reply; 20+ messages in thread From: Ulrich Mueller @ 2015-11-11 15:48 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1082 bytes --] >>>>> On Wed, 11 Nov 2015, Ciaran McCreesh wrote: > On Wed, 11 Nov 2015 07:16:42 +0100 > Patrick Lauer <patrick@gentoo.org> wrote: >> Wouldn't it be 'easier' (fsov easy) to have portage use sane-default >> locale settings, so that estonian or turkish users don't get hit by >> weirdness in the [a-z] character class etc.? > Paludis forces all the LC variables to sane values. A few vocal > annoying users hate this, and patch it out... Unfortunately, that doesn't help us, since ebuilds cannot rely on it. Should we revise EAPI 6? It hasn't been cleared for usage in the tree yet, so should be still possible. Losing such an important feature of bash-4 seems to be reason enough. (And obviously, some people had been aware of the problem. Why did nobody speak up before the spec was approved?) Paludis seems to do this: unset LANG ${!LC_*} export LC_ALL=C We could just add this to the spec. Alternatively, something less intrusive, like setting only LC_COLLATE and LC_CTYPE. We already have LC_MESSAGES=C in the base profile, per 20130813 Council decision. Ulrich [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) 2015-11-11 15:48 ` [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Ulrich Mueller @ 2015-11-11 21:52 ` Jason A. Donenfeld 2015-11-11 22:21 ` [gentoo-dev] Revise EAPI 6? Matthias Maier 2015-11-12 6:52 ` [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Martin Vaeth 0 siblings, 2 replies; 20+ messages in thread From: Jason A. Donenfeld @ 2015-11-11 21:52 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1904 bytes --] I'd be in favor of full-on LC_ALL=C. Ebuilds are meant for having a particular determinism. They're machine scripts. The operations they do need to be consistent. For user-facing parts, such as printing information, or sorting user-shown text, I can understand ebuild authors might want in some special circumstances to run a command with the user's language. For that reason, what if we did this: USER_LANG="$LANG" unset LANG ${!LC_*} export LC_ALL=C That way, ebuild writers could do: LC_ALL="$USER_LANG" einfo "Blah blah $(sort <blah)" While the rest of the actual programmatic part of the ebuild functions deterministically with LC_ALL=C. This seems like a decent compromise... On Nov 11, 2015 4:49 PM, "Ulrich Mueller" <ulm@gentoo.org> wrote: > >>>>> On Wed, 11 Nov 2015, Ciaran McCreesh wrote: > > > On Wed, 11 Nov 2015 07:16:42 +0100 > > Patrick Lauer <patrick@gentoo.org> wrote: > >> Wouldn't it be 'easier' (fsov easy) to have portage use sane-default > >> locale settings, so that estonian or turkish users don't get hit by > >> weirdness in the [a-z] character class etc.? > > > Paludis forces all the LC variables to sane values. A few vocal > > annoying users hate this, and patch it out... > > Unfortunately, that doesn't help us, since ebuilds cannot rely on it. > > Should we revise EAPI 6? It hasn't been cleared for usage in the tree > yet, so should be still possible. Losing such an important feature of > bash-4 seems to be reason enough. (And obviously, some people had been > aware of the problem. Why did nobody speak up before the spec was > approved?) > > Paludis seems to do this: > > unset LANG ${!LC_*} > export LC_ALL=C > > We could just add this to the spec. Alternatively, something less > intrusive, like setting only LC_COLLATE and LC_CTYPE. > > We already have LC_MESSAGES=C in the base profile, per 20130813 > Council decision. > > Ulrich > [-- Attachment #2: Type: text/html, Size: 2532 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] Revise EAPI 6? 2015-11-11 21:52 ` Jason A. Donenfeld @ 2015-11-11 22:21 ` Matthias Maier 2015-11-11 23:18 ` Ulrich Mueller 2015-11-12 6:52 ` [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Martin Vaeth 1 sibling, 1 reply; 20+ messages in thread From: Matthias Maier @ 2015-11-11 22:21 UTC (permalink / raw To: gentoo-dev On Wed, Nov 11, 2015, at 15:52 CST, "Jason A. Donenfeld" <zx2c4@gentoo.org> wrote: > I'd be in favor of full-on LC_ALL=C. ++ I'm surprised that we do not have such a policy already. Best, Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] Revise EAPI 6? 2015-11-11 22:21 ` [gentoo-dev] Revise EAPI 6? Matthias Maier @ 2015-11-11 23:18 ` Ulrich Mueller 2015-11-12 0:34 ` Mike Gilbert 0 siblings, 1 reply; 20+ messages in thread From: Ulrich Mueller @ 2015-11-11 23:18 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 695 bytes --] >>>>> On Wed, 11 Nov 2015, Matthias Maier wrote: > On Wed, Nov 11, 2015, at 15:52 CST, "Jason A. Donenfeld" <zx2c4@gentoo.org> wrote: >> I'd be in favor of full-on LC_ALL=C. > ++ > I'm surprised that we do not have such a policy already. LC_ALL=C would disable UTF-8, and I am told that this would cause problems for e.g. Python 3. What we would really want is C.UTF-8 [1] but that's neither a standard nor is it ready. In the meantime, we could go with the minimum changes necessary to unbreak the bash 4.2 case conversion operators. Setting LC_COLLATE to C and LC_CTYPE to some sane locale should be sufficient for that. Ulrich [1] https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] Revise EAPI 6? 2015-11-11 23:18 ` Ulrich Mueller @ 2015-11-12 0:34 ` Mike Gilbert 2015-11-12 6:24 ` Ulrich Mueller 0 siblings, 1 reply; 20+ messages in thread From: Mike Gilbert @ 2015-11-12 0:34 UTC (permalink / raw To: Gentoo Dev On Wed, Nov 11, 2015 at 6:18 PM, Ulrich Mueller <ulm@gentoo.org> wrote: >>>>>> On Wed, 11 Nov 2015, Matthias Maier wrote: > >> On Wed, Nov 11, 2015, at 15:52 CST, "Jason A. Donenfeld" <zx2c4@gentoo.org> wrote: > >>> I'd be in favor of full-on LC_ALL=C. > >> ++ > >> I'm surprised that we do not have such a policy already. > > LC_ALL=C would disable UTF-8, and I am told that this would cause > problems for e.g. Python 3. What we would really want is C.UTF-8 [1] > but that's neither a standard nor is it ready. > I can work around it in the python eclasses by adjusting the python_export_utf8_locale function, but would prefer not to do that. > In the meantime, we could go with the minimum changes necessary to > unbreak the bash 4.2 case conversion operators. Setting LC_COLLATE > to C and LC_CTYPE to some sane locale should be sufficient for that. If you want to force specific locale categories to C, I don't mind. I would just prefer that you don't mess with LC_ALL and keep LC_CTYPE to something with UTF-8 support. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [gentoo-dev] Revise EAPI 6? 2015-11-12 0:34 ` Mike Gilbert @ 2015-11-12 6:24 ` Ulrich Mueller 0 siblings, 0 replies; 20+ messages in thread From: Ulrich Mueller @ 2015-11-12 6:24 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 967 bytes --] >>>>> On Wed, 11 Nov 2015, Mike Gilbert wrote: > On Wed, Nov 11, 2015 at 6:18 PM, Ulrich Mueller <ulm@gentoo.org> wrote: >> In the meantime, we could go with the minimum changes necessary to >> unbreak the bash 4.2 case conversion operators. Setting LC_COLLATE >> to C and LC_CTYPE to some sane locale should be sufficient for that. > If you want to force specific locale categories to C, I don't mind. I > would just prefer that you don't mess with LC_ALL and keep LC_CTYPE to > something with UTF-8 support. We are thinking about adding a sentence like this: The package manager must ensure that the LC_COLLATE and LC_CTYPE locale categories are equivalent to the C locale, as far as characters in the ASCII range (U+0000 to U+007F) are concerned. Essentially this requires LC_COLLATE=C, but it permits almost anything for LC_CTYPE, except for such locales that change character categories or the upper/lowercase mapping for ASCII characters. Ulrich [-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) 2015-11-11 21:52 ` Jason A. Donenfeld 2015-11-11 22:21 ` [gentoo-dev] Revise EAPI 6? Matthias Maier @ 2015-11-12 6:52 ` Martin Vaeth 1 sibling, 0 replies; 20+ messages in thread From: Martin Vaeth @ 2015-11-12 6:52 UTC (permalink / raw To: gentoo-dev Jason A. Donenfeld <zx2c4@gentoo.org> wrote: > > I'd be in favor of full-on LC_ALL=C. Setting LC_ALL seems wrong as it is meant as a quick hack and should not be relied on by a "generic" tool like portage. Better define to *unset* LC_ALL (remembering the previous value, see below) and to set (all?) other LC_* to defined values. When we are at it: Maybe it is even sufficient to define only LC_CTYPE=C LC_NUMERIC=C LC_COLLATE=C LC_MESSAGES=C LC_MONETARY=C In any case, the old values should be kept (and for simplicity defined to the previous LC_ALL if the latter was set), so that the ebuild author is able to stick to the user's choice for certain/all values if he needs to: In particular, for LC_CTYPE, this might be necessary, because of correct UTF8-support, as already mentioned (the ebuild author cannot say LC_CTYPE=*.UTF8). But also e.g. for LC_MONETARY, this might be necessary for some strange local banking tools. It is perhaps not necessary to (re)define LANG at all: Setting LC_MESSAGES should be sufficient for most build-time stuff to get readable logs, and LANG=C might be the main reason, why some people might not like the change and decided e.g. to patch it out in paludis, as mentioned in this thread. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2015-11-12 6:52 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-10 23:53 [gentoo-dev] reminder: you cannot use bash-4.x features (e.g. ${var^^}) in EAPI=[0-5] Mike Frysinger 2015-11-11 1:54 ` Mike Frysinger 2015-11-11 2:51 ` [gentoo-dev] [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability Mike Frysinger 2015-11-11 4:03 ` Mike Frysinger 2015-11-11 4:16 ` Ulrich Mueller 2015-11-11 7:16 ` René Neumann 2015-11-11 7:37 ` Ulrich Mueller 2015-11-11 7:47 ` Mike Frysinger 2015-11-11 8:04 ` Ulrich Mueller 2015-11-11 7:42 ` Mike Frysinger 2015-11-11 6:16 ` Patrick Lauer 2015-11-11 9:13 ` Michał Górny 2015-11-11 12:39 ` Ciaran McCreesh 2015-11-11 15:48 ` [gentoo-dev] Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Ulrich Mueller 2015-11-11 21:52 ` Jason A. Donenfeld 2015-11-11 22:21 ` [gentoo-dev] Revise EAPI 6? Matthias Maier 2015-11-11 23:18 ` Ulrich Mueller 2015-11-12 0:34 ` Mike Gilbert 2015-11-12 6:24 ` Ulrich Mueller 2015-11-12 6:52 ` [gentoo-dev] Re: Revise EAPI 6? (was: [RFC] ban use of base-4 casemods in ebuilds due to locale collation instability) Martin Vaeth
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox