* [gentoo-user] export LC_CTYPE=en_US.UTF-8
@ 2013-08-05 18:25 Chris Stankevitz
2013-08-05 18:53 ` Mike Gilbert
0 siblings, 1 reply; 19+ messages in thread
From: Chris Stankevitz @ 2013-08-05 18:25 UTC (permalink / raw
To: gentoo-user@lists.gentoo.org
Hello,
I am using svn to update a repository. Somebody added files to the
repository with weird characters in the filename. SVN refuses to
update the respository unless I first:
export LC_CTYPE=en_US.UTF-8
I don't know or really care what that mumbo jumbo means, but I would
like an answer to this question:
Is my gentoo system properly setup? If not, what step did I miss that
is causing svn to want me to export LC_CTYPE?
I suspect either my gentoo system is messed up or svn is messed up.
Thank you,
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-05 18:25 [gentoo-user] export LC_CTYPE=en_US.UTF-8 Chris Stankevitz
@ 2013-08-05 18:53 ` Mike Gilbert
2013-08-05 18:57 ` Bruce Hill
2013-08-05 22:52 ` Chris Stankevitz
0 siblings, 2 replies; 19+ messages in thread
From: Mike Gilbert @ 2013-08-05 18:53 UTC (permalink / raw
To: gentoo-user
On Mon, Aug 5, 2013 at 2:25 PM, Chris Stankevitz
<chrisstankevitz@gmail.com> wrote:
> Hello,
>
> I am using svn to update a repository. Somebody added files to the
> repository with weird characters in the filename. SVN refuses to
> update the respository unless I first:
>
> export LC_CTYPE=en_US.UTF-8
>
> I don't know or really care what that mumbo jumbo means, but I would
> like an answer to this question:
>
> Is my gentoo system properly setup? If not, what step did I miss that
> is causing svn to want me to export LC_CTYPE?
>
> I suspect either my gentoo system is messed up or svn is messed up.
>
Sparing you the details as requested: In general, you want to be using
a locale that ends with ".UTF-8" to avoid encoding issues with
software like python and subversion.
The handbook documents setting a system-wide default locale. You
generally do this by setting the LANG variable in
/etc/conf.d/02locale.
http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-05 18:53 ` Mike Gilbert
@ 2013-08-05 18:57 ` Bruce Hill
2013-08-05 21:17 ` Mike Gilbert
2013-08-05 22:52 ` Chris Stankevitz
1 sibling, 1 reply; 19+ messages in thread
From: Bruce Hill @ 2013-08-05 18:57 UTC (permalink / raw
To: gentoo-user
On Mon, Aug 05, 2013 at 02:53:11PM -0400, Mike Gilbert wrote:
> On Mon, Aug 5, 2013 at 2:25 PM, Chris Stankevitz
> <chrisstankevitz@gmail.com> wrote:
> > Hello,
> >
> > I am using svn to update a repository. Somebody added files to the
> > repository with weird characters in the filename. SVN refuses to
> > update the respository unless I first:
> >
> > export LC_CTYPE=en_US.UTF-8
> >
> > I don't know or really care what that mumbo jumbo means, but I would
> > like an answer to this question:
> >
> > Is my gentoo system properly setup? If not, what step did I miss that
> > is causing svn to want me to export LC_CTYPE?
> >
> > I suspect either my gentoo system is messed up or svn is messed up.
> >
>
> Sparing you the details as requested: In general, you want to be using
> a locale that ends with ".UTF-8" to avoid encoding issues with
> software like python and subversion.
>
> The handbook documents setting a system-wide default locale. You
> generally do this by setting the LANG variable in
> /etc/conf.d/02locale.
>
> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3
Without looking, shouldn't that be /etc/env.d/02locale ?
--
Happy Penguin Computers >')
126 Fenco Drive ( \
Tupelo, MS 38801 ^^
support@happypenguincomputers.com
662-269-2706 662-205-6424
http://happypenguincomputers.com/
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-05 18:57 ` Bruce Hill
@ 2013-08-05 21:17 ` Mike Gilbert
0 siblings, 0 replies; 19+ messages in thread
From: Mike Gilbert @ 2013-08-05 21:17 UTC (permalink / raw
To: gentoo-user
On Mon, Aug 5, 2013 at 2:57 PM, Bruce Hill
<daddy@happypenguincomputers.com> wrote:
> On Mon, Aug 05, 2013 at 02:53:11PM -0400, Mike Gilbert wrote:
>> On Mon, Aug 5, 2013 at 2:25 PM, Chris Stankevitz
>> <chrisstankevitz@gmail.com> wrote:
>> > Hello,
>> >
>> > I am using svn to update a repository. Somebody added files to the
>> > repository with weird characters in the filename. SVN refuses to
>> > update the respository unless I first:
>> >
>> > export LC_CTYPE=en_US.UTF-8
>> >
>> > I don't know or really care what that mumbo jumbo means, but I would
>> > like an answer to this question:
>> >
>> > Is my gentoo system properly setup? If not, what step did I miss that
>> > is causing svn to want me to export LC_CTYPE?
>> >
>> > I suspect either my gentoo system is messed up or svn is messed up.
>> >
>>
>> Sparing you the details as requested: In general, you want to be using
>> a locale that ends with ".UTF-8" to avoid encoding issues with
>> software like python and subversion.
>>
>> The handbook documents setting a system-wide default locale. You
>> generally do this by setting the LANG variable in
>> /etc/conf.d/02locale.
>>
>> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3
>
> Without looking, shouldn't that be /etc/env.d/02locale ?
Yes.
Or /etc/locale.conf if you're on systemd.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-05 18:53 ` Mike Gilbert
2013-08-05 18:57 ` Bruce Hill
@ 2013-08-05 22:52 ` Chris Stankevitz
2013-08-05 23:25 ` [gentoo-user] " Kai Krakow
` (2 more replies)
1 sibling, 3 replies; 19+ messages in thread
From: Chris Stankevitz @ 2013-08-05 22:52 UTC (permalink / raw
To: gentoo-user@lists.gentoo.org
On Mon, Aug 5, 2013 at 11:53 AM, Mike Gilbert <floppym@gentoo.org> wrote:
> The handbook documents setting a system-wide default locale. You
> generally do this by setting the LANG variable in
> /etc/conf.d/02locale.
>
> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3
Mike,
Thank you for your help. I attempted to follow these instructions and
ran into three problems. Can you please confirm the fixes I employed
to deal with each of these issues:
1. The handbook suggests I should modify the file /etc/env.d/02locale,
but that file does not exist on my system. RESOLUTION: create the
file
2. The handbook suggests I should add this line to
/etc/env.d/02locale: 'LANG="de_DE.UTF-8"', but I do not speak the
language "DE". RESOLUTION: type instead 'LANG="en_US.UTF-8"' to match
/etc/locale.gen
3. The handbook suggests that I should add this line to
/etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are
again talking about the language "DE". RESOLUTION: I assumed
LC_COLLATE=C refers to english and added the line without
modification.
Thank you again for your help,
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* [gentoo-user] Re: export LC_CTYPE=en_US.UTF-8
2013-08-05 22:52 ` Chris Stankevitz
@ 2013-08-05 23:25 ` Kai Krakow
2013-08-06 13:04 ` [gentoo-user] " Kerin Millar
2013-08-06 15:13 ` Mike Gilbert
2 siblings, 0 replies; 19+ messages in thread
From: Kai Krakow @ 2013-08-05 23:25 UTC (permalink / raw
To: gentoo-user
Chris Stankevitz <chrisstankevitz@gmail.com> schrieb:
> 3. The handbook suggests that I should add this line to
> /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are
> again talking about the language "DE". RESOLUTION: I assumed
> LC_COLLATE=C refers to english and added the line without
> modification.
C refers to "as in C code"... Or something like that. What's essential: It
tells the system to use the strings like they are recorded in the program
file, without translating. In most cases that's English.
HTH
Kai
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-05 22:52 ` Chris Stankevitz
2013-08-05 23:25 ` [gentoo-user] " Kai Krakow
@ 2013-08-06 13:04 ` Kerin Millar
2013-08-06 13:24 ` Bruce Hill
` (2 more replies)
2013-08-06 15:13 ` Mike Gilbert
2 siblings, 3 replies; 19+ messages in thread
From: Kerin Millar @ 2013-08-06 13:04 UTC (permalink / raw
To: gentoo-user
On 05/08/2013 23:52, Chris Stankevitz wrote:
> On Mon, Aug 5, 2013 at 11:53 AM, Mike Gilbert <floppym@gentoo.org> wrote:
>> The handbook documents setting a system-wide default locale. You
>> generally do this by setting the LANG variable in
>> /etc/conf.d/02locale.
>>
>> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3
>
> Mike,
>
> Thank you for your help. I attempted to follow these instructions and
> ran into three problems. Can you please confirm the fixes I employed
> to deal with each of these issues:
>
> 1. The handbook suggests I should modify the file /etc/env.d/02locale,
> but that file does not exist on my system. RESOLUTION: create the
> file
Run "eselect locale", first with the "list" parameter and then the "set"
parameter as appropriate. It's easier.
>
> 2. The handbook suggests I should add this line to
> /etc/env.d/02locale: 'LANG="de_DE.UTF-8"', but I do not speak the
> language "DE". RESOLUTION: type instead 'LANG="en_US.UTF-8"' to match
> /etc/locale.gen
Legitimate locales are those installed with glibc. These can be shown
with either "eselect locale list" or "locale -a".
>
> 3. The handbook suggests that I should add this line to
> /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are
> again talking about the language "DE". RESOLUTION: I assumed
> LC_COLLATE=C refers to english and added the line without
> modification.
C refers to the POSIX locale [1].
Defining LC_COLLATE is a workaround for behaviour deeemed surprising to
those otherwise unaware of the impact of collations. For example, files
beginning with a dot might no longer appear at the top of a directory
listing and ranges in regular expressions may be affected, depending on
the extent to which a given program abides by the locale. Poorly written
shell scripts that capture from ls (assuming a given order) might also
be affected.
If undefined, the value of LC_COLLATE is inherited from LANG. I'm not
sure that overriding it is particularly useful nowadays but it doesn't hurt.
--Kerin
[1]
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 13:04 ` [gentoo-user] " Kerin Millar
@ 2013-08-06 13:24 ` Bruce Hill
2013-08-06 13:40 ` Kerin Millar
2013-08-06 15:51 ` Chris Stankevitz
2013-08-06 22:42 ` Stroller
2 siblings, 1 reply; 19+ messages in thread
From: Bruce Hill @ 2013-08-06 13:24 UTC (permalink / raw
To: gentoo-user
On Tue, Aug 06, 2013 at 02:04:00PM +0100, Kerin Millar wrote:
>
> Legitimate locales are those installed with glibc. These can be shown
> with either "eselect locale list" or "locale -a".
Having never used eselect with locales (AFAIR) before today.
Why does "locale -a" return utf8? I know UTF-8 is accepted as standard, utf8
is not but usually recognized, but want to understand why "locale -a" output
omits the standard, which is set on my systems, and differs from the others:
o@workstation ~ $ eselect locale list
Available targets for the LANG variable:
[1] C
[2] POSIX
[3] en_US.utf8
[4] en_US.UTF-8 *
[ ] (free form)
mingdao@workstation ~ $ locale -a
C
POSIX
en_US.utf8
mingdao@workstation ~ $ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Cheers,
Bruce
--
Happy Penguin Computers >')
126 Fenco Drive ( \
Tupelo, MS 38801 ^^
support@happypenguincomputers.com
662-269-2706 662-205-6424
http://happypenguincomputers.com/
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 13:24 ` Bruce Hill
@ 2013-08-06 13:40 ` Kerin Millar
2013-08-06 14:26 ` Bruce Hill
0 siblings, 1 reply; 19+ messages in thread
From: Kerin Millar @ 2013-08-06 13:40 UTC (permalink / raw
To: gentoo-user
On 06/08/2013 14:24, Bruce Hill wrote:
> On Tue, Aug 06, 2013 at 02:04:00PM +0100, Kerin Millar wrote:
>>
>> Legitimate locales are those installed with glibc. These can be shown
>> with either "eselect locale list" or "locale -a".
>
> Having never used eselect with locales (AFAIR) before today.
>
> Why does "locale -a" return utf8? I know UTF-8 is accepted as standard, utf8
> is not but usually recognized, but want to understand why "locale -a" output
> omits the standard, which is set on my systems, and differs from the others:
>
> o@workstation ~ $ eselect locale list
> Available targets for the LANG variable:
> [1] C
> [2] POSIX
> [3] en_US.utf8
> [4] en_US.UTF-8 *
> [ ] (free form)
> mingdao@workstation ~ $ locale -a
> C
> POSIX
> en_US.utf8
> mingdao@workstation ~ $ locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE=C
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
Apparently, "utf8" is the canonical representation in glibc (which
provides the locale tool):
http://lists.debian.org/debian-glibc/2004/12/msg00028.html
That eselect enumerates the locale twice when the alternate form is
specified in /etc/env.d/02locale could be considered as a minor bug.
--Kerin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 13:40 ` Kerin Millar
@ 2013-08-06 14:26 ` Bruce Hill
2013-08-06 14:53 ` Kerin Millar
0 siblings, 1 reply; 19+ messages in thread
From: Bruce Hill @ 2013-08-06 14:26 UTC (permalink / raw
To: gentoo-user
On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote:
>
> Apparently, "utf8" is the canonical representation in glibc (which
> provides the locale tool):
>
> http://lists.debian.org/debian-glibc/2004/12/msg00028.html
>
> That eselect enumerates the locale twice when the alternate form is
> specified in /etc/env.d/02locale could be considered as a minor bug.
>
> --Kerin
RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and
yes, I understand that's not official:
Other descriptions that omit the hyphen or replace it with a space, such as
"utf8" or "UTF 8", are not accepted as correct by the governing standards.[14]
Despite this, most agents such as browsers can understand them, and so
standards intended to describe existing practice (such as HTML5) may
effectively require their recognition.
[14] http://www.ietf.org/rfc/rfc3629.txt
I was only mildly curious seeing utf8 show up, because on numberous occasions
in #gentoo on FreeNode there have been different reports of incorrect
characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I
just made it a habit to always use the standard (UTF-8).
Having read the remainder of the Debian ML thread you referenced, I have a
headache. Debian did that to me when I used it for ~3 months in 2003. :-)
Cheers,
Bruce
--
Happy Penguin Computers >')
126 Fenco Drive ( \
Tupelo, MS 38801 ^^
support@happypenguincomputers.com
662-269-2706 662-205-6424
http://happypenguincomputers.com/
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 14:26 ` Bruce Hill
@ 2013-08-06 14:53 ` Kerin Millar
0 siblings, 0 replies; 19+ messages in thread
From: Kerin Millar @ 2013-08-06 14:53 UTC (permalink / raw
To: gentoo-user
On 06/08/2013 15:26, Bruce Hill wrote:
> On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote:
>>
>> Apparently, "utf8" is the canonical representation in glibc (which
>> provides the locale tool):
>>
>> http://lists.debian.org/debian-glibc/2004/12/msg00028.html
>>
>> That eselect enumerates the locale twice when the alternate form is
>> specified in /etc/env.d/02locale could be considered as a minor bug.
>>
>> --Kerin
>
> RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and
> yes, I understand that's not official:
>
> Other descriptions that omit the hyphen or replace it with a space, such as
> "utf8" or "UTF 8", are not accepted as correct by the governing standards.[14]
> Despite this, most agents such as browsers can understand them, and so
> standards intended to describe existing practice (such as HTML5) may
> effectively require their recognition.
>
> [14] http://www.ietf.org/rfc/rfc3629.txt
Internally, glibc may use whatever representation it pleases.
> I was only mildly curious seeing utf8 show up, because on numberous occasions
> in #gentoo on FreeNode there have been different reports of incorrect
> characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I
> just made it a habit to always use the standard (UTF-8).
Probably due to buggy applications. According to a glibc maintainer,
they should be using the nl_langinfo() function but some try to read the
locale name itself. The response of both of these commands is the same:
# LC_ALL=en_US.UTF-8 locale -k LC_CTYPE | grep charmap
# LC_ALL=en_US.utf8 locale -k LC_CTYPE | grep charmap
Ergo, applications that use the correct interface will be informed that
the character encoding is "UTF-8", irrespective of the format of the
locale name.
Given the above, sticking to the "<lang>_<territory>.UTF-8" format seems
wise.
>
> Having read the remainder of the Debian ML thread you referenced, I have a
> headache. Debian did that to me when I used it for ~3 months in 2003. :-)
>
> Cheers,
> Bruce
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-05 22:52 ` Chris Stankevitz
2013-08-05 23:25 ` [gentoo-user] " Kai Krakow
2013-08-06 13:04 ` [gentoo-user] " Kerin Millar
@ 2013-08-06 15:13 ` Mike Gilbert
2013-08-06 18:23 ` Chris Stankevitz
2 siblings, 1 reply; 19+ messages in thread
From: Mike Gilbert @ 2013-08-06 15:13 UTC (permalink / raw
To: gentoo-user
On Mon, Aug 5, 2013 at 6:52 PM, Chris Stankevitz
<chrisstankevitz@gmail.com> wrote:
> On Mon, Aug 5, 2013 at 11:53 AM, Mike Gilbert <floppym@gentoo.org> wrote:
>> The handbook documents setting a system-wide default locale. You
>> generally do this by setting the LANG variable in
>> /etc/conf.d/02locale.
>>
>> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3
>
> Mike,
>
> Thank you for your help. I attempted to follow these instructions and
> ran into three problems. Can you please confirm the fixes I employed
> to deal with each of these issues:
>
I think the other responses in the thread have this covered, but I
will respond anyway.
> 1. The handbook suggests I should modify the file /etc/env.d/02locale,
> but that file does not exist on my system. RESOLUTION: create the
> file
>
Correct. This file can also be created by using eselect locale.
> 2. The handbook suggests I should add this line to
> /etc/env.d/02locale: 'LANG="de_DE.UTF-8"', but I do not speak the
> language "DE". RESOLUTION: type instead 'LANG="en_US.UTF-8"' to match
> /etc/locale.gen
>
Right, the de_DE is just an example. You should select a
language/country that matches your lingual ability. :-)
> 3. The handbook suggests that I should add this line to
> /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are
> again talking about the language "DE". RESOLUTION: I assumed
> LC_COLLATE=C refers to english and added the line without
> modification.
>
LC_COLLATE specifies how to sort text strings. Setting it to "C"
indicates that you want to sort strings based on the binary (ASCII)
value of their characters.
Leaving LC_COLLATE unset will cause strings to be sorted according to
the normal rules associated with your locale.
For example, given the following strings:
cat
Dog
With LC_COLLATE="C", they are sorted like this, since the binary value
of "D" (66) is less than the value of "c" (99).
Dog
cat
With LC_COLLATE="en_US.UTF-8", they are sorted like this, since "c"
comes before "D" in the alphabet.
cat
Dog
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 13:04 ` [gentoo-user] " Kerin Millar
2013-08-06 13:24 ` Bruce Hill
@ 2013-08-06 15:51 ` Chris Stankevitz
2013-08-06 22:42 ` Stroller
2 siblings, 0 replies; 19+ messages in thread
From: Chris Stankevitz @ 2013-08-06 15:51 UTC (permalink / raw
To: gentoo-user@lists.gentoo.org
On Tue, Aug 6, 2013 at 6:04 AM, Kerin Millar <kerframil@fastmail.co.uk> wrote:
> Run "eselect locale", first with the "list" parameter and then the "set"
> parameter as appropriate. It's easier.
Kerin, all,
Thank for your help. SVN (and I'm sure other apps) are happy now.
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 15:13 ` Mike Gilbert
@ 2013-08-06 18:23 ` Chris Stankevitz
2013-08-07 0:58 ` Mike Gilbert
0 siblings, 1 reply; 19+ messages in thread
From: Chris Stankevitz @ 2013-08-06 18:23 UTC (permalink / raw
To: gentoo-user@lists.gentoo.org
On Tue, Aug 6, 2013 at 8:13 AM, Mike Gilbert <floppym@gentoo.org> wrote:
> Leaving LC_COLLATE unset will cause strings to be sorted according to
> the normal rules associated with your locale.
Mike (or anyone else),
For which applications does setting LC_COLLATE affect sorting:
a) Any C++ application that uses bool std::string::operator<(const std::string&)
b) Any C or C++ application that compares char values using the '<' operator
c) Any application that uses the system call "CompareStrings(const
char*, const char*)"
d) [your answer here]
I'm sure the answer is not a or b. I'm sure it's not c either since I
just made it up.
Thank you,
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 13:04 ` [gentoo-user] " Kerin Millar
2013-08-06 13:24 ` Bruce Hill
2013-08-06 15:51 ` Chris Stankevitz
@ 2013-08-06 22:42 ` Stroller
2013-08-07 12:41 ` Kerin Millar
2 siblings, 1 reply; 19+ messages in thread
From: Stroller @ 2013-08-06 22:42 UTC (permalink / raw
To: gentoo-user
On 6 August 2013, at 14:04, Kerin Millar wrote:
> ...
> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt.
It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon.
I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now.
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 18:23 ` Chris Stankevitz
@ 2013-08-07 0:58 ` Mike Gilbert
0 siblings, 0 replies; 19+ messages in thread
From: Mike Gilbert @ 2013-08-07 0:58 UTC (permalink / raw
To: gentoo-user
On Tue, Aug 6, 2013 at 2:23 PM, Chris Stankevitz
<chrisstankevitz@gmail.com> wrote:
> On Tue, Aug 6, 2013 at 8:13 AM, Mike Gilbert <floppym@gentoo.org> wrote:
>> Leaving LC_COLLATE unset will cause strings to be sorted according to
>> the normal rules associated with your locale.
>
> Mike (or anyone else),
>
> For which applications does setting LC_COLLATE affect sorting:
>
> a) Any C++ application that uses bool std::string::operator<(const std::string&)
>
> b) Any C or C++ application that compares char values using the '<' operator
>
> c) Any application that uses the system call "CompareStrings(const
> char*, const char*)"
>
> d) [your answer here]
>
> I'm sure the answer is not a or b. I'm sure it's not c either since I
> just made it up.
>
From locale(7):
LC_COLLATE
This is used to change the behavior of the functions
strcoll(3) and strxfrm(3),
which are used to compare strings in the local
alphabet. For example, the German
sharp s is sorted as "ss".
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-06 22:42 ` Stroller
@ 2013-08-07 12:41 ` Kerin Millar
2013-08-07 16:40 ` Stroller
0 siblings, 1 reply; 19+ messages in thread
From: Kerin Millar @ 2013-08-07 12:41 UTC (permalink / raw
To: gentoo-user
On 06/08/2013 23:42, Stroller wrote:
>
> On 6 August 2013, at 14:04, Kerin Millar wrote:
>> ...
>> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt.
>
> It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon.
As has been mentioned, there are valid reasons to want to override the
collation. Here is a concrete example:
https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html
Strictly speaking, grep is correct to behave that way but it can be
confounding. In an ideal world, everyone would be using named classes
instead of ranges in their regular expressions but it's not an ideal world.
These days, grep no longer exhibits this characteristic in Gentoo.
Nevertheless, it serves as a valid example of how collations for UTF-8
locales can be a liability.
Of the other distros, Arch Linux also defined LC_COLLATE=C although I
understand that they have just recently stopped doing that.
On a production system, I would still be inclined to use it for reasons
of safety. For that matter, some people refuse to use UTF-8 at all on
the grounds of security; the handling of variable-width encodings
continues to be an effective bug inducer.
> I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now.
Presumably:
a) LANG was defined inappropriately
b) LANG was defined appropriately but LC_TIME was defined otherwise
c) LC_ALL was defined, trumping all
I would definitely not advise doing any of these things.
--Kerin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-07 12:41 ` Kerin Millar
@ 2013-08-07 16:40 ` Stroller
2013-08-07 22:27 ` Kerin Millar
0 siblings, 1 reply; 19+ messages in thread
From: Stroller @ 2013-08-07 16:40 UTC (permalink / raw
To: gentoo-user
On 7 August 2013, at 13:41, Kerin Millar wrote:
> On 06/08/2013 23:42, Stroller wrote:
>>
>> On 6 August 2013, at 14:04, Kerin Millar wrote:
>>> ...
>>> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt.
>>
>> It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon.
>
> As has been mentioned, there are valid reasons to want to override the collation. Here is a concrete example:
>
> https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html
>
> Strictly speaking, grep is correct to behave that way but it can be confounding.
Linking also this answer, which you're aware of:
https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00600.html
This only goes to illustrate that you shouldn't be going overriding these willy-nilly without full awareness of why you're doing so and what you're doing.
>> I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now.
>
> Presumably:
>
> a) LANG was defined inappropriately
> b) LANG was defined appropriately but LC_TIME was defined otherwise
> c) LC_ALL was defined, trumping all
I'm having trouble parsing this reply, but perhaps you might find the full bug description helpful. I wrote about 1000 words on the subject there last year.
It is the top Google hit for "en_gb am pm bug": http://sourceware.org/bugzilla/show_bug.cgi?id=3768
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
2013-08-07 16:40 ` Stroller
@ 2013-08-07 22:27 ` Kerin Millar
0 siblings, 0 replies; 19+ messages in thread
From: Kerin Millar @ 2013-08-07 22:27 UTC (permalink / raw
To: gentoo-user
On 07/08/2013 17:40, Stroller wrote:
>
> On 7 August 2013, at 13:41, Kerin Millar wrote:
>
>> On 06/08/2013 23:42, Stroller wrote:
>>>
>>> On 6 August 2013, at 14:04, Kerin Millar wrote:
>>>> ...
>>>> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt.
>>>
>>> It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon.
>>
>> As has been mentioned, there are valid reasons to want to override the collation. Here is a concrete example:
>>
>> https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html
>>
>> Strictly speaking, grep is correct to behave that way but it can be confounding.
>
> Linking also this answer, which you're aware of:
> https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00600.html
Best practice will never be universally observed.
>
> This only goes to illustrate that you shouldn't be going overriding these willy-nilly without full awareness of why you're doing so and what you're doing.
It also served to illustrate the overall point I was making - that
sticking to the C/POSIX collation is not without value as a safety
measure. Naturally, I would expect anyone else to exercise their own
judgement.
>
>
>>> I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now.
>>
>> Presumably:
>>
>> a) LANG was defined inappropriately
>> b) LANG was defined appropriately but LC_TIME was defined otherwise
>> c) LC_ALL was defined, trumping all
>
>
> I'm having trouble parsing this reply, but perhaps you might find the full bug description helpful. I wrote about 1000 words on the subject there last year.
>
> It is the top Google hit for "en_gb am pm bug": http://sourceware.org/bugzilla/show_bug.cgi?id=3768
OK.
--Kerin
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-08-07 22:27 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05 18:25 [gentoo-user] export LC_CTYPE=en_US.UTF-8 Chris Stankevitz
2013-08-05 18:53 ` Mike Gilbert
2013-08-05 18:57 ` Bruce Hill
2013-08-05 21:17 ` Mike Gilbert
2013-08-05 22:52 ` Chris Stankevitz
2013-08-05 23:25 ` [gentoo-user] " Kai Krakow
2013-08-06 13:04 ` [gentoo-user] " Kerin Millar
2013-08-06 13:24 ` Bruce Hill
2013-08-06 13:40 ` Kerin Millar
2013-08-06 14:26 ` Bruce Hill
2013-08-06 14:53 ` Kerin Millar
2013-08-06 15:51 ` Chris Stankevitz
2013-08-06 22:42 ` Stroller
2013-08-07 12:41 ` Kerin Millar
2013-08-07 16:40 ` Stroller
2013-08-07 22:27 ` Kerin Millar
2013-08-06 15:13 ` Mike Gilbert
2013-08-06 18:23 ` Chris Stankevitz
2013-08-07 0:58 ` Mike Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox