* [gentoo-user] export LC_CTYPE=en_US.UTF-8 @ 2013-08-05 18:25 Chris Stankevitz 2013-08-05 18:53 ` Mike Gilbert 0 siblings, 1 reply; 19+ messages in thread From: Chris Stankevitz @ 2013-08-05 18:25 UTC (permalink / raw To: gentoo-user@lists.gentoo.org Hello, I am using svn to update a repository. Somebody added files to the repository with weird characters in the filename. SVN refuses to update the respository unless I first: export LC_CTYPE=en_US.UTF-8 I don't know or really care what that mumbo jumbo means, but I would like an answer to this question: Is my gentoo system properly setup? If not, what step did I miss that is causing svn to want me to export LC_CTYPE? I suspect either my gentoo system is messed up or svn is messed up. Thank you, Chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-05 18:25 [gentoo-user] export LC_CTYPE=en_US.UTF-8 Chris Stankevitz @ 2013-08-05 18:53 ` Mike Gilbert 2013-08-05 18:57 ` Bruce Hill 2013-08-05 22:52 ` Chris Stankevitz 0 siblings, 2 replies; 19+ messages in thread From: Mike Gilbert @ 2013-08-05 18:53 UTC (permalink / raw To: gentoo-user On Mon, Aug 5, 2013 at 2:25 PM, Chris Stankevitz <chrisstankevitz@gmail.com> wrote: > Hello, > > I am using svn to update a repository. Somebody added files to the > repository with weird characters in the filename. SVN refuses to > update the respository unless I first: > > export LC_CTYPE=en_US.UTF-8 > > I don't know or really care what that mumbo jumbo means, but I would > like an answer to this question: > > Is my gentoo system properly setup? If not, what step did I miss that > is causing svn to want me to export LC_CTYPE? > > I suspect either my gentoo system is messed up or svn is messed up. > Sparing you the details as requested: In general, you want to be using a locale that ends with ".UTF-8" to avoid encoding issues with software like python and subversion. The handbook documents setting a system-wide default locale. You generally do this by setting the LANG variable in /etc/conf.d/02locale. http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-05 18:53 ` Mike Gilbert @ 2013-08-05 18:57 ` Bruce Hill 2013-08-05 21:17 ` Mike Gilbert 2013-08-05 22:52 ` Chris Stankevitz 1 sibling, 1 reply; 19+ messages in thread From: Bruce Hill @ 2013-08-05 18:57 UTC (permalink / raw To: gentoo-user On Mon, Aug 05, 2013 at 02:53:11PM -0400, Mike Gilbert wrote: > On Mon, Aug 5, 2013 at 2:25 PM, Chris Stankevitz > <chrisstankevitz@gmail.com> wrote: > > Hello, > > > > I am using svn to update a repository. Somebody added files to the > > repository with weird characters in the filename. SVN refuses to > > update the respository unless I first: > > > > export LC_CTYPE=en_US.UTF-8 > > > > I don't know or really care what that mumbo jumbo means, but I would > > like an answer to this question: > > > > Is my gentoo system properly setup? If not, what step did I miss that > > is causing svn to want me to export LC_CTYPE? > > > > I suspect either my gentoo system is messed up or svn is messed up. > > > > Sparing you the details as requested: In general, you want to be using > a locale that ends with ".UTF-8" to avoid encoding issues with > software like python and subversion. > > The handbook documents setting a system-wide default locale. You > generally do this by setting the LANG variable in > /etc/conf.d/02locale. > > http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3 Without looking, shouldn't that be /etc/env.d/02locale ? -- Happy Penguin Computers >') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ support@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-05 18:57 ` Bruce Hill @ 2013-08-05 21:17 ` Mike Gilbert 0 siblings, 0 replies; 19+ messages in thread From: Mike Gilbert @ 2013-08-05 21:17 UTC (permalink / raw To: gentoo-user On Mon, Aug 5, 2013 at 2:57 PM, Bruce Hill <daddy@happypenguincomputers.com> wrote: > On Mon, Aug 05, 2013 at 02:53:11PM -0400, Mike Gilbert wrote: >> On Mon, Aug 5, 2013 at 2:25 PM, Chris Stankevitz >> <chrisstankevitz@gmail.com> wrote: >> > Hello, >> > >> > I am using svn to update a repository. Somebody added files to the >> > repository with weird characters in the filename. SVN refuses to >> > update the respository unless I first: >> > >> > export LC_CTYPE=en_US.UTF-8 >> > >> > I don't know or really care what that mumbo jumbo means, but I would >> > like an answer to this question: >> > >> > Is my gentoo system properly setup? If not, what step did I miss that >> > is causing svn to want me to export LC_CTYPE? >> > >> > I suspect either my gentoo system is messed up or svn is messed up. >> > >> >> Sparing you the details as requested: In general, you want to be using >> a locale that ends with ".UTF-8" to avoid encoding issues with >> software like python and subversion. >> >> The handbook documents setting a system-wide default locale. You >> generally do this by setting the LANG variable in >> /etc/conf.d/02locale. >> >> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3 > > Without looking, shouldn't that be /etc/env.d/02locale ? Yes. Or /etc/locale.conf if you're on systemd. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-05 18:53 ` Mike Gilbert 2013-08-05 18:57 ` Bruce Hill @ 2013-08-05 22:52 ` Chris Stankevitz 2013-08-05 23:25 ` [gentoo-user] " Kai Krakow ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Chris Stankevitz @ 2013-08-05 22:52 UTC (permalink / raw To: gentoo-user@lists.gentoo.org On Mon, Aug 5, 2013 at 11:53 AM, Mike Gilbert <floppym@gentoo.org> wrote: > The handbook documents setting a system-wide default locale. You > generally do this by setting the LANG variable in > /etc/conf.d/02locale. > > http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3 Mike, Thank you for your help. I attempted to follow these instructions and ran into three problems. Can you please confirm the fixes I employed to deal with each of these issues: 1. The handbook suggests I should modify the file /etc/env.d/02locale, but that file does not exist on my system. RESOLUTION: create the file 2. The handbook suggests I should add this line to /etc/env.d/02locale: 'LANG="de_DE.UTF-8"', but I do not speak the language "DE". RESOLUTION: type instead 'LANG="en_US.UTF-8"' to match /etc/locale.gen 3. The handbook suggests that I should add this line to /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are again talking about the language "DE". RESOLUTION: I assumed LC_COLLATE=C refers to english and added the line without modification. Thank you again for your help, Chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* [gentoo-user] Re: export LC_CTYPE=en_US.UTF-8 2013-08-05 22:52 ` Chris Stankevitz @ 2013-08-05 23:25 ` Kai Krakow 2013-08-06 13:04 ` [gentoo-user] " Kerin Millar 2013-08-06 15:13 ` Mike Gilbert 2 siblings, 0 replies; 19+ messages in thread From: Kai Krakow @ 2013-08-05 23:25 UTC (permalink / raw To: gentoo-user Chris Stankevitz <chrisstankevitz@gmail.com> schrieb: > 3. The handbook suggests that I should add this line to > /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are > again talking about the language "DE". RESOLUTION: I assumed > LC_COLLATE=C refers to english and added the line without > modification. C refers to "as in C code"... Or something like that. What's essential: It tells the system to use the strings like they are recorded in the program file, without translating. In most cases that's English. HTH Kai ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-05 22:52 ` Chris Stankevitz 2013-08-05 23:25 ` [gentoo-user] " Kai Krakow @ 2013-08-06 13:04 ` Kerin Millar 2013-08-06 13:24 ` Bruce Hill ` (2 more replies) 2013-08-06 15:13 ` Mike Gilbert 2 siblings, 3 replies; 19+ messages in thread From: Kerin Millar @ 2013-08-06 13:04 UTC (permalink / raw To: gentoo-user On 05/08/2013 23:52, Chris Stankevitz wrote: > On Mon, Aug 5, 2013 at 11:53 AM, Mike Gilbert <floppym@gentoo.org> wrote: >> The handbook documents setting a system-wide default locale. You >> generally do this by setting the LANG variable in >> /etc/conf.d/02locale. >> >> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3 > > Mike, > > Thank you for your help. I attempted to follow these instructions and > ran into three problems. Can you please confirm the fixes I employed > to deal with each of these issues: > > 1. The handbook suggests I should modify the file /etc/env.d/02locale, > but that file does not exist on my system. RESOLUTION: create the > file Run "eselect locale", first with the "list" parameter and then the "set" parameter as appropriate. It's easier. > > 2. The handbook suggests I should add this line to > /etc/env.d/02locale: 'LANG="de_DE.UTF-8"', but I do not speak the > language "DE". RESOLUTION: type instead 'LANG="en_US.UTF-8"' to match > /etc/locale.gen Legitimate locales are those installed with glibc. These can be shown with either "eselect locale list" or "locale -a". > > 3. The handbook suggests that I should add this line to > /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are > again talking about the language "DE". RESOLUTION: I assumed > LC_COLLATE=C refers to english and added the line without > modification. C refers to the POSIX locale [1]. Defining LC_COLLATE is a workaround for behaviour deeemed surprising to those otherwise unaware of the impact of collations. For example, files beginning with a dot might no longer appear at the top of a directory listing and ranges in regular expressions may be affected, depending on the extent to which a given program abides by the locale. Poorly written shell scripts that capture from ls (assuming a given order) might also be affected. If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt. --Kerin [1] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_02 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 13:04 ` [gentoo-user] " Kerin Millar @ 2013-08-06 13:24 ` Bruce Hill 2013-08-06 13:40 ` Kerin Millar 2013-08-06 15:51 ` Chris Stankevitz 2013-08-06 22:42 ` Stroller 2 siblings, 1 reply; 19+ messages in thread From: Bruce Hill @ 2013-08-06 13:24 UTC (permalink / raw To: gentoo-user On Tue, Aug 06, 2013 at 02:04:00PM +0100, Kerin Millar wrote: > > Legitimate locales are those installed with glibc. These can be shown > with either "eselect locale list" or "locale -a". Having never used eselect with locales (AFAIR) before today. Why does "locale -a" return utf8? I know UTF-8 is accepted as standard, utf8 is not but usually recognized, but want to understand why "locale -a" output omits the standard, which is set on my systems, and differs from the others: o@workstation ~ $ eselect locale list Available targets for the LANG variable: [1] C [2] POSIX [3] en_US.utf8 [4] en_US.UTF-8 * [ ] (free form) mingdao@workstation ~ $ locale -a C POSIX en_US.utf8 mingdao@workstation ~ $ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=C LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Cheers, Bruce -- Happy Penguin Computers >') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ support@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 13:24 ` Bruce Hill @ 2013-08-06 13:40 ` Kerin Millar 2013-08-06 14:26 ` Bruce Hill 0 siblings, 1 reply; 19+ messages in thread From: Kerin Millar @ 2013-08-06 13:40 UTC (permalink / raw To: gentoo-user On 06/08/2013 14:24, Bruce Hill wrote: > On Tue, Aug 06, 2013 at 02:04:00PM +0100, Kerin Millar wrote: >> >> Legitimate locales are those installed with glibc. These can be shown >> with either "eselect locale list" or "locale -a". > > Having never used eselect with locales (AFAIR) before today. > > Why does "locale -a" return utf8? I know UTF-8 is accepted as standard, utf8 > is not but usually recognized, but want to understand why "locale -a" output > omits the standard, which is set on my systems, and differs from the others: > > o@workstation ~ $ eselect locale list > Available targets for the LANG variable: > [1] C > [2] POSIX > [3] en_US.utf8 > [4] en_US.UTF-8 * > [ ] (free form) > mingdao@workstation ~ $ locale -a > C > POSIX > en_US.utf8 > mingdao@workstation ~ $ locale > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE=C > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_PAPER="en_US.UTF-8" > LC_NAME="en_US.UTF-8" > LC_ADDRESS="en_US.UTF-8" > LC_TELEPHONE="en_US.UTF-8" > LC_MEASUREMENT="en_US.UTF-8" > LC_IDENTIFICATION="en_US.UTF-8" > LC_ALL= Apparently, "utf8" is the canonical representation in glibc (which provides the locale tool): http://lists.debian.org/debian-glibc/2004/12/msg00028.html That eselect enumerates the locale twice when the alternate form is specified in /etc/env.d/02locale could be considered as a minor bug. --Kerin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 13:40 ` Kerin Millar @ 2013-08-06 14:26 ` Bruce Hill 2013-08-06 14:53 ` Kerin Millar 0 siblings, 1 reply; 19+ messages in thread From: Bruce Hill @ 2013-08-06 14:26 UTC (permalink / raw To: gentoo-user On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote: > > Apparently, "utf8" is the canonical representation in glibc (which > provides the locale tool): > > http://lists.debian.org/debian-glibc/2004/12/msg00028.html > > That eselect enumerates the locale twice when the alternate form is > specified in /etc/env.d/02locale could be considered as a minor bug. > > --Kerin RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and yes, I understand that's not official: Other descriptions that omit the hyphen or replace it with a space, such as "utf8" or "UTF 8", are not accepted as correct by the governing standards.[14] Despite this, most agents such as browsers can understand them, and so standards intended to describe existing practice (such as HTML5) may effectively require their recognition. [14] http://www.ietf.org/rfc/rfc3629.txt I was only mildly curious seeing utf8 show up, because on numberous occasions in #gentoo on FreeNode there have been different reports of incorrect characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I just made it a habit to always use the standard (UTF-8). Having read the remainder of the Debian ML thread you referenced, I have a headache. Debian did that to me when I used it for ~3 months in 2003. :-) Cheers, Bruce -- Happy Penguin Computers >') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ support@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 14:26 ` Bruce Hill @ 2013-08-06 14:53 ` Kerin Millar 0 siblings, 0 replies; 19+ messages in thread From: Kerin Millar @ 2013-08-06 14:53 UTC (permalink / raw To: gentoo-user On 06/08/2013 15:26, Bruce Hill wrote: > On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote: >> >> Apparently, "utf8" is the canonical representation in glibc (which >> provides the locale tool): >> >> http://lists.debian.org/debian-glibc/2004/12/msg00028.html >> >> That eselect enumerates the locale twice when the alternate form is >> specified in /etc/env.d/02locale could be considered as a minor bug. >> >> --Kerin > > RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and > yes, I understand that's not official: > > Other descriptions that omit the hyphen or replace it with a space, such as > "utf8" or "UTF 8", are not accepted as correct by the governing standards.[14] > Despite this, most agents such as browsers can understand them, and so > standards intended to describe existing practice (such as HTML5) may > effectively require their recognition. > > [14] http://www.ietf.org/rfc/rfc3629.txt Internally, glibc may use whatever representation it pleases. > I was only mildly curious seeing utf8 show up, because on numberous occasions > in #gentoo on FreeNode there have been different reports of incorrect > characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I > just made it a habit to always use the standard (UTF-8). Probably due to buggy applications. According to a glibc maintainer, they should be using the nl_langinfo() function but some try to read the locale name itself. The response of both of these commands is the same: # LC_ALL=en_US.UTF-8 locale -k LC_CTYPE | grep charmap # LC_ALL=en_US.utf8 locale -k LC_CTYPE | grep charmap Ergo, applications that use the correct interface will be informed that the character encoding is "UTF-8", irrespective of the format of the locale name. Given the above, sticking to the "<lang>_<territory>.UTF-8" format seems wise. > > Having read the remainder of the Debian ML thread you referenced, I have a > headache. Debian did that to me when I used it for ~3 months in 2003. :-) > > Cheers, > Bruce > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 13:04 ` [gentoo-user] " Kerin Millar 2013-08-06 13:24 ` Bruce Hill @ 2013-08-06 15:51 ` Chris Stankevitz 2013-08-06 22:42 ` Stroller 2 siblings, 0 replies; 19+ messages in thread From: Chris Stankevitz @ 2013-08-06 15:51 UTC (permalink / raw To: gentoo-user@lists.gentoo.org On Tue, Aug 6, 2013 at 6:04 AM, Kerin Millar <kerframil@fastmail.co.uk> wrote: > Run "eselect locale", first with the "list" parameter and then the "set" > parameter as appropriate. It's easier. Kerin, all, Thank for your help. SVN (and I'm sure other apps) are happy now. Chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 13:04 ` [gentoo-user] " Kerin Millar 2013-08-06 13:24 ` Bruce Hill 2013-08-06 15:51 ` Chris Stankevitz @ 2013-08-06 22:42 ` Stroller 2013-08-07 12:41 ` Kerin Millar 2 siblings, 1 reply; 19+ messages in thread From: Stroller @ 2013-08-06 22:42 UTC (permalink / raw To: gentoo-user On 6 August 2013, at 14:04, Kerin Millar wrote: > ... > If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt. It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon. I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now. Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 22:42 ` Stroller @ 2013-08-07 12:41 ` Kerin Millar 2013-08-07 16:40 ` Stroller 0 siblings, 1 reply; 19+ messages in thread From: Kerin Millar @ 2013-08-07 12:41 UTC (permalink / raw To: gentoo-user On 06/08/2013 23:42, Stroller wrote: > > On 6 August 2013, at 14:04, Kerin Millar wrote: >> ... >> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt. > > It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon. As has been mentioned, there are valid reasons to want to override the collation. Here is a concrete example: https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html Strictly speaking, grep is correct to behave that way but it can be confounding. In an ideal world, everyone would be using named classes instead of ranges in their regular expressions but it's not an ideal world. These days, grep no longer exhibits this characteristic in Gentoo. Nevertheless, it serves as a valid example of how collations for UTF-8 locales can be a liability. Of the other distros, Arch Linux also defined LC_COLLATE=C although I understand that they have just recently stopped doing that. On a production system, I would still be inclined to use it for reasons of safety. For that matter, some people refuse to use UTF-8 at all on the grounds of security; the handling of variable-width encodings continues to be an effective bug inducer. > I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now. Presumably: a) LANG was defined inappropriately b) LANG was defined appropriately but LC_TIME was defined otherwise c) LC_ALL was defined, trumping all I would definitely not advise doing any of these things. --Kerin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-07 12:41 ` Kerin Millar @ 2013-08-07 16:40 ` Stroller 2013-08-07 22:27 ` Kerin Millar 0 siblings, 1 reply; 19+ messages in thread From: Stroller @ 2013-08-07 16:40 UTC (permalink / raw To: gentoo-user On 7 August 2013, at 13:41, Kerin Millar wrote: > On 06/08/2013 23:42, Stroller wrote: >> >> On 6 August 2013, at 14:04, Kerin Millar wrote: >>> ... >>> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt. >> >> It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon. > > As has been mentioned, there are valid reasons to want to override the collation. Here is a concrete example: > > https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html > > Strictly speaking, grep is correct to behave that way but it can be confounding. Linking also this answer, which you're aware of: https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00600.html This only goes to illustrate that you shouldn't be going overriding these willy-nilly without full awareness of why you're doing so and what you're doing. >> I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now. > > Presumably: > > a) LANG was defined inappropriately > b) LANG was defined appropriately but LC_TIME was defined otherwise > c) LC_ALL was defined, trumping all I'm having trouble parsing this reply, but perhaps you might find the full bug description helpful. I wrote about 1000 words on the subject there last year. It is the top Google hit for "en_gb am pm bug": http://sourceware.org/bugzilla/show_bug.cgi?id=3768 Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-07 16:40 ` Stroller @ 2013-08-07 22:27 ` Kerin Millar 0 siblings, 0 replies; 19+ messages in thread From: Kerin Millar @ 2013-08-07 22:27 UTC (permalink / raw To: gentoo-user On 07/08/2013 17:40, Stroller wrote: > > On 7 August 2013, at 13:41, Kerin Millar wrote: > >> On 06/08/2013 23:42, Stroller wrote: >>> >>> On 6 August 2013, at 14:04, Kerin Millar wrote: >>>> ... >>>> If undefined, the value of LC_COLLATE is inherited from LANG. I'm not sure that overriding it is particularly useful nowadays but it doesn't hurt. >>> >>> It's been a couple of years since I looked into this, but I'm given to believe that LANG should set all LC_ variables correctly, and that overriding them is frowned upon. >> >> As has been mentioned, there are valid reasons to want to override the collation. Here is a concrete example: >> >> https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00537.html >> >> Strictly speaking, grep is correct to behave that way but it can be confounding. > > Linking also this answer, which you're aware of: > https://lists.gnu.org/archive/html/bug-gnu-utils/2003-08/msg00600.html Best practice will never be universally observed. > > This only goes to illustrate that you shouldn't be going overriding these willy-nilly without full awareness of why you're doing so and what you're doing. It also served to illustrate the overall point I was making - that sticking to the C/POSIX collation is not without value as a safety measure. Naturally, I would expect anyone else to exercise their own judgement. > > >>> I had to do this myself because, due to a bug, the en_GB time formatting failed to display am or pm. I believe this should be fixed now. >> >> Presumably: >> >> a) LANG was defined inappropriately >> b) LANG was defined appropriately but LC_TIME was defined otherwise >> c) LC_ALL was defined, trumping all > > > I'm having trouble parsing this reply, but perhaps you might find the full bug description helpful. I wrote about 1000 words on the subject there last year. > > It is the top Google hit for "en_gb am pm bug": http://sourceware.org/bugzilla/show_bug.cgi?id=3768 OK. --Kerin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-05 22:52 ` Chris Stankevitz 2013-08-05 23:25 ` [gentoo-user] " Kai Krakow 2013-08-06 13:04 ` [gentoo-user] " Kerin Millar @ 2013-08-06 15:13 ` Mike Gilbert 2013-08-06 18:23 ` Chris Stankevitz 2 siblings, 1 reply; 19+ messages in thread From: Mike Gilbert @ 2013-08-06 15:13 UTC (permalink / raw To: gentoo-user On Mon, Aug 5, 2013 at 6:52 PM, Chris Stankevitz <chrisstankevitz@gmail.com> wrote: > On Mon, Aug 5, 2013 at 11:53 AM, Mike Gilbert <floppym@gentoo.org> wrote: >> The handbook documents setting a system-wide default locale. You >> generally do this by setting the LANG variable in >> /etc/conf.d/02locale. >> >> http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=1&chap=8#doc_chap3_sect3 > > Mike, > > Thank you for your help. I attempted to follow these instructions and > ran into three problems. Can you please confirm the fixes I employed > to deal with each of these issues: > I think the other responses in the thread have this covered, but I will respond anyway. > 1. The handbook suggests I should modify the file /etc/env.d/02locale, > but that file does not exist on my system. RESOLUTION: create the > file > Correct. This file can also be created by using eselect locale. > 2. The handbook suggests I should add this line to > /etc/env.d/02locale: 'LANG="de_DE.UTF-8"', but I do not speak the > language "DE". RESOLUTION: type instead 'LANG="en_US.UTF-8"' to match > /etc/locale.gen > Right, the de_DE is just an example. You should select a language/country that matches your lingual ability. :-) > 3. The handbook suggests that I should add this line to > /etc/env.d/02locale: 'LC_COLLATE="C"', but I do not know if they are > again talking about the language "DE". RESOLUTION: I assumed > LC_COLLATE=C refers to english and added the line without > modification. > LC_COLLATE specifies how to sort text strings. Setting it to "C" indicates that you want to sort strings based on the binary (ASCII) value of their characters. Leaving LC_COLLATE unset will cause strings to be sorted according to the normal rules associated with your locale. For example, given the following strings: cat Dog With LC_COLLATE="C", they are sorted like this, since the binary value of "D" (66) is less than the value of "c" (99). Dog cat With LC_COLLATE="en_US.UTF-8", they are sorted like this, since "c" comes before "D" in the alphabet. cat Dog ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 15:13 ` Mike Gilbert @ 2013-08-06 18:23 ` Chris Stankevitz 2013-08-07 0:58 ` Mike Gilbert 0 siblings, 1 reply; 19+ messages in thread From: Chris Stankevitz @ 2013-08-06 18:23 UTC (permalink / raw To: gentoo-user@lists.gentoo.org On Tue, Aug 6, 2013 at 8:13 AM, Mike Gilbert <floppym@gentoo.org> wrote: > Leaving LC_COLLATE unset will cause strings to be sorted according to > the normal rules associated with your locale. Mike (or anyone else), For which applications does setting LC_COLLATE affect sorting: a) Any C++ application that uses bool std::string::operator<(const std::string&) b) Any C or C++ application that compares char values using the '<' operator c) Any application that uses the system call "CompareStrings(const char*, const char*)" d) [your answer here] I'm sure the answer is not a or b. I'm sure it's not c either since I just made it up. Thank you, Chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 2013-08-06 18:23 ` Chris Stankevitz @ 2013-08-07 0:58 ` Mike Gilbert 0 siblings, 0 replies; 19+ messages in thread From: Mike Gilbert @ 2013-08-07 0:58 UTC (permalink / raw To: gentoo-user On Tue, Aug 6, 2013 at 2:23 PM, Chris Stankevitz <chrisstankevitz@gmail.com> wrote: > On Tue, Aug 6, 2013 at 8:13 AM, Mike Gilbert <floppym@gentoo.org> wrote: >> Leaving LC_COLLATE unset will cause strings to be sorted according to >> the normal rules associated with your locale. > > Mike (or anyone else), > > For which applications does setting LC_COLLATE affect sorting: > > a) Any C++ application that uses bool std::string::operator<(const std::string&) > > b) Any C or C++ application that compares char values using the '<' operator > > c) Any application that uses the system call "CompareStrings(const > char*, const char*)" > > d) [your answer here] > > I'm sure the answer is not a or b. I'm sure it's not c either since I > just made it up. > From locale(7): LC_COLLATE This is used to change the behavior of the functions strcoll(3) and strxfrm(3), which are used to compare strings in the local alphabet. For example, the German sharp s is sorted as "ss". ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-08-07 22:27 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-05 18:25 [gentoo-user] export LC_CTYPE=en_US.UTF-8 Chris Stankevitz 2013-08-05 18:53 ` Mike Gilbert 2013-08-05 18:57 ` Bruce Hill 2013-08-05 21:17 ` Mike Gilbert 2013-08-05 22:52 ` Chris Stankevitz 2013-08-05 23:25 ` [gentoo-user] " Kai Krakow 2013-08-06 13:04 ` [gentoo-user] " Kerin Millar 2013-08-06 13:24 ` Bruce Hill 2013-08-06 13:40 ` Kerin Millar 2013-08-06 14:26 ` Bruce Hill 2013-08-06 14:53 ` Kerin Millar 2013-08-06 15:51 ` Chris Stankevitz 2013-08-06 22:42 ` Stroller 2013-08-07 12:41 ` Kerin Millar 2013-08-07 16:40 ` Stroller 2013-08-07 22:27 ` Kerin Millar 2013-08-06 15:13 ` Mike Gilbert 2013-08-06 18:23 ` Chris Stankevitz 2013-08-07 0:58 ` Mike Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox