From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1RzCE8-0001EM-AC for garchives@archives.gentoo.org; Sun, 19 Feb 2012 19:15:16 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id F0D43E0A45; Sun, 19 Feb 2012 19:15:07 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by pigeon.gentoo.org (Postfix) with ESMTP id 512F1E09C6 for ; Sun, 19 Feb 2012 19:14:42 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.gentoo.org (Postfix) with ESMTP id AB26C1B400F for ; Sun, 19 Feb 2012 19:14:41 +0000 (UTC) X-Virus-Scanned: by amavisd-new using ClamAV at gentoo.org X-Spam-Flag: NO X-Spam-Score: -1.675 X-Spam-Level: X-Spam-Status: No, score=-1.675 tagged_above=-999 required=5.5 tests=[AWL=-0.665, BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no Received: from smtp.gentoo.org ([127.0.0.1]) by localhost (smtp.gentoo.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RiivYTO57cnR for ; Sun, 19 Feb 2012 19:14:36 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 29AD21B4004 for ; Sun, 19 Feb 2012 19:14:35 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1RzCDM-0002jd-RR for gentoo-dev@gentoo.org; Sun, 19 Feb 2012 20:14:28 +0100 Received: from cpc2-enfi16-2-0-cust659.hari.cable.virginmedia.com ([94.170.82.148]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 19 Feb 2012 20:14:28 +0100 Received: from kerframil by cpc2-enfi16-2-0-cust659.hari.cable.virginmedia.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 19 Feb 2012 20:14:28 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: gentoo-dev@lists.gentoo.org From: Kerin Millar Subject: [gentoo-dev] Re: LANG=en_GB.UTF-8 by default Date: Sun, 19 Feb 2012 19:14:16 +0000 Message-ID: References: <201202151258.52431.vivo75@gmail.com> <20120215122252.GA12319@atrus.grandmasfridge.org> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpc2-enfi16-2-0-cust659.hari.cable.virginmedia.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111224 Thunderbird/9.0.1 In-Reply-To: X-Archives-Salt: 7ac2cce2-f240-4ad2-afee-c163321a9164 X-Archives-Hash: 6739ac1a12e0e316b32d694bdeeaf3b2 On 19/02/2012 01:00, James Cloos wrote: >>>>>> "KM" == Kerin Millar writes: > > KM> Arch also used to define LC_COLLATE="C" by default, probably to > KM> mitigate unpredictable behaviour in some applications, but have > KM> since dropped this additional variable so they must have deemed it > KM> no longer necessary. > > Without LC_COLLATE="C" things like [a-z]* gets a false=positive match > on files like Makefile. Indeed, character classes are a potential minefield. Incidentally, I just tested Ubuntu and Arch with only LANG set to a UTF-8 locale:- $ echo Makefile | sed -re 's/[a-z]//g' # collation rules ignored M $ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored akefile In neither case are the collation rules being obeyed. In Gentoo, however:- $ echo Makefile | sed -re 's/[a-z]//g' # collation rules obeyed $ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored akefile Obeying the collation rules is ostensibly the correct thing to do but, until everyone starts using named character classes (which will never happen), it's not safe. The thing that worries me here is the inconsistency in Gentoo. LC_COLLATE="C" is sufficient to work around the issue but the above makes me wonder why we still need it. --Kerin