From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-dev+bounces-42042-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1OeucS-00040G-Jo
	for garchives@archives.gentoo.org; Fri, 30 Jul 2010 18:47:44 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id BA204E08FB;
	Fri, 30 Jul 2010 18:47:40 +0000 (UTC)
Received: from mail-pz0-f53.google.com (mail-pz0-f53.google.com [209.85.210.53])
	by pigeon.gentoo.org (Postfix) with ESMTP id DE298E08D1
	for <gentoo-dev@lists.gentoo.org>; Fri, 30 Jul 2010 18:47:20 +0000 (UTC)
Received: by pzk9 with SMTP id 9so852127pzk.40
        for <gentoo-dev@lists.gentoo.org>; Fri, 30 Jul 2010 11:47:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:received:received:date:from:to:subject
         :message-id:references:mime-version:content-type:content-disposition
         :in-reply-to:user-agent;
        bh=VbH+wAWO2RQNDP1W+7SQC7y38RVYTFtjZlu+hVb3d4I=;
        b=ff7ydnA9TqCu0bLPz5CyuVhcSQ3EYo7xIyYNMScVGNQVEvxLClPebnQQ7N820tUGmb
         AnvfPOQBV2ps7hTOs0uKx5MKI3QORNCbc2jsv6EScS2u0YM8UQEYUUub8da9OFCsiHyS
         C5aeO6vURFOweGZcAsIwos4fhT9OZVbWCB8VM=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=date:from:to:subject:message-id:references:mime-version
         :content-type:content-disposition:in-reply-to:user-agent;
        b=usEjmx//2A4mSneWZ1Uf98SqtDtwkJREccNYRFkPrM7vb62/Q5c55pMPzXXV4LlsKR
         EY/M1iFBIDp0Inh9zIWocCOwTsUiWx9tWTraU1BvDs7NhqVzPCbsDYDl63ZN/KlFxl9X
         XzoIlwNzyEXVhvk9RdGEEetco8E0Ab3M64RZw=
Received: by 10.142.156.16 with SMTP id d16mr2096376wfe.324.1280515640309;
        Fri, 30 Jul 2010 11:47:20 -0700 (PDT)
Received: from smtp.gmail.com (c-67-171-128-62.hsd1.wa.comcast.net [67.171.128.62])
        by mx.google.com with ESMTPS id t11sm2892577wfc.16.2010.07.30.11.47.17
        (version=TLSv1/SSLv3 cipher=RC4-MD5);
        Fri, 30 Jul 2010 11:47:19 -0700 (PDT)
Received: by smtp.gmail.com (sSMTP sendmail emulation); Fri, 30 Jul 2010 11:45:18 -0700
Date: Fri, 30 Jul 2010 11:45:18 -0700
From: Brian Harring <ferringb@gmail.com>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] Locale check in python_pkg_setup()
Message-ID: <20100730184518.GA32513@hrair>
References: <201007300116.43653.Arfrever@gentoo.org>
 <4C5243C7.70709@gentoo.org>
 <20100730034827.GC15031@hrair>
 <4C530291.2010100@gentoo.org>
Precedence: bulk
List-Post: <mailto:gentoo-dev@lists.gentoo.org>
List-Help: <mailto:gentoo-dev+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-dev+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@lists.gentoo.org
Reply-to: gentoo-dev@lists.gentoo.org
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="PNTmBPCT7hxwcZjr"
Content-Disposition: inline
In-Reply-To: <4C530291.2010100@gentoo.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Archives-Salt: 50ca8f20-8347-44a6-8cd1-6d38685f90f1
X-Archives-Hash: 058a9c31e2769f213aecf9360335c6cb


--PNTmBPCT7hxwcZjr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 30, 2010 at 09:49:21AM -0700, "Paweee Hajdan, Jr." wrote:
> On 7/29/10 8:48 PM, Brian Harring wrote:
> > It's basically annoying people into changing to partially=20
> > sidestep a couple of bugs, instead of fixing the issue- and that's the=
=20
> > wrong course of action.
>=20
> I think that with python earlier than python-3 unicode handling is quite
> complicated, and I'm not surprised there are problems with that.

encoding handling wasn't that bad under py2k.  Py3k just enforces the=20
boundaries- meaning you can't just skid by.

> Arfrever, does python-3 have the same problem with non-UTF8 locales?

ascii is a subset of utf-8 and ascii is a subset of latin-1; latin-1=20
and utf-8 aren't compatible in encoded form however.

What this means is that the same set of bugs I ran down still will go=20
boom if you have a utf-8 locale and the code in question was dealing=20
w/ a latin-1 encoded file.


> Another thing we can consider is making UTF8 the default setup in
> Gentoo. I think most people (including me) don't care whether it's C or
> UTF8 as long as it works.

"as long as it works" in this case means "fix the code" as I've laid=20
out.  Forcing locale's to sidestep it leaves the latin-1/utf8=20
incompatibility to go 'boom'.

Basically, forcing utf8 doesn't "make it work".  It reduces the cases=20
breakage will show up while leaving those issues still there- frankly=20
this is worse, can't fix those screwups without them breaking (for=20
better or worse, and preferably breaking in a testcase).  We've got 4=20
bugs, and only one of them is semi complex fix (dodcutils needs to=20
require that html it's fed is utf8 compatible- valid enough req=20
anyways since html shouldn't be latin-1, it should be ascii or utf8).

So.. get fixing, instead of dodging the work imo. ;)

~brian

--PNTmBPCT7hxwcZjr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)

iEYEARECAAYFAkxTHb4ACgkQsiLx3HvNzgcU1ACgy50GXOOtllogbKd1ZEuHPjdM
vH4AoJA9aMVrTnsrIBsAuEOZzQ7xtF3N
=OruI
-----END PGP SIGNATURE-----

--PNTmBPCT7hxwcZjr--