From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1R27zd-0006mL-NG for garchives@archives.gentoo.org; Fri, 09 Sep 2011 20:48:10 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 85FA921C17B; Fri, 9 Sep 2011 20:48:01 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by pigeon.gentoo.org (Postfix) with ESMTP id 3E20621C17B for ; Fri, 9 Sep 2011 20:48:01 +0000 (UTC) Received: from pelican.gentoo.org (unknown [66.219.59.40]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 8CC681B401D for ; Fri, 9 Sep 2011 20:48:00 +0000 (UTC) Received: from localhost.localdomain (localhost [127.0.0.1]) by pelican.gentoo.org (Postfix) with ESMTP id AA5A880042 for ; Fri, 9 Sep 2011 20:47:59 +0000 (UTC) From: "Zac Medico" To: gentoo-commits@lists.gentoo.org Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Zac Medico" Message-ID: Subject: [gentoo-commits] proj/portage:master commit in: pym/portage/ X-VCS-Repository: proj/portage X-VCS-Files: pym/portage/__init__.py X-VCS-Directories: pym/portage/ X-VCS-Committer: zmedico X-VCS-Committer-Name: Zac Medico X-VCS-Revision: db32c3e3ca1e3cc724acacc79a5be2343efc13d1 Date: Fri, 9 Sep 2011 20:47:59 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: quoted-printable X-Archives-Salt: X-Archives-Hash: 39375fcef7643b8cf2af01b83158d58f commit: db32c3e3ca1e3cc724acacc79a5be2343efc13d1 Author: Zac Medico gentoo org> AuthorDate: Fri Sep 9 20:47:30 2011 +0000 Commit: Zac Medico gentoo org> CommitDate: Fri Sep 9 20:47:30 2011 +0000 URL: http://git.overlays.gentoo.org/gitweb/?p=3Dproj/portage.git;a= =3Dcommit;h=3Ddb32c3e3 Use utf_8 'merge' encoding for all locales. Previously, we used sys.getfilesystemencoding() for the 'merge' encoding, but that had various problems: 1) If the locale is ever changed then it can cause orphan files due to changed character set translation. 2) Ebuilds typically install files with utf_8 encoded file names, and then portage would be forced to rename those files to match sys.getfilesystemencoding(), possibly breaking things. 3) Automatic translation between encodings can lead to nonsensical file names when the source encoding is unknown by portage. 4) It's inconvenient for ebuilds to convert the encodings of file names themselves, and upstreams typically encode file names with utf_8 encoding. So, instead of relying on sys.getfilesystemencoding(), we avoid the above problems by using a constant utf_8 'merge' encoding for all locales, as discussed in bug #382199 and bug #381509. --- pym/portage/__init__.py | 40 ++++++++++++++++++++++------------------ 1 files changed, 22 insertions(+), 18 deletions(-) diff --git a/pym/portage/__init__.py b/pym/portage/__init__.py index 789d043..d3df6e3 100644 --- a/pym/portage/__init__.py +++ b/pym/portage/__init__.py @@ -148,31 +148,35 @@ if sys.hexversion >=3D 0x3000000: basestring =3D str long =3D int =20 -# Assume utf_8 fs encoding everywhere except in merge code, where the -# user's locale is respected. +# We use utf_8 encoding everywhere. Previously, we used +# sys.getfilesystemencoding() for the 'merge' encoding, but that had +# various problems: +# +# 1) If the locale is ever changed then it can cause orphan files due +# to changed character set translation. +# +# 2) Ebuilds typically install files with utf_8 encoded file names, +# and then portage would be forced to rename those files to match +# sys.getfilesystemencoding(), possibly breaking things. +# +# 3) Automatic translation between encodings can lead to nonsensical +# file names when the source encoding is unknown by portage. +# +# 4) It's inconvenient for ebuilds to convert the encodings of file +# names to match the current locale, and upstreams typically encode +# file names with utf_8 encoding. +# +# So, instead of relying on sys.getfilesystemencoding(), we avoid the ab= ove +# problems by using a constant utf_8 'merge' encoding for all locales, a= s +# discussed in bug #382199 and bug #381509. _encodings =3D { 'content' : 'utf_8', 'fs' : 'utf_8', - 'merge' : sys.getfilesystemencoding(), + 'merge' : 'utf_8', 'repo.content' : 'utf_8', 'stdio' : 'utf_8', } =20 -# sys.getfilesystemencoding() can return None if python is built with -# USE=3Dbuild (stage 1). If the filesystem encoding is undefined or is a -# subset of utf_8, then we default to utf_8 encoding for merges, since -# it probably won't hurt, and forced conversion to ascii encoding is -# known to break some packages that install file names with utf_8 -# encoding (see bug #381509). The ascii aliases are borrowed from -# python's encodings.aliases.aliases dict. -if _encodings['merge'] is None or \ - _encodings['merge'].lower().replace('-', '_') in \ - ('ascii', '646', 'ansi_x3.4_1968', 'ansi_x3_4_1968', - 'ansi_x3.4_1986', 'cp367', 'csascii', 'ibm367', 'iso646_us', - 'iso_646.irv_1991', 'iso_ir_6', 'us', 'us_ascii'): - - _encodings['merge'] =3D 'utf_8' - if sys.hexversion >=3D 0x3000000: def _unicode_encode(s, encoding=3D_encodings['content'], errors=3D'back= slashreplace'): if isinstance(s, str):