public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Locale check in python_pkg_setup()
@ 2010-07-29 23:16 Arfrever Frehtes Taifersar Arahesis
  2010-07-29 23:20 ` "Paweł Hajdan, Jr."
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-07-29 23:16 UTC (permalink / raw
  To: Gentoo Development


[-- Attachment #1.1: Type: Text/Plain, Size: 244 bytes --]

We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check
locale and print error (using eerror(), without die()), when unsupported locale has been detected.

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #1.2: python.eclass.patch --]
[-- Type: text/x-patch, Size: 1016 bytes --]

--- python.eclass
+++ python.eclass
@@ -355,6 +355,8 @@
 	# Check if phase is pkg_setup().
 	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
 
+	local locale
+
 	if [[ "$#" -ne 0 ]]; then
 		die "${FUNCNAME}() does not accept arguments"
 	fi
@@ -407,6 +409,16 @@
 		unset -f python_pkg_setup_check_USE_flags
 	fi
 
+	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
+	if [[ "${locale}" != *.UTF-8 ]]; then
+		eerror
+		eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
+		eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
+		eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
+		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
+		eerror
+	fi
+
 	PYTHON_PKG_SETUP_EXECUTED="1"
 }
 

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
@ 2010-07-29 23:20 ` "Paweł Hajdan, Jr."
  2010-07-30  2:29   ` Arfrever Frehtes Taifersar Arahesis
  2010-07-30  0:13 ` [gentoo-dev] " Jonathan Callen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: "Paweł Hajdan, Jr." @ 2010-07-29 23:20 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> 
> --- python.eclass
> +++ python.eclass
> @@ -355,6 +355,8 @@
>  	# Check if phase is pkg_setup().
>  	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
>  
> +	local locale
> +
>  	if [[ "$#" -ne 0 ]]; then
>  		die "${FUNCNAME}() does not accept arguments"
>  	fi
> @@ -407,6 +409,16 @@
>  		unset -f python_pkg_setup_check_USE_flags
>  	fi

nit: Why not declare "local locale" here, close to its usage?

> +	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> +	if [[ "${locale}" != *.UTF-8 ]]; then
> +		eerror
> +		eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
> +		eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
> +		eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
> +		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
> +		eerror
> +	fi
> +
>  	PYTHON_PKG_SETUP_EXECUTED="1"
>  }
>  



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [gentoo-dev] Re: Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
  2010-07-29 23:20 ` "Paweł Hajdan, Jr."
@ 2010-07-30  0:13 ` Jonathan Callen
  2010-07-30  2:32   ` Arfrever Frehtes Taifersar Arahesis
  2010-07-30  2:36 ` [gentoo-dev] " Brian Harring
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Jonathan Callen @ 2010-07-30  0:13 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 07/29/2010 07:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> +	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> +	if [[ "${locale}" != *.UTF-8 ]]; then

Shouldn't you be checking the output of `locale charmap` instead of the
actual contents of the LC_ALL/LC_CTYPE/LANG variables?  You currently
are reporting an error if someone is using the "en_US.utf8" locale
(which *is* a legal UTF-8 locale, and should not be an error).

- -- 
Jonathan Callen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJMUhkgAAoJELHSF2kinlg4dwYQAKkGNSIQJR+2If0c97MSTWZz
hj5UAUrj+hYsxdg4rjOt/J6rGdh/iG+v1OwzaGZy0GZpb3O/KKajrfbYDaarGXMp
RwHviKOh+jVZqnaCKF63Iz4F80BaEJpvuQBfU0zrwRVlvl5nVS9HaOuwXslFKFJr
ge4ygrsRkKWqenaVZbjvWnYWeFWxMHF3iGH77uWrAci04cDArJjNX6puCKiwCMEt
F/+aXro7DqkyZws084L1xjovfWs9HcbdkGPMsQ5TR48MqRIDRDuxKiNoRhRQoDjM
qSUKR8FpZtTcrXyIsPrZw85f2XAAsXXdW6aMwVcpj9rS7JxNeM8/383Z5A+i/za2
iyynZcBhZj1jYOWtghCvfOeKHdO+s6iBPRg/yN7WAashiS6cCa+hBwXeHT1YDw1V
iXSKfSKQnPcT1sqXqtZ7IkLKvXxG0PTNIrpIJya7SXCKTlZP97E6uVZcJeYYOT3Y
sN0FqCxJ7F7SIRndfC4Q9fxU6wxcNICoB6VF1jkpyYccO7XyjFqL9zNfd9+2Pe6u
hqAVZpae7GbE5NJJnkWvb7fQj0PVdhlk54dUdr9p5cinKnfV2hPW+23lSInpkgdw
Oa1ZMUy1G9+lEUsCN2ve/l3gfuUAWXeZx/Nuo6ieuJ/HJLFkAn9Cbbpy9C+VlkxN
K2S4CEu16mDy9zgrrbq+
=w5tr
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:20 ` "Paweł Hajdan, Jr."
@ 2010-07-30  2:29   ` Arfrever Frehtes Taifersar Arahesis
  2010-07-30  3:05     ` "Paweł Hajdan, Jr."
  0 siblings, 1 reply; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-07-30  2:29 UTC (permalink / raw
  To: Gentoo Development

[-- Attachment #1: Type: Text/Plain, Size: 1414 bytes --]

2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a):
> On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> > 
> > --- python.eclass
> > +++ python.eclass
> > @@ -355,6 +355,8 @@
> >  	# Check if phase is pkg_setup().
> >  	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
> >  
> > +	local locale
> > +
> >  	if [[ "$#" -ne 0 ]]; then
> >  		die "${FUNCNAME}() does not accept arguments"
> >  	fi
> > @@ -407,6 +409,16 @@
> >  		unset -f python_pkg_setup_check_USE_flags
> >  	fi
> 
> nit: Why not declare "local locale" here, close to its usage?

It's consistent with style used in python.eclass.

> > +	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> > +	if [[ "${locale}" != *.UTF-8 ]]; then
> > +		eerror
> > +		eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
> > +		eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
> > +		eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
> > +		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
> > +		eerror
> > +	fi
> > +
> >  	PYTHON_PKG_SETUP_EXECUTED="1"
> >  }
> >  

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Re: Locale check in python_pkg_setup()
  2010-07-30  0:13 ` [gentoo-dev] " Jonathan Callen
@ 2010-07-30  2:32   ` Arfrever Frehtes Taifersar Arahesis
  0 siblings, 0 replies; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-07-30  2:32 UTC (permalink / raw
  To: Gentoo Development


[-- Attachment #1.1: Type: Text/Plain, Size: 735 bytes --]

2010-07-30 02:13:20 Jonathan Callen napisał(a):
> On 07/29/2010 07:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> > +	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> > +	if [[ "${locale}" != *.UTF-8 ]]; then
> 
> Shouldn't you be checking the output of `locale charmap` instead of the
> actual contents of the LC_ALL/LC_CTYPE/LANG variables?  You currently
> are reporting an error if someone is using the "en_US.utf8" locale
> (which *is* a legal UTF-8 locale, and should not be an error).

OK. I will check output of `locale charmap`, but the actual locale is more useful in error message.

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #1.2: python.eclass.patch --]
[-- Type: text/x-patch, Size: 1025 bytes --]

--- python.eclass
+++ python.eclass
@@ -355,6 +355,8 @@
 	# Check if phase is pkg_setup().
 	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
 
+	local locale
+
 	if [[ "$#" -ne 0 ]]; then
 		die "${FUNCNAME}() does not accept arguments"
 	fi
@@ -407,6 +409,16 @@
 		unset -f python_pkg_setup_check_USE_flags
 	fi
 
+	if [[ "$(locale charmap)" != "UTF-8" ]]; then
+		locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
+		eerror
+		eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
+		eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
+		eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
+		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
+		eerror
+	fi
+
 	PYTHON_PKG_SETUP_EXECUTED="1"
 }
 

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
  2010-07-29 23:20 ` "Paweł Hajdan, Jr."
  2010-07-30  0:13 ` [gentoo-dev] " Jonathan Callen
@ 2010-07-30  2:36 ` Brian Harring
  2010-07-31 14:44   ` Arfrever Frehtes Taifersar Arahesis
  2010-07-30  3:15 ` Krzysztof Pawlik
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Brian Harring @ 2010-07-30  2:36 UTC (permalink / raw
  To: Arfrever Frehtes Taifersar Arahesis; +Cc: gentoo-dev, qa

[-- Attachment #1: Type: text/plain, Size: 5520 bytes --]

On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> --- python.eclass
> +++ python.eclass
> @@ -355,6 +355,8 @@
>  	# Check if phase is pkg_setup().
>  	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
>  
> +	local locale
> +
>  	if [[ "$#" -ne 0 ]]; then
>  		die "${FUNCNAME}() does not accept arguments"
>  	fi
> @@ -407,6 +409,16 @@
>  		unset -f python_pkg_setup_check_USE_flags
>  	fi
>  
> +	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"

You're using python to get the exported env.  Don't.  Use bash (you're 
invoking python from freaking bash after all)...

> +	if [[ "${locale}" != *.UTF-8 ]]; then
> +		eerror
> +		eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
> +		eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
> +		eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
> +		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
> +		eerror

For cases such as this, ewarn, not eerror.  It's not an actual error, 
it's a potential source of problems people may see.

The more I look into this issue, the more I'm convinced it's not user 
settings that are problem- the problem is in the code, not in user 
env.  You've stated in a couple of places that "C/Posix locales are 
not supported", which frankly is very whacked- that's not really a 
proclamation you can make on your own for python, and you're actually 
ignoring that this problem would just as easily rear it's head with a 
latin-1 encoded file.


Take a look at 302425; the traceback in that is a classic example of 
where they *should* be using bytes mode (they don't need to interpret 
the data, just write the script across, thus bytes).

bug 328047 is induced by a patch we add (it's not in upstream python).  
The code in question also is invoking fricking ldd a few steps prior 
which is questionable in multiple ways: either way, relevant chunk is
+            os.system("ldd %s > %s" % (do_readline, tmpfile))
+            fp = open(tmpfile)
+            for ln in fp:

So... roughly, it invokes os.system, which will pass the environment 
straight through to it, meaning locale gets passed down.

Then it open's the file.  Note it specifes *NO ENCODING* nor is their 
actually an enforced locale best I can tell , thus ascii being the 
default.  The screwup here is in our patches- said patches should be 
forcing posix locale for the ldd call (resulting in ascii).  If you 
think through this bug, we've seen this multiple times in grep/sed 
calls- this is literally no different.

bug 287439 is a screw up in the programs source... should've been 
using bytes (non arguable).  Matter of fact, while generally I think 
Tarek knows what the hell he's doing, the skip they added to the 
tests ignored an actual valid bug in setuptools/distribute- shebangs 
from the standpoint of the kernel need to be consistant.  Thus reading 
the shebang line itself should be done in bytes, than converted to 
ascii and interpretted- they tried opening the file (in whole) in 
bytes, meaning they tried enforcing ascii across the whole buffer- 
not just the first line.  Program bug.

These bugs I got via searching for 'ALL python locale', and 
identifying the ones that were actually locale related.  I've at this 
point looked into the source of 3 bugs- meaning literally, 3 bugs 
checked into, 3 instances where the code was wrong.

I'll leave it as an exercise for others to keep digging, but the point 
here is that the programs themselves screwup their locale handling- 
trying to force all systems to use a utf-8 locale for the env is just 
a hack instead of fixing the actual issue.  A pretty bad hack 
considering I've spent all of 30 minutes digging into this and rooting 
out the actual flaws in the src I might add.

For shits and giggles, lets add one more bug in- one that has the 
potential of rearing its head in random consuming pkgs, bug 322425 
(docutils's build_html being flawed), their encoding handling is 
intrinsically flawed.  The encoding of a file their 
installing/parsing should be determined by the file itself- not 
attempting to arbitrarily force it to whatever locale the user happens 
to be running (which is exactly the first thing buildhtml.py attempts, 
literally `locale.setlocale(locale.LC_ALL, '')` at line 20).  The 
issue is not people using ascii locales, the issue is that these tools 
do not handle encoding correctly.

Recall, one of the purposes of py3k going bytes vs text (aka unicode) 
was to make clear that textual data's encoding need be known.  All of 
this code isn't actually forcing/handling the encoding for the data 
they deal in- meaning these are literal bugs, exposed purely due to 
py3k actually enforcing encoding in normal file opens.

So... this is a big -1 on adding such a warning (especially 
considering it doesn't actually resolve the raw issues, it just 
sidesteps a couple of cases).

Fix the actual problem instead...

Finally, cc'ing QA since this is a class of bugs they should be aware 
of with py3k.  This is a bit of a sign that a lot of source isn't 
really py3k ready yet either imo, but so it goes...

~harring

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-30  2:29   ` Arfrever Frehtes Taifersar Arahesis
@ 2010-07-30  3:05     ` "Paweł Hajdan, Jr."
  0 siblings, 0 replies; 30+ messages in thread
From: "Paweł Hajdan, Jr." @ 2010-07-30  3:05 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 294 bytes --]

On 7/29/10 7:29 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> 2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a):
>> nit: Why not declare "local locale" here, close to its usage?
> It's consistent with style used in python.eclass.

Fine for me then. Thanks for explaining.

Paweł


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
                   ` (2 preceding siblings ...)
  2010-07-30  2:36 ` [gentoo-dev] " Brian Harring
@ 2010-07-30  3:15 ` Krzysztof Pawlik
  2010-07-30  3:48   ` Brian Harring
  2010-07-30 16:05 ` [gentoo-dev] " Harald van Dijk
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Krzysztof Pawlik @ 2010-07-30  3:15 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

On 07/30/10 01:16, Arfrever Frehtes Taifersar Arahesis wrote:
> We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check
> locale and print error (using eerror(), without die()), when unsupported locale has been detected.

ewarn then instead of eerror - both are nicely visible, and you're actually
*warning* against potential issues.

> +		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for
> information on how to fix locale."

I'm with Brian on this one - my locale (C/POSIX) is not broken, it's the code
that has bugs. Can you please change wording here to read something along "...
for information on switching to Unicode locale." instead of suggesting that
users locale is broken.

-- 
Krzysztof Pawlik  <nelchael at gentoo.org>  key id: 0xF6A80E46
desktop-misc, java, apache, ppc, vim, kernel, python...


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 554 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-30  3:15 ` Krzysztof Pawlik
@ 2010-07-30  3:48   ` Brian Harring
  2010-07-30 16:49     ` "Paweł Hajdan, Jr."
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Harring @ 2010-07-30  3:48 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Fri, Jul 30, 2010 at 05:15:19AM +0200, Krzysztof Pawlik wrote:
> On 07/30/10 01:16, Arfrever Frehtes Taifersar Arahesis wrote:
> > +		eerror "See http://www.gentoo.org/doc/en/utf-8.xml for
> > information on how to fix locale."
> 
> I'm with Brian on this one - my locale (C/POSIX) is not broken, it's the code
> that has bugs. Can you please change wording here to read something along "...
> for information on switching to Unicode locale." instead of suggesting that
> users locale is broken.

From where I'm sitting, the only ebuild that has any business telling 
me to change (or suggesting how) locale is glibc.  Especially when 
we're talking about a warning that will be in 7.6% of the versions
in the tree.

That's pretty freaking spammy... end result will be people switching 
(for better or worse) to stop seeing the complaints.

It's basically annoying people into changing to partially 
sidestep a couple of bugs, instead of fixing the issue- and that's the 
wrong course of action.

~brian

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
                   ` (3 preceding siblings ...)
  2010-07-30  3:15 ` Krzysztof Pawlik
@ 2010-07-30 16:05 ` Harald van Dijk
  2010-07-31  3:37 ` Mike Frysinger
  2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
  6 siblings, 0 replies; 30+ messages in thread
From: Harald van Dijk @ 2010-07-30 16:05 UTC (permalink / raw
  To: gentoo-dev

On Fri, Jul 30, 2010 at 01:16:18AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check
> locale and print error (using eerror(), without die()), when unsupported locale has been detected.

I'm strongly with Brian on this. You receive too many valid bug reports
caused by a broken package.  python_pkg_setup needs to do nothing. You
need to fix the bugs, or if fixing them is too much of an issue, work
around them in the ebuild. Keep in mind that having no locale explicitly
selected is the default for a Gentoo installation, and that the docs do
not (and should not) say anywhere that non-UTF-8 locales are unsupported.
In fact, quoting from
  <http://www.gentoo.org/doc/en/guide-localization.xml>:

"It's also possible, and pretty common especially in a more traditional
 UNIX environment, to leave the global settings unchanged, i.e. in the
 "C" locale. Users can still specify their preferred locale in their own
 shell RC file:"



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-30  3:48   ` Brian Harring
@ 2010-07-30 16:49     ` "Paweł Hajdan, Jr."
  2010-07-30 18:45       ` Brian Harring
  2010-07-31 21:39       ` James Cloos
  0 siblings, 2 replies; 30+ messages in thread
From: "Paweł Hajdan, Jr." @ 2010-07-30 16:49 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

On 7/29/10 8:48 PM, Brian Harring wrote:
> It's basically annoying people into changing to partially 
> sidestep a couple of bugs, instead of fixing the issue- and that's the 
> wrong course of action.

I think that with python earlier than python-3 unicode handling is quite
complicated, and I'm not surprised there are problems with that.

Arfrever, does python-3 have the same problem with non-UTF8 locales?

Another thing we can consider is making UTF8 the default setup in
Gentoo. I think most people (including me) don't care whether it's C or
UTF8 as long as it works.

Paweł


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-30 16:49     ` "Paweł Hajdan, Jr."
@ 2010-07-30 18:45       ` Brian Harring
  2010-07-31 21:39       ` James Cloos
  1 sibling, 0 replies; 30+ messages in thread
From: Brian Harring @ 2010-07-30 18:45 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1830 bytes --]

On Fri, Jul 30, 2010 at 09:49:21AM -0700, "Paweee Hajdan, Jr." wrote:
> On 7/29/10 8:48 PM, Brian Harring wrote:
> > It's basically annoying people into changing to partially 
> > sidestep a couple of bugs, instead of fixing the issue- and that's the 
> > wrong course of action.
> 
> I think that with python earlier than python-3 unicode handling is quite
> complicated, and I'm not surprised there are problems with that.

encoding handling wasn't that bad under py2k.  Py3k just enforces the 
boundaries- meaning you can't just skid by.

> Arfrever, does python-3 have the same problem with non-UTF8 locales?

ascii is a subset of utf-8 and ascii is a subset of latin-1; latin-1 
and utf-8 aren't compatible in encoded form however.

What this means is that the same set of bugs I ran down still will go 
boom if you have a utf-8 locale and the code in question was dealing 
w/ a latin-1 encoded file.


> Another thing we can consider is making UTF8 the default setup in
> Gentoo. I think most people (including me) don't care whether it's C or
> UTF8 as long as it works.

"as long as it works" in this case means "fix the code" as I've laid 
out.  Forcing locale's to sidestep it leaves the latin-1/utf8 
incompatibility to go 'boom'.

Basically, forcing utf8 doesn't "make it work".  It reduces the cases 
breakage will show up while leaving those issues still there- frankly 
this is worse, can't fix those screwups without them breaking (for 
better or worse, and preferably breaking in a testcase).  We've got 4 
bugs, and only one of them is semi complex fix (dodcutils needs to 
require that html it's fed is utf8 compatible- valid enough req 
anyways since html shouldn't be latin-1, it should be ascii or utf8).

So.. get fixing, instead of dodging the work imo. ;)

~brian

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
                   ` (4 preceding siblings ...)
  2010-07-30 16:05 ` [gentoo-dev] " Harald van Dijk
@ 2010-07-31  3:37 ` Mike Frysinger
  2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
  6 siblings, 0 replies; 30+ messages in thread
From: Mike Frysinger @ 2010-07-31  3:37 UTC (permalink / raw
  To: gentoo-dev; +Cc: Arfrever Frehtes Taifersar Arahesis

[-- Attachment #1: Type: Text/Plain, Size: 509 bytes --]

On Thursday, July 29, 2010 19:16:42 Arfrever Frehtes Taifersar Arahesis wrote:
> We received too many invalid bugs caused by unsupported locales.
> python_pkg_setup() needs to check locale and print error (using eerror(),
> without die()), when unsupported locale has been detected.

there is no such thing as an "unsupported locale".  only buggy code you should 
be fixing and not dumping onto users.  i wish i could mark all my glibc bugs 
as "invalid" because i didnt feel like fixing them.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-30  2:36 ` [gentoo-dev] " Brian Harring
@ 2010-07-31 14:44   ` Arfrever Frehtes Taifersar Arahesis
  2010-07-31 19:49     ` Alec Warner
  0 siblings, 1 reply; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-07-31 14:44 UTC (permalink / raw
  To: Gentoo Development

[-- Attachment #1: Type: Text/Plain, Size: 1045 bytes --]

2010-07-30 04:36:22 Brian Harring napisał(a):
> On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> > --- python.eclass
> > +++ python.eclass
> > @@ -355,6 +355,8 @@
> >  	# Check if phase is pkg_setup().
> >  	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
> >  
> > +	local locale
> > +
> >  	if [[ "$#" -ne 0 ]]; then
> >  		die "${FUNCNAME}() does not accept arguments"
> >  	fi
> > @@ -407,6 +409,16 @@
> >  		unset -f python_pkg_setup_check_USE_flags
> >  	fi
> >  
> > +	locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> 
> You're using python to get the exported env.  Don't.  Use bash (you're 
> invoking python from freaking bash after all)...

Given variable can be set, but not exported.

> bug 328047 is induced by a patch we add (it's not in upstream python).  

This patch comes from upstream.

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 14:44   ` Arfrever Frehtes Taifersar Arahesis
@ 2010-07-31 19:49     ` Alec Warner
  2010-07-31 20:10       ` Arfrever Frehtes Taifersar Arahesis
  0 siblings, 1 reply; 30+ messages in thread
From: Alec Warner @ 2010-07-31 19:49 UTC (permalink / raw
  To: gentoo-dev

On Sat, Jul 31, 2010 at 7:44 AM, Arfrever Frehtes Taifersar Arahesis
<Arfrever@gentoo.org> wrote:
> 2010-07-30 04:36:22 Brian Harring napisał(a):
>> On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
>> > --- python.eclass
>> > +++ python.eclass
>> > @@ -355,6 +355,8 @@
>> >     # Check if phase is pkg_setup().
>> >     [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
>> >
>> > +   local locale
>> > +
>> >     if [[ "$#" -ne 0 ]]; then
>> >             die "${FUNCNAME}() does not accept arguments"
>> >     fi
>> > @@ -407,6 +409,16 @@
>> >             unset -f python_pkg_setup_check_USE_flags
>> >     fi
>> >
>> > +   locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
>>
>> You're using python to get the exported env.  Don't.  Use bash (you're
>> invoking python from freaking bash after all)...
>
> Given variable can be set, but not exported.

If the variable is set but not exported then it is local to the shell
env.  When bash goes to exec() python the local shell variables are
not in the env; so os.environ() will not contain them.

antarus@kyoto ~ $ foo=BAR
antarus@kyoto ~ $ echo $foo
BAR
antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
None
antarus@kyoto ~ $ export foo
antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
BAR

so how is this any different than:

[[ -n $LC_TYPE ]] && locale=$LC_TYPE
[[ -n $LC_ALL ]] && locale=$LC_ALL
locale=${locale:-POSIX}

if you want to keep it short; or the longer version with more ifs and
less shell magic.  Normally I'm not a big performance man myself; but
this is in an eclass used by lots of packages; not just one ebuild.

>
>> bug 328047 is induced by a patch we add (it's not in upstream python).
>
> This patch comes from upstream.
>
> --
> Arfrever Frehtes Taifersar Arahesis
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 19:49     ` Alec Warner
@ 2010-07-31 20:10       ` Arfrever Frehtes Taifersar Arahesis
  2010-07-31 20:25         ` Petteri Räty
  0 siblings, 1 reply; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-07-31 20:10 UTC (permalink / raw
  To: Gentoo Development

[-- Attachment #1: Type: Text/Plain, Size: 2163 bytes --]

2010-07-31 21:49:50 Alec Warner napisał(a):
> On Sat, Jul 31, 2010 at 7:44 AM, Arfrever Frehtes Taifersar Arahesis
> <Arfrever@gentoo.org> wrote:
> > 2010-07-30 04:36:22 Brian Harring napisał(a):
> >> On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> >> > --- python.eclass
> >> > +++ python.eclass
> >> > @@ -355,6 +355,8 @@
> >> >     # Check if phase is pkg_setup().
> >> >     [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
> >> >
> >> > +   local locale
> >> > +
> >> >     if [[ "$#" -ne 0 ]]; then
> >> >             die "${FUNCNAME}() does not accept arguments"
> >> >     fi
> >> > @@ -407,6 +409,16 @@
> >> >             unset -f python_pkg_setup_check_USE_flags
> >> >     fi
> >> >
> >> > +   locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
> >>
> >> You're using python to get the exported env.  Don't.  Use bash (you're
> >> invoking python from freaking bash after all)...
> >
> > Given variable can be set, but not exported.
> 
> If the variable is set but not exported then it is local to the shell
> env.  When bash goes to exec() python the local shell variables are
> not in the env; so os.environ() will not contain them.
> 
> antarus@kyoto ~ $ foo=BAR
> antarus@kyoto ~ $ echo $foo
> BAR
> antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
> None
> antarus@kyoto ~ $ export foo
> antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
> BAR

I want only variables exported to Python processes.

> so how is this any different than:
> 
> [[ -n $LC_TYPE ]] && locale=$LC_TYPE
> [[ -n $LC_ALL ]] && locale=$LC_ALL
> locale=${locale:-POSIX}

This code uses non-exported variables.

> if you want to keep it short; or the longer version with more ifs and
> less shell magic.  Normally I'm not a big performance man myself; but
> this is in an eclass used by lots of packages; not just one ebuild.

python_pkg_setup() is a rarely called function.

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 20:10       ` Arfrever Frehtes Taifersar Arahesis
@ 2010-07-31 20:25         ` Petteri Räty
  2010-08-02 21:02           ` Arfrever Frehtes Taifersar Arahesis
  0 siblings, 1 reply; 30+ messages in thread
From: Petteri Räty @ 2010-07-31 20:25 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]

On 07/31/2010 11:10 PM, Arfrever Frehtes Taifersar Arahesis wrote:
>>
>> If the variable is set but not exported then it is local to the shell
>> env.  When bash goes to exec() python the local shell variables are
>> not in the env; so os.environ() will not contain them.
>>
>> antarus@kyoto ~ $ foo=BAR
>> antarus@kyoto ~ $ echo $foo
>> BAR
>> antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
>> None
>> antarus@kyoto ~ $ export foo
>> antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
>> BAR
> 
> I want only variables exported to Python processes.
> 

export -p

Petteri


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-30 16:49     ` "Paweł Hajdan, Jr."
  2010-07-30 18:45       ` Brian Harring
@ 2010-07-31 21:39       ` James Cloos
  2010-07-31 22:04         ` Mike Frysinger
  2010-07-31 23:30         ` [gentoo-dev] " Jonathan Callen
  1 sibling, 2 replies; 30+ messages in thread
From: James Cloos @ 2010-07-31 21:39 UTC (permalink / raw
  To: gentoo-dev

>>>>> "PH" == Paweł Hajdan, <phajdan.jr@gentoo.org> writes:

PH> Another thing we can consider is making UTF8 the default setup in
PH> Gentoo. I think most people (including me) don't care whether it's
PH> C or UTF8 as long as it works.

Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or
a POSIX.UTF-8 locale.

That should be done upstream in glibc, but were they to refuse then
Gentoo should add it to the glibc ebuild.

The language_country locales are just wrong for root.  They are often
broken (locales like en_US force case-insensitive colation, meaning that
a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files
which one would not expect to match) and cause bugs.

In fact, glibc's insistance that C and POSIX are ascii rather than raw
unspecified eight bit is itself a bug.

Utf8 is nice, but forcing the lang_country locales on root is not.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 21:39       ` James Cloos
@ 2010-07-31 22:04         ` Mike Frysinger
  2010-07-31 22:14           ` James Cloos
  2010-07-31 23:30         ` [gentoo-dev] " Jonathan Callen
  1 sibling, 1 reply; 30+ messages in thread
From: Mike Frysinger @ 2010-07-31 22:04 UTC (permalink / raw
  To: gentoo-dev; +Cc: James Cloos

[-- Attachment #1: Type: Text/Plain, Size: 1332 bytes --]

On Saturday, July 31, 2010 17:39:27 James Cloos wrote:
> Paweł Hajdan writes:
> > Another thing we can consider is making UTF8 the default setup in
> > Gentoo. I think most people (including me) don't care whether it's
> > C or UTF8 as long as it works.
> 
> Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or
> a POSIX.UTF-8 locale.
> 
> In fact, glibc's insistance that C and POSIX are ascii rather than raw
> unspecified eight bit is itself a bug.

yeah, no.  take it up with the POSIX group where they're already working on 
defining a C.UTF-8/etc... locale.

> That should be done upstream in glibc, but were they to refuse then
> Gentoo should add it to the glibc ebuild.

this doesnt really make sense, upstream or down.  if you wanted to talk about 
setting default LANG in the baselayout, then that's about the only reasonable 
possibility (especially since we already do this to a degree).  screwing with 
default locale when no locale variables are set is madness.

> The language_country locales are just wrong for root.  They are often
> broken (locales like en_US force case-insensitive colation, meaning that
> a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files
> which one would not expect to match) and cause bugs.

this is pure opinion
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 22:04         ` Mike Frysinger
@ 2010-07-31 22:14           ` James Cloos
  2010-07-31 22:53             ` Mike Frysinger
  0 siblings, 1 reply; 30+ messages in thread
From: James Cloos @ 2010-07-31 22:14 UTC (permalink / raw
  To: Mike Frysinger; +Cc: gentoo-dev

>>>>> "MF" == Mike Frysinger <vapier@gentoo.org> writes:

MF> take it up with the POSIX group where they're already working on 
MF> defining a C.UTF-8/etc... locale.

Then they agree with me.  Good to know.  Thanks.

MF> screwing with default locale when no locale variables are set is madness.

I never said anything about changing C or POSIX.  Only about creating
C.UTF-8 and/or POSIX.UTF-8.

>> The language_country locales are just wrong for root.

MF> this is pure opinion

Expert opinion.  I've seen what can and will go wrong over the entire
existance of the concept of locales (on Linux-based systems, at least).

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 22:14           ` James Cloos
@ 2010-07-31 22:53             ` Mike Frysinger
  0 siblings, 0 replies; 30+ messages in thread
From: Mike Frysinger @ 2010-07-31 22:53 UTC (permalink / raw
  To: James Cloos; +Cc: gentoo-dev

[-- Attachment #1: Type: Text/Plain, Size: 701 bytes --]

On Saturday, July 31, 2010 18:14:50 James Cloos wrote:
> Mike Frysinger writes:
> > screwing with default locale when no locale variables are set is
> > madness.
> 
> I never said anything about changing C or POSIX.  Only about creating
> C.UTF-8 and/or POSIX.UTF-8.

sorry, i misread.  thought you were talking about changing default behavior 
and not just the creation of new locales.

> >> The language_country locales are just wrong for root.
> >
> > this is pure opinion
> 
> Expert opinion.

i'm sure you're of that opinion ;).  my point was that the default isnt going 
to change in Gentoo that doesnt go through glibc, and that is most likely to 
not change either.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [gentoo-dev] Re: Locale check in python_pkg_setup()
  2010-07-31 21:39       ` James Cloos
  2010-07-31 22:04         ` Mike Frysinger
@ 2010-07-31 23:30         ` Jonathan Callen
  2010-08-05 14:00           ` James Cloos
  1 sibling, 1 reply; 30+ messages in thread
From: Jonathan Callen @ 2010-07-31 23:30 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 07/31/2010 05:39 PM, James Cloos wrote:
>>>>>> "PH" == Paweł Hajdan, <phajdan.jr@gentoo.org> writes:
> 
> PH> Another thing we can consider is making UTF8 the default setup in
> PH> Gentoo. I think most people (including me) don't care whether it's
> PH> C or UTF8 as long as it works.
> 
> Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or
> a POSIX.UTF-8 locale.
> 
> That should be done upstream in glibc, but were they to refuse then
> Gentoo should add it to the glibc ebuild.
> 
> The language_country locales are just wrong for root.  They are often
> broken (locales like en_US force case-insensitive colation, meaning that
> a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files
> which one would not expect to match) and cause bugs.
> 
> In fact, glibc's insistance that C and POSIX are ascii rather than raw
> unspecified eight bit is itself a bug.
> 
> Utf8 is nice, but forcing the lang_country locales on root is not.
> 
> -JimC

You can create a POSIX.UTF-8 locale right now, using the same
/etc/locale.gen mechanism that is used for generating other locales
(localedef will output a few warnings, but the generated locale works
just fine from what I can see).  If you want a C.UTF-8 locale, then you
just need to symlink /usr/share/i18n/locales/C to POSIX (or call
localedef directly as "localedef --add-to-archive -i POSIX -f UTF-8
C.UTF-8").

If there are any issues with those locales besides the warnings that
localedef outputs, I haven't seen them yet.

- -- 
Jonathan Callen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJMVLH7AAoJELHSF2kinlg4+VYQAIB/Qc3Oq6lmK6tgiXADjk1Y
ICMCTbxyuCRNkfllwVqqIKEMUE/UmkqIjkY2/1m2uHp3kIm8tErSa1AohdSoJncc
7LIH17daM7T9XylA7DoqX7et3E3mtl8SerGHFMQ7ae0qYMUkbnNeyeUq4mVhH35G
IazjLFCIn0KlLmsim+8ILh8OQ4NWGK1JQlqXDluxHb3BVK37XDLWmvz5gG1+CTmS
KrmL3ek+BujiHOfAuvc86jFi9rWMP/yPh8OMIOsG41e/4hdNnhhhwiF0MHRs6bpO
Ql3FLsQjS5J7o6MC5690r/Ov/qHj/PAVITXft5cEQhq/gK17sg5TM5zs1JZxNMpH
T5z8LuSJenB6hF/+Gk0aew0XKig52539KZRnYShyl9z0QlLUlmwj0L3t8cFnm1in
2ttaeVttc4P2gwaF5Uf4ljEPFJ5w3lVIsXtRJklcPOjDUlCwnpYU0y5GS7RtAXJG
l/4Ax2/yW8P070dg7AoYh1WVTY1ChsyRNTECFYfge8ra5OnXT9HJPVBm7FFTof+L
IYXJ8zOGnDm32xsiov0LsrYC5KiD+FixkqTiPUHnbZm9KmI/HCyvnODm3cD+k8ts
Ht4JXxdVPEjv37bpDgdSbrI2vFb3sfpdH/wY1LMoAU00p9f/xwM2d9R1i+Q08CBV
74aYdVDpAQi5Kqevehw8
=7bB7
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-31 20:25         ` Petteri Räty
@ 2010-08-02 21:02           ` Arfrever Frehtes Taifersar Arahesis
  2010-08-02 21:40             ` Harald van Dijk
  0 siblings, 1 reply; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-08-02 21:02 UTC (permalink / raw
  To: Gentoo Development

[-- Attachment #1: Type: Text/Plain, Size: 1085 bytes --]

2010-07-31 22:25:26 Petteri Räty napisał(a):
> On 07/31/2010 11:10 PM, Arfrever Frehtes Taifersar Arahesis wrote:
> >>
> >> If the variable is set but not exported then it is local to the shell
> >> env.  When bash goes to exec() python the local shell variables are
> >> not in the env; so os.environ() will not contain them.
> >>
> >> antarus@kyoto ~ $ foo=BAR
> >> antarus@kyoto ~ $ echo $foo
> >> BAR
> >> antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
> >> None
> >> antarus@kyoto ~ $ export foo
> >> antarus@kyoto ~ $ python -c 'import os; print os.environ.get("foo")'
> >> BAR
> > 
> > I want only variables exported to Python processes.
> > 
> 
> export -p

It would have to be parsed using e.g. grep and sed. It's easier to call Python in this case.
The call to Python is sufficiently fast:

$ time python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))' > /dev/null

real    0m0.062s
user    0m0.051s
sys     0m0.011s

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
                   ` (5 preceding siblings ...)
  2010-07-31  3:37 ` Mike Frysinger
@ 2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
  2010-08-02 21:40   ` Mike Frysinger
                     ` (3 more replies)
  6 siblings, 4 replies; 30+ messages in thread
From: Arfrever Frehtes Taifersar Arahesis @ 2010-08-02 21:18 UTC (permalink / raw
  To: Gentoo Development


[-- Attachment #1.1: Type: Text/Plain, Size: 79 bytes --]

A milder warning will be printed.

-- 
Arfrever Frehtes Taifersar Arahesis

[-- Attachment #1.2: python.eclass.patch --]
[-- Type: text/x-patch, Size: 904 bytes --]

--- python.eclass
+++ python.eclass
@@ -355,6 +355,8 @@
 	# Check if phase is pkg_setup().
 	[[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
 
+	local locale
+
 	if [[ "$#" -ne 0 ]]; then
 		die "${FUNCNAME}() does not accept arguments"
 	fi
@@ -407,6 +409,15 @@
 		unset -f python_pkg_setup_check_USE_flags
 	fi
 
+	if [[ "$(locale charmap)" != "UTF-8" ]]; then
+		locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
+		ewarn
+		ewarn "Currently used locale '${locale}' can cause UnicodeDecodeError or UnicodeEncodeError"
+		ewarn "exceptions. It is recommended to use a UTF-8 locale to avoid problems."
+		ewarn "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to change locale."
+		ewarn
+	fi
+
 	PYTHON_PKG_SETUP_EXECUTED="1"
 }
 

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
@ 2010-08-02 21:40   ` Mike Frysinger
  2010-08-02 22:08   ` Jeroen Roovers
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Mike Frysinger @ 2010-08-02 21:40 UTC (permalink / raw
  To: gentoo-dev; +Cc: Arfrever Frehtes Taifersar Arahesis

[-- Attachment #1: Type: Text/Plain, Size: 549 bytes --]

the env lookup is still way more complicated than it needs to be:
	locale=${LC_ALL:-${LC_CTYPE:-${LANG}}}

it is not possible for these variables to be set but not exported unless an 
ebuild is doing it, and those are broken anyways.

the message should also reference an open bug report for people to follow so 
that this issue is properly fixed.  you cant drop warnings like this into 
places without a referenced bug # otherwise they can easily become outdated 
but no one knows how to easily determine "can this be punted yet".
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-08-02 21:02           ` Arfrever Frehtes Taifersar Arahesis
@ 2010-08-02 21:40             ` Harald van Dijk
  0 siblings, 0 replies; 30+ messages in thread
From: Harald van Dijk @ 2010-08-02 21:40 UTC (permalink / raw
  To: gentoo-dev

On Mon, Aug 02, 2010 at 11:02:20PM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> It would have to be parsed using e.g. grep and sed. It's easier to call Python in this case.

It's even easier not to.

> The call to Python is sufficiently fast:
> 
> $ time python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))' > /dev/null
> 
> real    0m0.062s
> user    0m0.051s
> sys     0m0.011s

Let's compare. On my system:

time python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))'
en_GB.UTF-8

real	0m0.020s
user	0m0.016s
sys	0m0.004s

time sh -c 'echo "${LC_ALL:-${LC_CTYPE:-${LANG:-POSIX}}}"'
en_GB.UTF-8

real	0m0.001s
user	0m0.000s
sys	0m0.000s

And that's after several runs for both, so it's not caused by the
initial load of python, which wasn't in memory yet.

Yes, 0.019s is very little, but in this case I see absolutely no benefit
whatsoever in calling python. Plus sh has the advantage of actually
working when LC_ALL is exported as "" (which in LC_* means the same as
having it unset)...

But why exactly are you concerned about LC_* being defined but not
exported anyway? You're checking from an ebuild; locales are going to
get inherited from portage or profile.env anyway, so you can just
assume that if they _are_ set, they're exported. The only way they might
not be is if the user is messing with the locale from the bashrc, and if
the user's doing that, the user really needs to fix the bashrc and
export the vars anyway.

None of this changes the fact that locale checks warns about bugs in packages,
not bugs in the user's configuration.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
  2010-08-02 21:40   ` Mike Frysinger
@ 2010-08-02 22:08   ` Jeroen Roovers
  2010-08-02 22:13   ` Jeroen Roovers
  2010-08-03  0:58   ` Brian Harring
  3 siblings, 0 replies; 30+ messages in thread
From: Jeroen Roovers @ 2010-08-02 22:08 UTC (permalink / raw
  To: gentoo-dev

On Mon, 2 Aug 2010 23:18:59 +0200
Arfrever Frehtes Taifersar Arahesis <Arfrever@gentoo.org> wrote:

> A milder warning will be printed.

I distinctly remember several voices being raised in this thread very
recently, suggesting if not demanding that you should not convey a
message like that at all, but fix the affected packages instead.


     jer



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
  2010-08-02 21:40   ` Mike Frysinger
  2010-08-02 22:08   ` Jeroen Roovers
@ 2010-08-02 22:13   ` Jeroen Roovers
  2010-08-03  0:58   ` Brian Harring
  3 siblings, 0 replies; 30+ messages in thread
From: Jeroen Roovers @ 2010-08-02 22:13 UTC (permalink / raw
  To: gentoo-dev

On Mon, 2 Aug 2010 23:18:59 +0200
Arfrever Frehtes Taifersar Arahesis <Arfrever@gentoo.org> wrote:

> +		ewarn "exceptions. It is recommended to use a UTF-8
> locale to avoid problems."
> +		ewarn "See http://www.gentoo.org/doc/en/utf-8.xml
> for information on how to change locale."

In fact the documentation you point to positively encourages
users/admins to set up locales and explains how to do it system-wide,
and in no place does it warn against any adverse effects of doing so.
So you can't even point to that documentation in defence of this "milder
warning".



     jer



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Locale check in python_pkg_setup()
  2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
                     ` (2 preceding siblings ...)
  2010-08-02 22:13   ` Jeroen Roovers
@ 2010-08-03  0:58   ` Brian Harring
  3 siblings, 0 replies; 30+ messages in thread
From: Brian Harring @ 2010-08-03  0:58 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

On Mon, Aug 02, 2010 at 11:18:59PM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> A milder warning will be printed.

Guessing you didn't get the part about "no warning should be put in" 
that everyone stated?  You're ignoring that this message also will 
make users think that switching their locale will magically fix 
programs that chuck encoding errors (validly so, if not particularly 
user friendly) when running into improperly encoded files (regardless 
of locale).

This locale crap doesn't belong in the tree, mild warning or not- do 
not add it.  Take it up to the council if you really think everyone 
else is wrong and still want it.

~harring

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [gentoo-dev] Re: Locale check in python_pkg_setup()
  2010-07-31 23:30         ` [gentoo-dev] " Jonathan Callen
@ 2010-08-05 14:00           ` James Cloos
  0 siblings, 0 replies; 30+ messages in thread
From: James Cloos @ 2010-08-05 14:00 UTC (permalink / raw
  To: gentoo-dev

>>>>> "JC" == Jonathan Callen <abcd@gentoo.org> writes:

JC> You can create a POSIX.UTF-8 locale right now, using the same
JC> /etc/locale.gen mechanism that is used for generating other locales
JC> (localedef will output a few warnings, but the generated locale
JC> works just fine from what I can see).

JC> If there are any issues with those locales besides the warnings that
JC> localedef outputs, I haven't seen them yet.

There will be more errors, such as:

,----
| Cannot open the message catalog "man" for locale "POSIX.UTF-8"
| (NLSPATH="<none>")
`----

But it is a useful tip.  Thanks.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-08-05 14:01 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-29 23:16 [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis
2010-07-29 23:20 ` "Paweł Hajdan, Jr."
2010-07-30  2:29   ` Arfrever Frehtes Taifersar Arahesis
2010-07-30  3:05     ` "Paweł Hajdan, Jr."
2010-07-30  0:13 ` [gentoo-dev] " Jonathan Callen
2010-07-30  2:32   ` Arfrever Frehtes Taifersar Arahesis
2010-07-30  2:36 ` [gentoo-dev] " Brian Harring
2010-07-31 14:44   ` Arfrever Frehtes Taifersar Arahesis
2010-07-31 19:49     ` Alec Warner
2010-07-31 20:10       ` Arfrever Frehtes Taifersar Arahesis
2010-07-31 20:25         ` Petteri Räty
2010-08-02 21:02           ` Arfrever Frehtes Taifersar Arahesis
2010-08-02 21:40             ` Harald van Dijk
2010-07-30  3:15 ` Krzysztof Pawlik
2010-07-30  3:48   ` Brian Harring
2010-07-30 16:49     ` "Paweł Hajdan, Jr."
2010-07-30 18:45       ` Brian Harring
2010-07-31 21:39       ` James Cloos
2010-07-31 22:04         ` Mike Frysinger
2010-07-31 22:14           ` James Cloos
2010-07-31 22:53             ` Mike Frysinger
2010-07-31 23:30         ` [gentoo-dev] " Jonathan Callen
2010-08-05 14:00           ` James Cloos
2010-07-30 16:05 ` [gentoo-dev] " Harald van Dijk
2010-07-31  3:37 ` Mike Frysinger
2010-08-02 21:18 ` Arfrever Frehtes Taifersar Arahesis
2010-08-02 21:40   ` Mike Frysinger
2010-08-02 22:08   ` Jeroen Roovers
2010-08-02 22:13   ` Jeroen Roovers
2010-08-03  0:58   ` Brian Harring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox