public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-portage-dev] [PATCH 1/2] setup.py: add stub for building custom modules in C/C++
@ 2016-05-22 17:04 Anthony G. Basile
  2016-05-22 17:04 ` [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale Anthony G. Basile
  0 siblings, 1 reply; 7+ messages in thread
From: Anthony G. Basile @ 2016-05-22 17:04 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Anthony G. Basile

From: "Anthony G. Basile" <blueness@gentoo.org>

Currently portage doesn't include any custom modules written in C/C++.
This commit introduces stub code for building such modules in setup.py.

Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
---
 setup.py | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/setup.py b/setup.py
index 75c4bcb..25429bc 100755
--- a/setup.py
+++ b/setup.py
@@ -4,7 +4,7 @@
 
 from __future__ import print_function
 
-from distutils.core import setup, Command
+from distutils.core import setup, Command, Extension
 from distutils.command.build import build
 from distutils.command.build_scripts import build_scripts
 from distutils.command.clean import clean
@@ -30,6 +30,9 @@ import sys
 # TODO:
 # - smarter rebuilds of docs w/ 'install_docbook' and 'install_epydoc'.
 
+# Dictionary of scripts.  The structure is
+#   key   = location in filesystem to install the scripts
+#   value = list of scripts, path relative to top source directory
 x_scripts = {
 	'bin': [
 		'bin/ebuild', 'bin/egencache', 'bin/emerge', 'bin/emerge-webrsync',
@@ -41,6 +44,10 @@ x_scripts = {
 	],
 }
 
+# Dictionary custom modules written in C/C++ here.  The structure is
+#   key   = module name
+#   value = list of C/C++ source code, path relative to top source directory
+x_c_helpers = {}
 
 class x_build(build):
 	""" Build command with extra build_man call. """
@@ -636,6 +643,8 @@ setup(
 		['$sysconfdir/portage/repo.postsync.d', ['cnf/repo.postsync.d/example']],
 	],
 
+	ext_modules = [Extension(name=n, sources=m) for n, m in x_c_helpers.items()],
+
 	cmdclass = {
 		'build': x_build,
 		'build_man': build_man,
-- 
2.7.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
  2016-05-22 17:04 [gentoo-portage-dev] [PATCH 1/2] setup.py: add stub for building custom modules in C/C++ Anthony G. Basile
@ 2016-05-22 17:04 ` Anthony G. Basile
  2016-05-23  6:44   ` Michał Górny
  0 siblings, 1 reply; 7+ messages in thread
From: Anthony G. Basile @ 2016-05-22 17:04 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Anthony G. Basile

From: "Anthony G. Basile" <blueness@gentoo.org>

The current method to check for a sane system locale is to use python's
ctypes.util.find_library() to construct a full library path to the
system libc.so and pass that path to ctypes.CDLL() so we can call
toupper() and tolower() directly.  However, this gets bogged down in
implementation details and fails with musl.

We work around this design flaw in ctypes with a small python module
written in C which provides thin wrappers to toupper() and tolower(),
and only fall back on the current ctypes-based check when this module
is not available.

This has been tested on glibc, uClibc and musl systems.

X-Gentoo-bug: 571444
X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444

Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
---
 pym/portage/util/locale.py   | 32 ++++++++++-----
 setup.py                     |  6 ++-
 src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 11 deletions(-)
 create mode 100644 src/portage_c_convert_case.c

diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
index 2a15ea1..85ddd2b 100644
--- a/pym/portage/util/locale.py
+++ b/pym/portage/util/locale.py
@@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
 import locale
 import logging
 import os
+import sys
 import textwrap
 import traceback
 
@@ -34,18 +35,26 @@ def _check_locale(silent):
 	"""
 	The inner locale check function.
 	"""
-
-	libc_fn = find_library("c")
-	if libc_fn is None:
-		return None
-	libc = LoadLibrary(libc_fn)
-	if libc is None:
-		return None
+	try:
+		from portage_c_convert_case import _c_toupper, _c_tolower
+		libc_tolower = _c_tolower
+		libc_toupper = _c_toupper
+	except ImportError:
+		writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
+			level=logging.WARNING, noiselevel=-1)
+		libc_fn = find_library("c")
+		if libc_fn is None:
+			return None
+		libc = LoadLibrary(libc_fn)
+		if libc is None:
+			return None
+		libc_tolower = libc.tolower
+		libc_toupper = libc.toupper
 
 	lc = list(range(ord('a'), ord('z')+1))
 	uc = list(range(ord('A'), ord('Z')+1))
-	rlc = [libc.tolower(c) for c in uc]
-	ruc = [libc.toupper(c) for c in lc]
+	rlc = [libc_tolower(c) for c in uc]
+	ruc = [libc_toupper(c) for c in lc]
 
 	if lc != rlc or uc != ruc:
 		if silent:
@@ -62,7 +71,10 @@ def _check_locale(silent):
 			"as LC_CTYPE in make.conf.")
 		msg = [l for l in textwrap.wrap(msg, 70)]
 		msg.append("")
-		chars = lambda l: ''.join(chr(x) for x in l)
+		if sys.version_info.major >= 3:
+			chars = lambda l: ''.join(chr(x) for x in l)
+		else:
+			chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)
 		if uc != ruc:
 			msg.extend([
 				"  %s -> %s" % (chars(lc), chars(ruc)),
diff --git a/setup.py b/setup.py
index 25429bc..8b6b408 100755
--- a/setup.py
+++ b/setup.py
@@ -47,7 +47,11 @@ x_scripts = {
 # Dictionary custom modules written in C/C++ here.  The structure is
 #   key   = module name
 #   value = list of C/C++ source code, path relative to top source directory
-x_c_helpers = {}
+x_c_helpers = {
+	'portage_c_convert_case' : [
+		'src/portage_c_convert_case.c',
+	],
+}
 
 class x_build(build):
 	""" Build command with extra build_man call. """
diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
new file mode 100644
index 0000000..f60b0c2
--- /dev/null
+++ b/src/portage_c_convert_case.c
@@ -0,0 +1,94 @@
+/* Copyright 2005-2016 Gentoo Foundation
+ * Distributed under the terms of the GNU General Public License v2
+ */
+
+#include <Python.h>
+#include <ctype.h>
+
+static PyObject * portage_c_tolower(PyObject *, PyObject *);
+static PyObject * portage_c_toupper(PyObject *, PyObject *);
+
+static PyMethodDef ConvertCaseMethods[] = {
+	{"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
+	{"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
+	{NULL, NULL, 0, NULL}
+};
+
+#if PY_MAJOR_VERSION >= 3
+static struct PyModuleDef moduledef = {
+	PyModuleDef_HEAD_INIT,
+	"portage_c_convert_case",					/* m_name */
+	"Module for converting case using the system locale",		/* m_doc */
+	-1,								/* m_size */
+	ConvertCaseMethods,						/* m_methods */
+	NULL,								/* m_reload */
+	NULL,								/* m_traverse */
+	NULL,								/* m_clear */
+	NULL,								/* m_free */
+};
+#endif
+
+static PyObject *ConvertCaseError;
+
+PyMODINIT_FUNC
+#if PY_MAJOR_VERSION >= 3
+PyInit_portage_c_convert_case(void)
+#else
+initportage_c_convert_case(void)
+#endif
+{
+	PyObject *m;
+
+#if PY_MAJOR_VERSION >= 3
+	m = PyModule_Create(&moduledef);
+#else
+	m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
+#endif
+
+	if (m == NULL)
+#if PY_MAJOR_VERSION >= 3
+		return NULL;
+#else
+		return;
+#endif
+
+	ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
+	Py_INCREF(ConvertCaseError);
+	PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
+
+#if PY_MAJOR_VERSION >= 3
+	return m;
+#else
+	return;
+#endif
+}
+
+
+static PyObject *
+portage_c_tolower(PyObject *self, PyObject *args)
+{
+	int c;
+
+	if (!PyArg_ParseTuple(args, "i", &c))
+	{
+		PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");
+		return NULL;
+	}
+
+	return Py_BuildValue("i", tolower(c));
+}
+
+
+static PyObject *
+portage_c_toupper(PyObject *self, PyObject *args)
+{
+	int c;
+
+	if (!PyArg_ParseTuple(args, "i", &c))
+	{
+		PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
+		return NULL;
+	}
+
+	return Py_BuildValue("i", toupper(c));
+}
-- 
2.7.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
  2016-05-22 17:04 ` [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale Anthony G. Basile
@ 2016-05-23  6:44   ` Michał Górny
  2016-05-23 12:08     ` Anthony G. Basile
  0 siblings, 1 reply; 7+ messages in thread
From: Michał Górny @ 2016-05-23  6:44 UTC (permalink / raw
  To: Anthony G. Basile; +Cc: gentoo-portage-dev, Anthony G. Basile

[-- Attachment #1: Type: text/plain, Size: 7724 bytes --]

On Sun, 22 May 2016 13:04:40 -0400
"Anthony G. Basile" <basile@opensource.dyc.edu> wrote:

> From: "Anthony G. Basile" <blueness@gentoo.org>
> 
> The current method to check for a sane system locale is to use python's
> ctypes.util.find_library() to construct a full library path to the
> system libc.so and pass that path to ctypes.CDLL() so we can call
> toupper() and tolower() directly.  However, this gets bogged down in
> implementation details and fails with musl.
> 
> We work around this design flaw in ctypes with a small python module
> written in C which provides thin wrappers to toupper() and tolower(),
> and only fall back on the current ctypes-based check when this module
> is not available.
> 
> This has been tested on glibc, uClibc and musl systems.
> 
> X-Gentoo-bug: 571444
> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
> 
> Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
> ---
>  pym/portage/util/locale.py   | 32 ++++++++++-----
>  setup.py                     |  6 ++-
>  src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 121 insertions(+), 11 deletions(-)
>  create mode 100644 src/portage_c_convert_case.c
> 
> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
> index 2a15ea1..85ddd2b 100644
> --- a/pym/portage/util/locale.py
> +++ b/pym/portage/util/locale.py
> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
>  import locale
>  import logging
>  import os
> +import sys
>  import textwrap
>  import traceback
>  
> @@ -34,18 +35,26 @@ def _check_locale(silent):
>  	"""
>  	The inner locale check function.
>  	"""
> -
> -	libc_fn = find_library("c")
> -	if libc_fn is None:
> -		return None
> -	libc = LoadLibrary(libc_fn)
> -	if libc is None:
> -		return None
> +	try:
> +		from portage_c_convert_case import _c_toupper, _c_tolower
> +		libc_tolower = _c_tolower
> +		libc_toupper = _c_toupper

Now I'm being picky... but if you named the functions toupper()
and tolower(), you could actually import the whole module as 'libc'
and have less code!

Also it would be nice to actually make the module more generic. There
are more places where we use CDLL, and all of them could eventually be
supported by the module (unshare() would be much better done in C, for
example).

> +	except ImportError:
> +		writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
> +			level=logging.WARNING, noiselevel=-1)

Do we really want to warn verbosely about this? I think it'd be
a pretty common case for people running the git checkout.

> +		libc_fn = find_library("c")
> +		if libc_fn is None:
> +			return None
> +		libc = LoadLibrary(libc_fn)
> +		if libc is None:
> +			return None
> +		libc_tolower = libc.tolower
> +		libc_toupper = libc.toupper
>  
>  	lc = list(range(ord('a'), ord('z')+1))
>  	uc = list(range(ord('A'), ord('Z')+1))
> -	rlc = [libc.tolower(c) for c in uc]
> -	ruc = [libc.toupper(c) for c in lc]
> +	rlc = [libc_tolower(c) for c in uc]
> +	ruc = [libc_toupper(c) for c in lc]
>  
>  	if lc != rlc or uc != ruc:
>  		if silent:
> @@ -62,7 +71,10 @@ def _check_locale(silent):
>  			"as LC_CTYPE in make.conf.")
>  		msg = [l for l in textwrap.wrap(msg, 70)]
>  		msg.append("")
> -		chars = lambda l: ''.join(chr(x) for x in l)
> +		if sys.version_info.major >= 3:

Portage uses hexversion for comparisons. Please be consistent.

> +			chars = lambda l: ''.join(chr(x) for x in l)
> +		else:
> +			chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)

This looks like an unrelated change. Was the original code buggy? If
this is the case, then please fix it in a separate commit.

>  		if uc != ruc:
>  			msg.extend([
>  				"  %s -> %s" % (chars(lc), chars(ruc)),
> diff --git a/setup.py b/setup.py
> index 25429bc..8b6b408 100755
> --- a/setup.py
> +++ b/setup.py
> @@ -47,7 +47,11 @@ x_scripts = {
>  # Dictionary custom modules written in C/C++ here.  The structure is
>  #   key   = module name
>  #   value = list of C/C++ source code, path relative to top source directory
> -x_c_helpers = {}
> +x_c_helpers = {
> +	'portage_c_convert_case' : [
> +		'src/portage_c_convert_case.c',
> +	],
> +}
>  
>  class x_build(build):
>  	""" Build command with extra build_man call. """
> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
> new file mode 100644
> index 0000000..f60b0c2
> --- /dev/null
> +++ b/src/portage_c_convert_case.c
> @@ -0,0 +1,94 @@
> +/* Copyright 2005-2016 Gentoo Foundation
> + * Distributed under the terms of the GNU General Public License v2
> + */
> +
> +#include <Python.h>
> +#include <ctype.h>
> +
> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
> +
> +static PyMethodDef ConvertCaseMethods[] = {
> +	{"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
> +	{"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
> +	{NULL, NULL, 0, NULL}

You should include stdlib.h or stdio.h for NULL.

> +};
> +
> +#if PY_MAJOR_VERSION >= 3
> +static struct PyModuleDef moduledef = {
> +	PyModuleDef_HEAD_INIT,
> +	"portage_c_convert_case",					/* m_name */
> +	"Module for converting case using the system locale",		/* m_doc */
> +	-1,								/* m_size */
> +	ConvertCaseMethods,						/* m_methods */
> +	NULL,								/* m_reload */
> +	NULL,								/* m_traverse */
> +	NULL,								/* m_clear */
> +	NULL,								/* m_free */
> +};
> +#endif
> +
> +static PyObject *ConvertCaseError;
> +
> +PyMODINIT_FUNC
> +#if PY_MAJOR_VERSION >= 3
> +PyInit_portage_c_convert_case(void)
> +#else
> +initportage_c_convert_case(void)
> +#endif
> +{
> +	PyObject *m;
> +
> +#if PY_MAJOR_VERSION >= 3
> +	m = PyModule_Create(&moduledef);
> +#else
> +	m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
> +#endif
> +
> +	if (m == NULL)
> +#if PY_MAJOR_VERSION >= 3
> +		return NULL;
> +#else
> +		return;
> +#endif
> +
> +	ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
> +	Py_INCREF(ConvertCaseError);
> +	PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
> +
> +#if PY_MAJOR_VERSION >= 3
> +	return m;
> +#else
> +	return;
> +#endif
> +}

To be honest, I think we'd be happier having two big #ifdefs for init
funcs rather than one function with a lot of #ifdefs, and the common
ConvertCaseError part in a separate static function.

> +
> +
> +static PyObject *
> +portage_c_tolower(PyObject *self, PyObject *args)
> +{
> +	int c;
> +
> +	if (!PyArg_ParseTuple(args, "i", &c))
> +	{
> +		PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");

From PyArg_ParseTuple() [1]:

| on failure, it returns false and raises the appropriate exception.

So I don't think you need or should use a custom exception here.

[1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple

> +		return NULL;
> +	}
> +
> +	return Py_BuildValue("i", tolower(c));
> +}
> +
> +
> +static PyObject *
> +portage_c_toupper(PyObject *self, PyObject *args)
> +{
> +	int c;
> +
> +	if (!PyArg_ParseTuple(args, "i", &c))
> +	{
> +		PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
> +		return NULL;
> +	}
> +
> +	return Py_BuildValue("i", toupper(c));
> +}

Thanks a lot for your effort. This is really getting in shape.

-- 
Best regards,
Michał Górny
<http://dev.gentoo.org/~mgorny/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 949 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
  2016-05-23  6:44   ` Michał Górny
@ 2016-05-23 12:08     ` Anthony G. Basile
  2016-05-23 14:25       ` Michał Górny
  0 siblings, 1 reply; 7+ messages in thread
From: Anthony G. Basile @ 2016-05-23 12:08 UTC (permalink / raw
  To: gentoo-portage-dev

On 5/23/16 2:44 AM, Michał Górny wrote:
> On Sun, 22 May 2016 13:04:40 -0400
> "Anthony G. Basile" <basile@opensource.dyc.edu> wrote:
> 
>> From: "Anthony G. Basile" <blueness@gentoo.org>
>>
>> The current method to check for a sane system locale is to use python's
>> ctypes.util.find_library() to construct a full library path to the
>> system libc.so and pass that path to ctypes.CDLL() so we can call
>> toupper() and tolower() directly.  However, this gets bogged down in
>> implementation details and fails with musl.
>>
>> We work around this design flaw in ctypes with a small python module
>> written in C which provides thin wrappers to toupper() and tolower(),
>> and only fall back on the current ctypes-based check when this module
>> is not available.
>>
>> This has been tested on glibc, uClibc and musl systems.
>>
>> X-Gentoo-bug: 571444
>> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
>>
>> Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
>> ---
>>  pym/portage/util/locale.py   | 32 ++++++++++-----
>>  setup.py                     |  6 ++-
>>  src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 121 insertions(+), 11 deletions(-)
>>  create mode 100644 src/portage_c_convert_case.c
>>
>> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
>> index 2a15ea1..85ddd2b 100644
>> --- a/pym/portage/util/locale.py
>> +++ b/pym/portage/util/locale.py
>> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
>>  import locale
>>  import logging
>>  import os
>> +import sys
>>  import textwrap
>>  import traceback
>>  
>> @@ -34,18 +35,26 @@ def _check_locale(silent):
>>  	"""
>>  	The inner locale check function.
>>  	"""
>> -
>> -	libc_fn = find_library("c")
>> -	if libc_fn is None:
>> -		return None
>> -	libc = LoadLibrary(libc_fn)
>> -	if libc is None:
>> -		return None
>> +	try:
>> +		from portage_c_convert_case import _c_toupper, _c_tolower
>> +		libc_tolower = _c_tolower
>> +		libc_toupper = _c_toupper
> 
> Now I'm being picky... but if you named the functions toupper()
> and tolower(), you could actually import the whole module as 'libc'
> and have less code!

I see what you're saying, and its tempting because its elegant, but I'm
afraid of a clash of names.  I've got a bad feeling this will get us
into trouble later.

Let me play with this and see what happens.

> 
> Also it would be nice to actually make the module more generic. There
> are more places where we use CDLL, and all of them could eventually be
> supported by the module (unshare() would be much better done in C, for
> example).

Yeah I get your point here.  Let me convince myself first.

> 
>> +	except ImportError:
>> +		writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
>> +			level=logging.WARNING, noiselevel=-1)
> 
> Do we really want to warn verbosely about this? I think it'd be
> a pretty common case for people running the git checkout.

This should stay.  Its good to know that the module is not being
imported and silently falling back on the ctypes stuff.

1) its only going to happen in the rare occasion that you're using
something like a turkish locale and can't import the module.

2) people who do a git checkout should add
PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
I can add something to testpath.  Users will have to be instructed to
run `./setup build` and then the script shoudl read something like this

unamem=$(uname -m)

pythonversion=$(python --version 2>&1 | cut -c8-)
pythonversion=${pythonversion%\.*}

portagedir=$(dirname ${BASH_SOURCE[0]})

export PATH="${portagedir}/bin:${PATH}"

export
PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"

export PYTHONWARNINGS=d,i::ImportWarning


BTW, the original code must have a bug in it.  It reads

export PYTHONPATH=PYTHONPATH="$(dirname
$BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"

The double PYTHONPATH=PYTHONPATH= can't be right.

> 
>> +		libc_fn = find_library("c")
>> +		if libc_fn is None:
>> +			return None
>> +		libc = LoadLibrary(libc_fn)
>> +		if libc is None:
>> +			return None
>> +		libc_tolower = libc.tolower
>> +		libc_toupper = libc.toupper
>>  
>>  	lc = list(range(ord('a'), ord('z')+1))
>>  	uc = list(range(ord('A'), ord('Z')+1))
>> -	rlc = [libc.tolower(c) for c in uc]
>> -	ruc = [libc.toupper(c) for c in lc]
>> +	rlc = [libc_tolower(c) for c in uc]
>> +	ruc = [libc_toupper(c) for c in lc]
>>  
>>  	if lc != rlc or uc != ruc:
>>  		if silent:
>> @@ -62,7 +71,10 @@ def _check_locale(silent):
>>  			"as LC_CTYPE in make.conf.")
>>  		msg = [l for l in textwrap.wrap(msg, 70)]
>>  		msg.append("")
>> -		chars = lambda l: ''.join(chr(x) for x in l)
>> +		if sys.version_info.major >= 3:
> 
> Portage uses hexversion for comparisons. Please be consistent.

I see that.  Yuck, why use sys.hexversion?  version_info is easier to
read.  Anyhow, I'll change it because consistency is important.

> 
>> +			chars = lambda l: ''.join(chr(x) for x in l)
>> +		else:
>> +			chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)
> 
> This looks like an unrelated change. Was the original code buggy? If
> this is the case, then please fix it in a separate commit.

Yes the original code is buggy.  Either via the module or without, if x
is outside the ascii range, chr(x) fails to convert it properly and
throws an error.  This code was not tested:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
124, in check_locale
    ret = _check_locale(silent)
  File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
81, in _check_locale
    "  %s -> %s" % (chars(lc), chars(ruc)),
  File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
74, in <lambda>
    chars = lambda l: ''.join(chr(x) for x in l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
ordinal not in range(128)

I'll separate out that commit.

> 
>>  		if uc != ruc:
>>  			msg.extend([
>>  				"  %s -> %s" % (chars(lc), chars(ruc)),
>> diff --git a/setup.py b/setup.py
>> index 25429bc..8b6b408 100755
>> --- a/setup.py
>> +++ b/setup.py
>> @@ -47,7 +47,11 @@ x_scripts = {
>>  # Dictionary custom modules written in C/C++ here.  The structure is
>>  #   key   = module name
>>  #   value = list of C/C++ source code, path relative to top source directory
>> -x_c_helpers = {}
>> +x_c_helpers = {
>> +	'portage_c_convert_case' : [
>> +		'src/portage_c_convert_case.c',
>> +	],
>> +}
>>  
>>  class x_build(build):
>>  	""" Build command with extra build_man call. """
>> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
>> new file mode 100644
>> index 0000000..f60b0c2
>> --- /dev/null
>> +++ b/src/portage_c_convert_case.c
>> @@ -0,0 +1,94 @@
>> +/* Copyright 2005-2016 Gentoo Foundation
>> + * Distributed under the terms of the GNU General Public License v2
>> + */
>> +
>> +#include <Python.h>
>> +#include <ctype.h>
>> +
>> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
>> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
>> +
>> +static PyMethodDef ConvertCaseMethods[] = {
>> +	{"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
>> +	{"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
>> +	{NULL, NULL, 0, NULL}
> 
> You should include stdlib.h or stdio.h for NULL.

yep.

> 
>> +};
>> +
>> +#if PY_MAJOR_VERSION >= 3
>> +static struct PyModuleDef moduledef = {
>> +	PyModuleDef_HEAD_INIT,
>> +	"portage_c_convert_case",					/* m_name */
>> +	"Module for converting case using the system locale",		/* m_doc */
>> +	-1,								/* m_size */
>> +	ConvertCaseMethods,						/* m_methods */
>> +	NULL,								/* m_reload */
>> +	NULL,								/* m_traverse */
>> +	NULL,								/* m_clear */
>> +	NULL,								/* m_free */
>> +};
>> +#endif
>> +
>> +static PyObject *ConvertCaseError;
>> +
>> +PyMODINIT_FUNC
>> +#if PY_MAJOR_VERSION >= 3
>> +PyInit_portage_c_convert_case(void)
>> +#else
>> +initportage_c_convert_case(void)
>> +#endif
>> +{
>> +	PyObject *m;
>> +
>> +#if PY_MAJOR_VERSION >= 3
>> +	m = PyModule_Create(&moduledef);
>> +#else
>> +	m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
>> +#endif
>> +
>> +	if (m == NULL)
>> +#if PY_MAJOR_VERSION >= 3
>> +		return NULL;
>> +#else
>> +		return;
>> +#endif
>> +
>> +	ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
>> +	Py_INCREF(ConvertCaseError);
>> +	PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
>> +
>> +#if PY_MAJOR_VERSION >= 3
>> +	return m;
>> +#else
>> +	return;
>> +#endif
>> +}
> 
> To be honest, I think we'd be happier having two big #ifdefs for init
> funcs rather than one function with a lot of #ifdefs, and the common
> ConvertCaseError part in a separate static function.

I can move things around and reduce the #ifdefs, especially if we drop
the custom exception.

> 
>> +
>> +
>> +static PyObject *
>> +portage_c_tolower(PyObject *self, PyObject *args)
>> +{
>> +	int c;
>> +
>> +	if (!PyArg_ParseTuple(args, "i", &c))
>> +	{
>> +		PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");
> 
> From PyArg_ParseTuple() [1]:
> 
> | on failure, it returns false and raises the appropriate exception.
> 
> So I don't think you need or should use a custom exception here.
> 
> [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple

I am aware, but then you loose the specificity of the error, whether it
occurred in _c_tolower or _c_toupper.  Its small, so I'll remove it
because it also cleans up the #ifdef spaghetti, but if we were to expand
this module for other libc functions then we should consider re-adding it.

> 
>> +		return NULL;
>> +	}
>> +
>> +	return Py_BuildValue("i", tolower(c));
>> +}
>> +
>> +
>> +static PyObject *
>> +portage_c_toupper(PyObject *self, PyObject *args)
>> +{
>> +	int c;
>> +
>> +	if (!PyArg_ParseTuple(args, "i", &c))
>> +	{
>> +		PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
>> +		return NULL;
>> +	}
>> +
>> +	return Py_BuildValue("i", toupper(c));
>> +}
> 
> Thanks a lot for your effort. This is really getting in shape.

I'm doing this because check_locale() broke portage on musl.  I'm not a
frequent contributor to portage so I'd ask the portage team to free free
style it up for me.  vapier does that to my code all the time :)


-- 
Anthony G. Basile, Ph. D.
Chair of Information Technology
D'Youville College
Buffalo, NY 14201
(716) 829-8197


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
  2016-05-23 12:08     ` Anthony G. Basile
@ 2016-05-23 14:25       ` Michał Górny
  2016-05-27 14:40         ` Anthony G. Basile
  0 siblings, 1 reply; 7+ messages in thread
From: Michał Górny @ 2016-05-23 14:25 UTC (permalink / raw
  To: Anthony G. Basile; +Cc: gentoo-portage-dev

[-- Attachment #1: Type: text/plain, Size: 13352 bytes --]

On Mon, 23 May 2016 08:08:18 -0400
"Anthony G. Basile" <basile@opensource.dyc.edu> wrote:

> On 5/23/16 2:44 AM, Michał Górny wrote:
> > On Sun, 22 May 2016 13:04:40 -0400
> > "Anthony G. Basile" <basile@opensource.dyc.edu> wrote:
> >   
> >> From: "Anthony G. Basile" <blueness@gentoo.org>
> >>
> >> The current method to check for a sane system locale is to use python's
> >> ctypes.util.find_library() to construct a full library path to the
> >> system libc.so and pass that path to ctypes.CDLL() so we can call
> >> toupper() and tolower() directly.  However, this gets bogged down in
> >> implementation details and fails with musl.
> >>
> >> We work around this design flaw in ctypes with a small python module
> >> written in C which provides thin wrappers to toupper() and tolower(),
> >> and only fall back on the current ctypes-based check when this module
> >> is not available.
> >>
> >> This has been tested on glibc, uClibc and musl systems.
> >>
> >> X-Gentoo-bug: 571444
> >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
> >>
> >> Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
> >> ---
> >>  pym/portage/util/locale.py   | 32 ++++++++++-----
> >>  setup.py                     |  6 ++-
> >>  src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 121 insertions(+), 11 deletions(-)
> >>  create mode 100644 src/portage_c_convert_case.c
> >>
> >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
> >> index 2a15ea1..85ddd2b 100644
> >> --- a/pym/portage/util/locale.py
> >> +++ b/pym/portage/util/locale.py
> >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
> >>  import locale
> >>  import logging
> >>  import os
> >> +import sys
> >>  import textwrap
> >>  import traceback
> >>  
> >> @@ -34,18 +35,26 @@ def _check_locale(silent):
> >>  	"""
> >>  	The inner locale check function.
> >>  	"""
> >> -
> >> -	libc_fn = find_library("c")
> >> -	if libc_fn is None:
> >> -		return None
> >> -	libc = LoadLibrary(libc_fn)
> >> -	if libc is None:
> >> -		return None
> >> +	try:
> >> +		from portage_c_convert_case import _c_toupper, _c_tolower
> >> +		libc_tolower = _c_tolower
> >> +		libc_toupper = _c_toupper  
> > 
> > Now I'm being picky... but if you named the functions toupper()
> > and tolower(), you could actually import the whole module as 'libc'
> > and have less code!  
> 
> I see what you're saying, and its tempting because its elegant, but I'm
> afraid of a clash of names.  I've got a bad feeling this will get us
> into trouble later.
> 
> Let me play with this and see what happens.

I don't think this will be problematic since things like this happen
in Python all the time ;-). And after all, C function names can be
different than Python function names.

> > Also it would be nice to actually make the module more generic. There
> > are more places where we use CDLL, and all of them could eventually be
> > supported by the module (unshare() would be much better done in C, for
> > example).  
> 
> Yeah I get your point here.  Let me convince myself first.

I've got a killer argument: right now we hardcode constants from Linux
headers in the Python code!

Not that I'm asking you to actually add code for that as well. Just
rename the module to something more generic like portage.util.libc ;-).

> >> +	except ImportError:
> >> +		writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
> >> +			level=logging.WARNING, noiselevel=-1)  
> > 
> > Do we really want to warn verbosely about this? I think it'd be
> > a pretty common case for people running the git checkout.  
> 
> This should stay.  Its good to know that the module is not being
> imported and silently falling back on the ctypes stuff.
> 
> 1) its only going to happen in the rare occasion that you're using
> something like a turkish locale and can't import the module.

Wrong. This happens before the check is done, so it will be output
every time Portage is started, also with good locale.

> 2) people who do a git checkout should add
> PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
> I can add something to testpath.  Users will have to be instructed to
> run `./setup build` and then the script shoudl read something like this
> 
> unamem=$(uname -m)
> 
> pythonversion=$(python --version 2>&1 | cut -c8-)
> pythonversion=${pythonversion%\.*}
> 
> portagedir=$(dirname ${BASH_SOURCE[0]})
> 
> export PATH="${portagedir}/bin:${PATH}"
> 
> export
> PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"
> 
> export PYTHONWARNINGS=d,i::ImportWarning
> 
> 
> BTW, the original code must have a bug in it.  It reads
> 
> export PYTHONPATH=PYTHONPATH="$(dirname
> $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"
> 
> The double PYTHONPATH=PYTHONPATH= can't be right.

You are probably right. However:

1. Since bin/ scripts are setting PYTHONPATH appropriately, you should
probably update all of them as well to work out-of-the-box.

2. Don't do bash hackery and guessing. Take correct paths directly out
of Python modules (distutils should provide necessary means to get
the defaults).

3. Remember that something.cfg can be used to override those paths.
Though if you use distutils well enough, it should be able to figure
that out for oyu.

> >> +		libc_fn = find_library("c")
> >> +		if libc_fn is None:
> >> +			return None
> >> +		libc = LoadLibrary(libc_fn)
> >> +		if libc is None:
> >> +			return None
> >> +		libc_tolower = libc.tolower
> >> +		libc_toupper = libc.toupper
> >>  
> >>  	lc = list(range(ord('a'), ord('z')+1))
> >>  	uc = list(range(ord('A'), ord('Z')+1))
> >> -	rlc = [libc.tolower(c) for c in uc]
> >> -	ruc = [libc.toupper(c) for c in lc]
> >> +	rlc = [libc_tolower(c) for c in uc]
> >> +	ruc = [libc_toupper(c) for c in lc]
> >>  
> >>  	if lc != rlc or uc != ruc:
> >>  		if silent:
> >> @@ -62,7 +71,10 @@ def _check_locale(silent):
> >>  			"as LC_CTYPE in make.conf.")
> >>  		msg = [l for l in textwrap.wrap(msg, 70)]
> >>  		msg.append("")
> >> -		chars = lambda l: ''.join(chr(x) for x in l)
> >> +		if sys.version_info.major >= 3:  
> > 
> > Portage uses hexversion for comparisons. Please be consistent.  
> 
> I see that.  Yuck, why use sys.hexversion?  version_info is easier to
> read.  Anyhow, I'll change it because consistency is important.

1. I think old versions of Python did not support named properties in
sys.version_info, back when Portage was written.

2. hexversion makes it easy to quickly check for specific versions like
3.2 (yep, Portage does that in some places because Python upstream
can't keep API sane).

> >> +			chars = lambda l: ''.join(chr(x) for x in l)
> >> +		else:
> >> +			chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)  
> > 
> > This looks like an unrelated change. Was the original code buggy? If
> > this is the case, then please fix it in a separate commit.  
> 
> Yes the original code is buggy.  Either via the module or without, if x
> is outside the ascii range, chr(x) fails to convert it properly and
> throws an error.  This code was not tested:
> 
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
> 124, in check_locale
>     ret = _check_locale(silent)
>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
> 81, in _check_locale
>     "  %s -> %s" % (chars(lc), chars(ruc)),
>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
> 74, in <lambda>
>     chars = lambda l: ''.join(chr(x) for x in l)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
> ordinal not in range(128)
> 
> I'll separate out that commit.

Thanks. Also please try if you can get a single code path for py2/py3.
You may want to take a look at portage.util for decoding/encoding
wrappers.

> >>  		if uc != ruc:
> >>  			msg.extend([
> >>  				"  %s -> %s" % (chars(lc), chars(ruc)),
> >> diff --git a/setup.py b/setup.py
> >> index 25429bc..8b6b408 100755
> >> --- a/setup.py
> >> +++ b/setup.py
> >> @@ -47,7 +47,11 @@ x_scripts = {
> >>  # Dictionary custom modules written in C/C++ here.  The structure is
> >>  #   key   = module name
> >>  #   value = list of C/C++ source code, path relative to top source directory
> >> -x_c_helpers = {}
> >> +x_c_helpers = {
> >> +	'portage_c_convert_case' : [
> >> +		'src/portage_c_convert_case.c',
> >> +	],
> >> +}
> >>  
> >>  class x_build(build):
> >>  	""" Build command with extra build_man call. """
> >> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
> >> new file mode 100644
> >> index 0000000..f60b0c2
> >> --- /dev/null
> >> +++ b/src/portage_c_convert_case.c
> >> @@ -0,0 +1,94 @@
> >> +/* Copyright 2005-2016 Gentoo Foundation
> >> + * Distributed under the terms of the GNU General Public License v2
> >> + */
> >> +
> >> +#include <Python.h>
> >> +#include <ctype.h>
> >> +
> >> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
> >> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
> >> +
> >> +static PyMethodDef ConvertCaseMethods[] = {
> >> +	{"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
> >> +	{"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
> >> +	{NULL, NULL, 0, NULL}  
> > 
> > You should include stdlib.h or stdio.h for NULL.  
> 
> yep.
> 
> >   
> >> +};
> >> +
> >> +#if PY_MAJOR_VERSION >= 3
> >> +static struct PyModuleDef moduledef = {
> >> +	PyModuleDef_HEAD_INIT,
> >> +	"portage_c_convert_case",					/* m_name */
> >> +	"Module for converting case using the system locale",		/* m_doc */
> >> +	-1,								/* m_size */
> >> +	ConvertCaseMethods,						/* m_methods */
> >> +	NULL,								/* m_reload */
> >> +	NULL,								/* m_traverse */
> >> +	NULL,								/* m_clear */
> >> +	NULL,								/* m_free */
> >> +};
> >> +#endif
> >> +
> >> +static PyObject *ConvertCaseError;
> >> +
> >> +PyMODINIT_FUNC
> >> +#if PY_MAJOR_VERSION >= 3
> >> +PyInit_portage_c_convert_case(void)
> >> +#else
> >> +initportage_c_convert_case(void)
> >> +#endif
> >> +{
> >> +	PyObject *m;
> >> +
> >> +#if PY_MAJOR_VERSION >= 3
> >> +	m = PyModule_Create(&moduledef);
> >> +#else
> >> +	m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
> >> +#endif
> >> +
> >> +	if (m == NULL)
> >> +#if PY_MAJOR_VERSION >= 3
> >> +		return NULL;
> >> +#else
> >> +		return;
> >> +#endif
> >> +
> >> +	ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
> >> +	Py_INCREF(ConvertCaseError);
> >> +	PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
> >> +
> >> +#if PY_MAJOR_VERSION >= 3
> >> +	return m;
> >> +#else
> >> +	return;
> >> +#endif
> >> +}  
> > 
> > To be honest, I think we'd be happier having two big #ifdefs for init
> > funcs rather than one function with a lot of #ifdefs, and the common
> > ConvertCaseError part in a separate static function.  
> 
> I can move things around and reduce the #ifdefs, especially if we drop
> the custom exception.

Thanks.

> >> +
> >> +
> >> +static PyObject *
> >> +portage_c_tolower(PyObject *self, PyObject *args)
> >> +{
> >> +	int c;
> >> +
> >> +	if (!PyArg_ParseTuple(args, "i", &c))
> >> +	{
> >> +		PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");  
> > 
> > From PyArg_ParseTuple() [1]:
> > 
> > | on failure, it returns false and raises the appropriate exception.
> > 
> > So I don't think you need or should use a custom exception here.
> > 
> > [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple  
> 
> I am aware, but then you loose the specificity of the error, whether it
> occurred in _c_tolower or _c_toupper.  Its small, so I'll remove it
> because it also cleans up the #ifdef spaghetti, but if we were to expand
> this module for other libc functions then we should consider re-adding it.

Are you really sure about that? I'd rather see what kind of wrong
argument has been passed rather than 'PyArg_ParseTuple failed' which
tells me nothing.

> >> +		return NULL;
> >> +	}
> >> +
> >> +	return Py_BuildValue("i", tolower(c));
> >> +}
> >> +
> >> +
> >> +static PyObject *
> >> +portage_c_toupper(PyObject *self, PyObject *args)
> >> +{
> >> +	int c;
> >> +
> >> +	if (!PyArg_ParseTuple(args, "i", &c))
> >> +	{
> >> +		PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
> >> +		return NULL;
> >> +	}
> >> +
> >> +	return Py_BuildValue("i", toupper(c));
> >> +}  
> > 
> > Thanks a lot for your effort. This is really getting in shape.  
> 
> I'm doing this because check_locale() broke portage on musl.  I'm not a
> frequent contributor to portage so I'd ask the portage team to free free
> style it up for me.  vapier does that to my code all the time :)

-- 
Best regards,
Michał Górny
<http://dev.gentoo.org/~mgorny/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 949 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
  2016-05-23 14:25       ` Michał Górny
@ 2016-05-27 14:40         ` Anthony G. Basile
  2016-05-27 15:15           ` Brian Dolbec
  0 siblings, 1 reply; 7+ messages in thread
From: Anthony G. Basile @ 2016-05-27 14:40 UTC (permalink / raw
  To: gentoo-portage-dev

On 5/23/16 10:25 AM, Michał Górny wrote:
> On Mon, 23 May 2016 08:08:18 -0400
> "Anthony G. Basile" <basile@opensource.dyc.edu> wrote:
> 
>> On 5/23/16 2:44 AM, Michał Górny wrote:
>>> On Sun, 22 May 2016 13:04:40 -0400
>>> "Anthony G. Basile" <basile@opensource.dyc.edu> wrote:
>>>   
>>>> From: "Anthony G. Basile" <blueness@gentoo.org>
>>>>
>>>> The current method to check for a sane system locale is to use python's
>>>> ctypes.util.find_library() to construct a full library path to the
>>>> system libc.so and pass that path to ctypes.CDLL() so we can call
>>>> toupper() and tolower() directly.  However, this gets bogged down in
>>>> implementation details and fails with musl.
>>>>
>>>> We work around this design flaw in ctypes with a small python module
>>>> written in C which provides thin wrappers to toupper() and tolower(),
>>>> and only fall back on the current ctypes-based check when this module
>>>> is not available.
>>>>
>>>> This has been tested on glibc, uClibc and musl systems.
>>>>
>>>> X-Gentoo-bug: 571444
>>>> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
>>>>
>>>> Signed-off-by: Anthony G. Basile <blueness@gentoo.org>
>>>> ---
>>>>  pym/portage/util/locale.py   | 32 ++++++++++-----
>>>>  setup.py                     |  6 ++-
>>>>  src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 121 insertions(+), 11 deletions(-)
>>>>  create mode 100644 src/portage_c_convert_case.c
>>>>
>>>> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
>>>> index 2a15ea1..85ddd2b 100644
>>>> --- a/pym/portage/util/locale.py
>>>> +++ b/pym/portage/util/locale.py
>>>> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
>>>>  import locale
>>>>  import logging
>>>>  import os
>>>> +import sys
>>>>  import textwrap
>>>>  import traceback
>>>>  
>>>> @@ -34,18 +35,26 @@ def _check_locale(silent):
>>>>  	"""
>>>>  	The inner locale check function.
>>>>  	"""
>>>> -
>>>> -	libc_fn = find_library("c")
>>>> -	if libc_fn is None:
>>>> -		return None
>>>> -	libc = LoadLibrary(libc_fn)
>>>> -	if libc is None:
>>>> -		return None
>>>> +	try:
>>>> +		from portage_c_convert_case import _c_toupper, _c_tolower
>>>> +		libc_tolower = _c_tolower
>>>> +		libc_toupper = _c_toupper  
>>>
>>> Now I'm being picky... but if you named the functions toupper()
>>> and tolower(), you could actually import the whole module as 'libc'
>>> and have less code!  
>>
>> I see what you're saying, and its tempting because its elegant, but I'm
>> afraid of a clash of names.  I've got a bad feeling this will get us
>> into trouble later.
>>
>> Let me play with this and see what happens.
> 
> I don't think this will be problematic since things like this happen
> in Python all the time ;-). And after all, C function names can be
> different than Python function names.

It works fine so my last set of patches adopts this approach.

> 
>>> Also it would be nice to actually make the module more generic. There
>>> are more places where we use CDLL, and all of them could eventually be
>>> supported by the module (unshare() would be much better done in C, for
>>> example).  
>>
>> Yeah I get your point here.  Let me convince myself first.
> 
> I've got a killer argument: right now we hardcode constants from Linux
> headers in the Python code!
> 
> Not that I'm asking you to actually add code for that as well. Just
> rename the module to something more generic like portage.util.libc ;-).

Well you might as well point me in this direction since I'm working on
this now.

> 
>>>> +	except ImportError:
>>>> +		writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
>>>> +			level=logging.WARNING, noiselevel=-1)  
>>>
>>> Do we really want to warn verbosely about this? I think it'd be
>>> a pretty common case for people running the git checkout.  
>>
>> This should stay.  Its good to know that the module is not being
>> imported and silently falling back on the ctypes stuff.
>>
>> 1) its only going to happen in the rare occasion that you're using
>> something like a turkish locale and can't import the module.
> 
> Wrong. This happens before the check is done, so it will be output
> every time Portage is started, also with good locale.

Right I dropped it.

> 
>> 2) people who do a git checkout should add
>> PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
>> I can add something to testpath.  Users will have to be instructed to
>> run `./setup build` and then the script shoudl read something like this
>>
>> unamem=$(uname -m)
>>
>> pythonversion=$(python --version 2>&1 | cut -c8-)
>> pythonversion=${pythonversion%\.*}
>>
>> portagedir=$(dirname ${BASH_SOURCE[0]})
>>
>> export PATH="${portagedir}/bin:${PATH}"
>>
>> export
>> PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"
>>
>> export PYTHONWARNINGS=d,i::ImportWarning
>>
>>
>> BTW, the original code must have a bug in it.  It reads
>>
>> export PYTHONPATH=PYTHONPATH="$(dirname
>> $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"
>>
>> The double PYTHONPATH=PYTHONPATH= can't be right.
> 
> You are probably right. However:
> 
> 1. Since bin/ scripts are setting PYTHONPATH appropriately, you should
> probably update all of them as well to work out-of-the-box.
> 
> 2. Don't do bash hackery and guessing. Take correct paths directly out
> of Python modules (distutils should provide necessary means to get
> the defaults).
> 
> 3. Remember that something.cfg can be used to override those paths.
> Though if you use distutils well enough, it should be able to figure
> that out for oyu.

I didn't touched testpath because the only way I could get this working
right is to do a ./setup.py install --root= and I'm not sure we want to
totally install portage.  It kind of defeats the purpose of just running
portage from checkout.  If I find a better way I'll make another commit.

> 
>>>> +		libc_fn = find_library("c")
>>>> +		if libc_fn is None:
>>>> +			return None
>>>> +		libc = LoadLibrary(libc_fn)
>>>> +		if libc is None:
>>>> +			return None
>>>> +		libc_tolower = libc.tolower
>>>> +		libc_toupper = libc.toupper
>>>>  
>>>>  	lc = list(range(ord('a'), ord('z')+1))
>>>>  	uc = list(range(ord('A'), ord('Z')+1))
>>>> -	rlc = [libc.tolower(c) for c in uc]
>>>> -	ruc = [libc.toupper(c) for c in lc]
>>>> +	rlc = [libc_tolower(c) for c in uc]
>>>> +	ruc = [libc_toupper(c) for c in lc]
>>>>  
>>>>  	if lc != rlc or uc != ruc:
>>>>  		if silent:
>>>> @@ -62,7 +71,10 @@ def _check_locale(silent):
>>>>  			"as LC_CTYPE in make.conf.")
>>>>  		msg = [l for l in textwrap.wrap(msg, 70)]
>>>>  		msg.append("")
>>>> -		chars = lambda l: ''.join(chr(x) for x in l)
>>>> +		if sys.version_info.major >= 3:  
>>>
>>> Portage uses hexversion for comparisons. Please be consistent.  
>>
>> I see that.  Yuck, why use sys.hexversion?  version_info is easier to
>> read.  Anyhow, I'll change it because consistency is important.
> 
> 1. I think old versions of Python did not support named properties in
> sys.version_info, back when Portage was written.

I didn't need this because I found _unicode_decode() which does what I
want.  Thanks for the clue.  BTW, why are those functions/classes in
pym/portage/__init__.py?  All that code in there is just cluttering
__init__.py.  Shouldn't that stuff be pulled into a separate file and
imported cleanly?

> 
> 2. hexversion makes it easy to quickly check for specific versions like
> 3.2 (yep, Portage does that in some places because Python upstream
> can't keep API sane).
> 
>>>> +			chars = lambda l: ''.join(chr(x) for x in l)
>>>> +		else:
>>>> +			chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)  
>>>
>>> This looks like an unrelated change. Was the original code buggy? If
>>> this is the case, then please fix it in a separate commit.  
>>
>> Yes the original code is buggy.  Either via the module or without, if x
>> is outside the ascii range, chr(x) fails to convert it properly and
>> throws an error.  This code was not tested:
>>
>> Traceback (most recent call last):
>>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
>> 124, in check_locale
>>     ret = _check_locale(silent)
>>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
>> 81, in _check_locale
>>     "  %s -> %s" % (chars(lc), chars(ruc)),
>>   File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
>> 74, in <lambda>
>>     chars = lambda l: ''.join(chr(x) for x in l)
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
>> ordinal not in range(128)
>>
>> I'll separate out that commit.
> 
> Thanks. Also please try if you can get a single code path for py2/py3.
> You may want to take a look at portage.util for decoding/encoding
> wrappers.
> 
>>>>  		if uc != ruc:
>>>>  			msg.extend([
>>>>  				"  %s -> %s" % (chars(lc), chars(ruc)),
>>>> diff --git a/setup.py b/setup.py
>>>> index 25429bc..8b6b408 100755
>>>> --- a/setup.py
>>>> +++ b/setup.py
>>>> @@ -47,7 +47,11 @@ x_scripts = {
>>>>  # Dictionary custom modules written in C/C++ here.  The structure is
>>>>  #   key   = module name
>>>>  #   value = list of C/C++ source code, path relative to top source directory
>>>> -x_c_helpers = {}
>>>> +x_c_helpers = {
>>>> +	'portage_c_convert_case' : [
>>>> +		'src/portage_c_convert_case.c',
>>>> +	],
>>>> +}
>>>>  
>>>>  class x_build(build):
>>>>  	""" Build command with extra build_man call. """
>>>> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
>>>> new file mode 100644
>>>> index 0000000..f60b0c2
>>>> --- /dev/null
>>>> +++ b/src/portage_c_convert_case.c
>>>> @@ -0,0 +1,94 @@
>>>> +/* Copyright 2005-2016 Gentoo Foundation
>>>> + * Distributed under the terms of the GNU General Public License v2
>>>> + */
>>>> +
>>>> +#include <Python.h>
>>>> +#include <ctype.h>
>>>> +
>>>> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
>>>> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
>>>> +
>>>> +static PyMethodDef ConvertCaseMethods[] = {
>>>> +	{"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
>>>> +	{"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
>>>> +	{NULL, NULL, 0, NULL}  
>>>
>>> You should include stdlib.h or stdio.h for NULL.  
>>
>> yep.

yeah done.  Arferver pointed out that Python.h pulls in stdlib.h, but in
case Python.h changes structure in the future and drops it, we're covered.

>>
>>>   
>>>> +};
>>>> +
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +static struct PyModuleDef moduledef = {
>>>> +	PyModuleDef_HEAD_INIT,
>>>> +	"portage_c_convert_case",					/* m_name */
>>>> +	"Module for converting case using the system locale",		/* m_doc */
>>>> +	-1,								/* m_size */
>>>> +	ConvertCaseMethods,						/* m_methods */
>>>> +	NULL,								/* m_reload */
>>>> +	NULL,								/* m_traverse */
>>>> +	NULL,								/* m_clear */
>>>> +	NULL,								/* m_free */
>>>> +};
>>>> +#endif
>>>> +
>>>> +static PyObject *ConvertCaseError;
>>>> +
>>>> +PyMODINIT_FUNC
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +PyInit_portage_c_convert_case(void)
>>>> +#else
>>>> +initportage_c_convert_case(void)
>>>> +#endif
>>>> +{
>>>> +	PyObject *m;
>>>> +
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +	m = PyModule_Create(&moduledef);
>>>> +#else
>>>> +	m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
>>>> +#endif
>>>> +
>>>> +	if (m == NULL)
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +		return NULL;
>>>> +#else
>>>> +		return;
>>>> +#endif
>>>> +
>>>> +	ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
>>>> +	Py_INCREF(ConvertCaseError);
>>>> +	PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
>>>> +
>>>> +#if PY_MAJOR_VERSION >= 3
>>>> +	return m;
>>>> +#else
>>>> +	return;
>>>> +#endif
>>>> +}  
>>>
>>> To be honest, I think we'd be happier having two big #ifdefs for init
>>> funcs rather than one function with a lot of #ifdefs, and the common
>>> ConvertCaseError part in a separate static function.  
>>
>> I can move things around and reduce the #ifdefs, especially if we drop
>> the custom exception.
> 
> Thanks.

I think you'll agree its clean now.

> 
>>>> +
>>>> +
>>>> +static PyObject *
>>>> +portage_c_tolower(PyObject *self, PyObject *args)
>>>> +{
>>>> +	int c;
>>>> +
>>>> +	if (!PyArg_ParseTuple(args, "i", &c))
>>>> +	{
>>>> +		PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");  
>>>
>>> From PyArg_ParseTuple() [1]:
>>>
>>> | on failure, it returns false and raises the appropriate exception.
>>>
>>> So I don't think you need or should use a custom exception here.
>>>
>>> [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple  
>>
>> I am aware, but then you loose the specificity of the error, whether it
>> occurred in _c_tolower or _c_toupper.  Its small, so I'll remove it
>> because it also cleans up the #ifdef spaghetti, but if we were to expand
>> this module for other libc functions then we should consider re-adding it.
> 
> Are you really sure about that? I'd rather see what kind of wrong
> argument has been passed rather than 'PyArg_ParseTuple failed' which
> tells me nothing.

I dropped the custom exception.  But if we expand portage.util.libc, we
should reintroduce something like ErrnoException to pass back errno upon
failure.  Eg. with unshare() you'll want EINVAL or EPERM thrown up
through the call stack.

> 
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>> +	return Py_BuildValue("i", tolower(c));
>>>> +}
>>>> +
>>>> +
>>>> +static PyObject *
>>>> +portage_c_toupper(PyObject *self, PyObject *args)
>>>> +{
>>>> +	int c;
>>>> +
>>>> +	if (!PyArg_ParseTuple(args, "i", &c))
>>>> +	{
>>>> +		PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>> +	return Py_BuildValue("i", toupper(c));
>>>> +}  
>>>
>>> Thanks a lot for your effort. This is really getting in shape.  
>>
>> I'm doing this because check_locale() broke portage on musl.  I'm not a
>> frequent contributor to portage so I'd ask the portage team to free free
>> style it up for me.  vapier does that to my code all the time :)
> 


-- 
Anthony G. Basile, Ph. D.
Chair of Information Technology
D'Youville College
Buffalo, NY 14201
(716) 829-8197


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
  2016-05-27 14:40         ` Anthony G. Basile
@ 2016-05-27 15:15           ` Brian Dolbec
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Dolbec @ 2016-05-27 15:15 UTC (permalink / raw
  To: gentoo-portage-dev

On Fri, 27 May 2016 10:40:46 -0400
"Anthony G. Basile" <basile@opensource.dyc.edu> wrote:

> On 5/23/16 10:25 AM, Michał Górny wrote:
> > On Mon, 23 May 2016 08:08:18 -0400
> > "Anthony G. Basile" <basile@opensource.dyc.edu> wrote:
> >   
> >> On 5/23/16 2:44 AM, Michał Górny wrote:  
> >>> On Sun, 22 May 2016 13:04:40 -0400
> >>> "Anthony G. Basile" <basile@opensource.dyc.edu> wrote:
> >>>     

> > 
> > 1. I think old versions of Python did not support named properties
> > in sys.version_info, back when Portage was written.  
> 
> I didn't need this because I found _unicode_decode() which does what I
> want.  Thanks for the clue.  BTW, why are those functions/classes in
> pym/portage/__init__.py?  All that code in there is just cluttering
> __init__.py.  Shouldn't that stuff be pulled into a separate file and
> imported cleanly?
> 

Yes, there is generally far too much code in many of the __init__.py's.
There are many with over 1K LOC

-- 
Brian Dolbec <dolsen>



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-27 15:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-22 17:04 [gentoo-portage-dev] [PATCH 1/2] setup.py: add stub for building custom modules in C/C++ Anthony G. Basile
2016-05-22 17:04 ` [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale Anthony G. Basile
2016-05-23  6:44   ` Michał Górny
2016-05-23 12:08     ` Anthony G. Basile
2016-05-23 14:25       ` Michał Górny
2016-05-27 14:40         ` Anthony G. Basile
2016-05-27 15:15           ` Brian Dolbec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox