From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 58969138262 for ; Mon, 23 May 2016 12:08:28 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id EFCE31418A; Mon, 23 May 2016 12:08:25 +0000 (UTC) Received: from virtual.dyc.edu (mail.virtual.dyc.edu [67.222.116.22]) by pigeon.gentoo.org (Postfix) with ESMTP id 5844114088 for ; Mon, 23 May 2016 12:08:25 +0000 (UTC) Received: from greysprite.dite (cpe-74-77-145-97.buffalo.res.rr.com [74.77.145.97]) by virtual.dyc.edu (Postfix) with ESMTPSA id 95B1B7E001F for ; Mon, 23 May 2016 08:08:19 -0400 (EDT) Subject: Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale To: gentoo-portage-dev@lists.gentoo.org References: <1463936680-16072-1-git-send-email-basile@opensource.dyc.edu> <1463936680-16072-2-git-send-email-basile@opensource.dyc.edu> <20160523084409.2dce2618.mgorny@gentoo.org> From: "Anthony G. Basile" Message-ID: Date: Mon, 23 May 2016 08:08:18 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: <20160523084409.2dce2618.mgorny@gentoo.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Archives-Salt: fd432a76-6a0f-4330-b8c8-18054e914993 X-Archives-Hash: e672dd7e3aeea71e98956d12969840a4 On 5/23/16 2:44 AM, Michał Górny wrote: > On Sun, 22 May 2016 13:04:40 -0400 > "Anthony G. Basile" wrote: > >> From: "Anthony G. Basile" >> >> The current method to check for a sane system locale is to use python's >> ctypes.util.find_library() to construct a full library path to the >> system libc.so and pass that path to ctypes.CDLL() so we can call >> toupper() and tolower() directly. However, this gets bogged down in >> implementation details and fails with musl. >> >> We work around this design flaw in ctypes with a small python module >> written in C which provides thin wrappers to toupper() and tolower(), >> and only fall back on the current ctypes-based check when this module >> is not available. >> >> This has been tested on glibc, uClibc and musl systems. >> >> X-Gentoo-bug: 571444 >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 >> >> Signed-off-by: Anthony G. Basile >> --- >> pym/portage/util/locale.py | 32 ++++++++++----- >> setup.py | 6 ++- >> src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 121 insertions(+), 11 deletions(-) >> create mode 100644 src/portage_c_convert_case.c >> >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py >> index 2a15ea1..85ddd2b 100644 >> --- a/pym/portage/util/locale.py >> +++ b/pym/portage/util/locale.py >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals >> import locale >> import logging >> import os >> +import sys >> import textwrap >> import traceback >> >> @@ -34,18 +35,26 @@ def _check_locale(silent): >> """ >> The inner locale check function. >> """ >> - >> - libc_fn = find_library("c") >> - if libc_fn is None: >> - return None >> - libc = LoadLibrary(libc_fn) >> - if libc is None: >> - return None >> + try: >> + from portage_c_convert_case import _c_toupper, _c_tolower >> + libc_tolower = _c_tolower >> + libc_toupper = _c_toupper > > Now I'm being picky... but if you named the functions toupper() > and tolower(), you could actually import the whole module as 'libc' > and have less code! I see what you're saying, and its tempting because its elegant, but I'm afraid of a clash of names. I've got a bad feeling this will get us into trouble later. Let me play with this and see what happens. > > Also it would be nice to actually make the module more generic. There > are more places where we use CDLL, and all of them could eventually be > supported by the module (unshare() would be much better done in C, for > example). Yeah I get your point here. Let me convince myself first. > >> + except ImportError: >> + writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n", >> + level=logging.WARNING, noiselevel=-1) > > Do we really want to warn verbosely about this? I think it'd be > a pretty common case for people running the git checkout. This should stay. Its good to know that the module is not being imported and silently falling back on the ctypes stuff. 1) its only going to happen in the rare occasion that you're using something like a turkish locale and can't import the module. 2) people who do a git checkout should add PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module. I can add something to testpath. Users will have to be instructed to run `./setup build` and then the script shoudl read something like this unamem=$(uname -m) pythonversion=$(python --version 2>&1 | cut -c8-) pythonversion=${pythonversion%\.*} portagedir=$(dirname ${BASH_SOURCE[0]}) export PATH="${portagedir}/bin:${PATH}" export PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}" export PYTHONWARNINGS=d,i::ImportWarning BTW, the original code must have a bug in it. It reads export PYTHONPATH=PYTHONPATH="$(dirname $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}" The double PYTHONPATH=PYTHONPATH= can't be right. > >> + libc_fn = find_library("c") >> + if libc_fn is None: >> + return None >> + libc = LoadLibrary(libc_fn) >> + if libc is None: >> + return None >> + libc_tolower = libc.tolower >> + libc_toupper = libc.toupper >> >> lc = list(range(ord('a'), ord('z')+1)) >> uc = list(range(ord('A'), ord('Z')+1)) >> - rlc = [libc.tolower(c) for c in uc] >> - ruc = [libc.toupper(c) for c in lc] >> + rlc = [libc_tolower(c) for c in uc] >> + ruc = [libc_toupper(c) for c in lc] >> >> if lc != rlc or uc != ruc: >> if silent: >> @@ -62,7 +71,10 @@ def _check_locale(silent): >> "as LC_CTYPE in make.conf.") >> msg = [l for l in textwrap.wrap(msg, 70)] >> msg.append("") >> - chars = lambda l: ''.join(chr(x) for x in l) >> + if sys.version_info.major >= 3: > > Portage uses hexversion for comparisons. Please be consistent. I see that. Yuck, why use sys.hexversion? version_info is easier to read. Anyhow, I'll change it because consistency is important. > >> + chars = lambda l: ''.join(chr(x) for x in l) >> + else: >> + chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l) > > This looks like an unrelated change. Was the original code buggy? If > this is the case, then please fix it in a separate commit. Yes the original code is buggy. Either via the module or without, if x is outside the ascii range, chr(x) fails to convert it properly and throws an error. This code was not tested: Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line 124, in check_locale ret = _check_locale(silent) File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line 81, in _check_locale " %s -> %s" % (chars(lc), chars(ruc)), File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line 74, in chars = lambda l: ''.join(chr(x) for x in l) UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0: ordinal not in range(128) I'll separate out that commit. > >> if uc != ruc: >> msg.extend([ >> " %s -> %s" % (chars(lc), chars(ruc)), >> diff --git a/setup.py b/setup.py >> index 25429bc..8b6b408 100755 >> --- a/setup.py >> +++ b/setup.py >> @@ -47,7 +47,11 @@ x_scripts = { >> # Dictionary custom modules written in C/C++ here. The structure is >> # key = module name >> # value = list of C/C++ source code, path relative to top source directory >> -x_c_helpers = {} >> +x_c_helpers = { >> + 'portage_c_convert_case' : [ >> + 'src/portage_c_convert_case.c', >> + ], >> +} >> >> class x_build(build): >> """ Build command with extra build_man call. """ >> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c >> new file mode 100644 >> index 0000000..f60b0c2 >> --- /dev/null >> +++ b/src/portage_c_convert_case.c >> @@ -0,0 +1,94 @@ >> +/* Copyright 2005-2016 Gentoo Foundation >> + * Distributed under the terms of the GNU General Public License v2 >> + */ >> + >> +#include >> +#include >> + >> +static PyObject * portage_c_tolower(PyObject *, PyObject *); >> +static PyObject * portage_c_toupper(PyObject *, PyObject *); >> + >> +static PyMethodDef ConvertCaseMethods[] = { >> + {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."}, >> + {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."}, >> + {NULL, NULL, 0, NULL} > > You should include stdlib.h or stdio.h for NULL. yep. > >> +}; >> + >> +#if PY_MAJOR_VERSION >= 3 >> +static struct PyModuleDef moduledef = { >> + PyModuleDef_HEAD_INIT, >> + "portage_c_convert_case", /* m_name */ >> + "Module for converting case using the system locale", /* m_doc */ >> + -1, /* m_size */ >> + ConvertCaseMethods, /* m_methods */ >> + NULL, /* m_reload */ >> + NULL, /* m_traverse */ >> + NULL, /* m_clear */ >> + NULL, /* m_free */ >> +}; >> +#endif >> + >> +static PyObject *ConvertCaseError; >> + >> +PyMODINIT_FUNC >> +#if PY_MAJOR_VERSION >= 3 >> +PyInit_portage_c_convert_case(void) >> +#else >> +initportage_c_convert_case(void) >> +#endif >> +{ >> + PyObject *m; >> + >> +#if PY_MAJOR_VERSION >= 3 >> + m = PyModule_Create(&moduledef); >> +#else >> + m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods); >> +#endif >> + >> + if (m == NULL) >> +#if PY_MAJOR_VERSION >= 3 >> + return NULL; >> +#else >> + return; >> +#endif >> + >> + ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL); >> + Py_INCREF(ConvertCaseError); >> + PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError); >> + >> +#if PY_MAJOR_VERSION >= 3 >> + return m; >> +#else >> + return; >> +#endif >> +} > > To be honest, I think we'd be happier having two big #ifdefs for init > funcs rather than one function with a lot of #ifdefs, and the common > ConvertCaseError part in a separate static function. I can move things around and reduce the #ifdefs, especially if we drop the custom exception. > >> + >> + >> +static PyObject * >> +portage_c_tolower(PyObject *self, PyObject *args) >> +{ >> + int c; >> + >> + if (!PyArg_ParseTuple(args, "i", &c)) >> + { >> + PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed"); > > From PyArg_ParseTuple() [1]: > > | on failure, it returns false and raises the appropriate exception. > > So I don't think you need or should use a custom exception here. > > [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple I am aware, but then you loose the specificity of the error, whether it occurred in _c_tolower or _c_toupper. Its small, so I'll remove it because it also cleans up the #ifdef spaghetti, but if we were to expand this module for other libc functions then we should consider re-adding it. > >> + return NULL; >> + } >> + >> + return Py_BuildValue("i", tolower(c)); >> +} >> + >> + >> +static PyObject * >> +portage_c_toupper(PyObject *self, PyObject *args) >> +{ >> + int c; >> + >> + if (!PyArg_ParseTuple(args, "i", &c)) >> + { >> + PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed"); >> + return NULL; >> + } >> + >> + return Py_BuildValue("i", toupper(c)); >> +} > > Thanks a lot for your effort. This is really getting in shape. I'm doing this because check_locale() broke portage on musl. I'm not a frequent contributor to portage so I'd ask the portage team to free free style it up for me. vapier does that to my code all the time :) -- Anthony G. Basile, Ph. D. Chair of Information Technology D'Youville College Buffalo, NY 14201 (716) 829-8197