On Sat, 2003-11-22 at 21:08, Aron Griffis wrote:
> Could you give a quick run-down on the difference between UCS2 and UCS4
> and what this change buys us?  If it's been discussed in a thread which
> I missed, a pointer to the archived thread would be sufficient.
> 

This was the initial thread:

http://article.gmane.org/gmane.linux.gentoo.devel/13751

Anyway, just a quick run down. Basically, Gentoo's python-2.2 used UCS2
as default for the unicode internal representation. That means using a
16bit word for each unicode character. There were bugs with 2.2's UCS4
implementation though. They have been fixed in 2.3 and it is mature
enough to be used as standard.

UCS4 is pretty popular standard for implementing unicode. For example,
glib has no UCS2 support and only supports UCS4. Another reason for
sticking with UCS4 is that it is recommended by the Python devs and is
being adopted by the other distros like Redhat (>=9) and Debian
(unstable)[3] for python-2.3. In fact, as far as I know, wchar in Linux
defaults to 4 bytes anyway.

I initially had doubts about UCS4 but from my tests[1], unless an
application uses unicode extensively, the memory footprint doesn't grow.
For example, emerge took pretty much the same memory (actually 160k
less).

So in the long run, I think aligning ourselves with UCS4 support in
Python will decrease the hassles in the future. For a more professional
(and detailed) treatment of the subject, you might like to read PEP261
[2].

Hope that answers your questions.

Cheers,

Alastair


[1] http://article.gmane.org/gmane.linux.gentoo.devel/13842/match=python
[2] 
http://www.python.org/peps/pep-0261.html
[3] http://mail.python.org/pipermail/python-dev/2003-June/036458.html
> Thanks,
> Aron
-- 
Alastair 'liquidx' Tse
 >> Gentoo Developer
 >> http://www.liquidx.net/ | http://dev.gentoo.org/~liquidx/