From: Alastair Tse <liquidx@gentoo.org>
To: gentoo-dev@gentoo.org
Subject: Re: [gentoo-dev] python-2.3.2 testing required
Date: Mon, 17 Nov 2003 00:29:32 +0000 [thread overview]
Message-ID: <1069028971.19556.39.camel@huggins.eng.cam.ac.uk> (raw)
In-Reply-To: <1068662803.18867.134.camel@huggins.eng.cam.ac.uk>
[-- Attachment #1: Type: text/plain, Size: 4749 bytes --]
On Wed, 2003-11-12 at 18:46, Alastair Tse wrote:
> The reason why I'm not making this default is because UCS4 python uses
> more memory. An example is supybot (Python IRC bot) that uses 8M for
> UCS2 and 13M for UCS4. But note that this example is not scientific
> because the machines were different in kernel version, compiler and
> compiler optimisations.
I've found a little spare time this weekend to do a little bit of memory
benchmarking to prove/disprove my point about UCS4 using more memory
than UCS2.
I wrote and conducted 2 simple tests that I thought were relevant to
Python on Gentoo. The two tests I conducted were:
1. Generating a large number of Python Unicode Strings and recording the
memory usage.
2. Running "emerge" on various different options and recording the
memory usage.
The results demonstrate that UCS4 is more memory hungry _only_ if a
script/module/application uses unicode strings. This means any bindings
that use PyUnicode_* objects (for example, pygtk) or any script that
uses unicode strings. If a script/module/application does not use
unicode objects, it suffers from no noticable memory impact.
The numbers reported are averages from 3 or more runs. In nearly all
cases, the memory usage was constant.
Results:
========
1 : Generating Unicode Multi-Byte Strings (1 to 10000) strings
(String Size of 256 mbchars stored in a regular python list)
-------------------------------------------------------------------
Strings: (UCS2) Mem RSS Shared (UCS4) Mem RSS Shared %+
1 1839 710 1535 1839 711 1535 0
10 1871 712 1535 1871 717 1535 0
100 1904 765 1535 1971 830 1535 3.5
1000 2465 1336 1535 3102 1960 1535 25.84
10000 8213 7052 1535 14445 13309 1535 75.80
2 : Generating Unicode ASCII Strings (1 to 10000) strings
(String Size of 256 chars stored in a regular python list)
-------------------------------------------------------------------
Strings: (UCS2) Mem RSS Shared (UCS4) Mem RSS Shared %+
1 1839 710 1535 1839 711 1535 0
10 1871 712 1535 1871 717 1535 0
100 1904 765 1535 1971 830 1535 3.5
1000 2465 1336 1535 3102 1960 1535 25.84
10000 8213 7053 1535 14445 13309 1535 75.80
3: Max Memory Usage under "emerge -p kde"
-------------------------------------------------------------------
Mem RSS Shared
UCS2: 3222 1893 1955
UCS4: 3123 1769 1955
4: Max Memory Usage under "emerge search kde"
-------------------------------------------------------------------
Mem RSS Shared
UCS2: 3221 1898 1955
UCS4: 3160 1803 1955
Discussion
==========
There are two immediate observations. One is that UCS4 does use more
memory compared to UCS2 when unicode strings are involved. From Test 1
and 2, the VM has an overhead of 1.8M and as more strings are created,
their memory usage difference steadily increase to 75% difference.
The other observation is that if there are is no unicode usage in
application, like "emerge", there is virtually no impact. Actually, in
this case, you'll find that UCS4 uses about 60K ot 100K less memory than
UCS2. I don't have an explanation for that behaviour.
Other observations that can be made which do not relate to the UCS2/UCS4
benchmark is that it doesn't matter if you are primarily dealing with
ASCII or Multi-Byte (eg, CJK characters) strings. As soon as they are
cast as unicode objects, they use more memory. Note that the two runs
have identical memory usage, that is not a mistake.
Another one is that 'emerge' uses the same amount of memory regardless
of what is being run. I had an informal test running just "emerge info"
and it still used approximately the same memory as running more
complicated things like merging packages or searching the package
database.
Other Details
=============
The above results were run with dev-lang/python-2.3.2-r1 with:
Kernel 2.6.0-test9-mm1
Glibc-2.3.2-r8 (w/ nptl)
GCC-3.3.2
Portage 2.0.49-r16
The raw logs for the tests and the scripts used can be found at:
http://dev.gentoo.org/~liquidx/python-test/
Remarks
=======
After running these tests, I still divided about whether UCS4 should be
enabled by default. I'm not seeing the added benefits of UCS4 in
contrast with the memory usage increase it brings. Yet, it also seems
like the "right" thing to do for m17n support.
Cheers,
--
Alastair 'liquidx' Tse
>> Gentoo Developer
>> http://www.liquidx.net/ | http://dev.gentoo.org/~liquidx/
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2003-11-17 0:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-11-12 18:46 [gentoo-dev] python-2.3.2 testing required Alastair Tse
2003-11-13 8:07 ` Nick Jones
2003-11-13 9:57 ` Alastair Tse
2003-11-13 8:07 ` Alastair Tse
2003-11-13 9:05 ` Paul de Vrieze
2003-11-13 9:38 ` Alastair Tse
2003-11-13 9:10 ` Toby Dickenson
2003-11-13 9:51 ` Alastair Tse
2003-11-13 23:34 ` Toby Dickenson
2003-11-14 9:37 ` Alastair Tse
2003-11-15 8:09 ` Simon Watson
2003-11-17 0:29 ` Alastair Tse [this message]
2003-11-17 10:28 ` Toby Dickenson
2003-11-17 10:48 ` Alastair Tse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1069028971.19556.39.camel@huggins.eng.cam.ac.uk \
--to=liquidx@gentoo.org \
--cc=gentoo-dev@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox