On Wed, 2003-11-12 at 18:46, Alastair Tse wrote: > The reason why I'm not making this default is because UCS4 python uses > more memory. An example is supybot (Python IRC bot) that uses 8M for > UCS2 and 13M for UCS4. But note that this example is not scientific > because the machines were different in kernel version, compiler and > compiler optimisations. I've found a little spare time this weekend to do a little bit of memory benchmarking to prove/disprove my point about UCS4 using more memory than UCS2. I wrote and conducted 2 simple tests that I thought were relevant to Python on Gentoo. The two tests I conducted were: 1. Generating a large number of Python Unicode Strings and recording the memory usage. 2. Running "emerge" on various different options and recording the memory usage. The results demonstrate that UCS4 is more memory hungry _only_ if a script/module/application uses unicode strings. This means any bindings that use PyUnicode_* objects (for example, pygtk) or any script that uses unicode strings. If a script/module/application does not use unicode objects, it suffers from no noticable memory impact. The numbers reported are averages from 3 or more runs. In nearly all cases, the memory usage was constant. Results: ======== 1 : Generating Unicode Multi-Byte Strings (1 to 10000) strings (String Size of 256 mbchars stored in a regular python list) ------------------------------------------------------------------- Strings: (UCS2) Mem RSS Shared (UCS4) Mem RSS Shared %+ 1 1839 710 1535 1839 711 1535 0 10 1871 712 1535 1871 717 1535 0 100 1904 765 1535 1971 830 1535 3.5 1000 2465 1336 1535 3102 1960 1535 25.84 10000 8213 7052 1535 14445 13309 1535 75.80 2 : Generating Unicode ASCII Strings (1 to 10000) strings (String Size of 256 chars stored in a regular python list) ------------------------------------------------------------------- Strings: (UCS2) Mem RSS Shared (UCS4) Mem RSS Shared %+ 1 1839 710 1535 1839 711 1535 0 10 1871 712 1535 1871 717 1535 0 100 1904 765 1535 1971 830 1535 3.5 1000 2465 1336 1535 3102 1960 1535 25.84 10000 8213 7053 1535 14445 13309 1535 75.80 3: Max Memory Usage under "emerge -p kde" ------------------------------------------------------------------- Mem RSS Shared UCS2: 3222 1893 1955 UCS4: 3123 1769 1955 4: Max Memory Usage under "emerge search kde" ------------------------------------------------------------------- Mem RSS Shared UCS2: 3221 1898 1955 UCS4: 3160 1803 1955 Discussion ========== There are two immediate observations. One is that UCS4 does use more memory compared to UCS2 when unicode strings are involved. From Test 1 and 2, the VM has an overhead of 1.8M and as more strings are created, their memory usage difference steadily increase to 75% difference. The other observation is that if there are is no unicode usage in application, like "emerge", there is virtually no impact. Actually, in this case, you'll find that UCS4 uses about 60K ot 100K less memory than UCS2. I don't have an explanation for that behaviour. Other observations that can be made which do not relate to the UCS2/UCS4 benchmark is that it doesn't matter if you are primarily dealing with ASCII or Multi-Byte (eg, CJK characters) strings. As soon as they are cast as unicode objects, they use more memory. Note that the two runs have identical memory usage, that is not a mistake. Another one is that 'emerge' uses the same amount of memory regardless of what is being run. I had an informal test running just "emerge info" and it still used approximately the same memory as running more complicated things like merging packages or searching the package database. Other Details ============= The above results were run with dev-lang/python-2.3.2-r1 with: Kernel 2.6.0-test9-mm1 Glibc-2.3.2-r8 (w/ nptl) GCC-3.3.2 Portage 2.0.49-r16 The raw logs for the tests and the scripts used can be found at: http://dev.gentoo.org/~liquidx/python-test/ Remarks ======= After running these tests, I still divided about whether UCS4 should be enabled by default. I'm not seeing the added benefits of UCS4 in contrast with the memory usage increase it brings. Yet, it also seems like the "right" thing to do for m17n support. Cheers, -- Alastair 'liquidx' Tse >> Gentoo Developer >> http://www.liquidx.net/ | http://dev.gentoo.org/~liquidx/