From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20285 invoked by uid 1002); 13 Nov 2003 09:51:50 -0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 7967 invoked from network); 13 Nov 2003 09:51:49 -0000 From: Alastair Tse To: gentoo-dev@gentoo.org In-Reply-To: <200311130910.16348.tdickenson@devmail.geminidataloggers.co.uk> References: <1068662803.18867.134.camel@huggins.eng.cam.ac.uk> <200311130910.16348.tdickenson@devmail.geminidataloggers.co.uk> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-D7XBoxufHb3wJgNsJy31" Message-Id: <1068717088.25166.47.camel@huggins.eng.cam.ac.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.4 Date: Thu, 13 Nov 2003 09:51:28 +0000 X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/ X-Cam-AntiVirus: No virus found X-Cam-SpamDetails: scanned, SpamAssassin (score=-7.4, EMAIL_ATTRIBUTION -0.50, IN_REP_TO -0.50, PGP_SIGNATURE_2 -2.45, QUOTED_EMAIL_TEXT -0.48, REFERENCES -0.50, REPLY_WITH_QUOTES -0.50, USER_AGENT_XIMIAN -2.35) Subject: Re: [gentoo-dev] python-2.3.2 testing required X-Archives-Salt: da271d29-7cf0-4a0d-b8d9-143ac1035495 X-Archives-Hash: 9b89fa6f0e5d12398b39d449fd1b842e --=-D7XBoxufHb3wJgNsJy31 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Thu, 2003-11-13 at 09:10, Toby Dickenson wrote: > Ive not used ucs4 python yet, but it is one of the things I was looking=20 > forward to in version 2.3. It would much nicer to leave ucs2 behind. I would like to move away from UCS2 as well, but I'd like some arguments to say why this is a good thing apart from "it's more compatible.". > If ucs4 strings were the only cause of that difference, supybot would nee= d to=20 > be storing 2.5 million unicode characters. I guess that isnt likely.=20 > Excluding bugs, I dont see any reason why a program that doesnt use any=20 > unicode objects would use more memory when running on a ucs4 python=20 > interpreter. All unicode string objects would have been stored in UCS4 instead of UCS2. Things like XML parsers all use unicode string objects to store their representations because UTF-8 is the default encoding for XML. Those sorts of applications may have a more significant memory footprint growth. > > But note that this example is not scientific > > because the machines were different in kernel version, compiler and > > compiler optimisations. >=20 > Those reasons sound much more plausibe to me. Does anyone have a more=20 > scientific comparison of the effect of the ucs4 option on python? I'd like to do that some time. Otherwise, someone with a faster machine than mine may want to try it. It would be an interesting to see what the real impact is. If the memory footprint doesn't grow as much as I claims it does, then it is a powerful argument for moving to UCS4 as default. The reason why UCS2 is still default in the masked python-2.3.2 is because (a) not many people use anything at the moment that requires anything above UCS2 and (b) UCS4 does take up more memory compared to the UCS2. How much more, I'm not certain. For instance, how much more memory would portage take if it doesn't use unicode strings at all? Cheers, --=20 Alastair 'liquidx' Tse >> Gentoo Developer >> http://www.liquidx.net/ | http://dev.gentoo.org/~liquidx/ --=-D7XBoxufHb3wJgNsJy31 Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA/s1QgOM4cezkHFPYRArRXAKCCeEFal/pS6G/XN6Z9qGA0zZjWhQCeO1n8 L0Igw54GLaz2uu5MokEnHLE= =R3RR -----END PGP SIGNATURE----- --=-D7XBoxufHb3wJgNsJy31--