From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-dev-return-8243-arch-gentoo-dev=gentoo.org@gentoo.org>
Received: (qmail 20285 invoked by uid 1002); 13 Nov 2003 09:51:50 -0000
Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm
Precedence: bulk
List-Post: <mailto:gentoo-dev@gentoo.org>
List-Help: <mailto:gentoo-dev-help@gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev-unsubscribe@gentoo.org>
List-Subscribe: <mailto:gentoo-dev-subscribe@gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@gentoo.org
Received: (qmail 7967 invoked from network); 13 Nov 2003 09:51:49 -0000
From: Alastair Tse <liquidx@gentoo.org>
To: gentoo-dev@gentoo.org
In-Reply-To: <200311130910.16348.tdickenson@devmail.geminidataloggers.co.uk>
References: <1068662803.18867.134.camel@huggins.eng.cam.ac.uk>
	 <200311130910.16348.tdickenson@devmail.geminidataloggers.co.uk>
Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-D7XBoxufHb3wJgNsJy31"
Message-Id: <1068717088.25166.47.camel@huggins.eng.cam.ac.uk>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.4 
Date: Thu, 13 Nov 2003 09:51:28 +0000
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
X-Cam-AntiVirus: No virus found
X-Cam-SpamDetails: scanned, SpamAssassin (score=-7.4,
	EMAIL_ATTRIBUTION -0.50, IN_REP_TO -0.50, PGP_SIGNATURE_2 -2.45,
	QUOTED_EMAIL_TEXT -0.48, REFERENCES -0.50, REPLY_WITH_QUOTES -0.50,
	USER_AGENT_XIMIAN -2.35)
Subject: Re: [gentoo-dev] python-2.3.2 testing required
X-Archives-Salt: da271d29-7cf0-4a0d-b8d9-143ac1035495
X-Archives-Hash: 9b89fa6f0e5d12398b39d449fd1b842e

--=-D7XBoxufHb3wJgNsJy31
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Thu, 2003-11-13 at 09:10, Toby Dickenson wrote:
> Ive not used ucs4 python yet, but it is one of the things I was looking=20
> forward to in version 2.3. It would much nicer to leave ucs2 behind.

I would like to move away from UCS2 as well, but I'd like some arguments
to say why this is a good thing apart from "it's more compatible.".

> If ucs4 strings were the only cause of that difference, supybot would nee=
d to=20
> be storing 2.5 million unicode characters. I guess that isnt likely.=20
> Excluding bugs, I dont see any reason why a program that doesnt use any=20
> unicode objects would use more memory when running on a ucs4 python=20
> interpreter.

All unicode string objects would have been stored in UCS4 instead of
UCS2. Things like XML parsers all use unicode string objects to store
their representations because UTF-8 is the default encoding for XML.
Those sorts of applications may have a more significant  memory
footprint growth.

> > But note that this example is not scientific
> > because the machines were different in kernel version, compiler and
> > compiler optimisations.
>=20
> Those reasons sound much more plausibe to me. Does anyone have a more=20
> scientific comparison of the effect of the ucs4 option on python?

I'd like to do that some time. Otherwise, someone with a faster machine
than mine may want to try it. It would be an interesting to see what the
real impact is. If the memory footprint doesn't grow as much as I claims
it does, then it is a powerful argument for moving to UCS4 as default.

The reason why UCS2 is still default in the masked python-2.3.2 is
because (a) not many people use anything at the moment that requires
anything above UCS2 and (b) UCS4 does take up more memory compared to
the UCS2. How much more, I'm not certain.

For instance, how much more memory would portage take if it doesn't use
unicode strings at all?

Cheers,
--=20
Alastair 'liquidx' Tse
 >> Gentoo Developer
 >> http://www.liquidx.net/ | http://dev.gentoo.org/~liquidx/


--=-D7XBoxufHb3wJgNsJy31
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA/s1QgOM4cezkHFPYRArRXAKCCeEFal/pS6G/XN6Z9qGA0zZjWhQCeO1n8
L0Igw54GLaz2uu5MokEnHLE=
=R3RR
-----END PGP SIGNATURE-----

--=-D7XBoxufHb3wJgNsJy31--