From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from <gentoo-user+bounces-104753-garchives=archives.gentoo.org@lists.gentoo.org>) id 1NGM5l-0008WE-63 for garchives@archives.gentoo.org; Fri, 04 Dec 2009 00:32:13 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 2D6DBE0C44; Fri, 4 Dec 2009 00:31:41 +0000 (UTC) Received: from crowfix.com (li35-165.members.linode.com [72.14.176.165]) by pigeon.gentoo.org (Postfix) with ESMTP id E025CE0C44 for <gentoo-user@lists.gentoo.org>; Fri, 4 Dec 2009 00:31:40 +0000 (UTC) Received: (qmail 1889 invoked from network); 4 Dec 2009 00:31:32 -0000 Received: from unknown (HELO df.crowfix.com) (10.130.13.2) by 10.130.13.1 with SMTP; 4 Dec 2009 00:31:32 -0000 Received: (qmail 19825 invoked by uid 1000); 4 Dec 2009 00:31:11 -0000 Date: Thu, 3 Dec 2009 16:31:11 -0800 From: felix@crowfix.com To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long Message-ID: <20091204003111.GA19515@crowfix.com> References: <20091203192003.GA1702@crowfix.com> <20091203205008.5584fa37@gmx.net> <20091203200726.GA6956@crowfix.com> <200912040103.23107.volkerarmin@googlemail.com> Precedence: bulk List-Post: <mailto:gentoo-user@lists.gentoo.org> List-Help: <mailto:gentoo-user+help@lists.gentoo.org> List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org> List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org> List-Id: Gentoo Linux mail <gentoo-user.gentoo.org> X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200912040103.23107.volkerarmin@googlemail.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-Archives-Salt: cf8b02d3-e1c1-46a6-88f6-2b9b8e84ac23 X-Archives-Hash: d9241287dfe466568536f4b69a78daaa On Fri, Dec 04, 2009 at 01:03:23AM +0100, Volker Armin Hemmann wrote: > look at my name, ok? > > Just dropping the Umlaut is wrong. No if, but, maybe. It is wrong. Error. > Mistake. Fail. If you can not enter ?, ? or ?, you must transform them to ae, > oe or ue. I'd like to find a program which would do that! Seriously. But anyway, the purpose of this is not to transform names so our antique ASCII-7 computers can store them, but to eliminate redundant records. For instance, we get data from vendors for all cities and states, geolocation data, which has its own redundancies, such as both FORT WORTH and FT WORTH, or SAINT LOUIS and ST LOUIS. But we have to convert to upper case, get rid of punctuation, get rid of extra white space, etc, and all that is independent of the locale. I want to do the same for unicode. If enough Europeans are in the habit of taking shortcuts and skipping umlauts and accents and cedilla and tildes, then I'd like to standardize the data for lookup. This has nothing to do with converting people's names for storage. We don't even store the transformed place name. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o