From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-user+bounces-104753-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1NGM5l-0008WE-63
	for garchives@archives.gentoo.org; Fri, 04 Dec 2009 00:32:13 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 2D6DBE0C44;
	Fri,  4 Dec 2009 00:31:41 +0000 (UTC)
Received: from crowfix.com (li35-165.members.linode.com [72.14.176.165])
	by pigeon.gentoo.org (Postfix) with ESMTP id E025CE0C44
	for <gentoo-user@lists.gentoo.org>; Fri,  4 Dec 2009 00:31:40 +0000 (UTC)
Received: (qmail 1889 invoked from network); 4 Dec 2009 00:31:32 -0000
Received: from unknown (HELO df.crowfix.com) (10.130.13.2)
  by 10.130.13.1 with SMTP; 4 Dec 2009 00:31:32 -0000
Received: (qmail 19825 invoked by uid 1000); 4 Dec 2009 00:31:11 -0000
Date: Thu, 3 Dec 2009 16:31:11 -0800
From: felix@crowfix.com
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] [OT] Need advice from people who use non-ascii
 all day long
Message-ID: <20091204003111.GA19515@crowfix.com>
References: <20091203192003.GA1702@crowfix.com>
 <20091203205008.5584fa37@gmx.net>
 <20091203200726.GA6956@crowfix.com>
 <200912040103.23107.volkerarmin@googlemail.com>
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200912040103.23107.volkerarmin@googlemail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Archives-Salt: cf8b02d3-e1c1-46a6-88f6-2b9b8e84ac23
X-Archives-Hash: d9241287dfe466568536f4b69a78daaa

On Fri, Dec 04, 2009 at 01:03:23AM +0100, Volker Armin Hemmann wrote:
> look at my name, ok?
> 
> Just dropping the Umlaut is wrong. No if, but, maybe. It is wrong. Error. 
> Mistake. Fail. If you can not enter ?, ? or ?, you must transform them to ae, 
> oe or ue.

I'd like to find a program which would do that!  Seriously.  But
anyway, the purpose of this is not to transform names so our antique
ASCII-7 computers can store them, but to eliminate redundant records.
For instance, we get data from vendors for all cities and states,
geolocation data, which has its own redundancies, such as both FORT
WORTH and FT WORTH, or SAINT LOUIS and ST LOUIS.  But we have to
convert to upper case, get rid of punctuation, get rid of extra white
space, etc, and all that is independent of the locale.  I want to do
the same for unicode.  If enough Europeans are in the habit of taking
shortcuts and skipping umlauts and accents and cedilla and tildes,
then I'd like to standardize the data for lookup.  This has nothing to
do with converting people's names for storage.  We don't even store
the transformed place name.

-- 
            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
     Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o