public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] [OT] Need advice from people who use non-ascii all day long
@ 2009-12-03 19:20 felix
  2009-12-03 19:50 ` Renat Golubchyk
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: felix @ 2009-12-03 19:20 UTC (permalink / raw
  To: gentoo-user

I have a project which requires normalizing names, and by that, I mean
converting to lower case etc, whatever eliminates redundancies.  I
know Unicode has a different "normalize" meaning, but for my purposes,
that has already been done.  Maybe I should call it standardization or
make up a new cromulent word.

By which I really mean I am confused by a lot of advice I have gotten
from USAians who get by with the good old 7 bit ASCII character set on
a daily basis, whether it be written in Unicode or not.

One of the puzzles to me is all the accented chars.  Umlauts, etc.  I
am not trying to convert names for permanent purposes but for internal
comparison.  In Germany is a district "Busingen", with an umlauted
'u'.  Is it reasonable to consider it the same word whether with or
without the unlauted u?  French has the cedilla and acute and grave
accents.  Spanish has the tilde n.  Scandinavian languages (all?
some?) have the o with a slash.

Or put another way, I don't know much about German, French, Spanish,
etc keyboards.  Do your keyboards have any of the extra keys, all of
them?  Are German keyboards and French and Spanish keyboards as
restricted to their own languages as US keyboards are?  If you have to
hit two or three keys to keep the umlauts, accents, and tildes, do you
get lazy sometimes and type the base character by itself?  Is it even
considered the base character, or is it considered lazy and sloppy,
much as I get complaints about typing "thru" because "through" is too
much trouble?

I need something the equivalent of the C function strcasecmp() which
not only ignores case, but all other differences without distinction,
whatever they may be.  If leaving off umlauts horrifies academics and
purists but is what people do in the real world, I want to take that
into consideration, so that if one person uses the ummlaut and another
doesn't, it won't generated two separate entries.  But if leaving off
the umlaut or accent is a distinct place name, then I can't do that --
but if real world people do that and live with the confusion, then I
guess I have to make a different choice.

Yes, I am something of an ignorant American.  I know some Japanese,
French, and Spanish, but not the details of everyday usage.  I'd like
to learn.

-- 
            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
     Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-12-15 18:01 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-03 19:20 [gentoo-user] [OT] Need advice from people who use non-ascii all day long felix
2009-12-03 19:50 ` Renat Golubchyk
2009-12-03 20:07   ` felix
2009-12-03 20:29     ` Renat Golubchyk
2009-12-03 22:32       ` Francisco Ares
2009-12-03 22:54         ` felix
2009-12-04  0:03     ` Volker Armin Hemmann
2009-12-04  0:18       ` Alan McKinnon
2009-12-04  0:31       ` felix
2009-12-04 13:42         ` Volker Armin Hemmann
2009-12-04 20:50           ` Alan McKinnon
2009-12-05  0:01             ` Neil Bothwick
2009-12-04  9:17     ` Patrick Holthaus
2009-12-04  9:55       ` felix
2009-12-03 22:07   ` Volker Armin Hemmann
2009-12-03 22:14     ` Alan McKinnon
2009-12-03 22:38 ` Arttu V.
2009-12-03 22:57   ` felix
2009-12-06  1:58 ` daid kahl
2009-12-06  2:35   ` felix
2009-12-06  2:45     ` daid kahl
2009-12-06  2:47       ` daid kahl
2009-12-06  3:19       ` felix
2009-12-15 16:05 ` J. Roeleveld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox