>   I don't know if this has improved over the years, but my initial
> experience with unicode was rather negative.  The fact that text
> files were twice as large wasn't a major problem in itself.  The
> real showstopper was that importing text files into spreadsheets
> and text-editors and word processors failed miseraby.
> 
>   I looked at a unicode text file with a binary viewer.  It turns out
> that a simple text string like "1234" was actually...
> "1" binary-zero "2" binary-zero "3" binary-zero "4" binary zero, etc.

That's (as someone has already pointed out) UTF-16, which is the default for 
some Windows tools (but understood in Linux too). (Even UTF-32 exists where 
all characters are 4 byte wide, but I've never seen it in the wild.)

UTF-8 is normally used on Linux (and ASCII chars look exactly the same there); 
even for "long characters" outside the ASCII range spreadsheets and word 
processors should not be a problem anymore.

-- 
Andreas K. Hüttel
dilfridge@gentoo.org
Gentoo Linux developer 
(council, qa, toolchain, base-system, perl, libreoffice)