* [gentoo-dev] enable UTF8 per default?
@ 2006-02-28 10:58 Patrick Lauer
2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
` (7 more replies)
0 siblings, 8 replies; 21+ messages in thread
From: Patrick Lauer @ 2006-02-28 10:58 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 975 bytes --]
Hi all,
at FOSDEM we had a nice discussion about languages, translations etc.
Having people from the US (wolf31o2) who never have problems and people
from Japan (usata) who always have problems with encodings /
charsets / ... was quite interesting.
During that discussion we realized that having utf-8 not enabled by
default and no utf8 fonts available by default causes lots of
recompilation and reconfiguration.
Enabling the unicode useflag in the profiles should help our
international users and should not cause any problems. Are there any
known bugs / problems this would trigger? Any reasons against that?
If there are no objections this should be a small but helpful change.
On a tangent I wonder if pulling in extra fonts as a dependency of X
makes sense (useflag controlled, enabled by default) - that way the
unicode capabilities are available without any configuration.
Patrick
--
Stand still, and let the rest of the universe move
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 200 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
@ 2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
2006-02-28 11:47 ` Patrick Lauer
2006-02-28 12:50 ` Lars Weiler
` (6 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Diego 'Flameeyes' Pettenò @ 2006-02-28 11:32 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 605 bytes --]
On Tuesday 28 February 2006 11:58, Patrick Lauer wrote:
> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration.
At the same time, you'll probably hear people bitching about UTF-8 being
enabled because "it's not needed for me, should be the rest of the world to
change"....
I'd be the first to be interested in having it enabled by default, tho.
--
Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE
[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
@ 2006-02-28 11:47 ` Patrick Lauer
2006-02-28 12:11 ` Diego 'Flameeyes' Pettenò
2006-02-28 14:27 ` Mike Frysinger
0 siblings, 2 replies; 21+ messages in thread
From: Patrick Lauer @ 2006-02-28 11:47 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]
On Tue, 2006-02-28 at 12:32 +0100, Diego 'Flameeyes' Pettenò wrote:
> On Tuesday 28 February 2006 11:58, Patrick Lauer wrote:
> > During that discussion we realized that having utf-8 not enabled by
> > default and no utf8 fonts available by default causes lots of
> > recompilation and reconfiguration.
> At the same time, you'll probably hear people bitching about UTF-8 being
> enabled because "it's not needed for me, should be the rest of the world to
> change"....
It is still optional, just enabled by default :-)
All the people with non-ASCII charsets will have less work, only that we
switch the load from, say, 75% of the users fixing their environment to
25% of users having to switch.
And who doesn't want UTF-8? Just being able to see a Japanese Website as
it was intended (even if I can't read it) is a nice feature.
So - apart from some users maybe not wanting it, any technical reasons
against?
> I'd be the first to be interested in having it enabled by default, tho.
Yes, otherwise spelling your name is almost impossible :-)
Patrick
--
Stand still, and let the rest of the universe move
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 200 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 11:47 ` Patrick Lauer
@ 2006-02-28 12:11 ` Diego 'Flameeyes' Pettenò
2006-02-28 14:27 ` Mike Frysinger
1 sibling, 0 replies; 21+ messages in thread
From: Diego 'Flameeyes' Pettenò @ 2006-02-28 12:11 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]
On Tuesday 28 February 2006 12:47, Patrick Lauer wrote:
> It is still optional, just enabled by default :-)
Would be enough to be criticized probably, mainly by english-speaking users
that doesn't care of extended characters.
Although, this would follow also the direction of both Apple and Microsoft,
the first providing, the other saying that will provide, an always-unicoded
system.
That is probably the way to allow an easier access to Gentoo for non-english
speaking people, too.
> So - apart from some users maybe not wanting it, any technical reasons
> against?
I'll wait for the "clutter" comment by users and maybe devs.
I was criticized for enabling unicode forcefully on vlc because of a
source-code bug that prevented non-unicode wxGTK to be used to build it,
after that I'm always expecting some sort of problem :P
> > I'd be the first to be interested in having it enabled by default, tho.
> Yes, otherwise spelling your name is almost impossible :-)
That, and I'm actually trying to find time to learn Japanese :P
But time is something I don't have abundant :|
--
Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE
[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
@ 2006-02-28 12:50 ` Lars Weiler
2006-02-28 13:50 ` Patrick Lauer
2006-02-28 16:24 ` Kalin KOZHUHAROV
2006-02-28 16:51 ` Josh
` (5 subsequent siblings)
7 siblings, 2 replies; 21+ messages in thread
From: Lars Weiler @ 2006-02-28 12:50 UTC (permalink / raw
To: gentoo-dev
* Patrick Lauer <patrick@gentoo.org> [06/02/28 11:58 +0100]:
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?
It is enabled by default. At least on ppc. And that since,
uhm, summer 2004?
I can't say if there are any problems, as I didn't received
a bug for a long time. The only thing that's nasty: we
don't have any good utf8-fonts for the console.
Regards, Lars
--
Lars Weiler <pylon@gentoo.org> +49-171-1963258
Gentoo Linux PowerPC : Developer and Release Engineer
Gentoo Infrastructure : CVS Administrator
Gentoo Foundation : Trustee
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 12:50 ` Lars Weiler
@ 2006-02-28 13:50 ` Patrick Lauer
2006-02-28 14:46 ` Joseph Jezak
2006-02-28 16:24 ` Kalin KOZHUHAROV
1 sibling, 1 reply; 21+ messages in thread
From: Patrick Lauer @ 2006-02-28 13:50 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 774 bytes --]
On Tue, 2006-02-28 at 13:50 +0100, Lars Weiler wrote:
> * Patrick Lauer <patrick@gentoo.org> [06/02/28 11:58 +0100]:
> > Enabling the unicode useflag in the profiles should help our
> > international users and should not cause any problems. Are there any
> > known bugs / problems this would trigger? Any reasons against that?
>
> It is enabled by default. At least on ppc.
As far as I can tell that's not the case on x86
> And that since,
> uhm, summer 2004?
Ok, so it should be quite well-tested.
> I can't say if there are any problems, as I didn't received
> a bug for a long time. The only thing that's nasty: we
> don't have any good utf8-fonts for the console.
I think that's acceptable.
--
Stand still, and let the rest of the universe move
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 200 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 11:47 ` Patrick Lauer
2006-02-28 12:11 ` Diego 'Flameeyes' Pettenò
@ 2006-02-28 14:27 ` Mike Frysinger
1 sibling, 0 replies; 21+ messages in thread
From: Mike Frysinger @ 2006-02-28 14:27 UTC (permalink / raw
To: gentoo-dev
On Tuesday 28 February 2006 06:47, Patrick Lauer wrote:
> On Tue, 2006-02-28 at 12:32 +0100, Diego 'Flameeyes' Pettenò wrote:
> > On Tuesday 28 February 2006 11:58, Patrick Lauer wrote:
> > > During that discussion we realized that having utf-8 not enabled by
> > > default and no utf8 fonts available by default causes lots of
> > > recompilation and reconfiguration.
> >
> > At the same time, you'll probably hear people bitching about UTF-8 being
> > enabled because "it's not needed for me, should be the rest of the world
> > to change"....
>
> It is still optional, just enabled by default :-)
> All the people with non-ASCII charsets will have less work, only that we
> switch the load from, say, 75% of the users fixing their environment to
> 25% of users having to switch.
hopefully people will fix their packages to respect USE=unicode as to whether
they link against libncurses or libncursesw
-mike
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 13:50 ` Patrick Lauer
@ 2006-02-28 14:46 ` Joseph Jezak
0 siblings, 0 replies; 21+ messages in thread
From: Joseph Jezak @ 2006-02-28 14:46 UTC (permalink / raw
To: gentoo-dev
>>I can't say if there are any problems, as I didn't received
>>a bug for a long time. The only thing that's nasty: we
>>don't have any good utf8-fonts for the console.
>
> I think that's acceptable.
The only issue related to that we really have is this bug, which is
annoying but not fatal:
http://bugs.gentoo.org/show_bug.cgi?id=107235
-Joe
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 12:50 ` Lars Weiler
2006-02-28 13:50 ` Patrick Lauer
@ 2006-02-28 16:24 ` Kalin KOZHUHAROV
2006-03-04 12:46 ` Alexander Simonov
1 sibling, 1 reply; 21+ messages in thread
From: Kalin KOZHUHAROV @ 2006-02-28 16:24 UTC (permalink / raw
To: gentoo-dev
Lars Weiler wrote:
> * Patrick Lauer <patrick@gentoo.org> [06/02/28 11:58 +0100]:
>> Enabling the unicode useflag in the profiles should help our
>> international users and should not cause any problems. Are there any
>> known bugs / problems this would trigger? Any reasons against that?
>
> It is enabled by default. At least on ppc. And that since,
> uhm, summer 2004?
>
> I can't say if there are any problems, as I didn't received
> a bug for a long time.
Well there are a few problems, but yes I cannot name them now.
Using Japanese, Cyrillic and English in a few encodings each is a big nightmare.
Nowadays I try to move everything to UTF-8, but there are those windoze users
and webdevs that make all Japanese in Shift_JIS ... So support of wide range of
encodings is a must, but UTF-8 is the truth.
> The only thing that's nasty: we don't have any good utf8-fonts for the console.
And not only the console.
Even for xterm there are not many good fonts (known to me) that display both Japanese
and Cyrillic in regular and bold. Currently there is only on combination that works for me.
So fonts, font config and related stuff is what has to be fixed first.
Kalin.
P.S. And before fixed, it has to be filed... Promise to take notes (again) when I see something.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
2006-02-28 12:50 ` Lars Weiler
@ 2006-02-28 16:51 ` Josh
2006-02-28 17:47 ` solar
` (4 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Josh @ 2006-02-28 16:51 UTC (permalink / raw
To: gentoo-dev
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ooh, I'm very much in favor of unicode being enabled by default. It's not like
users would be limited *only* to UTF-8 on their new installs, anyway. I'd love
to see this implemented.
++ for the suggestion. :)
Patrick Lauer wrote:
> Hi all,
>
> at FOSDEM we had a nice discussion about languages, translations etc.
> Having people from the US (wolf31o2) who never have problems and people
> from Japan (usata) who always have problems with encodings /
> charsets / ... was quite interesting.
>
> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration.
>
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?
>
> If there are no objections this should be a small but helpful change.
>
> On a tangent I wonder if pulling in extra fonts as a dependency of X
> makes sense (useflag controlled, enabled by default) - that way the
> unicode capabilities are available without any configuration.
>
> Patrick
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iD8DBQFEBH+YrsJQqN81j74RAr1+AJ44WIZB6nSljue+RC//KWAvAFyFUwCdG5cB
khBaPU69f8gAhn1MFN+grLs=
=0DAj
-----END PGP SIGNATURE-----
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
` (2 preceding siblings ...)
2006-02-28 16:51 ` Josh
@ 2006-02-28 17:47 ` solar
2006-02-28 17:53 ` Ciaran McCreesh
` (2 more replies)
2006-02-28 23:51 ` Bjarke Istrup Pedersen
` (3 subsequent siblings)
7 siblings, 3 replies; 21+ messages in thread
From: solar @ 2006-02-28 17:47 UTC (permalink / raw
To: gentoo-dev
On Tue, 2006-02-28 at 11:58 +0100, Patrick Lauer wrote:
> Hi all,
>
> at FOSDEM we had a nice discussion about languages, translations etc.
> Having people from the US (wolf31o2) who never have problems and people
> from Japan (usata) who always have problems with encodings /
> charsets / ... was quite interesting.
>
> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration.
>
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?
>
> If there are no objections this should be a small but helpful change.
>
> On a tangent I wonder if pulling in extra fonts as a dependency of X
> makes sense (useflag controlled, enabled by default) - that way the
> unicode capabilities are available without any configuration.
I forget where I read it but I thought that unicode lead to overflows
and was considered a general security risk. I wish I knew where I read
that but I'm unable to find it.
Any list readers know anything relating to that?
--
solar <solar@gentoo.org>
Gentoo Linux
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 17:47 ` solar
@ 2006-02-28 17:53 ` Ciaran McCreesh
2006-02-28 18:25 ` Bryan Østergaard
2006-02-28 19:18 ` Kevin F. Quinn (Gentoo)
2 siblings, 0 replies; 21+ messages in thread
From: Ciaran McCreesh @ 2006-02-28 17:53 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 952 bytes --]
On Tue, 28 Feb 2006 12:47:33 -0500 solar <solar@gentoo.org> wrote:
| I forget where I read it but I thought that unicode lead to overflows
| and was considered a general security risk. I wish I knew where I read
| that but I'm unable to find it.
|
| Any list readers know anything relating to that?
Eh, not really. With non-utf-8 you could argue that it's an increased
risk, since you get non-string-terminating nulls, but with utf-8 those
aren't an issue.
It's not really a very well substantiated claim. It's like saying "GUI
programming leads to bugs" or "internationalisation leads to program
crashes". Yes, it's possible (in C, anyway) to screw up your buffer
routines when converting code to handle utf-8, but then it's always
possible to screw up buffer routines.
--
Ciaran McCreesh : Gentoo Developer (Wearer of the shiny hat)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 17:47 ` solar
2006-02-28 17:53 ` Ciaran McCreesh
@ 2006-02-28 18:25 ` Bryan Østergaard
2006-02-28 19:18 ` Kevin F. Quinn (Gentoo)
2 siblings, 0 replies; 21+ messages in thread
From: Bryan Østergaard @ 2006-02-28 18:25 UTC (permalink / raw
To: gentoo-dev
On Tue, Feb 28, 2006 at 12:47:33PM -0500, solar wrote:
> I forget where I read it but I thought that unicode lead to overflows
> and was considered a general security risk. I wish I knew where I read
> that but I'm unable to find it.
>
> Any list readers know anything relating to that?
>
It's true that many overflows have been found in unicode aware
applications, like the zillion unicode overflows in Internet Explorer
for example. But that shouldn't lead to considering unicode a general
security risk in my mind even though the apache team uses ascii in the
default configuration to protect against bugs in poorly written
applications.
Regards,
Bryan Østergaard
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 17:47 ` solar
2006-02-28 17:53 ` Ciaran McCreesh
2006-02-28 18:25 ` Bryan Østergaard
@ 2006-02-28 19:18 ` Kevin F. Quinn (Gentoo)
2006-02-28 20:23 ` solar
2 siblings, 1 reply; 21+ messages in thread
From: Kevin F. Quinn (Gentoo) @ 2006-02-28 19:18 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1476 bytes --]
On Tue, 28 Feb 2006 12:47:33 -0500
solar <solar@gentoo.org> wrote:
> I forget where I read it but I thought that unicode lead to overflows
> and was considered a general security risk. I wish I knew where I read
> that but I'm unable to find it.
Well, stuff I could find includes:
http://www.kde.org/info/security/advisory-20060119-1.txt
buggy UTF-8 decoder in KDE - this is an overflow error, which as
ciaranm says is a risk applicable to anything. It's a bug in KDE, not
in UTF-8 as such. Perhaps this is what was at the back of your mind.
http://www.izerv.net/idwg-public/archive/0181.html
risks of using UTF-8; in particular the use of separate validators
which won't process things exactly the same way the application does.
Also homograph risks associated with allowing more than one encoding for
a character.
http://www.eeye.com/html/Research/Advisories/AD20010705.html
example of UTF-8(ish) used to fool IDSs by using alternative
non-standard encodings that IDSs aren't aware of.
This actually is another example of issues with secondary validators
described in the link above - they're not guaranteed to parse things
exactly the same way the application does.
http://www.microsoft.com/mspress/books/sampchap/5612b.asp
describes a number of risks of accepting UTF-8, including the above.
So far I haven't found anything that could be considered a general
security risk, but that doesn't prove much :)
--
Kevin F. Quinn
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 19:18 ` Kevin F. Quinn (Gentoo)
@ 2006-02-28 20:23 ` solar
0 siblings, 0 replies; 21+ messages in thread
From: solar @ 2006-02-28 20:23 UTC (permalink / raw
To: gentoo-dev
On Tue, 2006-02-28 at 20:18 +0100, Kevin F. Quinn (Gentoo) wrote:
> On Tue, 28 Feb 2006 12:47:33 -0500
> solar <solar@gentoo.org> wrote:
>
> > I forget where I read it but I thought that unicode lead to overflows
> > and was considered a general security risk. I wish I knew where I read
> > that but I'm unable to find it.
>
> Well, stuff I could find includes:
>
> http://www.kde.org/info/security/advisory-20060119-1.txt
> buggy UTF-8 decoder in KDE - this is an overflow error, which as
> ciaranm says is a risk applicable to anything. It's a bug in KDE, not
> in UTF-8 as such. Perhaps this is what was at the back of your mind.
>
>
> http://www.izerv.net/idwg-public/archive/0181.html
> risks of using UTF-8; in particular the use of separate validators
> which won't process things exactly the same way the application does.
> Also homograph risks associated with allowing more than one encoding for
> a character.
>
> http://www.eeye.com/html/Research/Advisories/AD20010705.html
> example of UTF-8(ish) used to fool IDSs by using alternative
> non-standard encodings that IDSs aren't aware of.
> This actually is another example of issues with secondary validators
> described in the link above - they're not guaranteed to parse things
> exactly the same way the application does.
>
> http://www.microsoft.com/mspress/books/sampchap/5612b.asp
> describes a number of risks of accepting UTF-8, including the above.
>
>
> So far I haven't found anything that could be considered a general
> security risk, but that doesn't prove much :)
Thanks Kevin. I think whatever I was thinking of had todo with widechar
support. Maybe on phrack, vuln-dev, DD I forget.
But the second link was a pretty good read and perhaps can give us some
sort of reasonable checks that we can use before we opt to allow the use
flag to be enabled in our hardened profiles.
Think we can automate any checks using the UTF-8-test.txt ?
--
solar <solar@gentoo.org>
Gentoo Linux
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
` (3 preceding siblings ...)
2006-02-28 17:47 ` solar
@ 2006-02-28 23:51 ` Bjarke Istrup Pedersen
2006-03-08 7:43 ` [gentoo-dev] " Mathieu Bonnet
` (2 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Bjarke Istrup Pedersen @ 2006-02-28 23:51 UTC (permalink / raw
To: gentoo-dev
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Patrick Lauer skrev:
> Hi all,
>
> at FOSDEM we had a nice discussion about languages, translations etc.
> Having people from the US (wolf31o2) who never have problems and people
> from Japan (usata) who always have problems with encodings /
> charsets / ... was quite interesting.
>
> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration.
>
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?
>
> If there are no objections this should be a small but helpful change.
>
> On a tangent I wonder if pulling in extra fonts as a dependency of X
> makes sense (useflag controlled, enabled by default) - that way the
> unicode capabilities are available without any configuration.
>
> Patrick
I think it would be nice to have it enabled too :-)
You got my vote.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEBOHkO+Ewtpi9rLERAralAJoD2y5E9U6rVKV5WMKyjg/3u6baOACeKXba
dOAfrKDeV4ci9W9ykNwtKCQ=
=4Qkm
-----END PGP SIGNATURE-----
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 16:24 ` Kalin KOZHUHAROV
@ 2006-03-04 12:46 ` Alexander Simonov
2006-03-04 20:13 ` Kalin KOZHUHAROV
0 siblings, 1 reply; 21+ messages in thread
From: Alexander Simonov @ 2006-03-04 12:46 UTC (permalink / raw
To: gentoo-dev
On Wed, Mar 01, 2006 at 01:24:26AM +0900, Kalin KOZHUHAROV wrote:
>Well there are a few problems, but yes I cannot name them now.
>Using Japanese, Cyrillic and English in a few encodings each is a big nightmare.
>
It's true! We in xUSSR use KOI8-R, KOI8-U, CP1251 ( aka Windows-1251),
CP866.
>Nowadays I try to move everything to UTF-8, but there are those windoze users
>and webdevs that make all Japanese in Shift_JIS ... So support of wide range of
>encodings is a must, but UTF-8 is the truth.
>
>> The only thing that's nasty: we don't have any good utf8-fonts for the console.
>And not only the console.
>Even for xterm there are not many good fonts (known to me) that display both Japanese
>and Cyrillic in regular and bold. Currently there is only on combination that works for me.
>
What about terminus and UniCyr (unicode font from console-tools-cyrillic)?
I am use this fonts and most of russian speaking people says what this
font is the best font for cyrilic charsets.
I am don't see any issues in fonts for me.
>So fonts, font config and related stuff is what has to be fixed first.
>
--
WBR, Alexander Simonov (DEVL-UANIC)
Ukrainian Gentoo Community Coordinator
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-03-04 12:46 ` Alexander Simonov
@ 2006-03-04 20:13 ` Kalin KOZHUHAROV
0 siblings, 0 replies; 21+ messages in thread
From: Kalin KOZHUHAROV @ 2006-03-04 20:13 UTC (permalink / raw
To: gentoo-dev
Alexander Simonov wrote:
> On Wed, Mar 01, 2006 at 01:24:26AM +0900, Kalin KOZHUHAROV wrote:
>> Well there are a few problems, but yes I cannot name them now.
>> Using Japanese, Cyrillic and English in a few encodings each is a big
>> nightmare.
>>
>
> It's true! We in xUSSR use KOI8-R, KOI8-U, CP1251 ( aka Windows-1251),
> CP866.
>
>> Nowadays I try to move everything to UTF-8, but there are those
>> windoze users
>> and webdevs that make all Japanese in Shift_JIS ... So support of wide
>> range of
>> encodings is a must, but UTF-8 is the truth.
>>
>>> The only thing that's nasty: we don't have any good utf8-fonts for
>>> the console.
>> And not only the console.
>> Even for xterm there are not many good fonts (known to me) that
>> display both Japanese
>> and Cyrillic in regular and bold. Currently there is only on
>> combination that works for me.
>>
> What about terminus and UniCyr (unicode font from console-tools-cyrillic)?
> I am use this fonts and most of russian speaking people says what this
> font is the best font for cyrilic charsets.
> I am don't see any issues in fonts for me.
Yes, then the problem is with Japanese...
>> So fonts, font config and related stuff is what has to be fixed first.
Kalin.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* [gentoo-dev] Re: enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
` (4 preceding siblings ...)
2006-02-28 23:51 ` Bjarke Istrup Pedersen
@ 2006-03-08 7:43 ` Mathieu Bonnet
2006-03-09 20:25 ` [gentoo-dev] " Kevin F. Quinn (Gentoo)
2006-03-11 20:29 ` Eldad Zack
7 siblings, 0 replies; 21+ messages in thread
From: Mathieu Bonnet @ 2006-03-08 7:43 UTC (permalink / raw
To: gentoo-dev
Hi,
Patrick Lauer <patrick <at> gentoo.org> writes:
>
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. [...]
>
> On a tangent I wonder if pulling in extra fonts as a dependency of X
> makes sense (useflag controlled, enabled by default) - that way the
> unicode capabilities are available without any configuration.
>
Don't forget about the kernel configuration, the locales configuration, and
individual apps configuration (run-time configuration, I mean).
See: http://www.gentoo.org/doc/en/utf-8.xml
Having all this done automatically would indeed be nice.
>
> Are there any known bugs / problems this would trigger? Any
> reasons against that?
>
The classic problem is that non-ASCII characters will not be readable by the
people still not using UTF-8, instead of their old ISO/other charset...
If someone is using an old ISO/other charset and someone else is using UTF-8,
and if they speak the same language, they won't be able to understand
eachother's non-ASCII characters... (and most users won't understand why -and
most users will blame it on the configuration of the user using UTF-8 :/)
Classic case of incompatibility hell...
UTF-8 is here to solve this, but it only works if everybody changes to UTF-8...
but as long as not all software support UTF-8 (sometimes, they don't, only
because the devs don't feel like it is necessary :/), and as long as not every
users use UTF-8, then it will be perfectly respectable to stick to ISO/other
charsets, to avoid problems with family members, friends and colleagues who
still use ISO/other charsets and don't want to change or just can't change...
(Windows users, mostly). (This is pretty similar to the situation concerning
instant messaging... you want to go to Jabber, but everyone else is using MSN,
AIM, ICQ and Yahoo...).
This is why we should still not go with fully activating UTF-8 as default...
what we should do, is ease the migration to UTF-8, for everyone who feel like
using it... (and for everyone whose family members, friends and colleagues
already use UTF-8 -but it should be quite rare). If it's easy, more and more
people will migrate, and when Windows will activate UTF-8 by default (let's pray
MS will feel like implementing this correctly), and most Windows users have
updated (or went to Linux/BSD/other ^_^), then we should switch to all UTF-8 by
default.
Until then, as said, the migration should be easier for the user... it means we
should also take care of run-time configuration, maybe with a new USE flag, to
preconfigure every programs which support UTF-8, to use it by default (instead
of just supporting UTF-8, with the "unicode" USE-flag). Of course, the kernel
and locales should also be configured automatically.
Then, another great thing would be to be able to easily press the backpedal and
goes back to ISO/old charset, if too many incompatibility problems arise... (if
it's hard to come back, and they have problems, people will hate UTF-8 for years
:/ -and they'll tell other).
Instead of an USE-flag for default preconfiguration, maybe a script could take
care of modifying the user programs configuration, to easily activate and
deactivate UTF-8 by default, in every supported programs... (well, the script
would also allow to configure programs one at a time).
We should also grealty document the migration process and the incompatibility
problems which may arise if you try to communicate with people using ISO/old
charsets... (and grealty explain them the use of `iconv` and similar programs).
Another thing is to list every packages which do not support UTF-8... (editors,
like NEdit, for example), and maybe make `emerge` warn the user, if he uses
UTF-8, that the package he is trying to emerge, do not support UTF-8... (well,
an ewarn, at the end of the emerging process would be enough, but to be warned
before emerging the package would be nice).
Anyway, we should separate "activating support for Unicode/UTF-8" and
"activating use of UTF-8, by default, in every programs", and we should support
both...
If they can be cleanly be separated, we should probably activate support for
Unicode/UTF-8 by default, now... but as said, activating use of UTF-8 by default
for all program should not be done until quite some time...
Cya.
--
gentoo-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
` (5 preceding siblings ...)
2006-03-08 7:43 ` [gentoo-dev] " Mathieu Bonnet
@ 2006-03-09 20:25 ` Kevin F. Quinn (Gentoo)
2006-03-11 20:29 ` Eldad Zack
7 siblings, 0 replies; 21+ messages in thread
From: Kevin F. Quinn (Gentoo) @ 2006-03-09 20:25 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]
On Tue, 28 Feb 2006 11:58:03 +0100
Patrick Lauer <patrick@gentoo.org> wrote:
> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration.
>
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?
Enabling support for utf-8 should be fine, but I'd like to sound a note
of caution about using a utf-8 locale as a system-wide setting. Since
UTF-8 contains "holes" in the representation (i.e. some sequences of
8-bit values are invalid), when something is asked to parse such
invalid data unexpected results can ensue.
For an example, see bug #125375 - it turns out that invalid sequences
do not match '.' in sed regular expressions (sed-4.1.4). The other gnu
tools probably behave similarly. Up to a point this is in line with the
UTF-8 spec, which says, "When a process interprets a code unit sequence
which purports to be in a Unicode character encoding form, it shall
treat ill-formed code unit sequences as an error condition, and shall
not interpret such sequences as characters." (chapter 3 para 2 rule
C12a). This clearly means that the invalid bytes cannot match "." (or
anything else for that matter). However sed should either generate an
error, filter the illegal bytes out of its input, or replace them with
a marker (replacement character) - instead it leaves the non-conformant
bytes alone.
--
Kevin F. Quinn
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [gentoo-dev] enable UTF8 per default?
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
` (6 preceding siblings ...)
2006-03-09 20:25 ` [gentoo-dev] " Kevin F. Quinn (Gentoo)
@ 2006-03-11 20:29 ` Eldad Zack
7 siblings, 0 replies; 21+ messages in thread
From: Eldad Zack @ 2006-03-11 20:29 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]
On Tuesday 28 February 2006 12:58, Patrick Lauer wrote:
> Hi all,
>
> at FOSDEM we had a nice discussion about languages, translations etc.
> Having people from the US (wolf31o2) who never have problems and people
> from Japan (usata) who always have problems with encodings /
> charsets / ... was quite interesting.
>
> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration.
>
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?
I've been hit by a bug in egroupware that's related to unicode.
unicode-enabled mysql reserves string keys multiplied by 3, egroupware
assumes (wrongly) that it won't cross the 1000 bytes key length boundry...
But that's really not a big deal.
--
Eldad Zack <eldad@gentoo.org>
Key/Fingerprint at pgp.mit.edu, ID 0x96EA0A93
[-- Attachment #2: Type: application/pgp-signature, Size: 200 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-03-11 20:15 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
2006-02-28 11:47 ` Patrick Lauer
2006-02-28 12:11 ` Diego 'Flameeyes' Pettenò
2006-02-28 14:27 ` Mike Frysinger
2006-02-28 12:50 ` Lars Weiler
2006-02-28 13:50 ` Patrick Lauer
2006-02-28 14:46 ` Joseph Jezak
2006-02-28 16:24 ` Kalin KOZHUHAROV
2006-03-04 12:46 ` Alexander Simonov
2006-03-04 20:13 ` Kalin KOZHUHAROV
2006-02-28 16:51 ` Josh
2006-02-28 17:47 ` solar
2006-02-28 17:53 ` Ciaran McCreesh
2006-02-28 18:25 ` Bryan Østergaard
2006-02-28 19:18 ` Kevin F. Quinn (Gentoo)
2006-02-28 20:23 ` solar
2006-02-28 23:51 ` Bjarke Istrup Pedersen
2006-03-08 7:43 ` [gentoo-dev] " Mathieu Bonnet
2006-03-09 20:25 ` [gentoo-dev] " Kevin F. Quinn (Gentoo)
2006-03-11 20:29 ` Eldad Zack
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox