public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Kevin F. Quinn (Gentoo)" <kevquinn@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] enable UTF8 per default?
Date: Thu, 9 Mar 2006 21:25:11 +0100	[thread overview]
Message-ID: <20060309212511.2b92a73d@c1358217.kevquinn.com> (raw)
In-Reply-To: <1141124283.7962.74.camel@localhost>

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Tue, 28 Feb 2006 11:58:03 +0100
Patrick Lauer <patrick@gentoo.org> wrote:

> During that discussion we realized that having utf-8 not enabled by
> default and no utf8 fonts available by default causes lots of
> recompilation and reconfiguration. 
> 
> Enabling the unicode useflag in the profiles should help our
> international users and should not cause any problems. Are there any
> known bugs / problems this would trigger? Any reasons against that?

Enabling support for utf-8 should be fine, but I'd like to sound a note
of caution about using a utf-8 locale as a system-wide setting.  Since
UTF-8 contains "holes" in the representation (i.e. some sequences of
8-bit values are invalid), when something is asked to parse such
invalid data unexpected results can ensue.

For an example, see bug #125375 - it turns out that invalid sequences
do not match '.' in sed regular expressions (sed-4.1.4).  The other gnu
tools probably behave similarly.  Up to a point this is in line with the
UTF-8 spec, which says, "When a process interprets a code unit sequence
which purports to be in a Unicode character encoding form, it shall
treat ill-formed code unit sequences as an error condition, and shall
not interpret such sequences as characters." (chapter 3 para 2 rule
C12a).  This clearly means that the invalid bytes cannot match "." (or
anything else for that matter).  However sed should either generate an
error, filter the illegal bytes out of its input, or replace them with
a marker (replacement character) - instead it leaves the non-conformant
bytes alone.

-- 
Kevin F. Quinn

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

  parent reply	other threads:[~2006-03-09 20:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-28 10:58 [gentoo-dev] enable UTF8 per default? Patrick Lauer
2006-02-28 11:32 ` Diego 'Flameeyes' Pettenò
2006-02-28 11:47   ` Patrick Lauer
2006-02-28 12:11     ` Diego 'Flameeyes' Pettenò
2006-02-28 14:27     ` Mike Frysinger
2006-02-28 12:50 ` Lars Weiler
2006-02-28 13:50   ` Patrick Lauer
2006-02-28 14:46     ` Joseph Jezak
2006-02-28 16:24   ` Kalin KOZHUHAROV
2006-03-04 12:46     ` Alexander Simonov
2006-03-04 20:13       ` Kalin KOZHUHAROV
2006-02-28 16:51 ` Josh
2006-02-28 17:47 ` solar
2006-02-28 17:53   ` Ciaran McCreesh
2006-02-28 18:25   ` Bryan Østergaard
2006-02-28 19:18   ` Kevin F. Quinn (Gentoo)
2006-02-28 20:23     ` solar
2006-02-28 23:51 ` Bjarke Istrup Pedersen
2006-03-08  7:43 ` [gentoo-dev] " Mathieu Bonnet
2006-03-09 20:25 ` Kevin F. Quinn (Gentoo) [this message]
2006-03-11 20:29 ` [gentoo-dev] " Eldad Zack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060309212511.2b92a73d@c1358217.kevquinn.com \
    --to=kevquinn@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox