public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] The tree is now utf-8 clean
@ 2005-09-17  1:42 Ciaran McCreesh
  2005-09-17 10:56 ` Fernando J. Pereda
  2005-09-28 18:25 ` Ciaran McCreesh
  0 siblings, 2 replies; 9+ messages in thread
From: Ciaran McCreesh @ 2005-09-17  1:42 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]

The tree is now utf-8 clean. Or it is to the extent that a computer can
reasonably determine... If the relevant people are prepared to smack
anyone who refuses to play nice then now would be a good time to
unwithdraw GLEP 31, make compliance mandatory and add glep31check [1] to
repoman or server-side.

There are still a few instances of munged character sequences that
happen to also be valid UTF-8. If you come across one, feel free to fix
it.

If you have weird characters in your name, please make especially sure
that you're getting your ChangeLog name right. These are far more common
than occasional user credit ChangeLog entries. Also, if your name on the
devlist [2] isn't accented, pester someone to update it.

Something strange I noticed... Some people are using funny quotes and
non breaking spaces in ebuilds. Some people are using weird characters
as substitution delimiters for sed. Don't! It will break on many
systems. I'm going to go and purge all of those, UTF-8 or not, whenever
my brain recovers.

As far as editor support... On those really rare occasions when you need
to enter UTF-8 text in ebuilds, vim, emacs and nano should all more or
less work. For ChangeLogs, echangelog is utf-8 transparent, meaning if
you run it from a UTF-8 terminal it should be ok. We have a guide [3] if
you want to know more...

[1]: http://dev.gentoo.org/~ciaranm/toys/glep31check-0.3.3.tar.bz2
[2]: http://www.gentoo.org/proj/en/devrel/roll-call/userinfo.xml
[3]: http://www.gentoo.org/doc/en/utf-8.xml

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-17  1:42 [gentoo-dev] The tree is now utf-8 clean Ciaran McCreesh
@ 2005-09-17 10:56 ` Fernando J. Pereda
  2005-09-17 12:47   ` [gentoo-dev] " Dan Meltzer
  2005-09-17 17:15   ` [gentoo-dev] " Ciaran McCreesh
  2005-09-28 18:25 ` Ciaran McCreesh
  1 sibling, 2 replies; 9+ messages in thread
From: Fernando J. Pereda @ 2005-09-17 10:56 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
| Something strange I noticed... Some people are using funny quotes and
| non breaking spaces in ebuilds. Some people are using weird characters
| as substitution delimiters for sed. Don't! It will break on many
| systems. I'm going to go and purge all of those, UTF-8 or not, whenever
| my brain recovers.

I hope ~ is not considered a weird character... if it is, tell me and
I'll fix all my ebuilds.

Cheers,
Ferdy

-- 
Fernando J. Pereda Garcimartín
Gentoo Developer (Alpha,net-mail)
20BB BDC3 761A 4781 E6ED  ED0B 0A48 5B0C 60BD 28D4

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [gentoo-dev] Re: The tree is now utf-8 clean
  2005-09-17 10:56 ` Fernando J. Pereda
@ 2005-09-17 12:47   ` Dan Meltzer
  2005-09-17 17:15   ` [gentoo-dev] " Ciaran McCreesh
  1 sibling, 0 replies; 9+ messages in thread
From: Dan Meltzer @ 2005-09-17 12:47 UTC (permalink / raw
  To: gentoo-dev

Assuming, as I do... that ~arch is utf-8 clean, it must not be that
wierd a character, and therefore, probably acceptable for sed also.

On 9/17/05, Fernando J. Pereda <ferdy@gentoo.org> wrote:
> On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> | Something strange I noticed... Some people are using funny quotes and
> | non breaking spaces in ebuilds. Some people are using weird characters
> | as substitution delimiters for sed. Don't! It will break on many
> | systems. I'm going to go and purge all of those, UTF-8 or not, whenever
> | my brain recovers.
> 
> I hope ~ is not considered a weird character... if it is, tell me and
> I'll fix all my ebuilds.
> 
> Cheers,
> Ferdy
> 
> -- 
> Fernando J. Pereda Garcimartín
> Gentoo Developer (Alpha,net-mail)
> 20BB BDC3 761A 4781 E6ED  ED0B 0A48 5B0C 60BD 28D4
> 
>

-- 
gentoo-dev@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-17 10:56 ` Fernando J. Pereda
  2005-09-17 12:47   ` [gentoo-dev] " Dan Meltzer
@ 2005-09-17 17:15   ` Ciaran McCreesh
  2005-09-17 17:24     ` Ciaran McCreesh
  2005-09-17 20:06     ` Mike Frysinger
  1 sibling, 2 replies; 9+ messages in thread
From: Ciaran McCreesh @ 2005-09-17 17:15 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
<ferdy@gentoo.org> wrote:
| On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
| | Something strange I noticed... Some people are using funny quotes
| | and non breaking spaces in ebuilds. Some people are using weird
| | characters as substitution delimiters for sed. Don't! It will break
| | on many systems. I'm going to go and purge all of those, UTF-8 or
| | not, whenever my brain recovers.
| 
| I hope ~ is not considered a weird character... if it is, tell me and
| I'll fix all my ebuilds.

No, ~ is fine. Anything with a value below 127 (don't use 127, it's
weird) that sed accepts is ok. There are some ebuilds that use that
curly paragraph marker character (§) and weird curly quotes. Those're
the ones that cause problems.

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-17 17:15   ` [gentoo-dev] " Ciaran McCreesh
@ 2005-09-17 17:24     ` Ciaran McCreesh
  2005-09-17 20:06     ` Mike Frysinger
  1 sibling, 0 replies; 9+ messages in thread
From: Ciaran McCreesh @ 2005-09-17 17:24 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Sat, 17 Sep 2005 18:15:31 +0100 Ciaran McCreesh <ciaranm@gentoo.org>
wrote:
| No, ~ is fine. Anything with a value below 127 (don't use 127, it's
| weird) that sed accepts is ok. There are some ebuilds that use that
| curly paragraph marker character (§) and weird curly quotes. Those're
| the ones that cause problems.

Uhm, where by 127 I of course mean 128...

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-17 17:15   ` [gentoo-dev] " Ciaran McCreesh
  2005-09-17 17:24     ` Ciaran McCreesh
@ 2005-09-17 20:06     ` Mike Frysinger
  2005-09-19  9:52       ` Paul de Vrieze
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Frysinger @ 2005-09-17 20:06 UTC (permalink / raw
  To: gentoo-dev

On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote:
> On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
>
> <ferdy@gentoo.org> wrote:
> | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> | | Something strange I noticed... Some people are using funny quotes
> | | and non breaking spaces in ebuilds. Some people are using weird
> | | characters as substitution delimiters for sed. Don't! It will break
> | | on many systems. I'm going to go and purge all of those, UTF-8 or
> | | not, whenever my brain recovers.
> |
> | I hope ~ is not considered a weird character... if it is, tell me and
> | I'll fix all my ebuilds.
>
> No, ~ is fine. Anything with a value below 127 (don't use 127, it's
> weird) that sed accepts is ok.

in other words, ASCII characters are OK.  if in doubt, just run `man ascii` 
and see if your character is in the table
-mike
-- 
gentoo-dev@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-17 20:06     ` Mike Frysinger
@ 2005-09-19  9:52       ` Paul de Vrieze
  2005-09-19 10:43         ` Georgi Georgiev
  0 siblings, 1 reply; 9+ messages in thread
From: Paul de Vrieze @ 2005-09-19  9:52 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Saturday 17 September 2005 22:06, Mike Frysinger wrote:
> On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote:
> > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
> >
> > <ferdy@gentoo.org> wrote:
> > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> > | | Something strange I noticed... Some people are using funny quotes
> > | | and non breaking spaces in ebuilds. Some people are using weird
> > | | characters as substitution delimiters for sed. Don't! It will
> > | | break on many systems. I'm going to go and purge all of those,
> > | | UTF-8 or not, whenever my brain recovers.
> > |
> > | I hope ~ is not considered a weird character... if it is, tell me
> > | and I'll fix all my ebuilds.
> >
> > No, ~ is fine. Anything with a value below 127 (don't use 127, it's
> > weird) that sed accepts is ok.
>
> in other words, ASCII characters are OK.  if in doubt, just run `man
> ascii` and see if your character is in the table

You probably don't want to use the ascii control characters either 
(anything below 32), although they should not give issues with people 
they could cause havoc for terminals or annoy people (using the BELL 
character as sed separator).

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-19  9:52       ` Paul de Vrieze
@ 2005-09-19 10:43         ` Georgi Georgiev
  0 siblings, 0 replies; 9+ messages in thread
From: Georgi Georgiev @ 2005-09-19 10:43 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

maillog: 19/09/2005-11:52:26(+0200): Paul de Vrieze types
> On Saturday 17 September 2005 22:06, Mike Frysinger wrote:
> > On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote:
> > > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
> > >
> > > <ferdy@gentoo.org> wrote:
> > > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> > > | | Something strange I noticed... Some people are using funny quotes
> > > | | and non breaking spaces in ebuilds. Some people are using weird
> > > | | characters as substitution delimiters for sed. Don't! It will
> > > | | break on many systems. I'm going to go and purge all of those,
> > > | | UTF-8 or not, whenever my brain recovers.
> > > |
> > > | I hope ~ is not considered a weird character... if it is, tell me
> > > | and I'll fix all my ebuilds.
> > >
> > > No, ~ is fine. Anything with a value below 127 (don't use 127, it's
> > > weird) that sed accepts is ok.
> >
> > in other words, ASCII characters are OK.  if in doubt, just run `man
> > ascii` and see if your character is in the table
> 
> You probably don't want to use the ascii control characters either 
> (anything below 32), although they should not give issues with people 
> they could cause havoc for terminals or annoy people (using the BELL 
> character as sed separator).

Um, I guess everybody got the point. In fact, you probably shouldn't use
alphanumerics either -- they work, but are as ugly as...
echo herr | sed -e sorolog

-- 
(*   Georgi Georgiev   (* They can always run stderr through uniq. :-) (*
*)    chutz@gg3.net    *) -- Larry Wall in                             *)
(*  +81(90)2877-8845   (* <199704012331.PAA16535@wall.org>             (*

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] The tree is now utf-8 clean
  2005-09-17  1:42 [gentoo-dev] The tree is now utf-8 clean Ciaran McCreesh
  2005-09-17 10:56 ` Fernando J. Pereda
@ 2005-09-28 18:25 ` Ciaran McCreesh
  1 sibling, 0 replies; 9+ messages in thread
From: Ciaran McCreesh @ 2005-09-28 18:25 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

On Sat, 17 Sep 2005 02:42:09 +0100 Ciaran McCreesh <ciaranm@gentoo.org>
wrote:
| The tree is now utf-8 clean.

...and now it isn't.

app-benchmarks/ltp/ChangeLog
  Bad character 0x0a inside UTF-8 sequence (2/4) at line 8 offset 9

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail            : ciaranm at gentoo.org
Web             : http://dev.gentoo.org/~ciaranm


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-09-28 18:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-17  1:42 [gentoo-dev] The tree is now utf-8 clean Ciaran McCreesh
2005-09-17 10:56 ` Fernando J. Pereda
2005-09-17 12:47   ` [gentoo-dev] " Dan Meltzer
2005-09-17 17:15   ` [gentoo-dev] " Ciaran McCreesh
2005-09-17 17:24     ` Ciaran McCreesh
2005-09-17 20:06     ` Mike Frysinger
2005-09-19  9:52       ` Paul de Vrieze
2005-09-19 10:43         ` Georgi Georgiev
2005-09-28 18:25 ` Ciaran McCreesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox