* Fw: [gentoo-portage-dev] changelog encoding
@ 2004-10-08 15:29 Marius Mauch
2004-10-08 16:26 ` Luke-Jr
0 siblings, 1 reply; 11+ messages in thread
From: Marius Mauch @ 2004-10-08 15:29 UTC (permalink / raw
To: gentoo-portage-dev
[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]
Begin forwarded message:
Date: Fri, 8 Oct 2004 13:56:17 +0100
From: Ciaran McCreesh <ciaranm@gentoo.org>
To: Marius Mauch <genone@gentoo.org>
Cc: gentoo-portage-dev@lists.gentoo.org
Subject: Re: [gentoo-portage-dev] changelog encoding
[ not sure if I can post to gentoo-portage-dev, please forward on if
not... ]
On Fri, 8 Oct 2004 14:31:52 +0200 Marius Mauch <genone@gentoo.org>
wrote:
| On 10/07/04 Brian wrote:
| > What is the official encoding method(s) for the changelogs. It has
| > been reported that porthole often fails getting the changelogs due
| > to the encoding. Currently it is assuming ascii. Many are
| > reported to be iso-8859-1.
|
| I don't think we have an official encoding, but I think ciaranm knows
| a bit more about that issue.
Yup. We *need* to have an official encoding. Reason being, at least one
developer has a non-(ASCII as in characters 0..126 only) character in
their name. Said encoding should also apply to ebuilds, but not to
files/ entries (I could give the lengthy explanation if anyone really
wants to know, but basically certain things would break).
I've been whinging about this on and off for about a year now, and every
time it's been dismissed as irrelevant :)
If we're going to standardise on an encoding, it's got to be UTF-8.
iso-8859-1 is not sufficient to represent every developer (and potential
patch contributor)'s name correctly. UTF-16 and plain old four byte
unicode aren't compatible with our existing files (in UTF-8, characters
1 to 126 are the same as in regular ASCII). Yes, UTF-8 kinda sucks in
terms of space when encoding japanese or russian characters, but since
these will be a rare occurance it's not really a problem.
--
Ciaran McCreesh : Gentoo Developer (Sparc, MIPS, Vim, Fluxbox)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fw: [gentoo-portage-dev] changelog encoding
2004-10-08 15:29 Fw: [gentoo-portage-dev] changelog encoding Marius Mauch
@ 2004-10-08 16:26 ` Luke-Jr
2004-10-08 16:40 ` Ciaran McCreesh
0 siblings, 1 reply; 11+ messages in thread
From: Luke-Jr @ 2004-10-08 16:26 UTC (permalink / raw
To: gentoo-portage-dev; +Cc: Ciaran McCreesh
[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]
On Friday 08 October 2004 3:29 pm, Marius Mauch wrote:
> On Fri, 8 Oct 2004 14:31:52 +0200 Marius Mauch <genone@gentoo.org>
>
> wrote:
> | On 10/07/04 Brian wrote:
> | > What is the official encoding method(s) for the changelogs. It has
> | > been reported that porthole often fails getting the changelogs due
> | > to the encoding. Currently it is assuming ascii. Many are
> | > reported to be iso-8859-1.
> |
> | I don't think we have an official encoding, but I think ciaranm knows
> | a bit more about that issue.
>
> Yup. We *need* to have an official encoding. Reason being, at least one
> developer has a non-(ASCII as in characters 0..126 only)
ASCII defines 128 characters: 0-127
Why cut the last off?
> character in their name. Said encoding should also apply to ebuilds, but not
> to files/ entries (I could give the lengthy explanation if anyone really
> wants to know, but basically certain things would break).
>
> I've been whinging about this on and off for about a year now, and every
> time it's been dismissed as irrelevant :)
>
> If we're going to standardise on an encoding, it's got to be UTF-8.
> iso-8859-1 is not sufficient to represent every developer (and potential
> patch contributor)'s name correctly. UTF-16 and plain old four byte
> unicode aren't compatible with our existing files (in UTF-8, characters
> 1 to 126 are the same as in regular ASCII). Yes, UTF-8 kinda sucks in
> terms of space when encoding japanese or russian characters, but since
> these will be a rare occurance it's not really a problem.
Yay for UTF-8!
--
Luke-Jr
Developer, Utopios
http://utopios.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-08 16:26 ` Luke-Jr
@ 2004-10-08 16:40 ` Ciaran McCreesh
2004-10-09 1:05 ` Ed Grimm
0 siblings, 1 reply; 11+ messages in thread
From: Ciaran McCreesh @ 2004-10-08 16:40 UTC (permalink / raw
To: Luke-Jr; +Cc: gentoo-portage-dev
[-- Attachment #1: Type: text/plain, Size: 751 bytes --]
[ hopefully on the list and able to post to it now, if not please
forward ]
On Fri, 8 Oct 2004 16:26:33 +0000 Luke-Jr <luke-jr@utopios.org> wrote:
| > Yup. We *need* to have an official encoding. Reason being, at least
| > one developer has a non-(ASCII as in characters 0..126 only)
|
| ASCII defines 128 characters: 0-127
| Why cut the last off?
As I recall, 127 is flaky. Come to think of it, so is 0-31ish as well.
So maybe I should've said [a-zA-Z0-9\-_,.<>?/\\;:'@#~\]{}\+="$%^&* ] or
something... Basically, anything even the slightest bit flaky, plus
newlines, is prone to explode.
--
Ciaran McCreesh : Gentoo Developer (Sparc, MIPS, Vim, Fluxbox)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-08 16:40 ` Ciaran McCreesh
@ 2004-10-09 1:05 ` Ed Grimm
2004-10-09 13:53 ` Ciaran McCreesh
0 siblings, 1 reply; 11+ messages in thread
From: Ed Grimm @ 2004-10-09 1:05 UTC (permalink / raw
To: gentoo-portage-dev
On Fri, 8 Oct 2004, Ciaran McCreesh wrote:
> On Fri, 8 Oct 2004 16:26:33 +0000 Luke-Jr <luke-jr@utopios.org> wrote:
>|> Yup. We *need* to have an official encoding. Reason being, at least
>|> one developer has a non-(ASCII as in characters 0..126 only)
>|
>| ASCII defines 128 characters: 0-127
>| Why cut the last off?
>
> As I recall, 127 is flaky. Come to think of it, so is 0-31ish as well.
> So maybe I should've said [a-zA-Z0-9\-_,.<>?/\\;:'@#~\]{}\+="$%^&* ] or
> something... Basically, anything even the slightest bit flaky, plus
> newlines, is prone to explode.
I hope you like long lines.
Might I propose [\n\t\r -~]?
However, I suggest that any programs that cannot deal with characters
0-127 are broken and should be fixed. Admittedly, outputing 27 to the
terminal raw could trigger a security hole in many terminals, and
several other terminals have problems with another character (I'm
vaguely thinking 17, but not sure.)
(The preceeding paragraph should not be taken as an alternate solution,
but one to combine with restricting characters to the above recommended
9, 10, 13, 32-126.)
Ed
--
gentoo-portage-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-09 1:05 ` Ed Grimm
@ 2004-10-09 13:53 ` Ciaran McCreesh
2004-10-09 15:24 ` Brian
0 siblings, 1 reply; 11+ messages in thread
From: Ciaran McCreesh @ 2004-10-09 13:53 UTC (permalink / raw
To: gentoo-portage-dev
[-- Attachment #1: Type: text/plain, Size: 880 bytes --]
On Fri, 8 Oct 2004 20:05:16 -0500 (EST) Ed Grimm
<paranoid@gentoo.evolution.tgape.org> wrote:
| On Fri, 8 Oct 2004, Ciaran McCreesh wrote:
| > As I recall, 127 is flaky. Come to think of it, so is 0-31ish as
| > well. So maybe I should've said
| > [a-zA-Z0-9\-_,.<>?/\\;:'@#~\]{}\+="$%^&* ] or something...
| > Basically, anything even the slightest bit flaky, plus newlines, is
| > prone to explode.
|
| I hope you like long lines.
Well, remember that there's no standard way of doing newlines. Or that
there're at least three standards, depending upon how you look at it.
Eh, not that this is really relevant anyway. All that matters is that
ASCII is insufficient and that UTF-8 is most likely the best
alternative.
--
Ciaran McCreesh : Gentoo Developer (Sparc, MIPS, Vim, Fluxbox)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-09 13:53 ` Ciaran McCreesh
@ 2004-10-09 15:24 ` Brian
2004-10-09 17:10 ` Ciaran McCreesh
0 siblings, 1 reply; 11+ messages in thread
From: Brian @ 2004-10-09 15:24 UTC (permalink / raw
To: gentoo-portage-dev
On Sat, 2004-10-09 at 06:53, Ciaran McCreesh wrote:
> On Fri, 8 Oct 2004 20:05:16 -0500 (EST) Ed Grimm
> <paranoid@gentoo.evolution.tgape.org> wrote:
> | On Fri, 8 Oct 2004, Ciaran McCreesh wrote:
> | > As I recall, 127 is flaky. Come to think of it, so is 0-31ish as
> | > well. So maybe I should've said
> | > [a-zA-Z0-9\-_,.<>?/\\;:'@#~\]{}\+="$%^&* ] or something...
> | > Basically, anything even the slightest bit flaky, plus newlines, is
> | > prone to explode.
> |
> | I hope you like long lines.
>
> Well, remember that there's no standard way of doing newlines. Or that
> there're at least three standards, depending upon how you look at it.
>
> Eh, not that this is really relevant anyway. All that matters is that
> ASCII is insufficient and that UTF-8 is most likely the best
> alternative.
Thank you.
I have changed porthole's code to try decode(utf_8), then try
decode(iso-8859-1). Failing either of those Porthole will display an
unknown encoding error and to please report it to bugs.gentoo.org as
well as porthole's bug tracker.
--
Brian <dol-sen@telus.net>
--
gentoo-portage-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-09 15:24 ` Brian
@ 2004-10-09 17:10 ` Ciaran McCreesh
2004-10-09 17:38 ` Marius Mauch
0 siblings, 1 reply; 11+ messages in thread
From: Ciaran McCreesh @ 2004-10-09 17:10 UTC (permalink / raw
To: gentoo-portage-dev
[-- Attachment #1: Type: text/plain, Size: 766 bytes --]
On Sat, 09 Oct 2004 08:24:38 -0700 Brian <dol-sen@telus.net> wrote:
| I have changed porthole's code to try decode(utf_8), then try
| decode(iso-8859-1). Failing either of those Porthole will display an
| unknown encoding error and to please report it to bugs.gentoo.org as
| well as porthole's bug tracker.
Since there's no documented standard... Could one of you portage people
please post a message to -dev saying "as of now, use UTF-8 for ebuilds
and changelogs if at all possible"? I can update app-vim/gentoo-syntax
and the default vimrc to encourange fileencoding to be uft-8 for these
things...
--
Ciaran McCreesh : Gentoo Developer (Sparc, MIPS, Vim, Fluxbox)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-09 17:10 ` Ciaran McCreesh
@ 2004-10-09 17:38 ` Marius Mauch
0 siblings, 0 replies; 11+ messages in thread
From: Marius Mauch @ 2004-10-09 17:38 UTC (permalink / raw
To: gentoo-portage-dev
[-- Attachment #1: Type: text/plain, Size: 869 bytes --]
On 10/09/04 Ciaran McCreesh wrote:
> On Sat, 09 Oct 2004 08:24:38 -0700 Brian <dol-sen@telus.net> wrote:
> | I have changed porthole's code to try decode(utf_8), then try
> | decode(iso-8859-1). Failing either of those Porthole will display
> | an unknown encoding error and to please report it to bugs.gentoo.org
> | as well as porthole's bug tracker.
>
> Since there's no documented standard... Could one of you portage
> people please post a message to -dev saying "as of now, use UTF-8 for
> ebuilds and changelogs if at all possible"? I can update
> app-vim/gentoo-syntax and the default vimrc to encourange fileencoding
> to be uft-8 for these things...
No problem for ChangeLogs, but for ebuilds we probably need to make a
difference between the portage-visible parts and the normal bash stuff
as portage itself can only handle ascii internally.
Marius
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* [gentoo-portage-dev] changelog encoding
@ 2004-10-08 4:19 Brian
2004-10-08 12:31 ` Marius Mauch
2004-10-08 17:34 ` George Shapovalov
0 siblings, 2 replies; 11+ messages in thread
From: Brian @ 2004-10-08 4:19 UTC (permalink / raw
To: gentoo-portage-dev
What is the official encoding method(s) for the changelogs. It has
been reported that porthole often fails getting the changelogs due to
the encoding. Currently it is assuming ascii. Many are reported to be
iso-8859-1.
--
Brian <dol-sen@telus.net>
--
gentoo-portage-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-08 4:19 Brian
@ 2004-10-08 12:31 ` Marius Mauch
2004-10-08 17:34 ` George Shapovalov
1 sibling, 0 replies; 11+ messages in thread
From: Marius Mauch @ 2004-10-08 12:31 UTC (permalink / raw
To: gentoo-portage-dev; +Cc: ciaranm
[-- Attachment #1: Type: text/plain, Size: 367 bytes --]
On 10/07/04 Brian wrote:
> What is the official encoding method(s) for the changelogs. It has
> been reported that porthole often fails getting the changelogs due to
> the encoding. Currently it is assuming ascii. Many are reported to
> be iso-8859-1.
I don't think we have an official encoding, but I think ciaranm knows a
bit more about that issue.
Marius
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-portage-dev] changelog encoding
2004-10-08 4:19 Brian
2004-10-08 12:31 ` Marius Mauch
@ 2004-10-08 17:34 ` George Shapovalov
1 sibling, 0 replies; 11+ messages in thread
From: George Shapovalov @ 2004-10-08 17:34 UTC (permalink / raw
To: gentoo-portage-dev
Well, all the docs are gotta be UTF-8, and that was official since that big
doc reorganization back 2+ years ago :). This would be a good precedent to
apply the same policy" for the rest of the dok-like and even generic stuff
(why not even ebuilds). Besides, while I don't this this was "officially
stamped" yet, I seem to remember UTF-8 mentioned in extended
metadata/ChangeLog discussions..
George
On Thursday 07 October 2004 21:19, Brian wrote:
> What is the official encoding method(s) for the changelogs. It has
> been reported that porthole often fails getting the changelogs due to
> the encoding. Currently it is assuming ascii. Many are reported to be
> iso-8859-1.
--
gentoo-portage-dev@gentoo.org mailing list
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-10-09 17:38 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-08 15:29 Fw: [gentoo-portage-dev] changelog encoding Marius Mauch
2004-10-08 16:26 ` Luke-Jr
2004-10-08 16:40 ` Ciaran McCreesh
2004-10-09 1:05 ` Ed Grimm
2004-10-09 13:53 ` Ciaran McCreesh
2004-10-09 15:24 ` Brian
2004-10-09 17:10 ` Ciaran McCreesh
2004-10-09 17:38 ` Marius Mauch
-- strict thread matches above, loose matches on Subject: below --
2004-10-08 4:19 Brian
2004-10-08 12:31 ` Marius Mauch
2004-10-08 17:34 ` George Shapovalov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox