[gentoo-user] Kernel update messed up console encoding

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] Kernel update messed up console encoding
@ 2009-02-27 17:29 Florian v. Savigny
  2009-02-27 21:05 ` Sebastian Günther
  0 siblings, 1 reply; 14+ messages in thread
From: Florian v. Savigny @ 2009-02-27 17:29 UTC (permalink / raw
  To: gentoo-user

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]

Dear listmates,

(I did try to use a more specific mailing list, and tried
gentoo-admin, but it seems there's nobody around.)

I recently updated my kernel from 2.6.17 to 2.6.27, and it seems that
the new kernel causes the encoding of the console to behave weird: 

I used to use the default Unix encoding, i.e. iso-8859-1, because this
was fine for German (now I want to stick to it because I have so much
legacy material in that encoding).  Now, when I type a string with
Non-ASCII characters on the commandline, it looks normal, but when I
redirect this to a file, the file command identifies the contents of
that file (correctly, it seems to me) as UTF-8. When I boot the old
kernel (which I kept), the same procedure results in a file identified
as iso-8859-1 (and with accordingly fewer bytes). Here are the
contents (the same sentence):

Kernel 2.6.17:

"Ich kann es außerdem nicht ändern"

Kernel 2.6.27:

"Ich kann es auÃŸerdem nicht Ã¤ndern"

I grepped the .config files for any options that might have a bearing
on this. The only difference I found was in the first of these four
lines:

linux-2.6.17:

# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y

linux-2.6.27

CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y

So I set $CONFIG_NLS_ASCII differently for the new kernel. But as far
as I understand, these refer to the handling of file names (it's in
the section "file systems"), and only specify what is supported, so I
don't see how this could have an effect on console encoding.

The only thing I am dead sure about is that the kernel itself must be
the culprit, because when I boot the old kernel, this behaviour goes
away. There is absolutely no change in the system otherwise. (The
$UNICODE variable in /etc/rc.conf is set to "no".)

Can anyone give me a hint where to look what I have messed up? Emacs,
which I sometimes like to use on the console, is particularly
uncomfortable with this, and I seem to write confusing e-mails.

Many thanks in advance for any hint,

Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-27 17:29 [gentoo-user] Kernel update messed up console encoding Florian v. Savigny
@ 2009-02-27 21:05 ` Sebastian Günther
  2009-02-28 10:34   ` Florian v. Savigny
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Günther @ 2009-02-27 21:05 UTC (permalink / raw
  To: gentoo-user

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 2986 bytes --]

* Florian v. Savigny (lorian@fsavigny.de) [27.02.09 18:30]:
> 
> Dear listmates,
> 
> (I did try to use a more specific mailing list, and tried
> gentoo-admin, but it seems there's nobody around.)
> 
> I recently updated my kernel from 2.6.17 to 2.6.27, and it seems that
> the new kernel causes the encoding of the console to behave weird: 
> 
> I used to use the default Unix encoding, i.e. iso-8859-1, because this
> was fine for German (now I want to stick to it because I have so much
> legacy material in that encoding).  Now, when I type a string with
> Non-ASCII characters on the commandline, it looks normal, but when I
> redirect this to a file, the file command identifies the contents of
> that file (correctly, it seems to me) as UTF-8. When I boot the old
> kernel (which I kept), the same procedure results in a file identified
> as iso-8859-1 (and with accordingly fewer bytes). Here are the
> contents (the same sentence):
> 
> Kernel 2.6.17:
> 
> "Ich kann es außerdem nicht ändern"
> 
> Kernel 2.6.27:
> 
> "Ich kann es auÃŸerdem nicht Ã¤ndern"
> 
> I grepped the .config files for any options that might have a bearing
> on this. The only difference I found was in the first of these four
> lines:
> 
> linux-2.6.17:
> 
> # CONFIG_NLS_ASCII is not set
> CONFIG_NLS_ISO8859_1=y
> CONFIG_NLS_ISO8859_15=y
> CONFIG_NLS_UTF8=y
> 
> linux-2.6.27
> 
> CONFIG_NLS_ASCII=y
> CONFIG_NLS_ISO8859_1=y
> CONFIG_NLS_ISO8859_15=y
> CONFIG_NLS_UTF8=y
> 
> So I set $CONFIG_NLS_ASCII differently for the new kernel. But as far
> as I understand, these refer to the handling of file names (it's in
> the section "file systems"), and only specify what is supported, so I
> don't see how this could have an effect on console encoding.
> 
> The only thing I am dead sure about is that the kernel itself must be
> the culprit, because when I boot the old kernel, this behaviour goes
> away. There is absolutely no change in the system otherwise. (The
> $UNICODE variable in /etc/rc.conf is set to "no".)
> 
> Can anyone give me a hint where to look what I have messed up? Emacs,
> which I sometimes like to use on the console, is particularly
> uncomfortable with this, and I seem to write confusing e-mails.
> 
> Many thanks in advance for any hint,
> 
> Florian
> 
> 

Genrally speaking: switch to utf-8! There are many tools which can 
convert your files automatically.

To your issue:

Well, there still is /etc/conf.d/consolefont which could mess up things. 
Or the locales...

But the different bahavior of the two kernels is strange...
Is CONFIG_NLS_DEFAULT different of the two kernels? Maybe it's also 
related to the kernel build in keymap...

Maybe you should try the gentoo-user-de list, maybe there is someone 
whon ran into the same problem...

HTH
Sebastian

-- 
 " Religion ist das Opium des Volkes. "      Karl Marx

 SEB@STI@N GÃœNTHER         mailto:samson@guenther-roetgen.de

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-27 21:05 ` Sebastian Günther
@ 2009-02-28 10:34   ` Florian v. Savigny
  2009-02-28 11:34     ` Eray Aslan
  2009-02-28 14:26     ` Sebastian Günther
  0 siblings, 2 replies; 14+ messages in thread
From: Florian v. Savigny @ 2009-02-28 10:34 UTC (permalink / raw
  To: gentoo-user

Dear Sebastian,

thank you for your thoughts. I am afraid switching to UTF-8 for
everything, although I see that this is the sound thing to do
eventually, is not currently an option for me - there are far too many
things which depend on that.  (Also, it would tend to obscure or
complicate the problem rather than fix it, since Emacs obviously gets
confused by the console behaviour).

> there still is /etc/conf.d/consolefont that could mess up things

The only variable that's set there is CONSOLEFONT="cp1250". I would
not understand how the font could have an influence on the characters
*produced* by the console, and it seems also difficult to explain why
the shell and Emacs, which of course use the same console font, behave
differently. (Under the shell, it looks fine while you type it,
i.e. you cannot tell that your u umlaut actually consists of two
bytes. But Emacs displays the lower-case umlauts followed by a space
(i.e. two characters, but not those that most of us are probably quite
familiar with, i.e. which you see when UTF-8 is displayed as if it
were ASCII), while for upper-case umlauts and the eszett complains
that e.g. "\204 is undefined".)

It definitely looks to me as if the core of the problem is what the
console produces, not what it shows, i.e. what a keypress
produces. The variable CONSOLETRANSLATION is commented out, meaning I
am using the "default one", whichever that is.

As to the locale, where can I look that up ... ? I seem to remember I
purposely use no locale (or "C", I think), but I don't remember where
I set that.

CONFIG_NLS_DEFAULT is indeed different for the two kernels, but not in
a way that seems to explain anything, as those two encodings differ
only on a few positions (not umlauts or eszett):

linux-2.6.17-gentoo-r7:	"iso8859-15"
linux-2.6.27-gentoo-r8:	"iso8859-1"

Also, I think what I said last time holds: that only applies to
filenames in the filesystem, doesn't it?

I'll follow your suggestion and re-post the problem on gentoo-user-de,
although I think running into that sort of problem might happen to
anybody who uses a European language other than English (one of those
covered by iso-8859-1, more precisely), so comments here are still
welcome! But who still sometimes uses the console, except me?

I think I'll also write a small script that compares the settings in
the two kernel .configs systematically. Could also be of use for later
kernel updates ...

Thanks very much!

Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-28 10:34   ` Florian v. Savigny
@ 2009-02-28 11:34     ` Eray Aslan
  2009-02-28 14:26     ` Sebastian Günther
  1 sibling, 0 replies; 14+ messages in thread
From: Eray Aslan @ 2009-02-28 11:34 UTC (permalink / raw
  To: gentoo-user

On 28.02.2009 12:34, Florian v. Savigny wrote:
[...]
> I'll follow your suggestion and re-post the problem on gentoo-user-de,
> although I think running into that sort of problem might happen to
> anybody who uses a European language other than English (one of those
> covered by iso-8859-1, more precisely), so comments here are still
> welcome! But who still sometimes uses the console, except me?

A lot of people use the console.  I certainly do.  But I, and I would
assume majority of console users, switched to UTF-8 quiet some time ago
as was suggested earlier in the thread.  Hence, the lack of useful advice.

Good luck.
-- 
Eray



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-28 10:34   ` Florian v. Savigny
  2009-02-28 11:34     ` Eray Aslan
@ 2009-02-28 14:26     ` Sebastian Günther
  2009-02-28 17:38       ` Florian v. Savigny
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Günther @ 2009-02-28 14:26 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1529 bytes --]

* Florian v. Savigny (lorian@fsavigny.de) [28.02.09 11:35]:
> 
> 
> Dear Sebastian,
> 
> 
> > there still is /etc/conf.d/consolefont that could mess up things
> 
> The only variable that's set there is CONSOLEFONT="cp1250". I would
> not understand how the font could have an influence on the characters
> *produced* by the console, and it seems also difficult to explain why
> the shell and Emacs, which of course use the same console font, behave
> differently. (Under the shell, it looks fine while you type it,
> i.e. you cannot tell that your u umlaut actually consists of two
> bytes. But Emacs displays the lower-case umlauts followed by a space
> (i.e. two characters, but not those that most of us are probably quite
> familiar with, i.e. which you see when UTF-8 is displayed as if it
> were ASCII), while for upper-case umlauts and the eszett complains
> that e.g. "\204 is undefined".)
> 
what does file say about the offending files?
Emacs always uses the enconding of the file, where as an redirect uses 
the locale, iirc.

I assume you know the options->mule menu in emacs, there is a lot to 
help with encoding issues...

> As to the locale, where can I look that up ... ? I seem to remember I
> purposely use no locale (or "C", I think), but I don't remember where
> I set that.
> 
.bashrc

> 
> Thanks very much!
> 
> Florian
> 
> 

Sebastian

-- 
 " Religion ist das Opium des Volkes. "      Karl Marx

 SEB@STI@N GÜNTHER         mailto:samson@guenther-roetgen.de

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-28 14:26     ` Sebastian Günther
@ 2009-02-28 17:38       ` Florian v. Savigny
  2009-02-28 18:48         ` Sebastian Günther
  0 siblings, 1 reply; 14+ messages in thread
From: Florian v. Savigny @ 2009-02-28 17:38 UTC (permalink / raw
  To: gentoo-user

Hi Sebastian,

  > > But Emacs displays the lower-case umlauts followed by a space
  > > etc. etc. ...

  > what does file say about the offending files?

I was not actually talking about files when I mentioned Emacs, but
what I see when I *type* into Emacs (such as in this mail
message). But in case you mean what that produces when I save the
result of what I typed into a file, I ran a few tests, and the results
were mixed:

For the 3 lower-case umlauts, file reports UTF-8, consistent with the
number of bytes (i.e. the file length): 3 characters, 6 bytes. The hex
representation of the 6 bytes is: c3 a4 c3 b6 c3 3c.

For the three upper-case umlauts and for the eszett, file reports
iso-8859, also consistent with the number of bytes: 3 characters, 3
bytes. The code position is, however, definitely wrong: it is always
hex c3 (which would be the upper-case A tilde in iso-8859-1, and four
different letters can hardly have the same code position.)

To me this looks as if Emacs puts the first half of the byte sequences
(always the hex c3) into the buffer, while trying to interpret the
other half (see list below) as a command: it will say something like
"\204 is undefined". I am quite certain \nnn is an octal number.

eszett: \237 (hex 9f, dec 159)
A uml: \204 (hex 84, dec 132)
O uml: \226 (hex 96, dec 150)
Uuml: \234 (hex 9c, dec 156)

If I am right, the keys thus send:

eszett: c3 9f
A uml: c3 84
O uml: c3 96
U uml: c3 9c
a uml: c3 a4
o uml: c3 b6
u uml: c3 3c

I would assume that these sequences are the UTF-8 representation of
the respective characters (but I don't have a table to figure that
out).

Sorry if the whole thing was diffcult to follow. I should perhaps have
mentioned that for the upper-case umlauts and the eszett, Emacs not
only complains, but also inputs an "unknown" character into the
buffer, represented by a '?' in reverse video. That's apparently the
hex c3 byte.

  > Emacs always uses the enconding of the file, where as an redirect
  > uses the locale, iirc.

I know; normally it can figure it out - I think this ability is not
compromised in any way (I can e.g. open an XML file encoded in utf-8,
and will see "11u" in the mode line). Also, please note that under X,
Emacs behaves completely as before.

By "redirect", you mean shell redirection?  Does that do any character
conversion?

  > I assume you know the options->mule menu in emacs, there is a lot to
  > help with encoding issues...

Yes, I know, but I don't see how set-input-method would fix this. Do you?

  > > As to the locale, where can I look that up ... ?
  > .bashrc

Neither ~/.bashrc nor /etc/bash/bashrc contain any locale setting
... hmm.

But very frankly, would the solution not focus on the kernel, at least
partly? As I said, I can reverse the phenomenon by simply booting the
old kernel!

Does nobody know where the kernel controls what the keys of the
console keyboard send when pressed?

(BTW, KEYMAP="de-latin1-nodeadkeys", in /etc/conf.d/keymaps.)

Regards, Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-28 17:38       ` Florian v. Savigny
@ 2009-02-28 18:48         ` Sebastian Günther
  2009-03-01  9:36           ` Florian v. Savigny
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Günther @ 2009-02-28 18:48 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 3948 bytes --]

* Florian v. Savigny (lorian@fsavigny.de) [28.02.09 18:39]:
> 
> Hi Sebastian,
> 
>   > > But Emacs displays the lower-case umlauts followed by a space
>   > > etc. etc. ...
> 
>   > what does file say about the offending files?
> 
> I was not actually talking about files when I mentioned Emacs, but
> what I see when I *type* into Emacs (such as in this mail
> message). But in case you mean what that produces when I save the
> result of what I typed into a file, I ran a few tests, and the results
> were mixed:
> 
> For the 3 lower-case umlauts, file reports UTF-8, consistent with the
> number of bytes (i.e. the file length): 3 characters, 6 bytes. The hex
> representation of the 6 bytes is: c3 a4 c3 b6 c3 3c.
> 
> For the three upper-case umlauts and for the eszett, file reports
> iso-8859, also consistent with the number of bytes: 3 characters, 3
> bytes. The code position is, however, definitely wrong: it is always
> hex c3 (which would be the upper-case A tilde in iso-8859-1, and four
> different letters can hardly have the same code position.)
> 
> To me this looks as if Emacs puts the first half of the byte sequences
> (always the hex c3) into the buffer, while trying to interpret the
> other half (see list below) as a command: it will say something like
> "\204 is undefined". I am quite certain \nnn is an octal number.
> 
> eszett: \237 (hex 9f, dec 159)
> A uml: \204 (hex 84, dec 132)
> O uml: \226 (hex 96, dec 150)
> Uuml: \234 (hex 9c, dec 156)
> 
> If I am right, the keys thus send:
> 
> eszett: c3 9f
> A uml: c3 84
> O uml: c3 96
> U uml: c3 9c
> a uml: c3 a4
> o uml: c3 b6
> u uml: c3 3c
> 
> I would assume that these sequences are the UTF-8 representation of
> the respective characters (but I don't have a table to figure that
> out).
> 
> Sorry if the whole thing was diffcult to follow. I should perhaps have
> mentioned that for the upper-case umlauts and the eszett, Emacs not
> only complains, but also inputs an "unknown" character into the
> buffer, represented by a '?' in reverse video. That's apparently the
> hex c3 byte.
> 
That is a problem of the consolefont, since the console can't display it 
with cp1250...


>   > Emacs always uses the enconding of the file, where as an redirect
>   > uses the locale, iirc.
> 
> I know; normally it can figure it out - I think this ability is not
> compromised in any way (I can e.g. open an XML file encoded in utf-8,
> and will see "11u" in the mode line). Also, please note that under X,
> Emacs behaves completely as before.
> 
> By "redirect", you mean shell redirection?  Does that do any character
> conversion?

yes.

echo "äöüÄÖÜß" > console.test
then write the same in emacs and save as emacs.test.

And then compare the output of

file console.test
and
file emace.test

If there are differences, somewhere here lies the Problem

> 
>   > I assume you know the options->mule menu in emacs, there is a lot to
>   > help with encoding issues...
> 
> Yes, I know, but I don't see how set-input-method would fix this. Do you?
> 
No but set-coding-system for saving the file might help to achieve the 
right encoding.

>   > > As to the locale, where can I look that up ... ?
>   > .bashrc
> 
> Neither ~/.bashrc nor /etc/bash/bashrc contain any locale setting
> ... hmm.

locale 
should shown it to you

> 
> But very frankly, would the solution not focus on the kernel, at least
> partly? As I said, I can reverse the phenomenon by simply booting the
> old kernel!
> 
> Does nobody know where the kernel controls what the keys of the
> console keyboard send when pressed?
> 
> (BTW, KEYMAP="de-latin1-nodeadkeys", in /etc/conf.d/keymaps.)

Exactly there.

> 
> Regards, Florian
> 
> 
> 

Sebastian

-- 
 " Religion ist das Opium des Volkes. "      Karl Marx

 SEB@STI@N GÜNTHER         mailto:samson@guenther-roetgen.de

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user] Kernel update messed up console encoding
  2009-02-28 18:48         ` Sebastian Günther
@ 2009-03-01  9:36           ` Florian v. Savigny
  2009-03-01 10:30             ` [gentoo-user] " Nikos Chantziaras
  0 siblings, 1 reply; 14+ messages in thread
From: Florian v. Savigny @ 2009-03-01  9:36 UTC (permalink / raw
  To: gentoo-user

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1987 bytes --]

Hi Sebastian,

  > That is a problem of the consolefont, since the console can't display it 
  > with cp1250...

Maybe - if this font has codepage 1250, as one would assume, it should
normally display a capital A with a short accent (I think that's a
slavonic letter) in position hex c3. True, that is different from the
capital A tilde it should have in iso-8859-1. But this is hardly the
heart of the matter- the c3 shouldn't be there in the first place.

  > echo "äöüÄÖÜß" > console.test
  > then write the same in emacs and save as emacs.test.
  > 
  > And then compare the output of
  > 
  > file console.test
  > and
  > file emace.test
  > 
  > If there are differences, somewhere here lies the Problem

But I have already described the result of the first procedure in my
first posting (UTF-8 when echoed under the new kernel, iso-8859-1 when
echoed under the old kernel) and the result of the second one - IN
DETAIL - in my last posting (too long to repeat; see there), which I
assume you have read. Have I missed something?

  > locale 
  > should shown it to you

Thanks. $LANG and $LC_ALL are not set (i.e. locale simply shows
"LANG=" and "LC_ALL=" with no values). All other LC_... variables are
set to "POSIX".

  > > Does nobody know where the kernel controls what the keys of the
  > > console keyboard send when pressed?
  > > 
  > > (BTW, KEYMAP="de-latin1-nodeadkeys", in /etc/conf.d/keymaps.)
  > 
  > Exactly there.

Could you explain that, please (do you perhaps mean "this is where the
kernel's behaviour IS CONTROLLED")? As I have repeatedly said, all
variable settings are of course the same under both kernels, so both
definitely behave differently with the same settings.

Regards,
Florian

PS: Just one thing: do you think you could cite only those portions of
postings that you are replying to? Having to wade through tons of
cited material to find any replies is quite hard on the eyes,
especially when understanding one another seems to be difficult.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-user]  Re: Kernel update messed up console encoding
  2009-03-01  9:36           ` Florian v. Savigny
@ 2009-03-01 10:30             ` Nikos Chantziaras
  2009-03-01 12:25               ` Florian v. Savigny
  0 siblings, 1 reply; 14+ messages in thread
From: Nikos Chantziaras @ 2009-03-01 10:30 UTC (permalink / raw
  To: gentoo-user

Florian v. Savigny wrote:
>   > locale 
>   > should shown it to you
> 
> Thanks. $LANG and $LC_ALL are not set (i.e. locale simply shows
> "LANG=" and "LC_ALL=" with no values). All other LC_... variables are
> set to "POSIX".

I don't think that will work.  Here, locale says:

   LANG=en_US.UTF-8
   LC_CTYPE="en_US.UTF-8"
   LC_NUMERIC="en_US.UTF-8"
   LC_TIME="en_US.UTF-8"
   LC_COLLATE="en_US.UTF-8"
   LC_MONETARY="en_US.UTF-8"
   LC_MESSAGES="en_US.UTF-8"
   LC_PAPER="en_US.UTF-8"
   LC_NAME="en_US.UTF-8"
   LC_ADDRESS="en_US.UTF-8"
   LC_TELEPHONE="en_US.UTF-8"
   LC_MEASUREMENT="en_US.UTF-8"
   LC_IDENTIFICATION="en_US.UTF-8"
   LC_ALL=en_US.UTF-8

So I suppose you need something like "de_DE.ISO-8859-15@euro".  You need 
only set LANG and and LC_ALL.  The rest is derived automatically from 
those two.

To do this, edit the file /etc/env.d/02locale.  There should be only two 
lines in it:

   LC_ALL="de_DE.ISO-8859-15@euro"
   LANG="de_DE.ISO-8859-15@euro"

Substitute "ISO-8859-15" with whatever you're using.  After editing, run 
"env-update" as root.  Reboot (just to make sure) and try again.

I really recommend UTF-8 though:

   LC_ALL="en_US.UTF-8"
   LANG="en_US.UTF-8"




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user]  Re: Kernel update messed up console encoding
  2009-03-01 10:30             ` [gentoo-user] " Nikos Chantziaras
@ 2009-03-01 12:25               ` Florian v. Savigny
  2009-03-01 12:48                 ` Nikos Chantziaras
  0 siblings, 1 reply; 14+ messages in thread
From: Florian v. Savigny @ 2009-03-01 12:25 UTC (permalink / raw
  To: gentoo-user

Hi Nikos,

  > > $LANG and $LC_ALL are not set (i.e. locale simply shows
  > > "LANG=" and "LC_ALL=" with no values). All other LC_... variables are
  > > set to "POSIX".
  > 
  > I don't think that will work.

Interestingly, I just discovered the locales are different for one
user (who has "de_DE.iso-8859-1" for all variables (including LANG)
except LC_ALL, which is empty). For the other users, the locales are
as above, and it is this way no matter which kernel is running. But
the console does not behave differently for the two users, but
differently for the two kernels (i.e. identically for both users).

So, the bottom line is: that is apparently not the heart of the
problem either, as the setting cited above DOES work under my kernel
2.6.17. But thanks for having me discover the user-specific locale
settings! I wasn't aware of that.

A user who said he was too lazy to subscribe to the list (which is a
loss for the list, I think) gave me the tip that passing the kernel
the parameter "default_utf8=0" should reverse that behaviour. While
the kernel does know the parameter, it did not change the
behaviour. But he also said that the command kbd_mode can change the
behaviour of the keyboard, and indeed it can:

kbd_mode -a

sets the behaviour to single bytes, i.e. the keys send single bytes,
while

kbd_mode -u

sets it to sending one, two, or three bytes, depending on what UTF-8
requires. 

kbd_mode without a parameter outputs the current status, and this is
indeed different after booting the two kernels: as expected, for the
old kernel, it is "The keyboard is in ASCII mode", for the new one
"The keyboard is in Unicode (UTF-8) mode" (the documentation explains
that "ASCII" is misleading; it is indeed "single-byte", and fine for
all iso-8859- encodings).

After saying "kbd_mode -a" under the new kernel, I can now produce
ISO-8859-1-encoded files with Emacs or the shell. I haven't worked out
how to get the screen to display them correctly, however (as it does
under the old kernel). The unsubscribed user told me two magic escape
sequences, but I have yet to see how to type them correctly. ('ECS (
K' to switch to single-byte mode, 'ESC % G' to switch to utf-8
mode). "console" covers both keyboard and screen, he
explained. (Sounds familiar, but I thought it would not hurt to repeat
it here.)

But still, I am wondering how to get the new kernel to behave as I
want out of the box. My best guess is now that this console behaviour
has become the default at some point between kernels 2.6.17 and
2.6.27, and that you now have to switch it off explicitely. But how?

Regards,
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-user]  Re: Kernel update messed up console encoding
  2009-03-01 12:25               ` Florian v. Savigny
@ 2009-03-01 12:48                 ` Nikos Chantziaras
  2009-03-02  1:01                   ` Florian v. Savigny
  0 siblings, 1 reply; 14+ messages in thread
From: Nikos Chantziaras @ 2009-03-01 12:48 UTC (permalink / raw
  To: gentoo-user

Florian v. Savigny wrote:
> [...]
> But still, I am wondering how to get the new kernel to behave as I
> want out of the box. My best guess is now that this console behaviour
> has become the default at some point between kernels 2.6.17 and
> 2.6.27, and that you now have to switch it off explicitely. But how?

Maybe the commands "unicode_start" and "unicode_stop" might help.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user]  Re: Kernel update messed up console encoding
  2009-03-01 12:48                 ` Nikos Chantziaras
@ 2009-03-02  1:01                   ` Florian v. Savigny
  2009-03-02 11:29                     ` Nikos Chantziaras
  0 siblings, 1 reply; 14+ messages in thread
From: Florian v. Savigny @ 2009-03-02  1:01 UTC (permalink / raw
  To: gentoo-user

Hi Nikos,

  > Maybe the commands "unicode_start" and "unicode_stop" might help.

Bull's eye! "unicode_stop" reverses the behavior completely to what
the old kernel did.

I looked inside; both are actually shell scripts; unicode_stop is very
simple: 

  kbd_mode -a
  if test -t ; then
	echo -n -e '\033%@'
  fi

unicode_start does a little more (also change the keyboard mapping and
choose a unicode font), but it also contains

  kbd_mode -u

and 

  if test -t 1 -a -t 2 ; then
	  echo -n -e '\033%G'
  fi

So the escape sequences are 'ESC % @' and 'ESC % G'. Thanks very much
for this collaborate effort! 

Simultaneously, the unnamed user (sorry, I just forgot to ask whether
he minds being named or not) told me to try the kernel parameter
"vt.default_utf8=0", and that does the trick as well. So the smoothest
workaround will now be putting that into lilo.conf (yes, I know, I'm
hopelessly old-fashioned - old encodings, old bootloaders ... ;-)).

I think I'll continue on a kernel list to figure out what kernel
2.6.27 does differently from 2.6.17, and why (and whether that
behaviour cannot be changed with a compile-time option). I think that
part is really not a gentoo-specific question. But I'll report here
when I get the result!

Best regards!
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-user]  Re: Kernel update messed up console encoding
  2009-03-02  1:01                   ` Florian v. Savigny
@ 2009-03-02 11:29                     ` Nikos Chantziaras
  2009-03-02 12:51                       ` Florian v. Savigny
  0 siblings, 1 reply; 14+ messages in thread
From: Nikos Chantziaras @ 2009-03-02 11:29 UTC (permalink / raw
  To: gentoo-user

Florian v. Savigny wrote:
> [...]
> I think I'll continue on a kernel list to figure out what kernel
> 2.6.27 does differently from 2.6.17, and why (and whether that
> behaviour cannot be changed with a compile-time option). I think that
> part is really not a gentoo-specific question. But I'll report here
> when I get the result!

On my /etc/rc.conf, there's this:

   # Set unicode to YES to turn on unicode support for keyboards
   # and screens.
   unicode="YES"

So I suppose maybe simpley changing this to "NO" will do the job.  I'm 
on OpenRC now though so maybe it looks different on older baselayout.

Try "grep -ri unicode /etc" and see what you find.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-user]  Re: Kernel update messed up console encoding
  2009-03-02 11:29                     ` Nikos Chantziaras
@ 2009-03-02 12:51                       ` Florian v. Savigny
  0 siblings, 0 replies; 14+ messages in thread
From: Florian v. Savigny @ 2009-03-02 12:51 UTC (permalink / raw
  To: gentoo-user

Hi Nikos,

  > On my /etc/rc.conf, there's this:
  > 
  >    # Set unicode to YES to turn on unicode support for keyboards
  >    # and screens.
  >    unicode="YES"

It's set to "no" on my machine (I already posted this; this was the
first thing outside the kernel that I considered, I think). (I haven't
yet posted that I use sys-apps/baselayout-1.12.11.1, though - not sure
how this relates to the OpenRC you are mentioning.)

  > So I suppose maybe simpley changing this to "NO" will do the job.

Curiously, it does not, even though it seems supposed to do it, using
the very mechanisms we already discussed (kbd_mode and console escape
sequences). It's a little strange:

  > Try "grep -ri unicode /etc" and see what you find.

Doing this, I found out that /etc/runlevels/boot/keymaps and
/etc/init.d/keymaps do use this variable, but do so for setting the
keyboard encoding only if it's set to "yes". In other words, if the
kernel starts up with 8-bit encoding for the console, these scripts (I
don't know which one, perhaps both - they seem to do the same thing in
this respect) will switch to unicode for the keyboard, but not the
other way round (i.e. the if statement "if [[${UNICODE} == 'yes']]"
has no else part, so if $UNICODE has a different value, such as 'no',
it is simply ignored, and nothing happens). For the terminal encoding,
however, the scripts seem to act both ways (the if statement does have
an else part). Strange, to me (or am I overlooking something?).

(I'm not sure, BTW, whether the double '=' is a gentoo peculiarity,
nor whether this kind of string comparison is case-insensitive. But in
any case, the scripts only test for "yes", in lower case, so anything
else should effectively mean "no".)

Best regards,

Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-03-02 12:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-27 17:29 [gentoo-user] Kernel update messed up console encoding Florian v. Savigny
2009-02-27 21:05 ` Sebastian Günther
2009-02-28 10:34   ` Florian v. Savigny
2009-02-28 11:34     ` Eray Aslan
2009-02-28 14:26     ` Sebastian Günther
2009-02-28 17:38       ` Florian v. Savigny
2009-02-28 18:48         ` Sebastian Günther
2009-03-01  9:36           ` Florian v. Savigny
2009-03-01 10:30             ` [gentoo-user] " Nikos Chantziaras
2009-03-01 12:25               ` Florian v. Savigny
2009-03-01 12:48                 ` Nikos Chantziaras
2009-03-02  1:01                   ` Florian v. Savigny
2009-03-02 11:29                     ` Nikos Chantziaras
2009-03-02 12:51                       ` Florian v. Savigny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox