Improved UTF-8 support for the Linux kernel
Note: this site is currently focussing mainly on Linux kernel 2.6.x.
I have only backported one of them to 2.4 so far.
Introduction
The Linux kernel has some support for UTF-8 keyboard input, but it has
some gaping holes that make it awkward to use, especially when using the
console. I have been working on some patches to improve the situation.
My current goal is to have no regressions. That is: I want to be able to
do everything in UTF-8 mode that is possible to do in non-Unicode mode. This
in itself doesn't give you any incentive to switch to using Unicode. It simply
removes the disincentives.
After that, my next goal will be to write a new input method, since the
existing input method for extended Latin characters is very 8-bit centric,
and is not easily extended to cover the wide range of Unicode characters
available.
Patches
- p1_conv_8bit_to_uni.patch: This patch makes the keyboard
convert all 8-bit characters into their UTF-8 equivalent when the
the keyboard is in unicode mode. It uses the vt's encoding table,
so make sure you do a "setfont -m 8859-2" or whatever your favorite
8-bit encoding is.
- p2_selection.patch: This patch allows you to copy and paste
UTF-8 when the keyboard is in Unicode mode. Previously, copy and paste
would always use an 8-bit encoding.
- p3_extended_utf8.patch: This patch allows you to enter any
Unicode value using the Alt-nnnn mechanism. Previously, it was
restricted to the BMP, i.e. the first 65536 Unicode characters.
- fbcon.patch: This patch fixes a small problem with the
framebuffer console. Previously, incorrect characters were displayed
under the selection cursor when the screen is using a 512-character font.
(This patch has been submitted to fbdev, and will eventually make its
way into the mainstream kernel.)
- ttyutf8.patch: This patch adds a new flag to the TTY to
tell its line editor to use UTF-8. This makes the delete key delete
a whole character instead of just one byte. (The TTY line editor
is what you get when you type "cat" at a shell prompt.) This feature is
part of the mainstream kernel as of 2.6.4, so I have removed it from my
collection of patches.
Downloads
The patches for the 2.6 kernel can be downloaded in two forms.
Take your pick.
- Change Log
- Combined patch
for Linux 2.6.4:
All the above patches rolled into one.
- Individual patches
for Linux 2.6.4:
All the above patches provided separately so that you can pick and choose
which ones to apply. Note that some of them apply on top of others.
For a clean patching experience, apply the patches in alphabetical order.
- sttyutf8 utility:
A small Perl script for manipulating the new UTF8 flag in the TTY.
Type "sttyutf8 on" to enable UTF-8 mode, "sttyutf8 off" to disable it, or
"sttyutf8" by itself to find out the current setting without changing it.
- Patch
for Linux 2.4.21:
Currently the only patch that I've backported to Linux 2.4 is
p1_conv_8bit_to_uni.patch.
Obsolete Downloads
All of the older downloads that were available from this site are
archived here.