===== What? =====
Some notes regarding linux, i18n and encoding of files. [[https://resources.oreilly.com/examples/9781565922242/tree/master/doc|Ken Lundes cfk.inf file is very helpful.]]
===== verifying utf8 output =====
In a utf8 xterm, verifying the output of date and a utf8 file:
[chris@hive ~]$ locale -a|grep ^ja
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
japanese
japanese.euc
[chris@hive ~]$ echo $LC_ALL
ja_JP.utf8
[chris@hive ~]$ cat test_utf8
日本語
[chris@hive ~]$ date
2013年 12月 23日 月曜日 22:20:21 CET
===== looking at eucjp output =====
Now the same should also work in an eucjp encoding environment. Below (incorrect) output is on Fedora19, on Debian I get proper output:
[chris@hive ~]$ LC_ALL=ja_JP.eucjp luit
[chris@hive ~]$ locale charmap
EUC-JP
[chris@hive ~]$ cat test_eucjp
F|K\8l
[chris@hive ~]$ date
2013G/ 127n 23F| 7nMKF| 22:21:45 CET
Seems to be the 7-bit equivalent of EUC-JP "日本語".
===== fedora19/20 and rhel7 issue =====
* https://bugzilla.redhat.com/show_bug.cgi?id=1046341 https://bugzilla.redhat.com/show_bug.cgi?id=1093788
Description of problem:
xterm does not display EUCJP encoding
Version-Release number of selected component (if applicable):
xterm-293-1.fc19.x86_64
(not sure if this is an issue in xterm, glibc or something else)
How reproducible:
always
Steps to Reproduce:
1. # verify eucjp and utf8 locales exist
locale -a|grep ja_JP
2. # ensure this is a terminal capable of displaying the chars we request later
LC_ALL="ja_JP.utf8" echo 日本語
LC_ALL="ja_JP.utf8" date
3. LC_ALL="ja_JP.eucjp" luit
4. date
Actual results:
2013G/ 127n 24F| 2PMKF| 17:49:59 CET
Expected results:
2013年 12月 24日 火曜日 17:49:04 CET
Additional info:
- the above works on debian stable
- gnome-terminal and xterm both show this
- tried this in "xterm -en eucjp" terminal, as well as gnome-terminal
- in the eucjp output, the high bits seem stripped off by something
- setting "stty raw" does not lead to the expected output
- also creating a utf8 textfile, converting to eucjp with "iconv"
and outputting this gives same result
- the output of "date +%A| xxd|md5sum" in a "LC_ALL=ja_JP.eucjp luit"
environment is identical on the debian and the fedora system
===== debugging =====
* Alternatively to running "luit" manually, "xterm -en eucjp" can be used.
* "stty raw" does not change the output
* https://bugzilla.redhat.com/show_bug.cgi?id=1046341
[chris@hive ~]$ echo $LC_ALL
ja_JP.UTF-8
[chris@hive ~]$ date
2013年 12月 25日 水曜日 21:51:51 CET
[chris@hive ~]$ LC_ALL=ja_JP.eucjp date|iconv -f eucjp
2013年 12月 25日 水曜日 21:51:59 CET
[chris@hive ~]$ LC_ALL=ja_JP.eucjp date|xxd
0000000: 3230 3133 c7af 2031 32b7 ee20 3235 c6fc 2013.. 12.. 25..
0000010: 20bf e5cd cbc6 fc20 3231 3a35 323a 3036 ...... 21:52:06
0000020: 2043 4554 0a CET.
[chris@hive ~]$ for l in utf8 eucjp; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)"; LC_ALL=ja_JP.$l date +%A|xxd; echo; done
utf8 水曜日
0000000: e6b0 b4e6 9b9c e697 a50a ..........
eucjp 水曜日
0000000: bfe5 cdcb c6fc 0a .......
for i in $(xlsfonts); do
echo "current font: $i";
xterm -fn $i -e 'LC_ALL=ja_JP.eucjp luit cat test_eucjp; sleep 10;';
done
[chris@hive ~]$ for l in utf8 eucjp; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)"; LC_ALL=ja_JP.$l date +%A|xxd -g1; echo; done
utf8 水曜日
0000000: e6 b0 b4 e6 9b 9c e6 97 a5 0a ..........
eucjp 水曜日
0000000: bf e5 cd cb c6 fc 0a .......
The following produces the same on Debian and Fedora:
for l in utf8 eucjp ; do
echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)";
LC_ALL=ja_JP.$l date +%A|xxd; echo;
done | md5sum -c <(echo "31f0f83e7fe3dacb7b288c101cd1debd -")