===== What? ===== Some notes regarding linux, i18n and encoding of files. [[https://resources.oreilly.com/examples/9781565922242/tree/master/doc|Ken Lundes cfk.inf file is very helpful.]] ===== verifying utf8 output ===== In a utf8 xterm, verifying the output of date and a utf8 file: [chris@hive ~]$ locale -a|grep ^ja ja_JP ja_JP.eucjp ja_JP.ujis ja_JP.utf8 japanese japanese.euc [chris@hive ~]$ echo $LC_ALL ja_JP.utf8 [chris@hive ~]$ cat test_utf8 日本語 [chris@hive ~]$ date 2013年 12月 23日 月曜日 22:20:21 CET ===== looking at eucjp output ===== Now the same should also work in an eucjp encoding environment. Below (incorrect) output is on Fedora19, on Debian I get proper output: [chris@hive ~]$ LC_ALL=ja_JP.eucjp luit [chris@hive ~]$ locale charmap EUC-JP [chris@hive ~]$ cat test_eucjp F|K\8l [chris@hive ~]$ date 2013G/ 127n 23F| 7nMKF| 22:21:45 CET Seems to be the 7-bit equivalent of EUC-JP "日本語". ===== fedora19/20 and rhel7 issue ===== * https://bugzilla.redhat.com/show_bug.cgi?id=1046341 https://bugzilla.redhat.com/show_bug.cgi?id=1093788 Description of problem: xterm does not display EUCJP encoding Version-Release number of selected component (if applicable): xterm-293-1.fc19.x86_64 (not sure if this is an issue in xterm, glibc or something else) How reproducible: always Steps to Reproduce: 1. # verify eucjp and utf8 locales exist locale -a|grep ja_JP 2. # ensure this is a terminal capable of displaying the chars we request later LC_ALL="ja_JP.utf8" echo 日本語 LC_ALL="ja_JP.utf8" date 3. LC_ALL="ja_JP.eucjp" luit 4. date Actual results: 2013G/ 127n 24F| 2PMKF| 17:49:59 CET Expected results: 2013年 12月 24日 火曜日 17:49:04 CET Additional info: - the above works on debian stable - gnome-terminal and xterm both show this - tried this in "xterm -en eucjp" terminal, as well as gnome-terminal - in the eucjp output, the high bits seem stripped off by something - setting "stty raw" does not lead to the expected output - also creating a utf8 textfile, converting to eucjp with "iconv" and outputting this gives same result - the output of "date +%A| xxd|md5sum" in a "LC_ALL=ja_JP.eucjp luit" environment is identical on the debian and the fedora system ===== debugging ===== * Alternatively to running "luit" manually, "xterm -en eucjp" can be used. * "stty raw" does not change the output * https://bugzilla.redhat.com/show_bug.cgi?id=1046341 [chris@hive ~]$ echo $LC_ALL ja_JP.UTF-8 [chris@hive ~]$ date 2013年 12月 25日 水曜日 21:51:51 CET [chris@hive ~]$ LC_ALL=ja_JP.eucjp date|iconv -f eucjp 2013年 12月 25日 水曜日 21:51:59 CET [chris@hive ~]$ LC_ALL=ja_JP.eucjp date|xxd 0000000: 3230 3133 c7af 2031 32b7 ee20 3235 c6fc 2013.. 12.. 25.. 0000010: 20bf e5cd cbc6 fc20 3231 3a35 323a 3036 ...... 21:52:06 0000020: 2043 4554 0a CET. [chris@hive ~]$ for l in utf8 eucjp; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)"; LC_ALL=ja_JP.$l date +%A|xxd; echo; done utf8 水曜日 0000000: e6b0 b4e6 9b9c e697 a50a .......... eucjp 水曜日 0000000: bfe5 cdcb c6fc 0a ....... for i in $(xlsfonts); do echo "current font: $i"; xterm -fn $i -e 'LC_ALL=ja_JP.eucjp luit cat test_eucjp; sleep 10;'; done [chris@hive ~]$ for l in utf8 eucjp; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)"; LC_ALL=ja_JP.$l date +%A|xxd -g1; echo; done utf8 水曜日 0000000: e6 b0 b4 e6 9b 9c e6 97 a5 0a .......... eucjp 水曜日 0000000: bf e5 cd cb c6 fc 0a ....... The following produces the same on Debian and Fedora: for l in utf8 eucjp ; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)"; LC_ALL=ja_JP.$l date +%A|xxd; echo; done | md5sum -c <(echo "31f0f83e7fe3dacb7b288c101cd1debd -")