===== What? ===== Notes on getting japanese inputs/fonts working on linux and learning japanese. My presentation on the topic is here: https://fluxcoil.net/files/speeches/latex_japlinux/japlinux.pdf . ===== input hiragna/katagana/kanji ===== * https://iamacat.wordpress.com/2008/07/07/more-japanese-on-linux-anthy-uim-and-ubuntu-among-others/ ===== tex ===== * cjk-latex, ptex (non-utf8), XeTeX (japanese typesetting, also left to right writing) * xelatex, xecjk package * [[https://aminophen.github.io/slide/hytexconf18.pdf|日本語のLATEXで幸せになる...かもしれない方法]] * options: * pdflatex + CJK package * xelatex + xeCJK package * lualatex + luatex-ja package * uplatex or platex: TeX implementations as they were specifically Japanese typography, work differently in some aspects when compared to the more general-purpose implementations from above * https://tex.stackexchange.com/questions/15516/how-to-write-japanese-with-latex ===== tex cjk installation on Fedora19 ===== * with this the kanji/hiragana/katakana are properly used from utf8 tex-files # cjk $ yum -y install yum install texlive-cjk.noarch texlive-collection-langcjk.noarch $ cat minimal_japanese_example.tex \documentclass{beamer} \usepackage[encapsulated]{CJK} \usepackage{ucs} \usepackage[utf8x]{inputenc} \newcommand{\jptext}[1]{\begin{CJK}{UTF8}{min}#1\end{CJK}} \begin{document} \jptext{日本語} \end{document} $ pdflatex minimal_japanese_example.tex && evince minimal_japanese_example.pdf ===== terminals ===== I use this on Fedora: xterm -en UTF-8 -fg white -bg black \ -fn -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 -e bash Also i tried different fons with xterm, a listing of the available fonts comes back from executing "xlsfonts". To grab these out containing 'ja' and try them out: for i in $(xlsfonts|grep ja); do echo "current font: $i"; xterm -fn $i -e 'echo ちち; sleep 10'; done ===== japanese input on emacs ===== * make sure your terminal supports UTF8, i.e. it can properly display utf8-files, using xterm here emacs ~/.emacs # and add this: ;;;;;;;;;;;;;;;;;;;; ;; unicode-setup (prefer-coding-system 'utf-8) (set-default-coding-systems 'utf-8) (set-terminal-coding-system 'utf-8) (set-keyboard-coding-system 'utf-8) (setq default-buffer-file-coding-system 'utf-8) (setq x-select-request-type '(UTF8_STRING COMPOUND_TEXT TEXT STRING)) ;; make japanese default choice for input system (set-input-method 'japanese) emacs -nw test.tex # now you can use C-x C-m C-\ and are asked to enter an input method. # jap shows a selection, here # 'japanese' works for hiragana directly/katakana with after hiragana-inputs # 'japanese-katakana' works for katakana-input # then you can input japanese as with uim. # use C-\ to switch between english<->japanese input ===== converting Kanji to Hiragana/Furigana ===== * tlug++ for so many hints on that * https://www.edrdg.org/~jwb/mecabdemo.html has a online demo of conversion with MeCab/Unidic * **kakasi:** https://kakasi.namazu.org/ . Easy to install and use, but has an old dictionary and is bad for complex sentences. Simple example: $ echo '私は馬鹿です'| kakasi -JK -i utf8 -o utf8 ワタシはバカです * **MeCab**, i.e. with the mecab-ipadic-neologd dictionary ( https://github.com/neologd/mecab-ipadic-neologd ) is more modern. $ echo 例文文章です。|mecab --node-format='%pS%m[%f[7]]' --eos-format='\n' 例文[レイブン]文章[ブンショウ]です[デス]。[。] * Fedora: dnf -y install mecab mecab-ipadic ===== irc codepage ===== * iso-2022-jp ===== fonts ===== * font Electroharmonix which is for Romaji characters, but has them look like Kanij/katakana: [[https://www.dafont.com/electroharmonix.font|dafont.com]] ===== Libreoffice ===== * By default, libreoffice started to use Chinese variants of some Kanji for me. Use tools -> language -> for all text -> more, then "Default languages for documents" -> Asian: Japanese. For example 石炭, second 字, has Chinese and Japanese variants. Also 捨てる. ===== Translation ===== * cut'n'paste to google translate * EBView can read EB dictionaries, dictionaries: 広辞苑, 大辞林, Readers+ * Kenkyusha "KOD" online dictionary (not for free) * Eijiro dictionary * translating single Japanese words on the commandline, offline: * install jmdict https://jmdict.sourceforge.net/ * and the JMdict dictionary file: https://www.edrdg.org/wiki/index.php/JMdict-EDICT_Dictionary_Project ===== Links ===== * [[https://github.com/PaddlePaddle/PaddleOCR#PP-OCRv2|PaddleOCR]] Japanese OCR, [[https://gigazine.net/news/20210919-paddleocr/|Gigazine article]]