Site Tools


languages:japanese:linux:0verview

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
languages:japanese:linux:0verview [2022/01/10 09:50] – [fonts] chrislanguages:japanese:linux:0verview [2024/03/03 07:57] – [converting Kanji to Hiragana/Furigana] chris
Line 1: Line 1:
 +===== What? =====
 +Notes on getting japanese inputs/fonts working on linux and learning japanese. My presentation on the topic is here: https://fluxcoil.net/files/speeches/latex_japlinux/japlinux.pdf .
 +===== input hiragna/katagana/kanji =====
 +  * https://iamacat.wordpress.com/2008/07/07/more-japanese-on-linux-anthy-uim-and-ubuntu-among-others/
 +===== tex =====
 +  * cjk-latex, ptex (non-utf8), XeTeX (japanese typesetting, also left to right writing)
 +  * xelatex, xecjk package
 +  * [[https://aminophen.github.io/slide/hytexconf18.pdf|日本語のLATEXで幸せになる...かもしれない方法]]
 +  * options:
 +    * pdflatex + CJK package
 +    * xelatex + xeCJK package
 +    * lualatex + luatex-ja package
 +    * uplatex or platex: TeX implementations as they were specifically Japanese typography, work differently in some aspects when compared to the more general-purpose implementations from above
 +    * https://tex.stackexchange.com/questions/15516/how-to-write-japanese-with-latex
 +===== tex cjk installation on Fedora19 =====
 +  * with this the kanji/hiragana/katakana are properly used from utf8 tex-files
 +<code>
 +# cjk
 +$ yum -y install yum install texlive-cjk.noarch texlive-collection-langcjk.noarch
  
 +$ cat minimal_japanese_example.tex
 +\documentclass{beamer}
 +\usepackage[encapsulated]{CJK}
 +\usepackage{ucs}
 +\usepackage[utf8x]{inputenc}
 +\newcommand{\jptext}[1]{\begin{CJK}{UTF8}{min}#1\end{CJK}}
 +\begin{document}
 +\jptext{日本語}
 +\end{document}
 +
 +$ pdflatex minimal_japanese_example.tex && evince minimal_japanese_example.pdf
 +</code>
 +
 +
 +===== terminals =====
 +I use this on Fedora:
 +<code>
 +xterm -en UTF-8 -fg white -bg black \
 +  -fn -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 -e bash
 +</code>
 +Also i tried different fons with xterm, a listing of the available fonts comes back from executing "xlsfonts". To grab these out containing 'ja' and try them out:
 +<code>
 +for i in $(xlsfonts|grep ja); do 
 +    echo "current font: $i"; xterm -fn $i -e 'echo ちち; sleep 10'; 
 +done
 +</code>
 +
 +===== japanese input on emacs =====
 +  * make sure your terminal supports UTF8, i.e. it can properly display utf8-files, using xterm here
 +<code>
 +emacs ~/.emacs # and add this:
 +;;;;;;;;;;;;;;;;;;;;                                                                                                                        
 +;; unicode-setup                                                                                                                            
 +(prefer-coding-system       'utf-8)
 +(set-default-coding-systems 'utf-8)
 +(set-terminal-coding-system 'utf-8)
 +(set-keyboard-coding-system 'utf-8)
 +(setq default-buffer-file-coding-system 'utf-8)
 +(setq x-select-request-type '(UTF8_STRING COMPOUND_TEXT TEXT STRING))
 +;; make japanese default choice for input system
 +(set-input-method 'japanese)
 +
 +emacs -nw test.tex
 +# now you can use C-x C-m C-\ and are asked to enter an input method.
 +# jap<tab><tab> shows a selection, here 
 +#   'japanese' works for hiragana directly/katakana with <blank> after hiragana-inputs
 +#   'japanese-katakana' works for katakana-input
 +# then you can input japanese as with uim.
 +# use C-\ to switch between english<->japanese input
 +</code>
 +
 +===== converting Kanji to Hiragana/Furigana =====
 +  * tlug++ for so many hints on that
 +  * http://www.edrdg.org/~jwb/mecabdemo.html has a online demo of conversion with MeCab/Unidic
 +  * **kakasi:** http://kakasi.namazu.org/ . Easy to install and use, but has an old dictionary and is bad for complex sentences. Simple example:
 +<code>
 +$ echo '私は馬鹿です'| kakasi -JK -i utf8 -o utf8
 +ワタシはバカです
 +</code>
 +  * **MeCab**, i.e. with the mecab-ipadic-neologd dictionary ( https://github.com/neologd/mecab-ipadic-neologd ) is more modern.
 +<code>
 +$ echo 例文文章です。|mecab --node-format='%pS%m[%f[7]]' --eos-format='\n'
 +例文[レイブン]文章[ブンショウ]です[デス]。[。]
 +</code>
 +  * Fedora: dnf -y install mecab mecab-ipadic
 +
 +
 +===== irc codepage =====
 +  * iso-2022-jp
 +
 +===== fonts =====
 +  * font Electroharmonix which is for Romaji characters, but has them look like Kanij/katakana: [[http://www.dafont.com/electroharmonix.font|dafont.com]]
 +
 +===== Libreoffice =====
 +  * By default, libreoffice started to use Chinese variants of some Kanji for me. Use tools -> language -> for all text -> more, then "Default languages for documents" -> Asian: Japanese. For example 石炭, second 字, has Chinese and Japanese variants. Also 捨てる.
 +
 +===== Translation =====
 +  * cut'n'paste to google translate
 +  * EBView can read EB dictionaries, dictionaries: 広辞苑, 大辞林, Readers+
 +  * Kenkyusha "KOD" online dictionary (not for free)
 +  * Eijiro dictionary
 +  * translating single Japanese words on the commandline, offline:
 +    * install jmdict http://jmdict.sourceforge.net/
 +    * and the JMdict dictionary file: http://www.edrdg.org/jmdict/edict_doc.html
 +
 +===== Links =====
 +  * [[https://github.com/PaddlePaddle/PaddleOCR#PP-OCRv2|PaddleOCR]] Japanese OCR, [[https://gigazine.net/news/20210919-paddleocr/|Gigazine article]]
languages/japanese/linux/0verview.txt · Last modified: 2024/03/04 23:42 by chris