そのページに「誰か英訳して」って書いてあったから英訳した。新規ページを作るにはユーザー登録がいるみたいなので、ここに書いとく。一応 PukiWiki の書式を真似たけど、プレビューできてないので正しくフォーマットできてるかは不明。
訳しながら気になったというかわからんかった点:
- \CJKfamily と \CJKencfamily の違いは?
- NFSS って何? そういう convention でいいの?
- CJK 環境内部で他の言語を使った時の話なのか、CJK 環境外部に影響を及ぼすという話なのか?
- 「ソースファイルのエンコーディングは、フォントエンコーディングと同じものが、 使用されます」は「フォントエンコーディングのエンコーディングは、ソースファイルと同じものが、 使用されます」の間違い?
- inputenc の説明が全体的によく分からん…
- 「インストール」の節のTTフォントをOTFに偽装するだの何だの言う話は日本語からしてよく分からなかったので、直訳風味。
- 最初のサンプルをインストール無しでコンパイルしようとしたけど、どうすればいいのかよく分からんかった。単に展開すれば現在ディレクトリから辿って各種ファイルを見つけてくれるのかと思いきゃそんな事は無いようで。Directory hierarchy を完全に潰して全部のファイルを同一ディレクトリにぶち込んでみたけど dvipdfm にこんな事言われて失敗する (latex は通る):
dvipdfm UTF8-noembed-CJK UTF8-noembed-CJK.dvi -> UTF8-noembed-CJK.pdf [1 kpathsea: Running mktexpk --mfmode / --bdpi 600 --mag 1+0/600 --dpi 600 sungu5b mktexpk: don't know how to create bitmap font for sungu5b. mktexpk: perhaps sungu5b is missing from the map file. kpathsea: Appending font creation commands to missfont.log. ** WARNING ** Could not locate a virtual/physical font for TFM "sungu5b". ** WARNING ** >> There are no valid font mapping entry for this font. ** WARNING ** >> Font file name "sungu5b" was assumed but failed to locate that font. ** ERROR ** Cannot proceed without .vf or "physical" font for PDF output... Output file removed. 正直これ以上やる気無いので誰か他の人頑張って下さい。
* [[The CJK package for LaTeX:http://cjk.ffii.org/]] (This note was translated from an incomplete version as of 2013-05-26.) ASCII Inc.'s pTeX is a TeX distribution for processing Japanese, but it contains extensions to both the TeX typesetter and the DVI engine that make certain tools like dvisvgm and dvipng incompatible. This note explains how to use CJK LaTeX, which allows you to process Chines, Japanese, and Korean (CJK) text solely by macros, without modifications to TeX proper. CJK characters are unfortunately rendered as bitmap fonts, but as long as you can live with that, this arrangement lets you keep using standard TeX and related tools like you always used to. Other caveats include that the CJK package (unlike pTeX) doesn't handle vertical text flow and line breaking rules very well, but on the upside it can process Chinese and Korean text via UTF-8 encoding. It seems to be used widely outside of Japan for mixing short snippets of Asian text into English documents. The CJK package, like inputenc, changes the category code of bytes that have their 8-th bit set, so that TeX sources containing multibyte characters can be compiled by standard (8-bit enabled) LaTeX. It is not compatible with pLaTeX which interprets multibyte characters as characters rather than as macros. The basic usage is >\usepackage{CJK}~ ...~ \begin{CJK}{encoding}{family}~ ...~ \end{CJK} < The "encoding" part can be UTF-8 EUC-JP, Shift_JIS, GB2312, Big5, EUC-KR, x-EUC-TW (CNS 11643), and various other encodings. The following is a more complete table of major supported encodings. |CENTER:Encoding|CENTER:TeX Name|CENTER:TFM Encoding|h |Big5|Bg5|c00| |GB2312|GB|c10| |EUC-JP|JIS|c40| |Shift_JIS|SJIS|c40| |JIS X 0212 (EUC-JP)|JIS2|c50| |EUC-KR|KS|c60| |UTF-8|UTF8|c70| The TeX Name column shows the name that should be specified in the second argument to \begin{CJK} in the TeX source. TFM encoding is a parameter required to build an fd file. As you may haved guessed from the explanation so far, TeX can handle source code that mixes different encodings in a single file. However, in practice it's probably easier to edit if you separate different encodings into separate files and assemble them by \input as necessary. In particular, keeping Big5 and Shift_JIS encodings in separate files is imperative because these encodings contain characters whose trailing bytes conicide with special characters like "\", "{", and "}" that need to be preprocessed away to avoid confusing TeX, like: \begin{CJK}{JIS}{} \input{euc-jp-text1}% \CJKenc{Bg5}% \ifx\VTeXversion\undefined% \immediate\write18{bg5conv < big5-text.raw > big5-text.tex}% \fi\input{big5-text}% \input{euc-jp-text2} \end{CJK} The "family" part specifies the font family. If you leave it blank, TeX selects the "song" family by default. This default can be changed with \CJKfamily or \CJKencfamily. Note the family does not completely specify the font. The font that TeX actually accesses during typesetting is determined by the "(TFM encoding)(family).fd" files following the NFSS convention. For example, if the TeX file specifies the song family, TeX will select cyberbXX.tfm specified in c70song.fd if the TeX source is UTF-8, or select jsso12XX.tfm specified in c40song.fd if the TeX source is EUC-JP, and so on. **Extensions The CJK package distribution contains several extension packages and examples. Here we explain the CJKutf8 package, which is probably the most important one. Chinese and Japanese (and to some extent Korean) text have the unique property that lines can be broken almost anywhere. The CJK package implements this liberal line breaking rule, which can cause inappropriate line breaks when other languages are mixed into the CJK environment. This is a bit like how pTeX incorrectly hyphenates English documents written in full-width alphanumeric characters. The font encoding determines hyphenation rules, kerning, and ligature, so to correctly process non-CJ languages inside the CJK environment, we have to arrange for the right font encoding to be loaded outside (prior to?) the CJK environment. CJKutf8 does just that. To explain how this works, I need to tell you about the inputenc package first. ***The inputenc Package The big change that LaTeX2e made from LaTeX2.09 was the adoption of NFSS2. This protocol made the font encoding an attribute that the user specifies separately from all other aspects of the font. As a result, the minimal complete source code in LaTeX2e is >\documentclass{...}~ \usepackage[...]{fontenc}~ \usepackage[...]{inputenc}~ \begin{document}~ ...~ \end{document} < For backwards compatibility (with LaTeX2.09), OT1 is used if no fontenc is given, and the source file's encoding is used as the font encoding. A source file that doesn't load these packages cannot be said to be fully compliant to LaTeX2e's conventions, even if it suits your needs. inputenc.sty itself only sets the character class of character with the 8th bit set to active and to raise an error whenever the source code uses them. To use these activated characters in the TeX source, they have to be redefined to macros that generate the right character in the right encoding. The option to inputenc specifies the file to do this redefinition. The UTF8 option is available starting with the Feb 9, 2004 version of LaTeX2e. \usepackage[UTF8]{inputenc} But writing this in the preamble is not enough to enable all UTF-8 encoded characters. When inputenc is given the UTF8 option, it goes through all font encodings loaded in the preamble (scanning all the way up to the last line preceding the document body) and for each encoding XXX reads in XXXenc.dfu and enables redefinitions of characters defined in that file. Characters that are not defined in any of those files remain undefined and attempts to use them results in an error. Currently, the standard distribution contains the following dfu files. >lcyenc.dfu~ ly1enc.dfu~ omsenc.dfu~ ot1enc.dfu~ ot2enc.dfu~ t1enc.dfu~ t2aenc.dfu~ t2benc.dfu~ t2cenc.dfu~ ts1enc.dfu~ x2enc.dfu < utf8enc.dfu combines all of the files above. Languages that can be written in these font encodings are typeset in UTF-8 with exactly the same hyphenation, kerning, and ligature as when they are typeset with some other encodings. So, currently Unicode support in standard LaTeX, with no additional packages, works as follows. -LaTeX provides complete support for Unicode source files encoded in UTF-8 (not limited to the BMP). -Theoretically, any language that satisfies the following conditions can be properly typeset from a UTF-8 source file as long as an appropriate font encoding and dfu file are prepared. --A line is composed of horizontally listed glyphaemes (i.e. characters), and lines are listed from top to bottom. --Each line has enough "space" (a white space or similar entity) with a flexible width where the line can be broken. (So for Chinese and Japanese, there is an implicit space in this sense between (almost) every pair of characters.) *** CJKutf8 パッケージ This package does a lot of things under the hood, but its interface is straightforward. It reads in inputenc, tries to hijack everything inside the CJK environment and process it using inputenc, and reverts to the CJK environment whenever inputenc fails. \documentclass{article} \usepackage[T1]{CJKutf8} % The font encoding can be specified in the option. \begin{document} \begin{CJK}{UTF8}{min} % Write something in UTF-8. Hyphenation is properly handled if you specify % the right language with babel or the like. UTF-8で何か文章を書く。babel等でハイフネーションの言語を指定すれば、正しく組版される。 \end{CJK} \end{document} *Installation **TeX You first need a working LaTeX installation. Additionally, you need the macro files from [[CTAN:languages/japanese/CJK/]] (under the directory named cjk-4.x.x/; it may be archived in a zip or tarball) and the font metric (TFM) files. The default font settings that come with the CJK package are compatible with dvips and pdflatex, but this also means its suboptimal for use with dvipdfmx. This section explains how to write a custom font definition. The TFM used in standard TeX (which doesn't include nonstandard extensions like those of pTeX or Omega) can describe up to only 256 glyphs per TFM file. This is insufficient to handle Chinese characters or other large character sets, so in CJK a single font is distributed across multiple files. That might sound scary, but you can easily generate those TFM files from any TTF font using ttf2tfm. > ttf2tfm [TTF] [TFM stem]@[SFD name]@ If you have a TTC file which combines multiple TrueType faces into one font, you can use the -f option to choose which face you want. If you're generating cyberbXX.tfm the [TFM stem] is cyberb, and for jsso12XX.tfm it's jsso12. [SFD name] determines how to split into subfonts. Which SFD is needed depends on the font's CMap encoding and the TeX source code's encoding, but for recent TrueType fonts you should use one that starts with "U". If you are planning to use full-width characters exclusively, you can also just copy an existing TFM file to a different name and use that instead. (The TFM files contained in the samples below was made this way, so if you typeset half-width alphanumeric characters with those files you'll get pretty ugly results.) For instance, if you have a document written in EUC-JP/Shift_JIS, you want to use a TFM file whose stem is foo, and refer to that font as the "bar" family, you do ttf2tfm baz foo@UJIS@ to create f0001.tfm–foo35.tfm. Then you have to write c40bar.fd, which should contain at least \DeclareFontFamily{C40}{bar}{} \DeclareFontShape{C40}{bar}{m}{n}{<-> CJK * foo}{} If you put those files in somewhere LaTeX can find them, your LaTeX source should compile as expected. \documentclasss{article} \usepackage{CJK} \begin{document} \begin{CJK}{JIS}{bar} % Write your Japanese text here in EUC-JP. ここにEUC-JPで日本語の文章を書きます。 \end{CJK} \begin{CJK}{SJIS}{bar} % Write your Japanese text here in Shift_JIS. % You may have to preprocess this block if you use certain characters. ここにShift_JISで日本語の文章を書きます。% しかし、もしかすると、このブロックだけ% プリプロセッサーを通さないと% \LaTeX のコンパイルが通らないかも知れません。 \end{CJK} \end{document} To process Shift_JIS or Big5, you'll also need to install the preprocessors sjisconv and bg5conv. **DVI Driver pdflatex is slated to officially support the CJK package in the near future, but for now the only ways to generate decent PDFs with CJK are VTeX (commercial) and dvipdfmx. Here we will focus on dvipdfmx. >By "decent" we mean that the non-decent PDFs require the fonts to be split in accordance with the TFM files. The only things you need to set up are the mappings between the DVI file's TFM and PDF file's fonts. > DVI files do not contain any information about glyph appearances. They only specify the size and position of each character and which TFM that information comes from. The job of a DVI driver is to attach glyph shapes extracted from the fonts. This means it needs a mapping between TFM and fonts. Without this mapping, most DVI drivers tries to generate a bitmap font on its own. Currently, pTeX generates an error and dies at this point, which signals the user that there's something wrong with the installation. But if the CJK package is fully installed, the driver often succeeds in generating bitmaps for default fonts, which causes many people to not notice the problem and keep using a half-broken installation. The samples below use newly defined TFMs and show how to map them to real fonts. dvipdfmx has many files (called map files) that map TFM to fonts inside PDF files, but most of them are shared with dvipdfm, so they can handle only those 8-bit fonts that dvipdfm can understand. So this mapping has to be added to the dvipdfmx-specified map file called cid-x.map. (Details will be added later.) ***When Using Non-existent (CFF, CID-keyed) OpenType Fonts dvipdfmx knows about the following fonts. |CENTER:|||c |~Language|CENTER:Character Set|CENTER:Font Name|h |~Japanese|Adobe-Japan1|Ryumin-Light| |~|~|GothicBBB-Medium| |~|~|HeiseiMin-W3| |~|~|HeiseiKakuGo-W5| |~|Adobe-Japan1-2|HeiseiMin-W3-Acro| |~|~|HeiseiKakuGo-W5-Acro| |~|Adobe-Japan1-4|KozMinPro-Regular-Acro| |~|~|KozGoPro-Medium-Acro| |~Simplified Chinese|Adobe-GB1|STSong-Light| |~|Adobe-GB1-2|STSong-Light-Acro| |~|Adobe-GB1-4|AdobeSongStd-Light-Acro| |~Traditional Chinese|Adobe-CNS1|MSung-Light| |~|~|MHei-Medium| |~|Adobe-CNS1-0|MSung-Light-Acro| |~|~|MHei-Medium-Acro| |~|Adobe-CNS1-4|AdobeMingStd-Light-Acro| |~Korean|Adobe-Korea1|HYSMyeongJo-Medium| |~|~|HYGoThic-Medium| |~|Adobe-Korea1-0|HYSMyeongJo-Medium-Acro| |~|~|HYGoThic-Medium-Acro| |~|Adobe-Korea1-2|AdobeMyungjoStd-Medium-Acro| So if you specify these font names in a cid-x.map entry, which looks like > [TFM stem]@[SFD name]@ [CMap name] [Font file name] you'll get a PDF that doesn't embed those fonts, even if the fonts aren't on dvipdfmx's search path. [CMap name] is the mapping from the encoding that results from applying SFD, to the ordering CID (translator note: I have no idea what this is talking about; this goes for much of the subsequent discussion). [SFD name] is usually the same as the SFD name you passed in to ttf2tfm when you created the TFM file. But you can something like the following instead, too. jsso12@UJIS@ UniJIS-UCS2-H HeiseiMin-W3-Acro This entry above collects the characters in jsso12XX.tfm and decodes them to Unicode, and maps them to the glyphs in Adobe-Japan1. jsso12@SJIS@ RKSJ-H HeiseiMin-W3-Acro This collects the characters in jsso12XX.tfm and decodes them into Shift_JIS, then maps them to the glyphs in Adobe-Japan1. jsso12@SJIS@ 90ms-RKSJ-H HeiseiMin-W3-Acro This also goes through Shift_JIS, but uses the mapping from Windows-31J (Microsoft Windows Standard Japanese Character Set). Some characters will deploy different variants than the above. jsso12@SJIS@ 78-RKSJ-H HeiseiMin-W3-Acro This uses glyphs conforming to the example glyph shapes in JIS C 6226-1978 (JIS X 0208:1978). PDF that contain non-embedded fonts can be rendered on some systems with different substitute fonts. ***When Using Existing OpenType (CFF, CID-keyed) Fonts This is almost the same as in the previous section, but the [Font file name] has to specify a font that exists on dvipdfmx's search path. By default the font will be embedded, but a "!" before the [Font file name] prevents embedding. Embedding is also suppressed when [Font file name] is followed by ",Bold", ",Italic", or ",BolItalic". ***Using TrueType Fonts as OpenType (CFF, CID-keyed) For TrueType fonts, there is no set order in which glyphs are listed, so accesses to glyphs must go through the font file's CMap table. If the character set of the CMap file specified in [CMap name] is one of Adobe's standard character sets, the TrueType font can be embedded as if it's a CID font using the standard mapping from Unicode. However, glyphs that do not exist in Unicode are usually not included in TrueType fonts, and even if they are, they are inaccessible. If the character set of the CMap file specified in [CMap name] is not a standard character set from Adobe, you can emulate e.g. the Adobe-Japan1 supplement 4 set by adding "/AJ14" after the font name. Generally speaking, you should use this technique whenever you use a TrueType font without embedding it. ***Using TrueType Fonts To access a TrueType font using a CMap table, use the cid-x.map entry > [TFM stem]@[SFD name]@ unicode [Font file name] [options] The SFD should start with a "U", indicating that the TFM encoding should be mapped to Unicode. > -w option Given when the TrueType font will be used for vertical text. -w 0 Horizontal text (default) -w 1 Vertical text > -p option Used to access characters that lie outside of Unicode's BMP (Basic Multilingual Plane) -p 0 Access the BMP (default)~ In other words, the code points 0x0000–0xFFFF are mapped to characters with those exact code points. -p 1 Access the SMP (Supplementary Multilingual Plane). The characters needed in TeX are usually ancient scripts. The code points 0x0000–0xFFFF are slid over by 0x10000 and mapped to characters in the code point range 0x10000–0x1FFFF. -p 2 Access the SIP (Supplementary Ideographic Plane). This includes Chinese characters that did not fit in the BMP. The code points 0x0000–0xFFFF are slide over by 0x20000 and mapped to characters in the code point range 0x20000–0x2FFFF. If, for some reason, you must access a TrueType font's glyph in the order they are listed, without going through a CMap, you can do so by specifying a CMap that has the encoding Adobe-Identity. However, the CMap file must not be named "Identity-H" or "Identity-V". *Examples The following examples require the files from [[CTAN:languages/japanese/CJK/]]. You should be able to install and use them as given, but to try them out without installing, you should create an empty temporary directory (folder) and copy everything there. Then rename all the dvipdfm/config/cid-x.map-add.* in the example to cid-x.map. If you want to install, you should append the contents of those files to the system's cid-x.map file. + &ref(http://oku.edu.mie-u.ac.jp/~okumura/texfaq/archive/CJK-LaTeX-UTF8-noembed.tar.bz2,Render different variants of the same kanji from a TeX file written in UTF-8); --Known problems: ---There are some parts whose intentions are unclear on each page. + &ref(http://oku.edu.mie-u.ac.jp/~okumura/texfaq/archive/CJK-LaTeX-localEncoding-vertical.tar.bz2,Vertical text); also contains settings needed to use JIS X 0213 with Shift_JIS --Known problems: ---The TFM files included in this archive can only handle full-width characters ---The archive has no fdx files for using horizontal-script fonts to render vertical script, so punctuation appears incorrect in the vertical text mode of CJKvert.sty (this shouldn't be a problem if you have a genuine vertical-script font). + &ref(http://oku.edu.mie-u.ac.jp/~okumura/texfaq/archive/CJK-LaTeX-SIP.tar.bz2,Using the Supplementary Ideographic Plane); dvipdfmx (20070409 or newer) is required to build a PDF file from this example. It also uses proprietary fonts. In case you don't have them, &ref(http://oku.edu.mie-u.ac.jp/~okumura/texfaq/archive/CJK-LaTeX-SIP.pdf,here's a pre-built PDF) for your reference. --Known problems ---An old dvipdfmx has a bug that prevents it from handling this example properly (fixed in dvipdfmx-20070409) ---Two fonts are defined in the c70usong.fd file. This should be split into c70usong.fd and c70usong2.fd, or otherwise we can't use SIP characters at the beginning of the document. ---It uses proprietary fonts. cid-x.map should be rewritten to use [[HAN NOM FONTs:http://www.viethoc.org/article.php?sid=98&mode=threaded&order=0&thold=0]]. (But boy, does Han Nom's design look like SimSun!) ---Lots of other errors that you can spot by searching the Web about this example's usage of CJK's features. ---In CJK 4.6.0, you need to add the kind of code you see in this example's preamble in order to use non-BMP characters. But on the other hand, the development version (translator note -- as of when?) of CJK clashes with this code. Let us know if you want to see any other examples! Of course, new examples and corrections to this wiki page are welcome too.
No comments:
Post a Comment