This is the LaTeX2e style package CJK Version 4.2.0 (13-Dec-1998) ================================================================= It is freely distributable under the GNU Public License. ************************************************** * * * You need LaTeX 2e version 1996/12/01 or newer! * * * ************************************************** Use --- Use CJK.sty as a package, e.g. \documentclass{article} \usepackage{CJK} . Two new environments, \begin{CJK}[]{}{} ... \end{CJK} and \begin{CJK*}[]{}{} ... \end{CJK*} are defined. The parameters have the following meaning: These character sets resp. encodings are currently implemented in CJK.enc: Bg5 (For traditional Chinese. Mainly used in Taiwan. Character set: Big 5. Encoding: Big 5.) GB (For simplified Chinese. Mainly used in PR China. Character set: GB 2312-80. Encoding: EUC.) GBt (For traditional Chinese. Rarely used in PR China. Character set: GB 12345-90. Encoding: EUC.) JIS (For Japanese. Character set: JIS X 0208-1990. Encoding: EUC.) JIS2 (Japanese supplementary character set, Character set: JIS X 0212-1990. Encoding: EUC.) SJIS (For Japanese. Used mainly on PCs. Also known as MS Kanji. Character sets: 1-byte characters from JIS X 0201-1976, 2-byte characters from JIS X 0208-1990. Encoding: SJIS.) KS (For Korean. Character set: KSC 5601-1987. Encoding: EUC.) UTF8 (Unicode Transformation format 8, also called UTF 2 or FSS-UTF. Character set: Unicode. Encoding: UTF 8.) CNS1 (Chinese National Standard Plane 1, Character set: CNS 11643-1992 plane 1. Encoding: EUC.) CNS2 ... CNS7 (Character set: CNS 11643-1992 plane 2 - 7. Encoding: EUC.) CEFX (reserved CEF character set for IRIZ. Encoding: EUC.) CEFY (private CEF character set. Encoding: EUC.) The encodings (except Big 5, SJIS, and UTF 8) are simplified EUC (Extended UNIX Code) character sets without single shifts. The used character set slot G1 stands for two-byte encodings with byte values taken from the GR (Graphic Right) character range 0xA1-0xFE (as defined in ISO 2022). Note that CNS1 and CNS2 contain almost the same characters in the same order as Big 5 (but in EUC). For CEF and CNS character sets see CEF.doc also. If you use this parameter it's the same as you would have used \CJKenc: writing e.g. \begin{CJK}{Bg5}{...} ... is identical to \begin{CJK}{}{...} \CJKenc{Bg5} ... Note: A `character set' is a collection of glyphs. The order of the glyphs is just for defining purposes and for reference. An `encoding' is an ordering scheme to access a character set. A character set can have many encodings (cf. JIS X 0208 -> EUC resp. SJIS) An encoding can be used for many character sets (cf. EUC -> KS 5601, GB 2312, ...) Sometimes, the character set has the same name as the encoding (Big 5). For more details I suggest to read the document cjk.inf from Ken Lunde; it is available from ftp://ftp.ora.com/pub/examples/nutshell/ ujip/docs/cjk.inf Throughout the CJK documentation, `encoding' refers to the valid encoding/character set combinations defined just above. These font encodings are currently defined: `' (empty; the default), `pmC' (available for Bg5, GB, GBt, JIS, and KS), `dnp' and `wn' (for JIS), `HL' (for KS). Font encoding means the order of characters in the subfonts itselves. A change of the font encoding will neither alter the meaning of a CJK character nor change the character code in the selected encoding. The font encoding `pmC' is defined for compatibility with the pmC package. It's not encouraged to use this font encoding because of wasting subfonts. If possible, convert your original CJK bitmap fonts with hbf2gf (see hbf2gf.doc) or other tools to CJK encodings. `dnp' implements the character order of the Dai Nippon Printing fonts and is only available for JIS encoding. `wn' is the font encoding for watanabe jfonts. There exists a linking package which maps the watanabe jfonts onto the dnp naming scheme (thus you can use the real dnp fonts for printing and the mapped jfonts for previewing). See the documentation files in the `japanese' subdirectory for further details. `HL' allows the use of the new HLaTeX fonts (starting with version 0.97); note that the definition of fonts is rather different compared to HLaTeX. See the section `Korean input' below for a detailed description. You can change the font encoding per encoding with the command \CJKfontenc; the first parameter is the encoding, the second the font encoding. It is impossible to know in advance what fonts are available at your site; look at the example FD (font definition) files how to create or modify appropriate FD files suiting your needs. See fonts.doc also for further hints. If this parameter is empty, the default value given in CJK.enc is selected: `song' for all encoding except KS (which defaults to `mj'). If you use this parameter it's the same as you would have used \CJKfamily; all encodings will then use this family: \begin{CJK}{...}{song} ... is identical to \begin{CJK}{...}{} \CJKfamily{song} ... You can change the families per encoding (and font encoding) with the command \CJKencfamily; the first parameter is the encoding, the second the family, the optional argument is the font encoding. This will override the default value. Note that \CJKfamily or a non-empty `family' parameter of the CJK environment will override any \CJKencfamily commands. Say `\CJKfamily{}' to enable \CJKencfamily again. The CJK* environment will swallow unprotected spaces and newlines after a CJK character (the usual habit for Chinese and Japanese text), whereas CJK will not (for European and Korean text). You can change between these two `modes' with \CJKspace (CJK* -> CJK) and \CJKnospace (CJK -> CJK*). If you use cjk-enc.el, you don't need to specify a CJK environment. This will be done automatically. See cjk-enc.doc for details. This is a typical example: \begin{CJK*}{GB}{kai} ... Text in GuoBiao encoding ... \end{CJK*} How it works ------------ Asian logographs can't be represented completely with one byte per character. (At least) two bytes are needed, and the most common encoding schemes (GB, Big 5, JIS, KS, etc.) have a certain range for the first byte (usually 0xA1-OxFE or a part of it) which signals that this and the next byte represent an Asian logograph. This means that plain ASCII characters (i.e., characters between 0x00 and 0x7F) will be left undisturbed, and the remaining character codes (0x80-0xFF) will be assigned to a CJK encoding, creating a multiple-byte encoding with 1-byte and 2-byte characters (and even 3-byte characters for UTF 8). CJK.sty makes the character codes 0x81-0x9F and 0xA1-OxFE active inside of the CJK environment and assigns macros to the active characters which will then select the proper font and character. The real mechanism is a bit more complex to assure robustness (it was borrowed and modified from LaTeX 2e's inputenc.sty) and correct handling of punctuation characters. The remaining character codes 0x80 and 0xA0 are made active also and will be used with cjk-enc.el and cefconv. The last remaining character code 0xFF will be used as a delimiter. * emTeX users: you must activate 8bit input and output while creating the * LaTeX2e format file! Do this by unsing the switches -o and -8 (additional * to the iniTeX switch -i). * * Example: * * tex386 -i -o -8 latex.ltx Some internals -------------- Internally three levels (bindings, encodings, character macro sets) are defined: active characters | +--------------> bindings (standard, SJIS, UTF8) | active character macros | +--------------> encodings (GB, Big 5, ...) + | font encodings (none, dnp, wn, pmC, HL) | subfont selecting macros | +--------------> character macro sets (standard, Big 5, ...) | character selecting macros User selectable are only the encoding and the font encoding (as explained above); the other levels will be selected by the CJK package. These levels correspond to the following internal macros: \CJK@xxxxBinding (`.bdg' files): possible values for `xxxx' are: standard, SJIS, and UTF8. \CJK@xxxxEncoding (`.enc' files): possible values for `xxxx' are: standard, Bg5, SJIS, KS, UTF8, pmCsmall, pmCbig, JISdnp, and KSHL. \CJK@xxxxChr (`.chr' files): possible values for `xxxx' are: standard, Bg5, KS, SJIS, UTF8, pmC, and HLaTeX. And now a more detailed description of the various encodings: \CJK@standardEncoding will be used for EUC encodings with the second byte in the range 0xA1-0xFE (GB, GBt, JIS, JIS2, CNS, CEF). \CJK@Bg5Encoding will be used for Big 5 encoding with the second byte in the range 0x40-0xFE. \CJK@SJISEncoding will be used for SJIS encoding; one-byte characters are in the range 0xA1-0xDF, two-byte characters have the first byte in the ranges 0x81-0x9F and 0xE0-0xEF, the second byte runs from 0x40 to 0xFC except 0x7F. Since SJIS only squeezes the JIS encoding into a new scheme without changing the ordering, fonts produced by hbf2gf will look the same for JIS and SJIS except one-byte SJIS characters. For more details see below. \CJK@KSEncoding will be used for KS encoding. Two sets of subfonts are defined, one for Hangul syllables and elements, and a second for Hanja. For more details see below. \CJK@UTF8Encoding will be used for Unicode in UTF 8 encoding. The first byte is in the range 0xC0-0xDF for two-byte values, and in the range 0xE0-0xEF for three-byte values. The other byte(s) are in the range 0x80-0xBF. Note that CJK expects two hexadecimal digits as a running number in the font name (as defined in UTF8.enc) instead of two decimal digits. Select the option `unicode yes' in the hbf2gf config file if you use hbf2gf to transform bitmap fonts in HBF format to PK fonts as used by CJK.sty . Three commands (\CJKCJKchar, \CJKhangulchar, and \CJKlatinchar) control the handling of intercharacter glue: \CJKCJKchar (the default) selects CJK style (using \CJKglue), \CJKhangulchar selects hangul style (using \CJKtolerance), and \CJKlatinchar selects none of them. This is the only encoding which will not work in preprocessed mode. \CJK@pmCsmallEncoding and \CJK@pmCbigEncoding can be activated with \pmCsmall (this is the default) and \pmCbig inside the CJK environment. Note that the original pmC fonts have two character sizes per font (the bigger ones with an offset of -128); Bg5pmC encoded fonts cannot contain big characters. The names of the fonts in the FD files reflect the modifications added by Marc Leisher to the original poor man's Chinese (pmC) package written by Thomas Ridgeway . \CJK@JISdnpEncoding is JIS encoding with dnp fonts. The main difference (besides the offsets) is the composition of real font names; a dnp font name consists of name stem + subfont name + designsize: an example is dmjkata10. \CJK@JISwnEncoding is similar to JISdnp encoding but uses Watanabe jfonts. \CJK@KSHLEncoding finally uses the new fonts of the HLaTeX package; three internal encodings are necessary to represent it. See the next section for details. Korean input ------------ There is already a package which handles Hangul and Hanja: HLaTeX 0.98 (this version merges hlatex and jhtex). The main difference between HLaTeX and the standard processing of KS in CJK is the use of 11 real fonts containing Hangul syllables (HLaTeX), whereas the CJK package uses 11 virtual fonts to define Hangul syllables which map to two fonts containing Hangul `elements'. Additionally, HLaTeX uses TeX's ligature mechanism to map the two-byte character code to the subfont-specific glyph. To use KS encoding, say \begin{CJK}{KS}{} ... \end{CJK} . These font switches are available inside the environment: hangul fonts from former hlatex (in the han font packages): * \mj MyoungJo (default) \gt Gothic \gs BootGulssi \gr Graphic \dr Dinaru hangul fonts from former jhtex (in the han1 font packages): * \hgt Hangul Gothic * \hmj Hangul MyoungJo (MunHwaBu fonts) * \hpg Hangul Pilgi \hol Hangul Outline (MyoungJo) If a font is marked with a star, real bold series are available. All other bold fonts are defined using poor-man's boldface (see below). See the file INSTALL how to get these fonts. Note that the font switches are abbreviations for \CJKencfamily and not for \CJKfamily. For characters with the first byte in the ranges 0xA1-0xAF (except 0xA4) and 0xC9-0xFD (graphic characters, hanja, archaic hangul etc.) fonts with the encoding C60 are used. C61 is assigned to hangul fonts (for hangul elements with the first byte 0xA4 and hangul characters in the range 0xB0-0xC8). This enables the use of many hangul fonts and perhaps only one or two different hanja fonts. If you want to use C60 encoding for hangul characters also say \CJKhanja. The opposite command is \CJKhangul (of course this works only if you have hangul characters in the C60 font). Archaic hangul elements (KS 0xA4D5-0xA4FE) and the character KS 0xA4D4 are only accessible if \CJKhanja is active. You should convert your KS hanja fonts using hbf2gf (or ttf2pk) as described above. To use HLaTeX fonts, say \begin{CJK}[HL]{KS}{} ... \end{CJK} . These font switches are available inside the environment: \bm Bom * \gr Graphic + \gs Gungseo + * \gt Gothic \mg Mokgak \mgt Jamo Gothic * \mj Myoungjo \mmj Jamo Myoungjo \mnv Jamo Novel \msr Jamo Sora \ol Outline \pg Pilgi \pn Pen + * \sm SaeMyoungjo + \tz (Jamo) Taza \yt Yetgul If a font is marked with an asterisk, real bold series are available. All other fonts are defined using poor-man's boldface (see below). Only fonts marked with a plus sign are available for hanja (in both normal and bold series). The other font families are mapped to these four hanja families. Un Koaung-Hi , the author of HLaTeX, defines three groups of fonts: hangul, hanja, and symbols. The CJK package needs three internal encodings (C63 for hanja, C64 for symbols, and C65 for hangul) to represent the font encoding scheme of HLaTeX. HLaTeX options: If you want to use HLaTeX's PS fonts instead of MF based fonts, say `\usepackage{pshan}' (e.g. the font switch \mj maps then to the family `pmj' instead of `mj'). All bold series of the PS fonts are defined as poor-man's boldface. This package must be called after CJK.sty has been loaded. The option `hardbold' has been integrated into the FD files---I consider the fact whether you have bold series available or not as a fundamental local font setup decision which should be coded into the FD files and not into the document. As a consequence you have to change your FD files to emulate the `softbold' option with CJK's poor-man's boldface. Example: \DeclareFontShape{C63}{gt}{bx}{n}{<5-25> CJK * wsgtb}{} should be changed to \DeclareFontShape{C63}{gt}{bx}{n}{<5-25> CJKb * wsgtm}{\CJKbold} . and similar font definitions too. [Well, it's not really necessary to modify the FD files to emulate the `softbold' option: just insert the appropriate \DeclareFontShape and/or \DeclareFontFamily commands in the preamble of your text.] Finally a warning: please bear in mind that CJK does not emulate the behaviour of HLaTeX, it only supports its fonts. Big 5 encoding -------------- See below for the preferred input method using bg5conv. The characters `\', `{', and `}' are used as second bytes in the Big 5 encoding. This collides with TeX. If you write Big 5 text mixed with other encodings (and you don't want/can't use Mule or bg5conv), you should use the Bg5text environment which changes the category codes of these characters. The command prefix is now the forward slash `/', and the grouping characters are `(' and `)' respectively. An example: \begin{CJK}{Bg5}{song} \begin{Bg5text} ... /begin(center) ... /end(center) ... /end(Bg5text) \end{CJK} To get the `/', `(', and `)' characters, write `//', `/(', and `/)' inside the Bg5text environment. This environment is ugly, and some commands like \newcommand will not work in it. Starting with CJK version 3.0 it's also possible to use different encodings in preprocessed mode, thus this environment is almost obsolete. It makes only sense in the unlikely case that you want to mix Big 5 with SJIS or UTF8 or that you want to include a short Big 5 fragment in a LaTeX 2e document without using the preprocessor. Instead of using the Bg5text environment you can protect the offending second bytes with a backslash, i.e. \{, \}, \\ (using a non-Chinese editor). This will not increase the readability of the Chinese text, but for short texts it's perhaps more comfortable. Alas, it doesn't work in page header commands because the macros \{ etc. will not be expanded. Be careful not to use any commands inside the Bg5text environment which write something into an external file (commands like \chapter etc.). If it's not possible to avoid Big 5 character codes with \, {, or } outside of the Bg5text environment (e.g. having Big 5 text in a \chapter or \section command), you can replace them with the \CJKchar macro manually: \section{This is a problematic Big 5 character: \CJKchar{169}{92}} The parameters are the first and second byte of the Big 5 character code. You can also use hexadecimal or octal notation. See commands.doc for a full description. SJIS encoding ------------- See below for the preferred input method using sjisconv. Shift-JIS encoding is widely used on PCs for Japanese. A special feature is the simultaneous use of one-byte and two-byte encoded characters which arose because of backward compatibility. The two-byte encoded character set is completely identical to the JIS character set, even the ordering is the same. Thus there is no need for special two-byte SJIS FD files; the font definition files for JIS will be used. The situation is different for one-byte SJIS characters, the so called `half-width' Katakana (encoding C49). Usually you will use full-width Katakana fonts too to get a typographically correct output. The exception is a typewriter font which should really have only the half width of normal Kanji or Katakana to represent screen snapshots or similar things. Fonts defined in C49 encoding scheme must have the characters glyphs at the coding points 0xA1-0xDF. An environment `SJIStext' similar to `Bg5text' will be defined; the same restrictions as explained in the previous section hold. CJK captions ------------ To use the supplied caption files you will need the koma-script package. The main reason why I choose these style files instead of the standard classes is the fact that the author of koma-script is willing to support CJK. On the other hand, the philosophy of the LaTeX 2e maintainers is not to add new features to the standard classes. The koma-script style files are maintained by Markus Kohm (Markus_Kohm@hd.maus.de) and available at the CTAN hosts. If you say \CJKcaption{} inside of a CJK environment, the file .cap will be loaded. If in preprocessed mode instead, .cpx will be loaded instead. Example: \documentclass{scrartcl}% this is a KOMA-script class \usepackage{CJK} \begin{document} \begin{CJK*}{GB}{kai} \CJKcaption{GB}% loading GB.cap \chapter{blablabla}% will be formatted in Chinese ... \end{CJK*} \end{document} Note that for Korean three caption files are available: hanja.cap for caption using hanja (this corresponds to HLaTeX's `hanja' option) and two caption files (hangul.cap and hangul2.cap) using hangul. In case you want to edit a CAP file, you must create its correspinding CPX file too. After editing, preprocess the file with bg5conv < xxx.cap > xxx.cpx (for caption files in SJIS encoding use sjisconv instead), then change the file name identification strings in the CPX file accordingly. Poor-man's boldface ------------------- Most CJK fonts available in the public domain do not have bold series. To emulate boldface by printing the character three times with slight horizontal offsets some special features are used: CJK uses \CJKsymbol internally instead of \symbol to access CJK characters (after the correct font has been selected). This macro honours the \ifCJK@bold@ flag; if set it will emulate boldface. The default value of the horizontal offset is 0.015em; to change it you should redefine \CJKboldshift, the macro which holds this shift. \ifCJK@bold@ can be set and unset globally with the commands \CJKbold and \CJKnormal. These commands are intended to be used with \DeclareFontShape as follows: \DeclareFontShape{C00}{CNS}{m}{n}{<-> CJK * csso12}{} \DeclareFontShape{C00}{CNS}{bx}{n}{<-> CJKb * csso12}{\CJKbold} It should be never necessary to use \CJKnormal since \selectfont has been modified to always reset \ifCJK@bold@ and to call the loading-settings (i.e., the sixth parameter) of \DeclareFontShape. Additionally new size functions (CJKb, sCJKb, CJKfixedb, sCJKfixedb, and others; see fonts.doc for details) have been introduced which are completely identical to its counterparts without the final `b'. The only reason to use them is, as shown in the above example, to make the fifth parameter of \DeclareFontShape for bold series different from the one for medium series (LaTeX 2e uses this parameter as a macro name to execute loading-settings, thus they must not be equal). Embedding non-CJK words into CJK text ------------------------------------- To enable line breaking you should separate non-CJK words and CJK characters with horizontal space. But the ordinary space dimensions inserted by TeX based on the current non-CJK font often looks bad because the surrounding CJK characters are printed almost side by side (the non-stretched value of \CJKglue is 0pt). Especially in extreme cases which happen in underfull \hbox commands the default space distorts the CJK text too much. If you say \CJKtilde, the active `~' character will not produce an unbreakable space; instead the following definition will be used: \def~{\hspace{0.25em plus 0.125em minus 0.08em}} . This defines a space which has a normal width of a quarter space. See the file japanese/shibuaki.doc for some further details. Here an example: ThisIsChineseText~test~ThisIsChineseText ^^^^^^ Simply use tilde characters instead of spaces. The original definition of `~' is available as \nbs (non-breakable space, a shorthand for the LaTeX command \nobreakspace). To return to the standard `~' macro definition say \standardtilde. Note that the opposite is not true: to embed CJK words into non-CJK text an ordinary space is optimal. If you use Mule please consider the use of cjktilde.el in utils/lisp. This small package defines a minor mode (cjk-tilde-mode) which exchanges the space key with the tilde key. It's convenient to bind this mode to a key, e.g. C-insert. For AUCTeX you can also use cjkspace.el which is similar (but not identical) to cjktilde.el . bg5conv and sjisconv -------------------- Using the Bg5text or SJIStext environment is a mess. Thus two preprocessors, bg5conv and sjisconv, are provided for Big 5 and SJIS characters to overcome the restrictions of the Bg5text and SJIStext environments. Compile them with gcc -O -s -o bg5conv bg5conv.c gcc -O -s -o sjisconv sjisconv.c The CJK bin package already contains precompiled binaries for use with DOS and OS/2; see the batchfiles bg5latex[.bat] etc. for an example how to use them. Each Big 5 character (each two-byte encoded SJIS character) `XY' will be converted into the form `XZZZ^^FF'; ZZZ is the decimal equivalent of Y followed by a character with the hex value 0xFF. The use of bg5conv/sjisconv is completely transparent; no changes to your documents are necessary. It is possible to mix Big 5 encoding with other encodings (except SJIS and UTF8) if bg5conv is used. To mix SJIS with other encodings (except Big 5 and UTF8) use sjisconv. If you use traditional Chinese characters within Mule, it's not necessary to call bg5conv after the use of *cjk-coding* output encoding (but it is necessary if you write out the file in Big 5 encoding). The same is true for emacs 20.3 and above. Note: The OS/2 script files bg5latex.cmd etc. need REXX which you probably have to install first. Caveats ------- o You can of course use CJK environments inside of a CJK environment, but it is possible that you must increase the so called save size (with emTeX you can adjust this with -ms=...). The CJK package has optional arguments which control the scope of CJK environments: lowercase If you want to use \lowercase with encodings inside CJK environments. You need less save size using the `encapsulated' option if `lowercase' is not set. You must use bg5conv (sjisconv) or cjk-enc.el to use Big 5 (SJIS) characters with this option. Use this with caution! All \lccode values in the range 0x80-0xFF are set to zero, thus disabling TeX's hyphenation mechanism for words which contain characters of this range in the *input encoding* (e.g. Latin-1 encoded words with accents). This is due to an unfortunate mangling of the input and output encoding mechanism in TeX itself. global \lccode (if `lowercase' set), \uccode, \catcode and the activation of the characters 0x81-0xFE will be globally modified (\lccode and \uccode reset to 0). This is the most economical mode concerning save size, but you can't have CJK environments inside of CJK environments or other environments which manipulate the character range 0x81-0xFE. All CJK font selection commands are globally too! Packages which change some of the above values only once (e.g. in the preamble) will also not work after the first use of a CJK environment. cjk-enc.el will automatically select this option. local \lccode (if `lowercase' set) and \uccode together with bindings will be modified globally. This is the default. You can stack CJK environments. active If activated, bindings will be local additionally. You will need this option if you want to mix preprocessed text with non-preprocessed text in nested CJK environments. This can happen if you merge texts in various encodings. encapsulated If you want to access e.g. T1 fonts directly (i.e., without the macros defined in t1enc.def) or if you want to use a LaTeX 2e input encoding (outside of the CJK environment) with \uppercase and \lowercase (resp. \MakeUppercase and \MakeLowercase) working correctly, you must use this option. All values mentioned above will be local, so you can stack environments. This option probably causes an overflow of the save size. Note: all macro packages which access T1 fonts with the macros defined in t1enc.def will work in CJK environments! E.g. the command `"s' of german.sty works with \MakeUppercase too. Say \usepackage[