ICONV_UNICODE(7) Standards, Environments, and Macros ICONV_UNICODE(7)

NAME


iconv_unicode - code set conversion tables for Unicode

DESCRIPTION


The following code set conversions are supported:

CODE SET CONVERSIONS SUPPORTED
------------------------------
FROM Code Set TO Code Set
Code FROM Target Code TO
Filename Filename
Element Element

ISO 8859-1 (Latin 1) 8859-1 UTF-8 UTF-8
ISO 8859-2 (Latin 2) 8859-2 UTF-8 UTF-8
ISO 8859-3 (Latin 3) 8859-3 UTF-8 UTF-8
ISO 8859-4 (Latin 4) 8859-4 UTF-8 UTF-8
ISO 8859-5 (Cyrillic) 8859-5 UTF-8 UTF-8
ISO 8859-6 (Arabic) 8859-6 UTF-8 UTF-8
ISO 8859-7 (Greek) 8859-7 UTF-8 UTF-8
ISO 8859-8 (Hebrew) 8859-8 UTF-8 UTF-8
ISO 8859-9 (Latin 5) 8859-9 UTF-8 UTF-8
ISO 8859-10 (Latin 6) 8859-10 UTF-8 UTF-8
Japanese EUC eucJP UTF-8 UTF-8
Chinese/PRC EUC
(GB 2312-1980) gb2312 UTF-8 UTF-8
ISO-2022 iso2022 UTF-8 UTF-8
Korean EUC ko_KR-euc Korean UTF-8 ko_KR-UTF-8
ISO-2022-KR ko_KR-iso2022-7 Korean UTF-8 ko_KR_UTF-8
Korean Johap
(KS C 5601-1987) ko_KR-johap Korean UTF-8 ko_KR-UTF-8
Korean Johap
(KS C 5601-1992) ko_KR-johap92 Korean UTF-8 ko_KR-UTF-8
Korean UTF-8 ko_KR-UTF-8 Korean EUC ko_KR-euc
Korean UTF-8 ko_KR-UTF-8 Korean Johap ko_KR-johap
(KS C 5601-1987)
Korean UTF-8 ko_KR-UTF-8 Korean Johap ko_KR-johap92
(KS C 5601-1992)
KOI8-R (Cyrillic) KOI8-R UCS-2 UCS-2
KOI8-R (Cyrillic) KOI8-R UTF-8 UTF-8
PC Kanji (SJIS) PCK UTF-8 UTF-8
PC Kanji (SJIS) SJIS UTF-8 UTF-8
UCS-2 UCS-2 KOI8-R (Cyrillic) KOI8-R
UCS-2 UCS-2 UCS-4 UCS-4


CODE SET CONVERSIONS SUPPORTED
------------------------------
FROM Code Set TO Code Set
Code FROM Target Code TO
Filename Filename
Element Element

UCS-2 UCS-2 UTF-7 UTF-7
UCS-2 UCS-2 UTF-8 UTF-8
UCS-4 UCS-4 UCS-2 UCS-2
UCS-4 UCS-4 UTF-16 UTF-16
UCS-4 UCS-4 UTF-7 UTF-7
UCS-4 UCS-4 UTF-8 UTF-8
UTF-16 UTF-16 UCS-4 UCS-4
UTF-16 UTF-16 UTF-8 UTF-8
UTF-7 UTF-7 UCS-2 UCS-2
UTF-7 UTF-7 UCS-4 UCS-4
UTF-7 UTF-7 UTF-8 UTF-8
UTF-8 UTF-8 ISO 8859-1 (Latin 1) 8859-1
UTF-8 UTF-8 ISO 8859-2 (Latin 2) 8859-2
UTF-8 UTF-8 ISO 8859-3 (Latin 3) 8859-3
UTF-8 UTF-8 ISO 8859-4 (Latin 4) 8859-4
UTF-8 UTF-8 ISO 8859-5 (Cyrillic) 8859-5
UTF-8 UTF-8 ISO 8859-6 (Arabic) 8859-6
UTF-8 UTF-8 ISO 8859-7 (Greek) 8859-7
UTF-8 UTF-8 ISO 8859-8 (Hebrew) 8859-8
UTF-8 UTF-8 ISO 8859-9 (Latin 5) 8859-9
UTF-8 UTF-8 ISO 8859-10 (Latin 6) 8859-10
UTF-8 UTF-8 Japanese EUC eucJP
UTF-8 UTF-8 Chinese/PRC EUC gb2312
(GB 2312-1980)
UTF-8 UTF-8 ISO-2022 iso2022
UTF-8 UTF-8 KOI8-R (Cyrillic) KOI8-R
UTF-8 UTF-8 PC Kanji (SJIS) PCK
UTF-8 UTF-8 PC Kanji (SJIS) SJIS
UTF-8 UTF-8 UCS-2 UCS-2
UTF-8 UTF-8 UCS-4 UCS-4
UTF-8 UTF-8 UTF-16 UTF-16
UTF-8 UTF-8 UTF-7 UTF-7
UTF-8 UTF-8 Chinese/PRC EUC zh_CN.euc
(GB 2312-1980)


CODE SET CONVERSIONS SUPPORTED
------------------------------
FROM Code Set TO Code Set
Code FROM Target Code TO
Filename Filename
Element Element

UTF-8 UTF-8 ISO 2022-CN zh_CN.iso2022-7
UTF-8 UTF-8 Chinese/Taiwan Big5 zh_TW-big5
UTF-8 UTF-8 Chinese/Taiwan EUC zh_TW-euc
(CNS 11643-1992)
UTF-8 UTF-8 ISO 2022-TW zh_TW-iso2022-7
Chinese/PRC EUC zh_CN.euc UTF-8 UTF-8
(GB 2312-1980)
ISO 2022-CN zh_CN.iso2022-7 UTF-8 UTF-8
Chinese/Taiwan Big5 zh_TW-big5 UTF-8 UTF-8
Chinese/Taiwan EUC zh_TW-euc UTF-8 UTF-8
(CNS 11643-1992)
ISO 2022-TW zh_TW-iso2022-7 UTF-8 UTF-8


EXAMPLES


Example 1: The library module filename




In the conversion library, /usr/lib/iconv (see iconv(3C)), the library
module filename is composed of two symbolic elements separated by the
percent sign (%). The first symbol specifies the code set that is being
converted; the second symbol specifies the target code, that is, the code
set to which the first one is being converted.


In the conversion table above, the first symbol is termed the "FROM
Filename Element". The second symbol, representing the target code set,
is the "TO Filename Element".


For example, the library module filename to convert from the Korean EUC
code set to the Korean UTF-8 code set is


ko_KR-euc%ko_KR-UTF-8


FILES


/usr/lib/iconv/*.so
conversion modules


SEE ALSO


iconv(1), iconv(3C), iconv(7)


Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM
Development Team, July 1993.


Chon, K., H. Je Park, and U. Choi, Korean Character Encoding for Internet
Messages, RFC 1557, Solvit Chosun Media, December 1993.


Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation Format of
Unicode, RFC 1642, Taligent, Inc., July 1994.


Lee, F., HZ - A Data Format for Exchanging Files of Arbitrarily Mixed
Chinese and ASCII characters, RFC 1843, Stanford University, August 1995.


Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding
for Internet Messages, RFC 1468, Keio University, Panda Programming, June
1993.


Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet
Messages, RFC 1555, Israeli Inter-University, Hebrew University, December
1993.


Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo
Institute of Technology, July 1995.


Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Extension of
ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.


Reynolds, J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of
Southern California/Information Sciences Institute, October 1994.


Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel
Almen Planlaegning, June 1992.


Spinellis, D., Greek Character Encoding for Electronic Mail Messages, RFC
1947, SENA S.A., May 1996.


The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wesley
Developers Press, July 1996.


Wei, Y., Y. Zhang, J. Li, J. Ding, and Y. Jiang, ASCII Printable
Characters-Based Chinese Character Encoding for Internet Messages, RFC
1842, AsiaInfo Services Inc., Harvard University, Rice University,
University of Maryland, August 1995.


Yergeau, F., UTF-8, a transformation format of Unicode and ISO 10646, RFC
2044, Alis Technologies, October 1996.


Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and M. Crispin, Chinese
Character Encoding for Internet Messages, RFC 1922, Tsinghua University,
China Information Technology Standardization Technical Committee (CITS),
Institute for Information Industry (III), University of Washington, March
1996.

NOTES


ISO 8859 character sets using Latin alphabetic characters are
distinguished as follows:

ISO 8859-1 (Latin 1)
For most West European languages, including:


Albanian Finnish Italian
Catalan French Norwegian
Danish German Portuguese
Dutch Galician Spanish
English Irish Swedish
Faeroese Icelandic


ISO 8859-2 (Latin 2)
For most Latin-written Slavic and Central
European languages:


Czech Polish Slovak
German Rumanian Slovene
Hungarian Croatian


ISO 8859-3 (Latin 3)
Popularly used for Esperanto, Galician, Maltese,
and Turkish.


ISO 8859-4 (Latin 4)
Introduces letters for Estonian, Latvian, and
Lithuanian. It is an incomplete predecessor of
ISO 8859-10 (Latin 6).


ISO 8859-9 (Latin 5)
Replaces the rarely needed Icelandic letters in
ISO 8859-1 (Latin 1) with the Turkish ones.


ISO 8859-10 (Latin 6)
Adds the last Inuit (Greenlandic) and Sami
(Lappish) letters that were not included in ISO
8859-4 (Latin 4) to complete coverage of the
Nordic area.


illumos April 18, 1997 ICONV_UNICODE(7)