ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. The standard is divided into numbered, separately published parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc., each of which may be informally referred to as a standard in and of itself. There are currently 15 parts.
| Table of contents |
While the 96 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use the Roman alphabet need additional symbols not covered by ASCII, such as ß (German), å (Swedish and other Nordic languages). ISO 8859 sought to remedy this problem by utilizing the eighth bit, which was unused in ASCII, to allow positions for another 128 characters. However, more characters were needed to achieve this than could fit in a single 8-bit character encoding, so several mappings were developed, including at least 10 just to cover the Latin script.
The ISO 8859-n standard is not the same as the well-known ISO-8859-n character encodings approved by the IANA for use on the Internet. Besides the extra hyphen being present in the IANA-approved name, the encodings differ in that each part of the ISO standard assigns, at most, 191 characters to the byte ranges 32 to 126 and 160 to 255, whereas the corresponding IANA-approved character encoding merges these mappings with the C0 control set (control characters mapped to bytes 0 to 31) and the C1 control set (control characters mapped to bytes 127 to 159), resulting in a full 8-bit character map with most, if not all, bytes assigned.
The ISO 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO 8859 standards, or use Unicode instead.
As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it didn't get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks “ and ” used for English and some other languages. French didn't get its œ and Œ ligatures because French speakers had not previously needed them enough to demand them on their keyboards; nor did it get Ÿ, because this character is only used in French in all caps text. These characters were, however, included later with ISO 8859-15, which also introduced the new Euro character €. Likewise Dutch did not get the 'ij' and 'IJ' letters, because Dutch speakers had gotten used to typing these as two letters instead. Romanian did not initially get its 'Ș/ș' and 'Ț/ț' letters, because these letters were initially unified with 'Ş/ş' and 'Ţ/ţ' by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO 8859-16.
Most of the ISO 8859 encodings provide diacritic marks required for various European languages. Others provide non-Roman alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. However, the standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions either; Japanese syllabic Kana scripts, on the other hand, might, but like several other alphabets of the world isn't encoded.
ISO 8859 is divided into the following parts:
¹: only the IJ/ij (Dutch Y) is missing, which can be represented as IJ
²: missing characters are in ISO 8859-15
Each part of ISO 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all its seven special chars at the same positions in all Latin variants (1-4, 9-10, 13-16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1-4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.
| Binary | Oct | Dec | Hex | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 13 | 14 | 15 | 16 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10100000 | 240 | 160 | A0 | NBSP | |||||||||||||||
| 10100001 | 241 | 161 | A1 | ¡ | Ą | Ħ | Ą | Ё | ʽ | ¡ | Ą | ก | " | Ḃ | ¡ | Ą | |||
| 10100010 | 242 | 162 | A2 | ¢ | ˘ | ˘ | ĸ | Ђ | ʼ | ¢ | ¢ | Ē | ข | ¢ | ḃ | ¢ | ą | ||
| 10100011 | 243 | 163 | A3 | £ | Ł | £ | Ŗ | Ѓ | £ | £ | £ | Ģ | ฃ | £ | £ | £ | Ł | ||
| 10100100 | 244 | 164 | A4 | ¤ | ¤ | ¤ | ¤ | Є | ¤ | € | ¤ | ¤ | Ī | ค | ¤ | Ċ | € | € | |
| 10100101 | 245 | 165 | A5 | ¥ | Ľ | Ĩ | Ѕ | ₯ | ¥ | ¥ | Ĩ | ฅ | " | ċ | ¥ | " | |||
| 10100110 | 246 | 166 | A6 | ¦ | Ś | Ĥ | Ļ | І | ¦ | ¦ | ¦ | Ķ | ฆ | ¦ | Ḋ | Š | Š | ||
| 10100111 | 247 | 167 | A7 | § | § | § | § | Ї | § | § | § | § | ง | § | § | § | § | ||
| 10101000 | 250 | 168 | A8 | ¨ | ¨ | ¨ | ¨ | Ј | ¨ | ¨ | ¨ | Ļ | จ | Ø | Ẁ | š | š | ||
| 10101001 | 251 | 169 | A9 | © | Š | İ | Š | Љ | © | © | © | Đ | ฉ | © | © | © | © | ||
| 10101010 | 252 | 170 | AA | ª | Ş | Ş | Ē | Њ | × | ª | Š | ช | Ŗ | Ẃ | ª | Ș | |||
| 10101011 | 253 | 171 | AB | « | Ť | Ğ | Ģ | Ћ | « | « | « | Ŧ | ซ | « | ḋ | « | « | ||
| 10101100 | 254 | 172 | AC | ¬ | Ź | Ĵ | Ŧ | Ќ | ، | ¬ | ¬ | ¬ | Ž | ฌ | ¬ | Ỳ | ¬ | Ź | |
| 10101101 | 255 | 173 | AD | | | | | | | | | | | ญ | | | | | |
| 10101110 | 256 | 174 | AE | ® | Ž | Ž | Ў | ® | ® | Ū | ฎ | ® | ® | ® | ź | ||||
| 10101111 | 257 | 175 | AF | ¯ | Ż | Ż | ¯ | Џ | ― | ‾ | ¯ | Ŋ | ฏ | Æ | Ÿ | ¯ | Ż | ||
| 10110000 | 260 | 176 | B0 | ° | ° | ° | ° | А | ° | ° | ° | ° | ฐ | ° | Ḟ | ° | ° | ||
| 10110001 | 261 | 177 | B1 | ± | ą | ħ | ą | Б | ± | ± | ± | ą | ฑ | ± | ḟ | ± | ± | ||
| 10110010 | 262 | 178 | B2 | ² | ˛ | ² | ˛ | В | ² | ² | ² | ē | ฒ | ² | Ġ | ² | Č | ||
| 10110011 | 263 | 179 | B3 | ³ | ł | ³ | ŗ | Г | ³ | ³ | ³ | ģ | ณ | ³ | ġ | ³ | ł | ||
| 10110100 | 264 | 180 | B4 | ´ | ´ | ´ | ´ | Д | ΄ | ´ | ´ | ī | ด | " | Ṁ | Ž | Ž | ||
| 10110101 | 265 | 181 | B5 | µ | ľ | µ | ĩ | Е | ΅ | µ | µ | ĩ | ต | µ | ṁ | µ | " | ||
| 10110110 | 266 | 182 | B6 | ¶ | ś | ĥ | ļ | Ж | Ά | ¶ | ¶ | ķ | ถ | ¶ | ¶ | ¶ | ¶ | ||
| 10110111 | 267 | 183 | B7 | · | ˇ | · | ˇ | З | · | · | · | · | ท | · | Ṗ | · | · | ||
| 10111000 | 270 | 184 | B8 | ¸ | ¸ | ¸ | ¸ | И | Έ | ¸ | ¸ | ļ | ธ | ø | ẁ | ž | ž | ||
| 10111001 | 271 | 185 | B9 | ¹ | š | ı | š | Й | Ή | ¹ | ¹ | đ | น | ¹ | ṗ | ¹ | č | ||
| 10111010 | 272 | 186 | BA | º | ş | ş | ē | К | Ί | ÷ | º | š | บ | ŗ | ẃ | º | ș | ||
| 10111011 | 273 | 187 | BB | » | ť | ğ | ģ | Л | ؛ | » | » | » | ŧ | ป | » | Ṡ | » | » | |
| 10111100 | 274 | 188 | BC | ¼ | ź | ĵ | ŧ | М | Ό | ¼ | ¼ | ž | ผ | ¼ | ỳ | Œ | Œ | ||
| 10111101 | 275 | 189 | BD | ½ | ˝ | ½ | Ŋ | Н | ½ | ½ | ½ | ― | ฝ | ½ | Ẅ | œ | œ | ||
| 10111110 | 276 | 190 | BE | ¾ | ž | ž | О | Ύ | ¾ | ¾ | ū | พ | ¾ | ẅ | Ÿ | Ÿ | |||
| 10111111 | 277 | 191 | BF | ¿ | ż | ż | ŋ | П | ؟ | Ώ | ¿ | ŋ | ฟ | æ | ṡ | ¿ | ż | ||
| 11000000 | 300 | 192 | C0 | À | Ŕ | À | Ā | Р | ΐ | À | Ā | ภ | Ą | À | À | À | |||
| 11000001 | 301 | 193 | C1 | Á | Á | Á | Á | С | ء | Α | Á | Á | ม | Į | Á | Á | Á | ||
| 11000010 | 302 | 194 | C2 | Â | Â | Â | Â | Т | آ | Β | Â | Â | ย | Ā | Â | Â | Â | ||
| 11000011 | 303 | 195 | C3 | Ã | Ă | Ã | У | أ | Γ | Ã | Ã | ร | Ć | Ã | Ã | Ă | |||
| 11000100 | 304 | 196 | C4 | Ä | Ä | Ä | Ä | Ф | ؤ | Δ | Ä | Ä | ฤ | Ä | Ä | Ä | Ä | ||
| 11000101 | 305 | 197 | C5 | Å | Ĺ | Ċ | Å | Х | إ | Ε | Å | Å | ล | Å | Å | Å | Ć | ||
| 11000110 | 306 | 198 | C6 | Æ | Ć | Ĉ | Æ | Ц | ئ | Ζ | Æ | Æ | ฦ | Ę | Æ | Æ | Æ | ||
| 11000111 | 307 | 199 | C7 | Ç | Ç | Ç | Į | Ч | ا | Η | Ç | Į | ว | Ē | Ç | Ç | Ç | ||
| 11001000 | 310 | 200 | C8 | È | Č | È | Č | Ш | ب | Θ | È | Č | ศ | Č | È | È | È | ||
| 11001001 | 311 | 201 | C9 | É | É | É | É | Щ | ة | Ι | É | É | ษ | É | É | É | É | ||
| 11001010 | 312 | 202 | CA | Ê | Ę | Ê | Ę | Ъ | ت | Κ | Ê | Ę | ส | Ź | Ê | Ê | Ê | ||
| 11001011 | 313 | 203 | CB | Ë | Ë | Ë | Ë | Ы | ث | Λ | Ë | Ë | ห | Ė | Ë | Ë | Ë | ||
| 11001100 | 314 | 204 | CC | Ì | Ě | Ì | Ė | Ь | ج | Μ | Ì | Ė | ฬ | Ģ | Ì | Ì | Ì | ||
| 11001101 | 315 | 205 | CD | Í | Í | Í | Í | Э | ح | Ν | Í | Í | อ | Ķ | Í | Í | Í | ||
| 11001110 | 316 | 206 | CE | Î | Î | Î | Î | Ю | خ | Ξ | Î | Î | ฮ | Ī | Î | Î | Î | ||
| 11001111 | 317 | 207 | CF | Ï | Ď | Ï | Ī | Я | د | Ο | Ï | Ï | ฯ | Ļ | Ï | Ï | Ï | ||
| 11010000 | 320 | 208 | D0 | Ð | Đ | Đ | а | ذ | Π | Ğ | Ð | ะ | Š | Ŵ | Ð | Ð | |||
| 11010001 | 321 | 209 | D1 | Ñ | Ń | Ñ | Ņ | б | ر | Ρ | Ñ | Ņ | ั | Ń | Ñ | Ñ | Ń | ||
| 11010010 | 322 | 210 | D2 | Ò | Ň | Ò | Ō | в | ز | Ò | Ō | า | Ņ | Ò | Ò | Ò | |||
| 11010011 | 323 | 211 | D3 | Ó | Ó | Ó | Ķ | г | س | Σ | Ó | Ó | ำ | Ó | Ó | Ó | Ó | ||
| 11010100 | 324 | 212 | D4 | Ô | Ô | Ô | Ô | д | ش | Τ | Ô | Ô | ิ | Ō | Ô | Ô | Ô | ||
| 11010101 | 325 | 213 | D5 | Õ | Ő | Ġ | Õ | е | ص | Υ | Õ | Õ | ี | Õ | Õ | Õ | Ő | ||
| 11010110 | 326 | 214 | D6 | Ö | Ö | Ö | Ö | ж | ض | Φ | Ö | Ö | ึ | Ö | Ö | Ö | Ö | ||
| 11010111 | 327 | 215 | D7 | × | × | × | × | з | ط | Χ | × | Ũ | ื | × | Ṫ | × | Ś | ||
| 11011000 | 330 | 216 | D8 | Ø | Ř | Ĝ | Ø | и | ظ | Ψ | Ø | Ø | ุ | Ų | Ø | Ø | Ű | ||
| 11011001 | 331 | 217 | D9 | Ù | Ů | Ù | Ų | й | ع | Ω | Ù | Ų | ู | Ł | Ù | Ù | Ù | ||
| 11011010 | 332 | 218 | DA | Ú | Ú | Ú | Ú | к | غ | Ϊ | Ú | Ú | ฺ | Ś | Ú | Ú | Ú | ||
| 11011011 | 333 | 219 | DB | Û | Ű | Û | Û | л | Ϋ | Û | Û | Ū | Û | Û | Û | ||||
| 11011100 | 334 | 220 | DC | Ü | Ü | Ü | Ü | м | ά | Ü | Ü | Ü | Ü | Ü | Ü | ||||
| 11011101 | 335 | 221 | DD | Ý | Ý | Ŭ | Ũ | н | έ | İ | Ý | Ż | Ý | Ý | Ę | ||||
| 11011110 | 336 | 222 | DE | Þ | Ţ | Ŝ | Ū | о | ή | Ş | Þ | Ž | Ŷ | Þ | Ț | ||||
| 11011111 | 337 | 223 | DF | ß | ß | ß | ß | п | ί | ‗ | ß | ß | ฿ | ß | ß | ß | ß | ||
| 11100000 | 340 | 224 | E0 | à | ŕ | à | ā | р | ـ | ΰ | א | à | ā | เ | ą | à | à | à | |
| 11100001 | 341 | 225 | E1 | á | á | á | á | с | ف | α | ב | á | á | แ | į | á | á | á | |
| 11100010 | 342 | 226 | E2 | â | â | â | â | т | ق | β | ג | â | â | โ | ā | â | â | â | |
| 11100011 | 343 | 227 | E3 | ã | ă | ã | у | ك | γ | ד | ã | ã | ใ | ć | ã | ã | ă | ||
| 11100100 | 344 | 228 | E4 | ä | ä | ä | ä | ф | ل | δ | ה | ä | ä | ไ | ä | ä | ä | ä | |
| 11100101 | 345 | 229 | E5 | å | ĺ | ċ | å | х | م | ε | ו | å | å | ๅ | å | å | å | ć | |
| 11100110 | 346 | 230 | E6 | æ | ć | ĉ | æ | ц | ن | ζ | ז | æ | æ | ๆ | ę | æ | æ | æ | |
| 11100111 | 347 | 231 | E7 | ç | ç | ç | į | ч | ه | η | ח | ç | į | ็ | ē | ç | ç | ç | |
| 11101000 | 350 | 232 | E8 | è | č | è | č | ш | و | θ | ט | è | č | ่ | č | è | è | è | |
| 11101001 | 351 | 233 | E9 | é | é | é | é | щ | ى | ι | י | é | é | ้ | é | é | é | é | |
| 11101010 | 352 | 234 | EA | ê | ę | ê | ę | ъ | ي | κ | ך | ê | ę | ๊ | ź | ê | ê | ê | |
| 11101011 | 353 | 235 | EB | ë | ë | ë | ë | ы | ً | λ | כ | ë | ë | ๋ | ė | ë | ë | ë | |
| 11101100 | 354 | 236 | EC | ì | ě | ì | ė | ь | ٌ | μ | ל | ì | ė | ์ | ģ | ì | ì | ì | |
| 11101101 | 355 | 237 | ED | í | í | í | í | э | ٍ | ν | ם | í | í | ํ | ķ | í | í | í | |
| 11101110 | 356 | 238 | EE | î | î | î | î | ю | َ | ξ | מ | î | î | ๎ | ī | î | î | î | |
| 11101111 | 357 | 239 | EF | ï | ď | ï | ī | я | ُ | ο | ן | ï | ï | ๏ | ļ | ï | ï | ï | |
| 11110000 | 360 | 240 | F0 | ð | đ | đ | № | ِ | π | נ | ğ | ð | ๐ | š | ŵ | ð | đ | ||
| 11110001 | 361 | 241 | F1 | ñ | ń | ñ | ņ | ё | ّ | ρ | ס | ñ | ņ | ๑ | ń | ñ | ñ | ń | |
| 11110010 | 362 | 242 | F2 | ò | ň | ò | ō | ђ | ْ | ς | ע | ò | ō | ๒ | ņ | ò | ò | ò | |
| 11110011 | 363 | 243 | F3 | ó | ó | ó | ķ | ѓ | σ | ף | ó | ó | ๓ | ó | ó | ó | ó | ||
| 11110100 | 364 | 244 | F4 | ô | ô | ô | ô | є | τ | פ | ô | ô | ๔ | ō | ô | ô | ô | ||
| 11110101 | 365 | 245 | F5 | õ | ő | ġ | õ | ѕ | υ | ץ | õ | õ | ๕ | õ | õ | õ | ő | ||
| 11110110 | 366 | 246 | F6 | ö | ö | ö | ö | і | φ | צ | ö | ö | ๖ | ö | ö | ö | ö | ||
| 11110111 | 367 | 247 | F7 | ÷ | ÷ | ÷ | ÷ | ї | χ | ק | ÷ | ũ | ๗ | ÷ | ṫ | ÷ | ś | ||
| 11111000 | 370 | 248 | F8 | ø | ř | ĝ | ø | ј | ψ | ר | ø | ø | ๘ | ų | ø | ø | ű | ||
| 11111001 | 371 | 249 | F9 | ù | ů | ù | ų | љ | ω | ש | ù | ų | ๙ | ł | ù | ù | ù | ||
| 11111010 | 372 | 250 | FA | ú | ú | ú | ú | њ | ϊ | ת | ú | ú | ๚ | ś | ú | ú | ú | ||
| 11111011 | 373 | 251 | FB | û | ű | û | û | ћ | ϋ | û | û | ๛ | ū | û | û | û | |||
| 11111100 | 374 | 252 | FC | ü | ü | ü | ü | ќ | ό | ü | ü | ü | ü | ü | ü | ||||
| 11111101 | 375 | 253 | FD | ý | ý | ŭ | ũ | § | ύ | ı | ý | ż | ý | ý | ę | ||||
| 11111110 | 376 | 254 | FE | þ | ţ | ŝ | ū | ў | ώ | ş | þ | ž | ŷ | þ | ț | ||||
| 11111111 | 377 | 255 | FF | ÿ | ˙ | ˙ | ˙ | џ | ÿ | ĸ | ’ | ÿ | ÿ | ÿ | |||||
At position 0xA0 there's always the non breaking space and 0xAD is mostly the soft hyphen, which only shows at line breaks. Other empty fields are either unassigned or the system used isn't able to display them.
Since 1991, the Unicode Consortium has been working with ISO to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. This pair of standards was created to unify the ISO 8859 character repertoire, among others, by assigning each character, initially, to a 16-bit code value, with some code values left unassigned. Over time, their models adapted to map characters to abstract numeric code points rather than fixed bit-width values, so that more code points and encoding methods could be supported.
Unicode and ISO/IEC 10646 currently assign about 100,000 characters to a code space consisting of over a million code points, and they define several standard encodings that are capable of representing every available code point. The standard encodings of Unicode and the UCS use sequences of one to four 8-bit code values (UTF-8), sequences of one or two 16-bit code values (UTF-16), or one 32-bit code value (UTF-32 or UCS-4). There is also an older encoding that uses one 16-bit code value (UCS-2), capable of representing one-seventeenth of the available code points. Of these encoding forms, only UTF-8's byte sequences are in a fixed order; the others are subject to platform-dependent byte ordering issues that may be addressed via special codes or indicated via out-of-band means.
Newer editions of ISO 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO-8859-1.
ISO 8859 was favored throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms.
As the relative cost, in computing resources, of using more than one byte per character began to diminish, programming languages and operating systems added native support for Unicode alongside their system of code pages. As Unicode-enabled operating systems became more widespread, ISO 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from the simpler encodings, when necessary.
The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining Working Group, WG 2, is concentrating on development of ISO/IEC 10646.