Previous: Single-Byte Character Support, Up: International


27.14 Charsets

Emacs groups all supported characters into disjoint charsets. Each character code belongs to one and only one charset. For historical reasons, Emacs typically divides an 8-bit character code for an extended version of ASCII into two charsets: ASCII, which covers the codes 0 through 127, plus another charset which covers the “right-hand part” (the codes 128 and up). For instance, the characters of Latin-1 include the Emacs charset ascii plus the Emacs charset latin-iso8859-1.

Emacs characters belonging to different charsets may look the same, but they are still different characters. For example, the letter ‘o’ with acute accent in charset latin-iso8859-1, used for Latin-1, is different from the letter ‘o’ with acute accent in charset latin-iso8859-2, used for Latin-2.

There are two commands for obtaining information about Emacs charsets. The command M-x list-charset-chars prompts for a name of a character set, and displays all the characters in that character set. The command M-x describe-character-set prompts for a charset name and displays information about that charset, including its internal representation within Emacs.

To find out which charset a character in the buffer belongs to, put point before it and type C-u C-x =.