Next: , Previous: Mac Input, Up: Mac OS


F.2 International Character Set Support on Mac

Mac uses non-standard encodings for the upper 128 single-byte characters. They also deviate from the ISO 2022 standard by using character codes in the range 128-159. The coding systems mac-roman, mac-centraleurroman, and mac-cyrillic are used to represent these Mac encodings.

The fontset fontset-mac is created automatically when Emacs is run on Mac, and used by default. It displays as many kinds of characters as possible using 12-point Monaco as a base font. If you see some character as a hollow box with this fontset, then it's almost impossible to display it only by customizing font settings (see Mac Font Specs).

You can use input methods provided either by LEIM (see Input Methods) or Mac OS to enter international characters. To use the former, see the International Character Set Support section of the manual (see International).

Emacs on Mac OS automatically changes the value of keyboard-coding-system according to the current keyboard layout. So users don't need to set it manually, and even if set, it will be changed when the keyboard layout change is detected next time.

The Mac clipboard and the Emacs kill ring (see Killing) are synchronized by default: you can yank a piece of text and paste it into another Mac application, or cut or copy one in another Mac application and yank it into a Emacs buffer. This feature can be disabled by setting x-select-enable-clipboard to nil. One can still do copy and paste with another application from the Edit menu.

On Mac, the role of the coding system for selection that is set by set-selection-coding-system (see Specify Coding) is two-fold. First, it is used as a preferred coding system for the traditional text flavor that does not specify any particular encodings and is mainly used by applications on Mac OS Classic. Second, it specifies the intermediate encoding for the UTF-16 text flavor that is mainly used by applications on Mac OS X.

When pasting UTF-16 text data from the clipboard, it is first converted to the encoding specified by the selection coding system using the converter in the Mac OS system, and then decoded into the Emacs internal encoding using the converter in Emacs. If the first conversion failed, then the UTF-16 data is converted similarly but via UTF-8. Copying UTF-16 text to the clipboard goes through the inverse path. The reason for this two-pass decoding is to avoid subtle differences in Unicode mappings between the Mac OS system and Emacs such as various kinds of hyphens, to deal with UTF-16 data in native byte order with no byte order mark, and to minimize users' customization. For example, users that mainly use Latin characters would prefer Greek characters to be decoded into the mule-unicode-0100-24ff charset, but Japanese users would prefer them to be decoded into the japanese-jisx0208 charset. Since the coding system for selection is automatically set according to the system locale setting, users usually don't have to set it manually.

The default language environment (see Language Environments) is set according to the locale setting at the startup time. On Mac OS, the locale setting is consulted in the following order:

  1. Environment variables LC_ALL, LC_CTYPE and LANG as in other systems.
  2. Preference AppleLocale that is set by default on Mac OS X 10.3 and later.
  3. Preference AppleLanguages that is set by default on Mac OS X 10.1 and later.
  4. Variable mac-system-locale that is derived from the system language and region codes. This variable is available on all supported Mac OS versions including Mac OS Classic.

The default values of almost all variables about coding systems are also set according to the language environment. So usually you don't have to customize these variables manually.