Specify Coding - GNU Emacs Manual

Next: Fontsets, Previous: Recognize Coding, Up: International

27.9 Specifying a Coding System

In cases where Emacs does not automatically choose the right coding system, you can use these commands to specify one:

C-x <RET> f coding <RET>: Use coding system coding for saving or revisiting the visited file in the current buffer.
C-x <RET> c coding <RET>: Specify coding system coding for the immediately following command.
C-x <RET> r coding <RET>: Revisit the current file using the coding system coding.
C-x <RET> k coding <RET>: Use coding system coding for keyboard input.
C-x <RET> t coding <RET>: Use coding system coding for terminal output.
C-x <RET> p input-coding <RET> output-coding <RET>: Use coding systems input-coding and output-coding for subprocess input and output in the current buffer.
C-x <RET> x coding <RET>: Use coding system coding for transferring selections to and from other programs through the window system.
C-x <RET> F coding <RET>: Use coding system coding for encoding and decoding file names. This affects the use of non-ASCII characters in file names. It has no effect on reading and writing the contents of files.
C-x <RET> X coding <RET>: Use coding system coding for transferring one selection—the next one—to or from the window system.
M-x recode-region: Convert the region from a previous coding system to a new one.

The command C-x <RET> f (set-buffer-file-coding-system) sets the file coding system for the current buffer—in other words, it says which coding system to use when saving or reverting the visited file. You specify which coding system using the minibuffer. If you specify a coding system that cannot handle all of the characters in the buffer, Emacs warns you about the troublesome characters when you actually save the buffer.

Another way to specify the coding system for a file is when you visit the file. First use the command C-x <RET> c (universal-coding-system-argument); this command uses the minibuffer to read a coding system name. After you exit the minibuffer, the specified coding system is used for the immediately following command.

So if the immediately following command is C-x C-f, for example, it reads the file using that coding system (and records the coding system for when you later save the file). Or if the immediately following command is C-x C-w, it writes the file using that coding system. When you specify the coding system for saving in this way, instead of with C-x <RET> f, there is no warning if the buffer contains characters that the coding system cannot handle.

Other file commands affected by a specified coding system include C-x C-i and C-x C-v, as well as the other-window variants of C-x C-f. C-x <RET> c also affects commands that start subprocesses, including M-x shell (see Shell).

If the immediately following command does not use the coding system, then C-x <RET> c ultimately has no effect.

An easy way to visit a file with no conversion is with the M-x find-file-literally command. See Visiting.

The variable default-buffer-file-coding-system specifies the choice of coding system to use when you create a new file. It applies when you find a new file, and when you create a buffer and then save it in a file. Selecting a language environment typically sets this variable to a good choice of default coding system for that language environment.

If you visit a file with a wrong coding system, you can correct this with C-x <RET> r (revert-buffer-with-coding-system). This visits the current file again, using a coding system you specify.

The command C-x <RET> t (set-terminal-coding-system) specifies the coding system for terminal output. If you specify a character code for terminal output, all characters output to the terminal are translated into that coding system.

This feature is useful for certain character-only terminals built to support specific languages or character sets—for example, European terminals that support one of the ISO Latin character sets. You need to specify the terminal coding system when using multibyte text, so that Emacs knows which characters the terminal can actually handle.

By default, output to the terminal is not translated at all, unless Emacs can deduce the proper coding system from your terminal type or your locale specification (see Language Environments).

The command C-x <RET> k (set-keyboard-coding-system) or the variable keyboard-coding-system specifies the coding system for keyboard input. Character-code translation of keyboard input is useful for terminals with keys that send non-ASCII graphic characters—for example, some terminals designed for ISO Latin-1 or subsets of it.

By default, keyboard input is translated based on your system locale setting. If your terminal does not really support the encoding implied by your locale (for example, if you find it inserts a non-ASCII character if you type M-i), you will need to set keyboard-coding-system to nil to turn off encoding. You can do this by putting

     (set-keyboard-coding-system nil)

in your ~/.emacs file.

There is a similarity between using a coding system translation for keyboard input, and using an input method: both define sequences of keyboard input that translate into single characters. However, input methods are designed to be convenient for interactive use by humans, and the sequences that are translated are typically sequences of ASCII printing characters. Coding systems typically translate sequences of non-graphic characters.

The command C-x <RET> x (set-selection-coding-system) specifies the coding system for sending selected text to the window system, and for receiving the text of selections made in other applications. This command applies to all subsequent selections, until you override it by using the command again. The command C-x <RET> X (set-next-selection-coding-system) specifies the coding system for the next selection made in Emacs or read by Emacs.

The command C-x <RET> p (set-buffer-process-coding-system) specifies the coding system for input and output to a subprocess. This command applies to the current buffer; normally, each subprocess has its own buffer, and thus you can use this command to specify translation to and from a particular subprocess by giving the command in the corresponding buffer.

The default for translation of process input and output depends on the current language environment.

If a piece of text has already been inserted into a buffer using the wrong coding system, you can decode it again using M-x recode-region. This prompts you for the old coding system and the desired coding system, and acts on the text in the region.

The variable file-name-coding-system specifies a coding system to use for encoding file names. If you set the variable to a coding system name (as a Lisp symbol or a string), Emacs encodes file names using that coding system for all file operations. This makes it possible to use non-ASCII characters in file names—or, at least, those non-ASCII characters which the specified coding system can encode. Use C-x <RET> F (set-file-name-coding-system) to specify this interactively.

If file-name-coding-system is nil, Emacs uses a default coding system determined by the selected language environment. In the default language environment, any non-ASCII characters in file names are not encoded specially; they appear in the file system using the internal Emacs representation.

Warning: if you change file-name-coding-system (or the language environment) in the middle of an Emacs session, problems can result if you have already visited files whose names were encoded using the earlier coding system and cannot be encoded (or are encoded differently) under the new coding system. If you try to save one of these buffers under the visited file name, saving may use the wrong file name, or it may get an error. If such a problem happens, use C-x C-w to specify a new file name for that buffer.

If a mistake occurs when encoding a file name, use the command M-x recode-file-name to change the file name's coding system. This prompts for an existing file name, its old coding system, and the coding system to which you wish to convert.

The variable locale-coding-system specifies a coding system to use when encoding and decoding system strings such as system error messages and format-time-string formats and time stamps. That coding system is also used for decoding non-ASCII keyboard input on X Window systems. You should choose a coding system that is compatible with the underlying system's text representation, which is normally specified by one of the environment variables LC_ALL, LC_CTYPE, and LANG. (The first one, in the order specified above, whose value is nonempty is the one that determines the text representation.)