16 Jul 2008

Working with Diacritics in Emacs

One of the things that’s frustrated me for a while in Emacs is working with diacritics (accented characters) and other international text. Although as a basically monolingual English-speaker I do most of my writing well within the low-ASCII range, every once in a while I find it necessary to reproduce an accented word or string of international text.

Although typing accented characters (and other Latin-1 symbols) is very easy on a Mac in a native editor like TextMate, I’d never spent the time to figure out how to do it in Emacs. However, since Emacs is sort of the least-common-denominator editor, I decided it would be worth figuring out; unlike OS-specific dead-key methods, the Emacs way should work anyplace Emacs is installed. (And I use Emacs regularly on Mac OS X, Windows, Linux, and NetBSD – although the latter two are usually only through SSH sessions.)

Anyway, actually entering accented characters and other basic non-ASCII characters is the easy part. The easiest way is to turn on ‘iso-accents-mode’ within Emacs, and then let it convert character sequences (like “-a for ä) to their Latin-1 equivalent.

The trickier part was getting them to display correctly. The first time I tried using iso-accents-mode, the non-ASCII characters were just displayed as question-mark (?) characters. I quickly traced this to a problem in Emacs, rather than in my terminal (by saving the file and then displaying it with cat, which showed the characters properly), and then with a little more research, to an issue with the “terminal-encoding” parameter.

Basically, Emacs’s “terminal encoding” controls what character set Emacs uses when displaying text (sending it to the terminal device that you’re using to interact with it). It’s distinct from the character set that the file is actually being interpreted using, and also possibly separate from the character set that’s used to interpret keyboard input.

Since I have a UTF-8 terminal (set using the “Window Settings” window, under the Terminal menu, in OS X’s Terminal.app), I set Emacs to use UTF-8 as its terminal encoding by adding the following to my .emacs file:

(set-terminal-coding-system 'utf-8)

With this done (both locally and on the remote systems I SSH into), I was able to see all the non-ASCII characters properly. In fact, not only were Latin-1 characters correctly displayed, but Unicode smartquotes and symbols were also correctly displayed for the first time.

The only issue I anticipate with this is that, when I do connect from a non-UTF-8 terminal (like Cygwin’s Win32 version of rxvt), I’m probably going to get garbage instead of Unicode. However, that’s not really the fault of Emacs, and it’s always possible to temporarily change the terminal encoding back to ASCII if necessary. I just want UTF-8 to be the default.

References:

This entry was converted from an older version of the site; if desired, it can be viewed in its original format.