Accents and diacriticals in Xorg -------------------------------- Revision: 2 This note deals specifically with current xorg (7.7) and UTF-8 locales. Much of it is relevant to previous versions of xorg-7, but a few file locations have changed slightly. It also applies to at least as far back as a late version of xorg-7.5 with xkeyboard-config-1.7 and libX11-1.3.3 from July 2010. In some distros, and for some languages, dead keys are provided in the keymaps - both for the console and for xorg. Those are an aggravation - e.g. on my ubuntu installation, the double-quote is a dead diaeresis, and the single quote (') is a dead acute. That makes typing english-like text (e.g. when coding) _awkward_ because you have to type an extra space after the dead key, or else you will end up with things like acute accents on letters instead of single quotes in front of the letter. For recent versions of xorg, I discovered that in my setup (with a UTF-8 locale) I can get better dead-keys (using AltGr) *free*. To be clear - this does not interfere with, or slow down, my use of the normal key symbols, so it doesn't impact coding. But it does let me key in most of the current accents used in European languages. Actually, now that I've fixed the belowcomma with xmodmap and an environment variable (see below, for those who wish to type in romanian) I do believe I can type all the accents for current European languages. This is particularly useful if you want to type in multiple languages - in my own case, it allows me to spell place names correctly when I'm making notes about photos. Throughout this document I assume you have installed xorg in /usr. Those who put it somewhere else will have to remember that their own files are in different places. When I say this is free, my 11-keyboard.conf only contains Section "InputClass" Identifier "keyboard-all" Driver "evdev" Option "XkbLayout" "gb" Option "XkbModel" "evdev" Option "XkbOptions" "ctrl_alt_bksp" MatchIsKeyboard "on" EndSection which AFAICS is a fairly basic way of spefifying a British keyboard and allowing startx to be killed by Ctrl-Alt-Bksp. Files to care about : ------------------- The predefined lists of what combinations you can type are defined in Compose files in /usr/share/X11/locale. This is NOT just a matter of which compose-key definitions are available. Those files are listed in Compose.dir. For UTF-8, almost all locales use en_US.UTF-8/Compose - the exceptions are finnish (fi_FI) which has its own Compose file, the two greek locales for greece and cyprus which use the greek Compose file, and pt_BR. It is possible that those allow different combinations. The available key symbols that xorg recognises are defined in /usr/include/X11/keysymdef.h (all are prefixed XK_). The xkeyboard-config files are in /usr/share/X11/xkb. I shall refer to this directory as $XKB. Keyboard definitions (keymaps in my terminology) are defined in $XKB/symbols. These are made up from a layout and a variant. The list of available layouts ("countries") and variants is in $XKB/rules/xorg.lst [ in current xorg, this is a symlink ]. My own keyboards are British ('gb') and seem to acquire the key definitions from the en_US.UTF-8/Compose file in /usr/share/X11/local. At first I assumed these keys were (mostly) common, so that AltGr with ';' would always be a dead acute. But when I looked a bit deeper, I discovered there is _no_ consistency about how these dead keys using AltGr are mapped, and after looking at the keyboard definitions I was even more confused. Now that I've spent more time on this, I can point to where things come from, and how to see what options you have. What confused my was that I didn't, at first, realise that symbol files can include other files from the same directory. For example the 'latin' file contains various definitions - see the norwegian (no) file for examples of how these are loaded. Also note that anything using the _basic_ symbols from 'latin' gets a lot of extra symbols such as paragraph, registered, and extra letters such as ae, eng, eth, and my favourite kra. Of course, some maps which load this may then override some of these key definitions. The definitive list of what keymaps are available is in $XKB/rules/ in xorg.lst [ in my case, that is a symlink to base.lst ] - after the model and layout lists is the 'variant' list - ordered by "country" name, with the available variants. The order starts with 'us', and continues with the english equivalent of the name, e.g. ch is ordered as switzerland, gb is ordered as the uk). References to dead keys in the variant names usually mean real dead keys, NOT AltGr dead keys). This file also lists the available options - some of these might be useful to you, e.g. if you need to type non-breaking spaces, or want to use the defined methods of switching keymaps. To temporarily load a new set of descriptions you can use setxkbmap: $setxkbmap [ args ] [ layout [ variant [ option ... ] ] ] e.g. setxkbmap us altgr-intl You can then examine what dead keys are available by using xmodmap -pke | grep dead_ If a dead key is shown in the first four columns, it really is a dead key (hit it twice for the symbol on its own), what you are looking for is dead_ prefixes in the 5th and 6th columns - the 1st and 2nd columns are plain and with Shift, I've no idea about the next two, the 5th and sixth are plain-with-AltGr and shift-with-AltGr, and again no idea about the last two columns. Depending on what languages you want to use, that might be all you wish to know. OTOH, you might wish to use extra diacriticals or other symbols just because you can! People who need to set up their own keyboard definitions, e.g. for other languages, should look at the references in 'See also:'. Mostly, extra changes are small : I have a script which sets up multiple keyboards (gb, phonetic russian, monotonic greek) for icewm (I got confused by the caps-lock LED being lit for both russian and greek, and I believe at one time it did not light at all when switching keyboards, so I repaint the screen and activate different taskbar icons). This already enables a few things using [xmodmap] -e "keycode ..." lines e.g. for low double quote I add the following to all three maps: -e "keycode 11 = 2 quotedbl 2 quotedbl twosuperior U201e" \ and for the russian map I add a few more cyrillic glyphs such as -e "keycode 26 = Cyrillic_ie Cyrillic_IE Cyrillic_ie Cyrillic_IE Ukrainian_ie Ukrainian_IE" \ But you can also add things to ~/.Xmodmap as long as you then load it with xmodmap ~/.Xmodmap. While I was testing this I had two lines in mine: keycode 16 = 7 ampersand 7 ampersand dead_hook dead_horn dead_hook dead_horn That adds the dead hook and dead horn to AltGr 7 : the us altgr-intl map already has those, as do some others but not the gb map - I don't particularly use them, but I wanted to check that Xmodmap was working (at first it wasn't - fubar in my script ;) because I was having a problem with the next entry - keycode 59 = comma less comma less dead_belowcomma dead_belowcomma dead_belowcomma dead_belowcomma That adds the comma below the letter to the comma key. It was added to xorg fairly recently, and referenced in the (en_US) Compose file, and probably also in fi_FI, for use with s and t when writing romanian. It does work, although I need to wear my glasses to see that it is there in rxvt-unicode. What gave me the problem was that my (gtk+-2) applications just ignored it and gave me s and t. It turns out that the "known" valid sequences are in one of the gdk header files, and this addition [ XK_dead_belowcomma ] to /usr/include/X11/keysymdef.h has not been reflected in gtk+-2. Fortunately, there is a workaround: in .xinitrc export GTK_IM_MODULE=xim On my LFS-7.1 system, _with_that_workaround_ the dead comma below is working in epiphany, but I've no idea if the workaround is necessary for gtk+-3 apps. Judging by arora, QT seems to be fine for this. Unfortunately, later testing revealed that there is a downside to setting GTK_IM_MODULE=xim : gtk+-2 apps will not let me key-by-number. I hadn't realised that was available [ hold down ctrl-shift, type u and then the hex digits, release ] until I saw references to using it in libreoffice. On balance, the compose tables in gtk2 are _old_ and for me it is better to use my "enhanced" keymap and compose definitions and just paste, e.g. from rxvt-unicode [ ctrl-shift-hex-digits ]. Vim also permits this [ ctrl-v u hex-digits enter ]. But some people will undoubtedly prefer to use the default gtk settings. I saw there were three other dead keys added at the same time, of which one was double grave (used perhaps for croatian poetry, or alternatively for comparative linguistics in the former Yugoslavia, depending on which source I was accurate!) and I didn't make a note of the others. You can also add combining diacriticals in .Xmodmap, e.g. if you wish to use languages for which precomposed glyphs are not available, such as bosnian (in the cyrillic alphabet) and many african languages. Software: -------- Obviously you need fonts which can display the characters you are keying. BLFS still references the traditional fonts (ugh! :) and when I _last_ used these the full set seemed to cover everything. But unicode moves on and it is possible that some of the latest additions might not be covered in those fonts. For modern fonts using freetype I recommend that you use current DejaVu and FreeFont as a starting point. For terminals, I haven't used xterm since I found rxvt-unicode so I have no idea how well xterm will handle these things. I guess it is pretty good, provided you set the locale to a UTF-8 form. Gnome has a reputation for doing things its own way (might be the gtk+-2 issue I found with dead_commabelow, in which case mate and cinnamon are probably similar). For KDE I have no idea how much of this works. Compose Key ----------- The last time I tried to add a Compose key, ISTR it no longer worked. But since the dead keys provide the accents I normally use, I didn't worry about this. However, while trying to increase the amount of glyphs I can type I noticed that some things are defined for the Compose key, particularly ligatures such as for the French oe combination [ I'm not showing it here because the rest of this file is in plain ASCII ]. Google found https://help.ubuntu.com/community/ComposeKey where I learned that holding down Shift and AltGr (but only in that order!), and then releasing them, works as a Compose key. This doesn't offer quite as many weird and obscure letters as the dead keys (e.g. dead_stroke works for a,c,e,j,r,u,y as well as b,d,g,h,i,l,o,t) but it is useful. For the sake of completeness, the following compose sequences give diacriticals: ~ tilde ' acute ` grave b breve _ or - macron ; ogonek ^ circumflex / stroke . dot above c caron , cedilla = double acute * ring above " diaeresis ! dot below ? hook + horn These can often be combined, e.g. for o with acute and horn. The ligatures on Compose include: a e for ae o e for oe f f for ff f l for fl (and many others) There are also many other Compose glyphs, among which are (two spaces) non-break-space +- plus-or-minus so section PP pilcrow No number symbol See also: http://felixhanley.info/articles/custom-keyboard-layout-in-xorg/ http://people.uleth.ca/~daniel.odonnell/Blog/custom-keyboard-in-linuxx11 https://help.ubuntu.com/community/Howto%3A%20Custom%20keyboard%20layout%20definitions Created 2013-01-18. Revision 1 2013-01-18 - confirmed this works back to at least xorg-7.5 which is my oldest system currently available [ LFS-6.6 ]. Revision 2 2013-02-04 - added details of Shift+AltGr = Compose Revision 3 2013-05-03 - note that xim prevents gtk2 ctrl-shift-u input, and minor text changes.