University of North Carolina at Chapel Hill University Libraries logo
My Library Account / Renewals   |   Catalog   |   E-Research Tools   |   E-Journal Finder   |   Search TRLN   |   Need Help?

UNC-Chapel Hill Libraries Catalog



UNC Library Catalog
Diacritics and Special Characters

General Information

UNC-Chapel Hill's catalog allows the display of some diacritics and special characters.

For the purpose of this explanatory material, a diacritic is a mark appearing above or below a letter that is not part of the character itself. Special characters can take several forms: combined letters (e.g. "AE"), letters with modifications that are an integral part of the character (e.g. "D'" with crossbar or "O" with hook), letters from other scripts (e.g. thorn) and modifying symbols that occupy their own space (e.g. alif or miagkii znak). Examples are given with capital letters, but the information applies to both upper and lower case.

The following diacritics and special characters display:

DIACRITICS: acute Á ; circumflex  ; grave À ; tilde à ; umlaut Ä

SPECIAL CHARACTERS: thorn, lowercase þ ; thorn, uppercase Þ

These diacritics generally display only over vowels, not over consonants (although there are exceptions, e.g. the tilde appears over the letter "N"). However, the diacritics do not display over vowels that are represented by special characters, such as the "O" with hook, the "U" with hook and the Turkish "I" without dot.

A number of other diacritics or special characters do not display properly. For diacritics, they are simply dropped. Special characters that do not display properly are presented as the closest roman script letter (for example, a "D" with a crossbar will display and be searchable as "D").

There are two special characters that display incorrectly and may affect searching: specifically, the lower case "eth" and the upper case "O" with slash. Also, some other symbols, such as the British pound and plus/minus sign, display incorrectly. We are working to resolve these problems.

Alphabetical order in the catalog is based on English and is not affected by diacritic marks. Special characters file as the closest roman script letter.

How to Enter Searches

When typing words to be searched in the catalog, diacritics should be omitted; copying and pasting a word which contains diacritics or special characters will cause a search to fail. Upper and lower case is not important. Special characters should be entered as the unmodified roman script letter ( e.g. "th" for þ, thorn). The lowercase "eth" should be omitted from words where it occurs. This problem also occurs via a telnet connection to the catalog. The upper case "O" with slash (Ø) is searchable as an "O" even though it displays as an "í", lower case with an acute.

Information by Language Group

The following summaries by language area will explain how diacritics and special characters are treated in specific cases.

Western European Languages

Most diacritics in modern Western European languages display and special characters are displayed as the closest roman script letter. The diacritic that does not display is the cedilla (ç) in Catalan, French and Portuguese. The German "ess-zet" (ß) is displayed and searched as "ss" or "sz" depending on whether it is written as a ligature or as two separate letters in the source document.

East Asian Languages

For romanized Chinese (Wade-Giles), Japanese and Korean, diacritics and special characters are dropped, with the following exception: the umlaut in Chinese.

Near East Languages

In general, diacritics and special characters are dropped, with the following exceptions: the acute in Amharic, Arabic, and Persian; the circumflex in Turkish; and the umlaut in Turkish. The Turkish "I" without dot displays as a roman "I", however the circumflex used in Ottoman Turkish does not display over it.

Scandinavian and Baltic Languages

Scandinavian Languages

The umlaut displays but the circle above a letter does not. Special characters are displayed as the closest roman script letter. The uppercase "O" with slash (Ø) displays incorrectly.

Baltic Languages

The diacritics in Estonian display, but those in Latvian and Lithuanian do not.

Icelandic, Faroese, Anglo-Saxon

The acute and umlaut display, but other diacritics do not. Special characters are displayed as the closest roman script letter. However, the uppercase "O" with slash (Ø) displays incorrectly and the lowercase "eth" (Ð or ð) is dropped.

Slavic and East European Languages

Including languages of the former Soviet Union written in Cyrillic script.

In general, diacritics and special characters are dropped, with the following exceptions: the acute in Albanian, Czech, Polish, Serbo-Croatian, Slovak and Slovene; the umlaut in Albanian and Hungarian; the Polish "L" with slash as a roman "L" in both upper and lower case; the Serbo-Croatian "D" with crossbar (Ð) as a roman "D" in both upper and lower case.

South Asian Languages

In general, diacritics are dropped for these languages, with the exception of the circumflex in Gujarati, Hindi, Marathi, Sinhalese, and Telugu; the tilde in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Panjabi, Prakrit, Sanskrit, Sindhi, Sinhalese, Tamil, Telugu and Tibetan; and the umlaut in Sindhi.

Please note the acute ( ´ ) to mark the palatal sibilant is also dropped in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Prakrit, Pushto, Sanskrit, Sinhalese, Tamil, Telugu, and Tibetan.

Southeast Asian Languages

In general, diacritics are dropped and special characters are displayed as the closest roman script letter for these languages, with the exception of the acute in Tagalog and Vietnamese; the circumflex in Tagalog and Vietnamese; the grave in Tagalog and Vietnamese; the tilde in Vietnamese.

Please note that diacritics that would otherwise display are dropped when marking a special character such as the "O" with hook or "U" with hook in Vietnamese.