Locale naming
The babel multilingual framework for LaTeX employs specific
conventions for naming and referencing locale files (both the
traditional ldf mechanism and the modern ini one), prioritizing
compatibility with established standards
(Unicode
CLDR, IANA/BCP 47
registry)
across the web and international data repositories. Babel adheres to
those standards for locale classification, which are based (according
to ISO 639) not only on linguistic similarity, but also on the
existence of a common literature (which sometimes also means an
specific typographical tradition) or of a common ethnolinguistic
identity.
To ensure consistency, clarity, and compatibility across systems, the following conventions are used for naming locales at the user level. These conventions apply to both language names and the corresponding file names.
Basic rules
Base language names are written in lowercase English, with no spaces. Diacritics and non-letter characters (ampersands, hyphens, apostrophes) are removed to simplify parsing and avoid encoding issues:
northernkurdish(not “Northern Kurdish”)kinaraya(from “Kinaray-a”)lu(from “Lü”)bosniaherzegovina(from “Bosnia & Herzegovina”)
Scripts and regions are appended using hyphens:
serbian-latin(Serbian written in Latin script)spanish-mexico(Spanish as spoken in Mexico)
Region codes may be used, for convenience, as a shorthand for long region names:
bais accepted as a shorthand forbosniaherzegovina
Locale names are primarily derived from the CLDR (Common Locale Data Repository). When CLDR does not provide a name (e.g., for historical or constructed languages), the following patterns, followed in practice for other names, apply:
- Use the natural English form, lowercased and concatenated:
medievallatin(for “Medieval Latin”)
- Prefer names based on the description field in the IANA Language Subtag Registry:
polytonicgreek(for Greek written in polytonic orthography)
-
When necessary, simplify overly descriptive names to maintain usability and brevity.
-
Variants in the tags are uppercased in the CLDR, but in the BCP 47 registry they are lowercases (eg,
el-polyton). We stick to the BCP 47 format. - As secondary references:
Special cases
Some locales in CLDR include region-specific names that differ from the base language, with a more precise name:
ro-MDis labeled “Moldavian” in CLDR.
Compound names like “American English” or “Norwegian Nynorsk” may be shortened by removing the language when unambiguous:
americaninstead ofamericanenglishnynorskinstead ofnorwegiannynorskbrazilianinstead ofbrazilianportuguesescottishinstead ofscottishgaelic
When choosing short demonyms, prefer ‘plain’ forms (even if vernacular) over composed or acronym-based names:
britishis preferred overUKenglishamericanis preferred overUSenglish
This reflects the evolution of the english style, because the names
american and british predate USenglish and UKenglish. Further,
the main names in the CLDR for en-US and en-GB are American
English and British English. Note the names ukenglish and
usenglish (all lowercase) are not supported by babel as ldf
files, even if they work in some operating systems.
Deprecated Names
The following locale names are deprecated and should be replaced with their standardized equivalents:
| Deprecated | Use instead |
|---|---|
brazil |
brazilian |
bahasai |
indonesian |
bahasam, meyalu |
malay |
classiclatin |
classicallatin |
ecclesiasticlatin |
ecclesiasticallatin |
frenchb, francais, canadien |
french |
germanb |
german, ngerman |
lsorbian |
lowersorbian |
portuges |
portuguese |
samin |
northernsami |
ukraineb |
ukrainian |
usorbian |
uppersorbian |
vietnam |
vietnamese |
russianb |
russian |
These names are not included in name.babel and should be avoided in
new documents. Some legacy or anomalous names (especially related to
German and Serbian) are under review and may be revised in future
updates.