What’s new in babel 3.75
2022-05-22
Ensuring locale info
As the LaTeX kernel extends its capabilities related to localization
(like upper and lower casing), BCP 47
tags are becoming
essential. The required data are stored in the ini
locale files,
which were loaded only in some cases, except if explicitly
requested with \EnsureBabelInfo
. Now it’s loaded always. This means
there will be a (minimal) overhead. This should work for most languages,
but see the following section.
Because of the way this feature works, problems are very unlikely, but
there is switch as a package option to turn the new behavior off
(ensureinfo=off
).
Language naming with babel
and the Unicode CLDR
In the coming few weeks (or months), babel
will attempt to align the
language names with those in the CLDR, although well established
babel
names that don’t conflict with the latter will be retained
(like british
, norsk
or austrian
). There have already been some
moves in this direction in recent years (for example, portuguese
instead of portuges
, indonesian
instead of bahasai
, malay
instead of bahasam
, french
instead of frenchb
), but the goal is
to recognize correctly any CLDR-based name.
The number of languages with conflicting names is actually very
reduced: (n)german
, swissgerman
and serbian
are among them. How
to deal with these cases without breaking old documents is under study.
See also Locale naming.
\localeinfo*
Sometimes, it comes in handy to be able to use \localeinfo
in an
expandable way even if something went wrong (for example, the locale
currently active is undefined). For these cases, localeinfo*
just
returns an empty string instead of raising an error. Bear in mind that
babel
, following the CLDR, may leave the region unset, which means
\getlanguageproperty*
(see the manual) is the preferred command, so
that the existence of a field can be checked before. This also means
building a string with the language and the region with
\localeinfo*{language.tab.bcp47}-\localeinfo*{region.tab.bcp47}
is
not usually a good idea (because of the hyphen).
Fixes
- Misplaced line in
diagbox
with RTL andluatex
(#43) - Caption names of ‘chapter’ and ‘part’ swapped in the
ini
files for Traditional Chinese (#180). - Captions are now placed after numbers in the
ini
file for Lithuanian. - The
hyphenrules
inen-AU
anden-NZ
were set toenglish
instead ofukenglish
.
Script ranges
The Unicode character ranges were hard-coded in the babel
kernel, but
now they can be set or modified in the ini
file with the key ranges
in the characters
section (as well as in \babelprovide
). A few
Southeast Asian languages defined this key for the line breaking
algorithm, but this new feature can be useful with onchar
and with
locales declared from scratch.
Note you cannot define different blocks for different languages sharing the script. If there are multiple declarations, the last one wins.