MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C2BD59.40669980"
In-Reply-To:  <15899.14827.804209.458595@istrati.mittelbach-online.de>
References: <200212031601.gB3G11cQ009558@sun.dante.de>            <15899.14827.804209.458595@istrati.mittelbach-online.de>
User-Agent: Mutt/1.3.28i
Content-class: urn:content-classes:message
Subject:      Re: latex/3480: Support for UTF-8 missing in inputenc.sty
Date: Thu, 16 Jan 2003 12:46:37 +0100
Message-ID: A<20030116114637.GA9844@g113.hadiko.de>
Thread-Topic:      Re: latex/3480: Support for UTF-8 missing in inputenc.sty
Thread-Index: AcK9WUEQCv/fkASoRByliuer0Iycfg==
From: "Dominique Unruh" <dominique@UNRUH.DE>
To: <LATEX-L@listserv.uni-heidelberg.de>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@listserv.uni-heidelberg.de>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C2BD59.40669980
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I want to add several comments to Frank and Chris's utf8.def:


=3D=3D=3D 1. The definition of the .dfu files.

In the present model, we have the problem, that the same Unicode
character is defined several times in several .dfu files. If all
definitions are identical, this is no problem, but this has to be
ensured. Take the following example: Fontencoding LGR has the command
\euro, to be assigned to U+20AC, while TS1 has \texteuro, same Unicode
character. Therefore I propose the following policy:

- Unicode to TeX mappings are done in a single, fontencoding
independent file, e.g. ucs.map:
[...]
0x20AC   \texteuro
[...]

- Fontencoding specific files contain list of supported code
positions, e.g.  lgr.ucr and ts1.ucr (UCR=3D Unicode Range) both contain
the number 0x20AC (but no more information).

- A script then generates the .dfu files, the above example induces
the inclusion of

\DeclareUnicodeCharacter{20AC}{\texteuro}

into ts1.dfu and lgr.dfu (LGR has then to be updated to include the
macro \texteuro additionally to \euro). Note that only the final .dfu
files are seen by the latex executable, so this system does not
involve any changes in utf8.def.

- The ucs.map file is managed by the LaTeX team. The .ucr files can be
created be the developers of the fontencodings, thus enabling the
developement of fontencodings without the need of interaction with the
LaTeX team. Inclusion of new into the ucs.map file should not be
subject to some restrictive election, since no resources are wasted,
unless some fontencoding requests these characters.

- To the private area algoritmically generatable names should be
assigned, e.g. U+F8D0 (Klingon A according to
http://www.evertype.com/standards/csur/klingon.html) should map to
something like \unicodefBdO (some thought has to be given to the fact,
that the names may not contain numbers) and not e.g. \klingona.


=3D=3D=3D 2. \IeC

Most characters must be enclosed in a call to \IeC, like it is also
done by \DeclareInputText. Otherwise the following fragment

\tableofcontents
\section{La=DF nach}  % La\ss  nach

will give a TOC entry "La=DFnach" (i.e. the space will go away).


=3D=3D=3D 3. Unicode to LaTeX mappings.

There are already extensive lists of character mappings available at:
http://www.unruh.de/DniQ/latex/unicode/content/config/


=3D=3D=3D 4. The loading of the .dfu files.

It has been mentioned, that the late loading of the .dfu files (lines
113--124) causes problems with saveboxes. For completeness I'd like to
add, that also \xdef's etc. cause similar problems when used in the
preamble.


=3D=3D=3D 5. Interoperability with ucs.sty

There are some name clashes with my Unicode package.

- utf8.def: I accept the fact, that this is the canonical name for
that file and will rename my inputencoding in favour of the kernel's
encoding.

- \DeclareUnicodeCharacter: This command is named identically in my
system. I would appreciate if another name could be chosen at this
early stadium to evade chaos. Some possible names would be

\DeclareUnicodeGlyph (according to the nomenclature of the Unicode =
standard)
\DeclareUnicodeCommand (analogous to \DeclareTextCommand)


DniQ.

------_=_NextPart_001_01C2BD59.40669980
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: latex/3480: Support for UTF-8 missing in =
inputenc.sty</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>I want to add several comments to Frank and Chris's =
utf8.def:</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>=3D=3D=3D 1. The definition of the .dfu files.</FONT>
</P>

<P><FONT SIZE=3D2>In the present model, we have the problem, that the =
same Unicode</FONT>

<BR><FONT SIZE=3D2>character is defined several times in several .dfu =
files. If all</FONT>

<BR><FONT SIZE=3D2>definitions are identical, this is no problem, but =
this has to be</FONT>

<BR><FONT SIZE=3D2>ensured. Take the following example: Fontencoding LGR =
has the command</FONT>

<BR><FONT SIZE=3D2>\euro, to be assigned to U+20AC, while TS1 has =
\texteuro, same Unicode</FONT>

<BR><FONT SIZE=3D2>character. Therefore I propose the following =
policy:</FONT>
</P>

<P><FONT SIZE=3D2>- Unicode to TeX mappings are done in a single, =
fontencoding</FONT>

<BR><FONT SIZE=3D2>independent file, e.g. ucs.map:</FONT>

<BR><FONT SIZE=3D2>[...]</FONT>

<BR><FONT SIZE=3D2>0x20AC&nbsp;&nbsp; \texteuro</FONT>

<BR><FONT SIZE=3D2>[...]</FONT>
</P>

<P><FONT SIZE=3D2>- Fontencoding specific files contain list of =
supported code</FONT>

<BR><FONT SIZE=3D2>positions, e.g.&nbsp; lgr.ucr and ts1.ucr (UCR=3D =
Unicode Range) both contain</FONT>

<BR><FONT SIZE=3D2>the number 0x20AC (but no more information).</FONT>
</P>

<P><FONT SIZE=3D2>- A script then generates the .dfu files, the above =
example induces</FONT>

<BR><FONT SIZE=3D2>the inclusion of</FONT>
</P>

<P><FONT SIZE=3D2>\DeclareUnicodeCharacter{20AC}{\texteuro}</FONT>
</P>

<P><FONT SIZE=3D2>into ts1.dfu and lgr.dfu (LGR has then to be updated =
to include the</FONT>

<BR><FONT SIZE=3D2>macro \texteuro additionally to \euro). Note that =
only the final .dfu</FONT>

<BR><FONT SIZE=3D2>files are seen by the latex executable, so this =
system does not</FONT>

<BR><FONT SIZE=3D2>involve any changes in utf8.def.</FONT>
</P>

<P><FONT SIZE=3D2>- The ucs.map file is managed by the LaTeX team. The =
.ucr files can be</FONT>

<BR><FONT SIZE=3D2>created be the developers of the fontencodings, thus =
enabling the</FONT>

<BR><FONT SIZE=3D2>developement of fontencodings without the need of =
interaction with the</FONT>

<BR><FONT SIZE=3D2>LaTeX team. Inclusion of new into the ucs.map file =
should not be</FONT>

<BR><FONT SIZE=3D2>subject to some restrictive election, since no =
resources are wasted,</FONT>

<BR><FONT SIZE=3D2>unless some fontencoding requests these =
characters.</FONT>
</P>

<P><FONT SIZE=3D2>- To the private area algoritmically generatable names =
should be</FONT>

<BR><FONT SIZE=3D2>assigned, e.g. U+F8D0 (Klingon A according to</FONT>

<BR><FONT SIZE=3D2><A =
HREF=3D"http://www.evertype.com/standards/csur/klingon.html">http://www.e=
vertype.com/standards/csur/klingon.html</A>) should map to</FONT>

<BR><FONT SIZE=3D2>something like \unicodefBdO (some thought has to be =
given to the fact,</FONT>

<BR><FONT SIZE=3D2>that the names may not contain numbers) and not e.g. =
\klingona.</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>=3D=3D=3D 2. \IeC</FONT>
</P>

<P><FONT SIZE=3D2>Most characters must be enclosed in a call to \IeC, =
like it is also</FONT>

<BR><FONT SIZE=3D2>done by \DeclareInputText. Otherwise the following =
fragment</FONT>
</P>

<P><FONT SIZE=3D2>\tableofcontents</FONT>

<BR><FONT SIZE=3D2>\section{La=DF nach}&nbsp; % La\ss&nbsp; nach</FONT>
</P>

<P><FONT SIZE=3D2>will give a TOC entry &quot;La=DFnach&quot; (i.e. the =
space will go away).</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>=3D=3D=3D 3. Unicode to LaTeX mappings.</FONT>
</P>

<P><FONT SIZE=3D2>There are already extensive lists of character =
mappings available at:</FONT>

<BR><FONT SIZE=3D2><A =
HREF=3D"http://www.unruh.de/DniQ/latex/unicode/content/config/">http://ww=
w.unruh.de/DniQ/latex/unicode/content/config/</A></FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>=3D=3D=3D 4. The loading of the .dfu files.</FONT>
</P>

<P><FONT SIZE=3D2>It has been mentioned, that the late loading of the =
.dfu files (lines</FONT>

<BR><FONT SIZE=3D2>113--124) causes problems with saveboxes. For =
completeness I'd like to</FONT>

<BR><FONT SIZE=3D2>add, that also \xdef's etc. cause similar problems =
when used in the</FONT>

<BR><FONT SIZE=3D2>preamble.</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>=3D=3D=3D 5. Interoperability with ucs.sty</FONT>
</P>

<P><FONT SIZE=3D2>There are some name clashes with my Unicode =
package.</FONT>
</P>

<P><FONT SIZE=3D2>- utf8.def: I accept the fact, that this is the =
canonical name for</FONT>

<BR><FONT SIZE=3D2>that file and will rename my inputencoding in favour =
of the kernel's</FONT>

<BR><FONT SIZE=3D2>encoding.</FONT>
</P>

<P><FONT SIZE=3D2>- \DeclareUnicodeCharacter: This command is named =
identically in my</FONT>

<BR><FONT SIZE=3D2>system. I would appreciate if another name could be =
chosen at this</FONT>

<BR><FONT SIZE=3D2>early stadium to evade chaos. Some possible names =
would be</FONT>
</P>

<P><FONT SIZE=3D2>\DeclareUnicodeGlyph (according to the nomenclature of =
the Unicode standard)</FONT>

<BR><FONT SIZE=3D2>\DeclareUnicodeCommand (analogous to =
\DeclareTextCommand)</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>DniQ.</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C2BD59.40669980--