MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0FC8C.F147A800"
Content-class: urn:content-classes:message
Subject:      Font encoding specifications
Date: Fri, 22 Jun 2001 21:54:31 +0100
Message-ID:  <l03102800b75950d24097@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0FC8C.F147A800
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

As of late, I've spent quite some time thinking about font encodings and
related matters; in particular I have considered the matter of how they
should be specified. The discussions on this list these last months =
about
problems related to multilinguality have been quite inspiring (even =
though
I probably wouldn't have done much about it had not also some other
projects I've been working on presented a need for clarifications in =
this
area) and I have now compiled my thoughts in a paper on the matter, =
which
can be found in

  http://abel.math.umu.se/~lars/encodings/

Comments on its contents are welcome; I'd like to see a discussion about =
it
here on this list so that there could be an "official" acceptance or
rejection of the ideas expressed therein. (In the former case, this =
paper
might perhaps evolve into an "encguide".)

One matter in particular which I believe is of interest on this list is =
the
following passage about the output of LaTeX and the corresponding =
attempt
at defining what a LaTeX font encoding really is:
%%%%%%%%%%%%%%%%%%%%
On its way out of \LaTeX\ towards the printed text, a character passes
through a number of stages. The following five seem to cover what is
relevant for the present discussion:
\begin{enumerate}
  \item \emph{\LaTeX\ Internal Character Representation} (LICR)~%
    \cite{LICR}. At this point the character is a character token
    (e.g.~|a|), a text command (e.g.~|\ss|), or a combination
    (e.g.~|\H{o}|).
  \item \emph{Horizontal material;} this is what the character is
    en route from \TeX's mouth to its stomach. For most characters
    this is equivalent to a single |\char| command (e.g.\ |a| is
    equivalent to |\char|\,|97|), but some require more than one, some
    are combined using the |\accent| and |\char| commands, some
    involve rules and\slash or kerns, and some are built using boxes
    that arbitrarily combine the above elements.
  \item \emph{DVI commands;} this is the DVI file commands that
    produce the printed representation of the character.
  \item \emph{Printed text;} this is the graphical representation of
    the character, e.g. as ink on paper or as a pattern on a computer
    screen. Here the text consists of glyphs.
  \item \emph{Interpreted text;} this is essentially printed text
    modulo equivalence of interpretation, hence the text doesn't really
    reach this stage until someone reads it. Here the text consists of
    characters.
\end{enumerate}

In theory there is a universal mapping from LICR to interpreted text,
but various technical restrictions make it impossible to simultaneously
support the entire mapping. A \LaTeX\ encoding selects a restriction
of this mapping to a limited set which will be ``well supported''
(meaning kerning and such between characters in the set works), whereas
elements outside this set at best can be supported through temporary
encoding changes. The encoding also specifies a decomposition of the
mapping into one part which maps LICR to horizontal material and one
part which maps horizontal material to interpreted text. The first
part is realized by the text command definitions usually found in the
\meta{enc}\texttt{enc.def} file for the encoding. The second part is
the font encoding, the specification of which is the topic of this
paper. It is also worth noticing that an actual font is a mapping of
horizontal material to printed text.

An alternative decomposition of the mapping from LICR to interpreted
text would be at the DVI command level, but even though this
decomposition is realized in most \TeX\ implementations, it has very
little relevance for the discussion of encodings. The main reason for
this is that it depends not only on the encoding of a font, but
also on its metrics. Furthermore it is worth noticing that in pdf\TeX\
there needs not be a DVI command level.
%%%%%%%%%%%%%%%%%%%%

Lars Hellstr=F6m

------_=_NextPart_001_01C0FC8C.F147A800
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Font encoding specifications</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>As of late, I've spent quite some time thinking about =
font encodings and</FONT>

<BR><FONT SIZE=3D2>related matters; in particular I have considered the =
matter of how they</FONT>

<BR><FONT SIZE=3D2>should be specified. The discussions on this list =
these last months about</FONT>

<BR><FONT SIZE=3D2>problems related to multilinguality have been quite =
inspiring (even though</FONT>

<BR><FONT SIZE=3D2>I probably wouldn't have done much about it had not =
also some other</FONT>

<BR><FONT SIZE=3D2>projects I've been working on presented a need for =
clarifications in this</FONT>

<BR><FONT SIZE=3D2>area) and I have now compiled my thoughts in a paper =
on the matter, which</FONT>

<BR><FONT SIZE=3D2>can be found in</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; <A =
HREF=3D"http://abel.math.umu.se/~lars/encodings/">http://abel.math.umu.se=
/~lars/encodings/</A></FONT>
</P>

<P><FONT SIZE=3D2>Comments on its contents are welcome; I'd like to see =
a discussion about it</FONT>

<BR><FONT SIZE=3D2>here on this list so that there could be an =
&quot;official&quot; acceptance or</FONT>

<BR><FONT SIZE=3D2>rejection of the ideas expressed therein. (In the =
former case, this paper</FONT>

<BR><FONT SIZE=3D2>might perhaps evolve into an =
&quot;encguide&quot;.)</FONT>
</P>

<P><FONT SIZE=3D2>One matter in particular which I believe is of =
interest on this list is the</FONT>

<BR><FONT SIZE=3D2>following passage about the output of LaTeX and the =
corresponding attempt</FONT>

<BR><FONT SIZE=3D2>at defining what a LaTeX font encoding really =
is:</FONT>

<BR><FONT SIZE=3D2>%%%%%%%%%%%%%%%%%%%%</FONT>

<BR><FONT SIZE=3D2>On its way out of \LaTeX\ towards the printed text, a =
character passes</FONT>

<BR><FONT SIZE=3D2>through a number of stages. The following five seem =
to cover what is</FONT>

<BR><FONT SIZE=3D2>relevant for the present discussion:</FONT>

<BR><FONT SIZE=3D2>\begin{enumerate}</FONT>

<BR><FONT SIZE=3D2>&nbsp; \item \emph{\LaTeX\ Internal Character =
Representation} (LICR)~%</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \cite{LICR}. At this point the =
character is a character token</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; (e.g.~|a|), a text command =
(e.g.~|\ss|), or a combination</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; (e.g.~|\H{o}|).</FONT>

<BR><FONT SIZE=3D2>&nbsp; \item \emph{Horizontal material;} this is what =
the character is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; en route from \TeX's mouth to its =
stomach. For most characters</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; this is equivalent to a single =
|\char| command (e.g.\ |a| is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; equivalent to |\char|\,|97|), but =
some require more than one, some</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; are combined using the |\accent| =
and |\char| commands, some</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; involve rules and\slash or kerns, =
and some are built using boxes</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; that arbitrarily combine the above =
elements.</FONT>

<BR><FONT SIZE=3D2>&nbsp; \item \emph{DVI commands;} this is the DVI =
file commands that</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; produce the printed representation =
of the character.</FONT>

<BR><FONT SIZE=3D2>&nbsp; \item \emph{Printed text;} this is the =
graphical representation of</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; the character, e.g. as ink on =
paper or as a pattern on a computer</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; screen. Here the text consists of =
glyphs.</FONT>

<BR><FONT SIZE=3D2>&nbsp; \item \emph{Interpreted text;} this is =
essentially printed text</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; modulo equivalence of =
interpretation, hence the text doesn't really</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; reach this stage until someone =
reads it. Here the text consists of</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; characters.</FONT>

<BR><FONT SIZE=3D2>\end{enumerate}</FONT>
</P>

<P><FONT SIZE=3D2>In theory there is a universal mapping from LICR to =
interpreted text,</FONT>

<BR><FONT SIZE=3D2>but various technical restrictions make it impossible =
to simultaneously</FONT>

<BR><FONT SIZE=3D2>support the entire mapping. A \LaTeX\ encoding =
selects a restriction</FONT>

<BR><FONT SIZE=3D2>of this mapping to a limited set which will be ``well =
supported''</FONT>

<BR><FONT SIZE=3D2>(meaning kerning and such between characters in the =
set works), whereas</FONT>

<BR><FONT SIZE=3D2>elements outside this set at best can be supported =
through temporary</FONT>

<BR><FONT SIZE=3D2>encoding changes. The encoding also specifies a =
decomposition of the</FONT>

<BR><FONT SIZE=3D2>mapping into one part which maps LICR to horizontal =
material and one</FONT>

<BR><FONT SIZE=3D2>part which maps horizontal material to interpreted =
text. The first</FONT>

<BR><FONT SIZE=3D2>part is realized by the text command definitions =
usually found in the</FONT>

<BR><FONT SIZE=3D2>\meta{enc}\texttt{enc.def} file for the encoding. The =
second part is</FONT>

<BR><FONT SIZE=3D2>the font encoding, the specification of which is the =
topic of this</FONT>

<BR><FONT SIZE=3D2>paper. It is also worth noticing that an actual font =
is a mapping of</FONT>

<BR><FONT SIZE=3D2>horizontal material to printed text.</FONT>
</P>

<P><FONT SIZE=3D2>An alternative decomposition of the mapping from LICR =
to interpreted</FONT>

<BR><FONT SIZE=3D2>text would be at the DVI command level, but even =
though this</FONT>

<BR><FONT SIZE=3D2>decomposition is realized in most \TeX\ =
implementations, it has very</FONT>

<BR><FONT SIZE=3D2>little relevance for the discussion of encodings. The =
main reason for</FONT>

<BR><FONT SIZE=3D2>this is that it depends not only on the encoding of a =
font, but</FONT>

<BR><FONT SIZE=3D2>also on its metrics. Furthermore it is worth noticing =
that in pdf\TeX\</FONT>

<BR><FONT SIZE=3D2>there needs not be a DVI command level.</FONT>

<BR><FONT SIZE=3D2>%%%%%%%%%%%%%%%%%%%%</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0FC8C.F147A800--