Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f5O96Nf08635 for ; Sun, 24 Jun 2001 11:06:24 +0200 Received: by webgate.proteosys.de (8.11.4/8.11.0) with ESMTP id f5O96MQ04919 . for ; Sun, 24 Jun 2001 11:06:23 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0FC8C.F147A800" Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f5O96MU14114 for ; Sun, 24 Jun 2001 11:06:22 +0200 (MET DST) Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id LAA22693 for ; Sun, 24 Jun 2001 11:06:21 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f5O96LU14110 for ; Sun, 24 Jun 2001 11:06:21 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <15.C9A39CFF@mail.listserv.gmd.de>; Sun, 24 Jun 2001 11:03:39 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 497999 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sun, 24 Jun 2001 11:06:18 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id KAA01476 for ; Sun, 24 Jun 2001 10:38:48 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id WAA107742 for ; Fri, 22 Jun 2001 22:56:32 +0200 Received: from algonet.se (sinclair.tninet.se [195.100.94.101]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f5MKuQ315814 for ; Fri, 22 Jun 2001 22:56:26 +0200 (MET DST) Received: from [195.163.229.101] (sdu101-229.ppp.algonet.se [195.163.229.101]) by sinclair.tninet.se (BLUETAIL Mail Robustifier 2.2.2) with ESMTP id 74198.243376.993sinclair-s0 for ; Fri, 22 Jun 2001 22:56:16 +0200 Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id KAA01482 Content-class: urn:content-classes:message Subject: Font encoding specifications Date: Fri, 22 Jun 2001 21:54:31 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4131 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0FC8C.F147A800 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable As of late, I've spent quite some time thinking about font encodings and related matters; in particular I have considered the matter of how they should be specified. The discussions on this list these last months = about problems related to multilinguality have been quite inspiring (even = though I probably wouldn't have done much about it had not also some other projects I've been working on presented a need for clarifications in = this area) and I have now compiled my thoughts in a paper on the matter, = which can be found in http://abel.math.umu.se/~lars/encodings/ Comments on its contents are welcome; I'd like to see a discussion about = it here on this list so that there could be an "official" acceptance or rejection of the ideas expressed therein. (In the former case, this = paper might perhaps evolve into an "encguide".) One matter in particular which I believe is of interest on this list is = the following passage about the output of LaTeX and the corresponding = attempt at defining what a LaTeX font encoding really is: %%%%%%%%%%%%%%%%%%%% On its way out of \LaTeX\ towards the printed text, a character passes through a number of stages. The following five seem to cover what is relevant for the present discussion: \begin{enumerate} \item \emph{\LaTeX\ Internal Character Representation} (LICR)~% \cite{LICR}. At this point the character is a character token (e.g.~|a|), a text command (e.g.~|\ss|), or a combination (e.g.~|\H{o}|). \item \emph{Horizontal material;} this is what the character is en route from \TeX's mouth to its stomach. For most characters this is equivalent to a single |\char| command (e.g.\ |a| is equivalent to |\char|\,|97|), but some require more than one, some are combined using the |\accent| and |\char| commands, some involve rules and\slash or kerns, and some are built using boxes that arbitrarily combine the above elements. \item \emph{DVI commands;} this is the DVI file commands that produce the printed representation of the character. \item \emph{Printed text;} this is the graphical representation of the character, e.g. as ink on paper or as a pattern on a computer screen. Here the text consists of glyphs. \item \emph{Interpreted text;} this is essentially printed text modulo equivalence of interpretation, hence the text doesn't really reach this stage until someone reads it. Here the text consists of characters. \end{enumerate} In theory there is a universal mapping from LICR to interpreted text, but various technical restrictions make it impossible to simultaneously support the entire mapping. A \LaTeX\ encoding selects a restriction of this mapping to a limited set which will be ``well supported'' (meaning kerning and such between characters in the set works), whereas elements outside this set at best can be supported through temporary encoding changes. The encoding also specifies a decomposition of the mapping into one part which maps LICR to horizontal material and one part which maps horizontal material to interpreted text. The first part is realized by the text command definitions usually found in the \meta{enc}\texttt{enc.def} file for the encoding. The second part is the font encoding, the specification of which is the topic of this paper. It is also worth noticing that an actual font is a mapping of horizontal material to printed text. An alternative decomposition of the mapping from LICR to interpreted text would be at the DVI command level, but even though this decomposition is realized in most \TeX\ implementations, it has very little relevance for the discussion of encodings. The main reason for this is that it depends not only on the encoding of a font, but also on its metrics. Furthermore it is worth noticing that in pdf\TeX\ there needs not be a DVI command level. %%%%%%%%%%%%%%%%%%%% Lars Hellstr=F6m ------_=_NextPart_001_01C0FC8C.F147A800 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Font encoding specifications

As of late, I've spent quite some time thinking about = font encodings and
related matters; in particular I have considered the = matter of how they
should be specified. The discussions on this list = these last months about
problems related to multilinguality have been quite = inspiring (even though
I probably wouldn't have done much about it had not = also some other
projects I've been working on presented a need for = clarifications in this
area) and I have now compiled my thoughts in a paper = on the matter, which
can be found in

  http://abel.math.umu.se= /~lars/encodings/

Comments on its contents are welcome; I'd like to see = a discussion about it
here on this list so that there could be an = "official" acceptance or
rejection of the ideas expressed therein. (In the = former case, this paper
might perhaps evolve into an = "encguide".)

One matter in particular which I believe is of = interest on this list is the
following passage about the output of LaTeX and the = corresponding attempt
at defining what a LaTeX font encoding really = is:
%%%%%%%%%%%%%%%%%%%%
On its way out of \LaTeX\ towards the printed text, a = character passes
through a number of stages. The following five seem = to cover what is
relevant for the present discussion:
\begin{enumerate}
  \item \emph{\LaTeX\ Internal Character = Representation} (LICR)~%
    \cite{LICR}. At this point the = character is a character token
    (e.g.~|a|), a text command = (e.g.~|\ss|), or a combination
    (e.g.~|\H{o}|).
  \item \emph{Horizontal material;} this is what = the character is
    en route from \TeX's mouth to its = stomach. For most characters
    this is equivalent to a single = |\char| command (e.g.\ |a| is
    equivalent to |\char|\,|97|), but = some require more than one, some
    are combined using the |\accent| = and |\char| commands, some
    involve rules and\slash or kerns, = and some are built using boxes
    that arbitrarily combine the above = elements.
  \item \emph{DVI commands;} this is the DVI = file commands that
    produce the printed representation = of the character.
  \item \emph{Printed text;} this is the = graphical representation of
    the character, e.g. as ink on = paper or as a pattern on a computer
    screen. Here the text consists of = glyphs.
  \item \emph{Interpreted text;} this is = essentially printed text
    modulo equivalence of = interpretation, hence the text doesn't really
    reach this stage until someone = reads it. Here the text consists of
    characters.
\end{enumerate}

In theory there is a universal mapping from LICR to = interpreted text,
but various technical restrictions make it impossible = to simultaneously
support the entire mapping. A \LaTeX\ encoding = selects a restriction
of this mapping to a limited set which will be ``well = supported''
(meaning kerning and such between characters in the = set works), whereas
elements outside this set at best can be supported = through temporary
encoding changes. The encoding also specifies a = decomposition of the
mapping into one part which maps LICR to horizontal = material and one
part which maps horizontal material to interpreted = text. The first
part is realized by the text command definitions = usually found in the
\meta{enc}\texttt{enc.def} file for the encoding. The = second part is
the font encoding, the specification of which is the = topic of this
paper. It is also worth noticing that an actual font = is a mapping of
horizontal material to printed text.

An alternative decomposition of the mapping from LICR = to interpreted
text would be at the DVI command level, but even = though this
decomposition is realized in most \TeX\ = implementations, it has very
little relevance for the discussion of encodings. The = main reason for
this is that it depends not only on the encoding of a = font, but
also on its metrics. Furthermore it is worth noticing = that in pdf\TeX\
there needs not be a DVI command level.
%%%%%%%%%%%%%%%%%%%%

Lars Hellstr=F6m

------_=_NextPart_001_01C0FC8C.F147A800--