MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0D94F.CFE66300"
In-Reply-To:  <15096.33272.631022.67872@gargle.gargle.HOWL>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Thu, 10 May 2001 13:50:29 +0100
Message-ID:  <l03130300b7202a7ae067@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0D94F.CFE66300
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 01.32 +0200 2001-05-09, Marcel Oliver wrote:
>Apostolos Syropoulos has expressed interest (some time ago) to publish
>a version of this document in Eutupon (the Greek TeX Friends
>newsletter).  Therefore, I would like to make sure that the document
>is as accurate as possible, that everybody is happy with the way I
>presented his contributions, and that the external references are
>useful and complete.  So if I don't hear complaints, I assume that
>everything is cool.

I find the name of Section 2 (LaTeX Internal Character Representation)
rather strange, as there is very little in that section that concerns =
the
LICR. The main topic of that section seems rather to be the shortcomings =
of
TeX (as a typesetting engine).

The comparison in Section 3.2.1 of how characters are processed in TeX =
and
Omega respectively also seems strange. In Omega case (b), column C, we =
see
that the LICR character \'e is converted to an 8-bit character "82 =
before
some OTP converts it to the Unicode character "00E9 in column D. Surely
this can't be right---whenever LICR is converted to anything it should =
be
to full Unicode, since we will otherwise end up in an encoding morass =
much
worse than that in current LaTeX.

It also seems to me that there is some confusion---in the debate as well =
as
in the summary---of where the boundary between "input" and "output" is
located. Since LaTeX is a TeX format it lives between the "eye" and the
"stomach", and thus to LaTeX everything which happens to text from
evaluation (character tokens enter the stomach to be typeset) and on is
part of the output process. Much of what has been written about Omega =
seem
instead to draw the line between input and output at a much later =
position.
Hence some of the things which have been described as Omega extensions =
that
act on the input are from LaTeX's view rather yet another thing that act =
on
the output.

As I understand the Omega draft documentation, there can be no more than
one OTP (the \InputTranslation) acting on the input of LaTeX at any time
and that OTP in only meant to handle the basic conversion from the =
external
encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit
Unicode. All this happens way before the input gets tokenized, so there =
is
by the way no point in worrying about what the OTP should do with =
control
sequences.

The next time any OTP gets to act on the characters is when they are =
being
put onto a horizontal list---this is where the OTPs can be stacked and =
one
OTP can act on the output on another---i.e., in the first stage of =
_output_
from LaTeX. Yet these are what is described as "Input: set of input
conventions" (maybe because the Omega draft documentation calls them =
"Input
filters") in the itemize-list on page 8!! (Note: I am not questioning
whether this is a correct summary of the debate---if I am questioning
anything it is rather the idea expressed in the original contribution.)
Certainly there is a need for some OTPs to act on the text at this =
stage,
but some of the processing should rather be done on the input side of =
LaTeX
(for which the current Omega seems to provide very little). I note that =
the
last paragraph of Section 3 mentions the problem that Omega does not
provide any OTP processing of text when it is between the eye and the
stomach.

Lars Hellstr=F6m

------_=_NextPart_001_01C0D94F.CFE66300
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 01.32 +0200 2001-05-09, Marcel Oliver wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;Apostolos Syropoulos has expressed interest (some =
time ago) to publish</FONT>

<BR><FONT SIZE=3D2>&gt;a version of this document in Eutupon (the Greek =
TeX Friends</FONT>

<BR><FONT SIZE=3D2>&gt;newsletter).&nbsp; Therefore, I would like to =
make sure that the document</FONT>

<BR><FONT SIZE=3D2>&gt;is as accurate as possible, that everybody is =
happy with the way I</FONT>

<BR><FONT SIZE=3D2>&gt;presented his contributions, and that the =
external references are</FONT>

<BR><FONT SIZE=3D2>&gt;useful and complete.&nbsp; So if I don't hear =
complaints, I assume that</FONT>

<BR><FONT SIZE=3D2>&gt;everything is cool.</FONT>
</P>

<P><FONT SIZE=3D2>I find the name of Section 2 (LaTeX Internal Character =
Representation)</FONT>

<BR><FONT SIZE=3D2>rather strange, as there is very little in that =
section that concerns the</FONT>

<BR><FONT SIZE=3D2>LICR. The main topic of that section seems rather to =
be the shortcomings of</FONT>

<BR><FONT SIZE=3D2>TeX (as a typesetting engine).</FONT>
</P>

<P><FONT SIZE=3D2>The comparison in Section 3.2.1 of how characters are =
processed in TeX and</FONT>

<BR><FONT SIZE=3D2>Omega respectively also seems strange. In Omega case =
(b), column C, we see</FONT>

<BR><FONT SIZE=3D2>that the LICR character \'e is converted to an 8-bit =
character &quot;82 before</FONT>

<BR><FONT SIZE=3D2>some OTP converts it to the Unicode character =
&quot;00E9 in column D. Surely</FONT>

<BR><FONT SIZE=3D2>this can't be right---whenever LICR is converted to =
anything it should be</FONT>

<BR><FONT SIZE=3D2>to full Unicode, since we will otherwise end up in an =
encoding morass much</FONT>

<BR><FONT SIZE=3D2>worse than that in current LaTeX.</FONT>
</P>

<P><FONT SIZE=3D2>It also seems to me that there is some confusion---in =
the debate as well as</FONT>

<BR><FONT SIZE=3D2>in the summary---of where the boundary between =
&quot;input&quot; and &quot;output&quot; is</FONT>

<BR><FONT SIZE=3D2>located. Since LaTeX is a TeX format it lives between =
the &quot;eye&quot; and the</FONT>

<BR><FONT SIZE=3D2>&quot;stomach&quot;, and thus to LaTeX everything =
which happens to text from</FONT>

<BR><FONT SIZE=3D2>evaluation (character tokens enter the stomach to be =
typeset) and on is</FONT>

<BR><FONT SIZE=3D2>part of the output process. Much of what has been =
written about Omega seem</FONT>

<BR><FONT SIZE=3D2>instead to draw the line between input and output at =
a much later position.</FONT>

<BR><FONT SIZE=3D2>Hence some of the things which have been described as =
Omega extensions that</FONT>

<BR><FONT SIZE=3D2>act on the input are from LaTeX's view rather yet =
another thing that act on</FONT>

<BR><FONT SIZE=3D2>the output.</FONT>
</P>

<P><FONT SIZE=3D2>As I understand the Omega draft documentation, there =
can be no more than</FONT>

<BR><FONT SIZE=3D2>one OTP (the \InputTranslation) acting on the input =
of LaTeX at any time</FONT>

<BR><FONT SIZE=3D2>and that OTP in only meant to handle the basic =
conversion from the external</FONT>

<BR><FONT SIZE=3D2>encoding (ASCII, latin-1, UTF-8, or whatever) to the =
internal 32-bit</FONT>

<BR><FONT SIZE=3D2>Unicode. All this happens way before the input gets =
tokenized, so there is</FONT>

<BR><FONT SIZE=3D2>by the way no point in worrying about what the OTP =
should do with control</FONT>

<BR><FONT SIZE=3D2>sequences.</FONT>
</P>

<P><FONT SIZE=3D2>The next time any OTP gets to act on the characters is =
when they are being</FONT>

<BR><FONT SIZE=3D2>put onto a horizontal list---this is where the OTPs =
can be stacked and one</FONT>

<BR><FONT SIZE=3D2>OTP can act on the output on another---i.e., in the =
first stage of _output_</FONT>

<BR><FONT SIZE=3D2>from LaTeX. Yet these are what is described as =
&quot;Input: set of input</FONT>

<BR><FONT SIZE=3D2>conventions&quot; (maybe because the Omega draft =
documentation calls them &quot;Input</FONT>

<BR><FONT SIZE=3D2>filters&quot;) in the itemize-list on page 8!! (Note: =
I am not questioning</FONT>

<BR><FONT SIZE=3D2>whether this is a correct summary of the debate---if =
I am questioning</FONT>

<BR><FONT SIZE=3D2>anything it is rather the idea expressed in the =
original contribution.)</FONT>

<BR><FONT SIZE=3D2>Certainly there is a need for some OTPs to act on the =
text at this stage,</FONT>

<BR><FONT SIZE=3D2>but some of the processing should rather be done on =
the input side of LaTeX</FONT>

<BR><FONT SIZE=3D2>(for which the current Omega seems to provide very =
little). I note that the</FONT>

<BR><FONT SIZE=3D2>last paragraph of Section 3 mentions the problem that =
Omega does not</FONT>

<BR><FONT SIZE=3D2>provide any OTP processing of text when it is between =
the eye and the</FONT>

<BR><FONT SIZE=3D2>stomach.</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0D94F.CFE66300--