Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4ACocf04313 for ; Thu, 10 May 2001 14:50:38 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4ACob700526 . for ; Thu, 10 May 2001 14:50:37 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4ACoa010869 for ; Thu, 10 May 2001 14:50:36 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0D94F.CFE66300" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id OAA00973 for ; Thu, 10 May 2001 14:50:36 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4ACoaU10531 for ; Thu, 10 May 2001 14:50:36 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <15.D31B63DA@mail.listserv.gmd.de>; Thu, 10 May 2001 14:49:05 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 495794 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Thu, 10 May 2001 14:50:32 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id OAA05142 for ; Thu, 10 May 2001 14:50:31 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id OAA20382 for ; Thu, 10 May 2001 14:50:31 +0200 Received: from abel.math.umu.se (abel.math.umu.se [130.239.20.139]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4ACoVQ17691 for ; Thu, 10 May 2001 14:50:31 +0200 (MET DST) Received: from [130.239.20.144] (mac144.math.umu.se [130.239.20.144]) by abel.math.umu.se (8.9.2/8.9.2) with ESMTP id OAA25589 for ; Thu, 10 May 2001 14:47:15 +0200 (CEST) In-Reply-To: <15096.33272.631022.67872@gargle.gargle.HOWL> Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id OAA05143 Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Thu, 10 May 2001 13:50:29 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4040 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0D94F.CFE66300 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 01.32 +0200 2001-05-09, Marcel Oliver wrote: >Apostolos Syropoulos has expressed interest (some time ago) to publish >a version of this document in Eutupon (the Greek TeX Friends >newsletter). Therefore, I would like to make sure that the document >is as accurate as possible, that everybody is happy with the way I >presented his contributions, and that the external references are >useful and complete. So if I don't hear complaints, I assume that >everything is cool. I find the name of Section 2 (LaTeX Internal Character Representation) rather strange, as there is very little in that section that concerns = the LICR. The main topic of that section seems rather to be the shortcomings = of TeX (as a typesetting engine). The comparison in Section 3.2.1 of how characters are processed in TeX = and Omega respectively also seems strange. In Omega case (b), column C, we = see that the LICR character \'e is converted to an 8-bit character "82 = before some OTP converts it to the Unicode character "00E9 in column D. Surely this can't be right---whenever LICR is converted to anything it should = be to full Unicode, since we will otherwise end up in an encoding morass = much worse than that in current LaTeX. It also seems to me that there is some confusion---in the debate as well = as in the summary---of where the boundary between "input" and "output" is located. Since LaTeX is a TeX format it lives between the "eye" and the "stomach", and thus to LaTeX everything which happens to text from evaluation (character tokens enter the stomach to be typeset) and on is part of the output process. Much of what has been written about Omega = seem instead to draw the line between input and output at a much later = position. Hence some of the things which have been described as Omega extensions = that act on the input are from LaTeX's view rather yet another thing that act = on the output. As I understand the Omega draft documentation, there can be no more than one OTP (the \InputTranslation) acting on the input of LaTeX at any time and that OTP in only meant to handle the basic conversion from the = external encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit Unicode. All this happens way before the input gets tokenized, so there = is by the way no point in worrying about what the OTP should do with = control sequences. The next time any OTP gets to act on the characters is when they are = being put onto a horizontal list---this is where the OTPs can be stacked and = one OTP can act on the output on another---i.e., in the first stage of = _output_ from LaTeX. Yet these are what is described as "Input: set of input conventions" (maybe because the Omega draft documentation calls them = "Input filters") in the itemize-list on page 8!! (Note: I am not questioning whether this is a correct summary of the debate---if I am questioning anything it is rather the idea expressed in the original contribution.) Certainly there is a need for some OTPs to act on the text at this = stage, but some of the processing should rather be done on the input side of = LaTeX (for which the current Omega seems to provide very little). I note that = the last paragraph of Section 3 mentions the problem that Omega does not provide any OTP processing of text when it is between the eye and the stomach. Lars Hellstr=F6m ------_=_NextPart_001_01C0D94F.CFE66300 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

At 01.32 +0200 2001-05-09, Marcel Oliver wrote:
>Apostolos Syropoulos has expressed interest (some = time ago) to publish
>a version of this document in Eutupon (the Greek = TeX Friends
>newsletter).  Therefore, I would like to = make sure that the document
>is as accurate as possible, that everybody is = happy with the way I
>presented his contributions, and that the = external references are
>useful and complete.  So if I don't hear = complaints, I assume that
>everything is cool.

I find the name of Section 2 (LaTeX Internal Character = Representation)
rather strange, as there is very little in that = section that concerns the
LICR. The main topic of that section seems rather to = be the shortcomings of
TeX (as a typesetting engine).

The comparison in Section 3.2.1 of how characters are = processed in TeX and
Omega respectively also seems strange. In Omega case = (b), column C, we see
that the LICR character \'e is converted to an 8-bit = character "82 before
some OTP converts it to the Unicode character = "00E9 in column D. Surely
this can't be right---whenever LICR is converted to = anything it should be
to full Unicode, since we will otherwise end up in an = encoding morass much
worse than that in current LaTeX.

It also seems to me that there is some confusion---in = the debate as well as
in the summary---of where the boundary between = "input" and "output" is
located. Since LaTeX is a TeX format it lives between = the "eye" and the
"stomach", and thus to LaTeX everything = which happens to text from
evaluation (character tokens enter the stomach to be = typeset) and on is
part of the output process. Much of what has been = written about Omega seem
instead to draw the line between input and output at = a much later position.
Hence some of the things which have been described as = Omega extensions that
act on the input are from LaTeX's view rather yet = another thing that act on
the output.

As I understand the Omega draft documentation, there = can be no more than
one OTP (the \InputTranslation) acting on the input = of LaTeX at any time
and that OTP in only meant to handle the basic = conversion from the external
encoding (ASCII, latin-1, UTF-8, or whatever) to the = internal 32-bit
Unicode. All this happens way before the input gets = tokenized, so there is
by the way no point in worrying about what the OTP = should do with control
sequences.

The next time any OTP gets to act on the characters is = when they are being
put onto a horizontal list---this is where the OTPs = can be stacked and one
OTP can act on the output on another---i.e., in the = first stage of _output_
from LaTeX. Yet these are what is described as = "Input: set of input
conventions" (maybe because the Omega draft = documentation calls them "Input
filters") in the itemize-list on page 8!! (Note: = I am not questioning
whether this is a correct summary of the debate---if = I am questioning
anything it is rather the idea expressed in the = original contribution.)
Certainly there is a need for some OTPs to act on the = text at this stage,
but some of the processing should rather be done on = the input side of LaTeX
(for which the current Omega seems to provide very = little). I note that the
last paragraph of Section 3 mentions the problem that = Omega does not
provide any OTP processing of text when it is between = the eye and the
stomach.

Lars Hellstr=F6m

------_=_NextPart_001_01C0D94F.CFE66300--