Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4BEfAf10412 for ; Fri, 11 May 2001 16:41:10 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4BEfA706434 . for ; Fri, 11 May 2001 16:41:10 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4BEf9U21912 for ; Fri, 11 May 2001 16:41:09 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0DA28.6B4B0700" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id QAA01692 for ; Fri, 11 May 2001 16:41:08 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4BEf8U21905 for ; Fri, 11 May 2001 16:41:08 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <15.6D054833@mail.listserv.gmd.de>; Fri, 11 May 2001 16:39:35 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 496193 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Fri, 11 May 2001 16:41:04 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id QAA18643 for ; Fri, 11 May 2001 16:41:02 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id QAA51656 for ; Fri, 11 May 2001 16:41:03 +0200 Received: from abel.math.umu.se (abel.math.umu.se [130.239.20.139]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4BEf2Q17483 for ; Fri, 11 May 2001 16:41:02 +0200 (MET DST) Received: from [130.239.20.144] (mac144.math.umu.se [130.239.20.144]) by abel.math.umu.se (8.9.2/8.9.2) with ESMTP id QAA08222 for ; Fri, 11 May 2001 16:37:46 +0200 (CEST) In-Reply-To: <200105101920.f4AJKk729706@smtp.wanadoo.es> Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id QAA18644 Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Fri, 11 May 2001 15:41:01 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4045 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0DA28.6B4B0700 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 22.16 +0200 2001-05-10, Javier Bezos wrote: >Lars said: > >> As I understand the Omega draft documentation, there can be no more = than >> one OTP (the \InputTranslation) acting on the input of LaTeX at any = time >> and that OTP in only meant to handle the basic conversion from the = external >> encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit >> Unicode. All this happens way before the input gets tokenized, so = there is >> by the way no point in worrying about what the OTP should do with = control >> sequences. >> >> The next time any OTP gets to act on the characters is when they are = being >> put onto a horizontal list---this is where the OTPs can be stacked = and one >> OTP can act on the output on another---i.e., in the first stage of = _output_ >> from LaTeX. Yet these are what is described as "Input: set of input >> conventions" (maybe because the Omega draft documentation calls them = "Input >> filters") in the itemize-list on page 8!! (Note: I am not questioning >> whether this is a correct summary of the debate---if I am questioning >> anything it is rather the idea expressed in the original = contribution.) >> Certainly there is a need for some OTPs to act on the text at this = stage, >> but some of the processing should rather be done on the input side of = LaTeX >> (for which the current Omega seems to provide very little). I note = that the > >I don't see the point of doing that. E.g. normalization of Unicode is something which should happen on the = input side, since LaTeX has occationally a need to determine if two pieces of text are equal (cf. the xinitials package). >Processing information after full >expansion is essentially LaTeX without inputenc and fontenc, and very = little >code will be broken. Processing the source when it's read could break = lot >of things. This means that auxiliary files will have different coding >conventions and therefore differente processes should be applied = depending >on the file to be read. I think that is an unnecessary complication. It seems to me that what you are trying to do is to use a modified LaTeX kernel which still does 8-bit input and output (in particular: it = encodes every character it puts onto an hlist as an 8-bit quantity) on top of = the Omega 16-bit (or whatever it is right now) typesetting engine. Whereas = this is more powerful than the current LaTeX in that it can e.g. do language-specific ligature processing without resorting to language-specific fonts, it is no better at handling the problems = related to _multilinguality_ because it still cannot handle character sets that spans more than one (8-bit) encoding. How would for example the proposed code deal with the (nonsensical but legal) input a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu? >One of the problems here is if the code to be moved around should be >processed first and then moved (like floats) or moved first and then >processed (like marks). imo, the answer is definitely the second -- >have you tried placing a caption of a figure in the outer margin >of the page? (impossible without modifying the output routine because >figures and captions are first boxed and then moved). As I said, >preserving the original code when moving it around it's essential >to avoid a mess, and in fact that is the very reason things are >\protect'ed. This way, decisions could be taken depending on the >final placement of the material (for example, should be a Japanese >caption typeset vertically or horizontally?). There are many different kinds of processing. Those that have to do with interpreting the input have to be carried out before the material is = moved as moving material may change its interpretation. With text being = processed as in your example it is far from certain that the caption even can be recognized as Japanese when it is about to be typeset, as everything = anyway seems to be reencoded in some 8-bit input encoding before it is typeset! Lars Hellstr=F6m ------_=_NextPart_001_01C0DA28.6B4B0700 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

At 22.16 +0200 2001-05-10, Javier Bezos wrote:
>Lars said:
>
>> As I understand the Omega draft = documentation, there can be no more than
>> one OTP (the \InputTranslation) acting on = the input of LaTeX at any time
>> and that OTP in only meant to handle the = basic conversion from the external
>> encoding (ASCII, latin-1, UTF-8, or = whatever) to the internal 32-bit
>> Unicode. All this happens way before the = input gets tokenized, so there is
>> by the way no point in worrying about what = the OTP should do with control
>> sequences.
>>
>> The next time any OTP gets to act on the = characters is when they are being
>> put onto a horizontal list---this is where = the OTPs can be stacked and one
>> OTP can act on the output on another---i.e., = in the first stage of _output_
>> from LaTeX. Yet these are what is described = as "Input: set of input
>> conventions" (maybe because the Omega = draft documentation calls them "Input
>> filters") in the itemize-list on page = 8!! (Note: I am not questioning
>> whether this is a correct summary of the = debate---if I am questioning
>> anything it is rather the idea expressed in = the original contribution.)
>> Certainly there is a need for some OTPs to = act on the text at this stage,
>> but some of the processing should rather be = done on the input side of LaTeX
>> (for which the current Omega seems to = provide very little). I note that the
>
>I don't see the point of doing that.

E.g. normalization of Unicode is something which = should happen on the input
side, since LaTeX has occationally a need to = determine if two pieces of
text are equal (cf. the xinitials package).

>Processing information after full
>expansion is essentially LaTeX without inputenc = and fontenc, and very little
>code will be broken. Processing the source when = it's read could break lot
>of things. This means that auxiliary files will = have different coding
>conventions and therefore differente processes = should be applied depending
>on the file to be read. I think that is an = unnecessary complication.

It seems to me that what you are trying to do is to = use a modified LaTeX
kernel which still does 8-bit input and output (in = particular: it encodes
every character it puts onto an hlist as an 8-bit = quantity) on top of the
Omega 16-bit (or whatever it is right now) = typesetting engine. Whereas this
is more powerful than the current LaTeX in that it = can e.g. do
language-specific ligature processing without = resorting to
language-specific fonts, it is no better at handling = the problems related
to _multilinguality_ because it still cannot handle = character sets that
spans more than one (8-bit) encoding. How would for = example the proposed
code deal with the (nonsensical but legal) = input
   = a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu?

>One of the problems here is if the code to be = moved around should be
>processed first and then moved (like floats) or = moved first and then
>processed (like marks). imo, the answer is = definitely the second --
>have you tried placing a caption of a figure in = the outer margin
>of the page? (impossible without modifying the = output routine because
>figures and captions are first boxed and then = moved). As I said,
>preserving the original code when moving it = around it's essential
>to avoid a mess, and in fact that is the very = reason things are
>\protect'ed. This way, decisions could be taken = depending on the
>final placement of the material (for example, = should be a Japanese
>caption typeset vertically or = horizontally?).

There are many different kinds of processing. Those = that have to do with
interpreting the input have to be carried out before = the material is moved
as moving material may change its interpretation. = With text being processed
as in your example it is far from certain that the = caption even can be
recognized as Japanese when it is about to be = typeset, as everything anyway
seems to be reencoded in some 8-bit input encoding = before it is typeset!

Lars Hellstr=F6m

------_=_NextPart_001_01C0DA28.6B4B0700--