MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0DA28.6B4B0700"
In-Reply-To:  <200105101920.f4AJKk729706@smtp.wanadoo.es>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Fri, 11 May 2001 15:41:01 +0100
Message-ID:  <l03130300b72186a62db6@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0DA28.6B4B0700
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 22.16 +0200 2001-05-10, Javier Bezos wrote:
>Lars said:
>
>> As I understand the Omega draft documentation, there can be no more =
than
>> one OTP (the \InputTranslation) acting on the input of LaTeX at any =
time
>> and that OTP in only meant to handle the basic conversion from the =
external
>> encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit
>> Unicode. All this happens way before the input gets tokenized, so =
there is
>> by the way no point in worrying about what the OTP should do with =
control
>> sequences.
>>
>> The next time any OTP gets to act on the characters is when they are =
being
>> put onto a horizontal list---this is where the OTPs can be stacked =
and one
>> OTP can act on the output on another---i.e., in the first stage of =
_output_
>> from LaTeX. Yet these are what is described as "Input: set of input
>> conventions" (maybe because the Omega draft documentation calls them =
"Input
>> filters") in the itemize-list on page 8!! (Note: I am not questioning
>> whether this is a correct summary of the debate---if I am questioning
>> anything it is rather the idea expressed in the original =
contribution.)
>> Certainly there is a need for some OTPs to act on the text at this =
stage,
>> but some of the processing should rather be done on the input side of =
LaTeX
>> (for which the current Omega seems to provide very little). I note =
that the
>
>I don't see the point of doing that.

E.g. normalization of Unicode is something which should happen on the =
input
side, since LaTeX has occationally a need to determine if two pieces of
text are equal (cf. the xinitials package).

>Processing information after full
>expansion is essentially LaTeX without inputenc and fontenc, and very =
little
>code will be broken. Processing the source when it's read could break =
lot
>of things. This means that auxiliary files will have different coding
>conventions and therefore differente processes should be applied =
depending
>on the file to be read. I think that is an unnecessary complication.

It seems to me that what you are trying to do is to use a modified LaTeX
kernel which still does 8-bit input and output (in particular: it =
encodes
every character it puts onto an hlist as an 8-bit quantity) on top of =
the
Omega 16-bit (or whatever it is right now) typesetting engine. Whereas =
this
is more powerful than the current LaTeX in that it can e.g. do
language-specific ligature processing without resorting to
language-specific fonts, it is no better at handling the problems =
related
to _multilinguality_ because it still cannot handle character sets that
spans more than one (8-bit) encoding. How would for example the proposed
code deal with the (nonsensical but legal) input
   a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu?

>One of the problems here is if the code to be moved around should be
>processed first and then moved (like floats) or moved first and then
>processed (like marks). imo, the answer is definitely the second --
>have you tried placing a caption of a figure in the outer margin
>of the page? (impossible without modifying the output routine because
>figures and captions are first boxed and then moved). As I said,
>preserving the original code when moving it around it's essential
>to avoid a mess, and in fact that is the very reason things are
>\protect'ed. This way, decisions could be taken depending on the
>final placement of the material (for example, should be a Japanese
>caption typeset vertically or horizontally?).

There are many different kinds of processing. Those that have to do with
interpreting the input have to be carried out before the material is =
moved
as moving material may change its interpretation. With text being =
processed
as in your example it is far from certain that the caption even can be
recognized as Japanese when it is about to be typeset, as everything =
anyway
seems to be reencoded in some 8-bit input encoding before it is typeset!

Lars Hellstr=F6m

------_=_NextPart_001_01C0DA28.6B4B0700
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 22.16 +0200 2001-05-10, Javier Bezos wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;Lars said:</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; As I understand the Omega draft =
documentation, there can be no more than</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; one OTP (the \InputTranslation) acting on =
the input of LaTeX at any time</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; and that OTP in only meant to handle the =
basic conversion from the external</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; encoding (ASCII, latin-1, UTF-8, or =
whatever) to the internal 32-bit</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; Unicode. All this happens way before the =
input gets tokenized, so there is</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; by the way no point in worrying about what =
the OTP should do with control</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; sequences.</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; The next time any OTP gets to act on the =
characters is when they are being</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; put onto a horizontal list---this is where =
the OTPs can be stacked and one</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; OTP can act on the output on another---i.e., =
in the first stage of _output_</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; from LaTeX. Yet these are what is described =
as &quot;Input: set of input</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; conventions&quot; (maybe because the Omega =
draft documentation calls them &quot;Input</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; filters&quot;) in the itemize-list on page =
8!! (Note: I am not questioning</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; whether this is a correct summary of the =
debate---if I am questioning</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; anything it is rather the idea expressed in =
the original contribution.)</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; Certainly there is a need for some OTPs to =
act on the text at this stage,</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; but some of the processing should rather be =
done on the input side of LaTeX</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; (for which the current Omega seems to =
provide very little). I note that the</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;I don't see the point of doing that.</FONT>
</P>

<P><FONT SIZE=3D2>E.g. normalization of Unicode is something which =
should happen on the input</FONT>

<BR><FONT SIZE=3D2>side, since LaTeX has occationally a need to =
determine if two pieces of</FONT>

<BR><FONT SIZE=3D2>text are equal (cf. the xinitials package).</FONT>
</P>

<P><FONT SIZE=3D2>&gt;Processing information after full</FONT>

<BR><FONT SIZE=3D2>&gt;expansion is essentially LaTeX without inputenc =
and fontenc, and very little</FONT>

<BR><FONT SIZE=3D2>&gt;code will be broken. Processing the source when =
it's read could break lot</FONT>

<BR><FONT SIZE=3D2>&gt;of things. This means that auxiliary files will =
have different coding</FONT>

<BR><FONT SIZE=3D2>&gt;conventions and therefore differente processes =
should be applied depending</FONT>

<BR><FONT SIZE=3D2>&gt;on the file to be read. I think that is an =
unnecessary complication.</FONT>
</P>

<P><FONT SIZE=3D2>It seems to me that what you are trying to do is to =
use a modified LaTeX</FONT>

<BR><FONT SIZE=3D2>kernel which still does 8-bit input and output (in =
particular: it encodes</FONT>

<BR><FONT SIZE=3D2>every character it puts onto an hlist as an 8-bit =
quantity) on top of the</FONT>

<BR><FONT SIZE=3D2>Omega 16-bit (or whatever it is right now) =
typesetting engine. Whereas this</FONT>

<BR><FONT SIZE=3D2>is more powerful than the current LaTeX in that it =
can e.g. do</FONT>

<BR><FONT SIZE=3D2>language-specific ligature processing without =
resorting to</FONT>

<BR><FONT SIZE=3D2>language-specific fonts, it is no better at handling =
the problems related</FONT>

<BR><FONT SIZE=3D2>to _multilinguality_ because it still cannot handle =
character sets that</FONT>

<BR><FONT SIZE=3D2>spans more than one (8-bit) encoding. How would for =
example the proposed</FONT>

<BR><FONT SIZE=3D2>code deal with the (nonsensical but legal) =
input</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; =
a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu?</FONT>
</P>

<P><FONT SIZE=3D2>&gt;One of the problems here is if the code to be =
moved around should be</FONT>

<BR><FONT SIZE=3D2>&gt;processed first and then moved (like floats) or =
moved first and then</FONT>

<BR><FONT SIZE=3D2>&gt;processed (like marks). imo, the answer is =
definitely the second --</FONT>

<BR><FONT SIZE=3D2>&gt;have you tried placing a caption of a figure in =
the outer margin</FONT>

<BR><FONT SIZE=3D2>&gt;of the page? (impossible without modifying the =
output routine because</FONT>

<BR><FONT SIZE=3D2>&gt;figures and captions are first boxed and then =
moved). As I said,</FONT>

<BR><FONT SIZE=3D2>&gt;preserving the original code when moving it =
around it's essential</FONT>

<BR><FONT SIZE=3D2>&gt;to avoid a mess, and in fact that is the very =
reason things are</FONT>

<BR><FONT SIZE=3D2>&gt;\protect'ed. This way, decisions could be taken =
depending on the</FONT>

<BR><FONT SIZE=3D2>&gt;final placement of the material (for example, =
should be a Japanese</FONT>

<BR><FONT SIZE=3D2>&gt;caption typeset vertically or =
horizontally?).</FONT>
</P>

<P><FONT SIZE=3D2>There are many different kinds of processing. Those =
that have to do with</FONT>

<BR><FONT SIZE=3D2>interpreting the input have to be carried out before =
the material is moved</FONT>

<BR><FONT SIZE=3D2>as moving material may change its interpretation. =
With text being processed</FONT>

<BR><FONT SIZE=3D2>as in your example it is far from certain that the =
caption even can be</FONT>

<BR><FONT SIZE=3D2>recognized as Japanese when it is about to be =
typeset, as everything anyway</FONT>

<BR><FONT SIZE=3D2>seems to be reencoded in some 8-bit input encoding =
before it is typeset!</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0DA28.6B4B0700--