MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0992C.942D9580"
In-Reply-To:  <14990.30852.571842.571065@istrati.zdv.uni-mainz.de>
References: <200102122049.f1CKnvi13875@smtp.wanadoo.es>            <14990.30852.571842.571065@istrati.zdv.uni-mainz.de>
Content-class: urn:content-classes:message
Subject:      Embarrassingly wrong
Date: Sat, 17 Feb 2001 22:54:05 +0100
Message-ID:  <14990.62205.711349.925864@istrati.zdv.uni-mainz.de>
From: "Frank Mittelbach" <frank.mittelbach@LATEX-PROJECT.ORG>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0992C.942D9580
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Frank Mittelbach writes:
 >
 > A few people will unfortunately get this posting twice since it is =
both sent
 > to LATEX-L as well as to the Omega developers (several of which are =
on
 > LaTeX-L), sorry for that.
 >
 > We thought this advisable as we make a number of suggestions =
regarding
 > extensions/changes to Omega's character token processing. (Any =
technical
 > discusion of these suggestions should probably be confined to the =
omega
 > developers list though)

fortunately it is a weekend and nobody has already told us ... so we can =
at
least claim we found out ourselves shortly after sending the message =
out:
Omega already has input modes and translations  which do support what we =
are
asking for, ie the translation from the source document to the internal
unicode form.

thus OICR1=3DOICR2 and all our rambling about it was wrong

what seems to remain is

 a) problems with controlling these input translations; the way it works =
in
 omega according to the documentation is that a change applies to the =
next
 line in a file. However in an example like the following:

    \ocp\OCPa=3Dinutf8

    \def\foo{abc=E4d} % default seems to be latin1
    \show\foo


    % the following fails (not surprisingly)
    % and can't be corrected later on

    \def\foo{ab
    \InputTranslation currentfile\OCPa
    c=C3=A4}
    \show\foo


 the second \foo will now contains the tokens

   \foo=3Dmacro:
    ->ab \InputTranslation currentfile\OCPa c^^c3^^a4.

 thus if you ever use this \foo later on you will get the wrong =
characters
 because the input was umlaut-a in utf8 but what is stored in \foo are =
the
 _two_ characters uppercase-A-with-tilde and currency-sign).

 furthermore if this \foo is used anywhere it will change the input
 translation from the next line on to utf8 and this could be in a =
completely
 different file.

 This might look like a contrieved example but on a higher level of =
macro
 encoding this type of problem will happen whenever an \InputTranslation =
is
 used either directly or within some macro definition (like a language =
tag)
 and that is placed, for example, inside an argument of some other tag.

 Since we have been asked to provide input encoding changes for LaTeX =
within
 paragraphs, eg for individual words, something like this would happen =
if such
 a change appears, say, inside the argument of \section.


 b)  the other problem that seems to remain is:

 > Another problem of the current model seems to be that, even if trans =
A did the
 > encoding transformation to Unicode ie we have only a single OICR,
 > transformations of type D (ie transformation of character token =
strings) can't
 > be controlled by a mechanism similar to the one that is available for
 > transformations of type C, ie in one case we have ocps and in the =
other area,
 > when we work on structural issues like building TOC or arranging data =
for page
 > representation no such mechanism is available. Thus is seems =
interesting to
 > think about whether or not a similar concept (not necessarily the =
same!)
 > should be made available for this part of the process.
 >
 > In other words the concept of ocps makes perfect sense for character =
string
 > manipulation but one has to [pretend] to typeset something to have =
them
 > available in current Omega, but a large amount of document processing =
is
 > concerned with character string manipulation not related to =
typesetting at
 > all.

what is no longer a problem though is the example we gave for the above =
since
for that particular case (writing to output streams) Omega provides =
output
translations.

hope by this correction we got a little closer to the truth :-)

frank & chris

------_=_NextPart_001_01C0992C.942D9580
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Embarrassingly wrong</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Frank Mittelbach writes:</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; A few people will unfortunately get this =
posting twice since it is both sent</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; to LATEX-L as well as to the Omega =
developers (several of which are on</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; LaTeX-L), sorry for that.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; We thought this advisable as we make a =
number of suggestions regarding</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; extensions/changes to Omega's character =
token processing. (Any technical</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; discusion of these suggestions should =
probably be confined to the omega</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; developers list though)</FONT>
</P>

<P><FONT SIZE=3D2>fortunately it is a weekend and nobody has already =
told us ... so we can at</FONT>

<BR><FONT SIZE=3D2>least claim we found out ourselves shortly after =
sending the message out:</FONT>

<BR><FONT SIZE=3D2>Omega already has input modes and translations&nbsp; =
which do support what we are</FONT>

<BR><FONT SIZE=3D2>asking for, ie the translation from the source =
document to the internal</FONT>

<BR><FONT SIZE=3D2>unicode form.</FONT>
</P>

<P><FONT SIZE=3D2>thus OICR1=3DOICR2 and all our rambling about it was =
wrong</FONT>
</P>

<P><FONT SIZE=3D2>what seems to remain is</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;a) problems with controlling these input =
translations; the way it works in</FONT>

<BR><FONT SIZE=3D2>&nbsp;omega according to the documentation is that a =
change applies to the next</FONT>

<BR><FONT SIZE=3D2>&nbsp;line in a file. However in an example like the =
following:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \ocp\OCPa=3Dinutf8</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \def\foo{abc=E4d} % default seems =
to be latin1</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \show\foo</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; % the following fails (not =
surprisingly)</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; % and can't be corrected later =
on</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \def\foo{ab</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \InputTranslation =
currentfile\OCPa</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; c=C3=A4}</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \show\foo</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>&nbsp;the second \foo will now contains the =
tokens</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; \foo=3Dmacro:</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; -&gt;ab \InputTranslation =
currentfile\OCPa c^^c3^^a4.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;thus if you ever use this \foo later on you will =
get the wrong characters</FONT>

<BR><FONT SIZE=3D2>&nbsp;because the input was umlaut-a in utf8 but what =
is stored in \foo are the</FONT>

<BR><FONT SIZE=3D2>&nbsp;_two_ characters uppercase-A-with-tilde and =
currency-sign).</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;furthermore if this \foo is used anywhere it =
will change the input</FONT>

<BR><FONT SIZE=3D2>&nbsp;translation from the next line on to utf8 and =
this could be in a completely</FONT>

<BR><FONT SIZE=3D2>&nbsp;different file.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;This might look like a contrieved example but on =
a higher level of macro</FONT>

<BR><FONT SIZE=3D2>&nbsp;encoding this type of problem will happen =
whenever an \InputTranslation is</FONT>

<BR><FONT SIZE=3D2>&nbsp;used either directly or within some macro =
definition (like a language tag)</FONT>

<BR><FONT SIZE=3D2>&nbsp;and that is placed, for example, inside an =
argument of some other tag.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;Since we have been asked to provide input =
encoding changes for LaTeX within</FONT>

<BR><FONT SIZE=3D2>&nbsp;paragraphs, eg for individual words, something =
like this would happen if such</FONT>

<BR><FONT SIZE=3D2>&nbsp;a change appears, say, inside the argument of =
\section.</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>&nbsp;b)&nbsp; the other problem that seems to remain =
is:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; Another problem of the current model seems =
to be that, even if trans A did the</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; encoding transformation to Unicode ie we =
have only a single OICR,</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; transformations of type D (ie =
transformation of character token strings) can't</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; be controlled by a mechanism similar to =
the one that is available for</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; transformations of type C, ie in one case =
we have ocps and in the other area,</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; when we work on structural issues like =
building TOC or arranging data for page</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; representation no such mechanism is =
available. Thus is seems interesting to</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; think about whether or not a similar =
concept (not necessarily the same!)</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; should be made available for this part of =
the process.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; In other words the concept of ocps makes =
perfect sense for character string</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; manipulation but one has to [pretend] to =
typeset something to have them</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; available in current Omega, but a large =
amount of document processing is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; concerned with character string =
manipulation not related to typesetting at</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; all.</FONT>
</P>

<P><FONT SIZE=3D2>what is no longer a problem though is the example we =
gave for the above since</FONT>

<BR><FONT SIZE=3D2>for that particular case (writing to output streams) =
Omega provides output</FONT>

<BR><FONT SIZE=3D2>translations.</FONT>
</P>

<P><FONT SIZE=3D2>hope by this correction we got a little closer to the =
truth :-)</FONT>
</P>

<P><FONT SIZE=3D2>frank &amp; chris</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0992C.942D9580--