MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0DAF9.E6C8B900"
In-Reply-To:  <200105112029.f4BKT3707962@smtp.wanadoo.es>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Sat, 12 May 2001 16:40:32 +0100
Message-ID:  <l03102800b722eaba3c76@[130.239.137.13]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0DAF9.E6C8B900
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 23.24 +0200 01-05-11, Javier Bezos wrote:
>Frank wrote:
>> LaTeX conceptually has only three levels: source, ICR, Output
>
>However, I still think that it's necessary separating code processing
>from text processing.  Both concepts are mixed up by TeX (and
>therefore LaTeX) making, say, uppercasing a tricky thing.  Remember
>that \uppercase only changes chars in the code, and that
>\MakeUppercase first expands the argument and then applies a set of
>transformation (including math but not including text hidden in
>protected macros!).  Well, since ocp's are applied to text after
>expansion (not including math but including actual text even if
>hidden) we are doing things at the right place and in the right way.

The problem with current Omega is that it only provides text processing =
via
OCPs, but no code processing. Uppercasing as a stylistic variation is
clearly text processing and appears to be handled well (and in the right
place) by the current Omega. With TeX it is best handled using special
fonts, but current LaTeX has no interface for that and it would require =
a
lot of fonts. If uppercasing is done for some other reason then Omega is =
no
better than TeX.

>Another problem is if input encoding belongs to code transformations
>or text tranformations.  Very likely you are right when you say that
>after full expansion it's too late and when reading the source file is
>too early.  An intermediate step seem more sensible, thus making wrong
>the \'e stuff discussed in the recent messages. Another useful addition
>could be an ocp aware variant of \edef (or a similar device).

Indeed such a device is needed. Ideally it should work in the mouth (so
that it could be used without messing up the kerning).

>And
>regarding font transformation, they should be handled by fonts, but
>the main problem is that metric information (ie, tfm) cannot be
>modified from within TeX, except a few parameters; I really wonder
>if allowing more changes, mainly ligatures, is feasible (that
>solution would be better than font ocp's and vf's, I think).

I don't understand this. What kind of font transformations are you
referring to?

>>  my requirement for a usable internal representation is that I can =
take a
>>  single element of it at any time and it has a welldefined meaning =
(and a
>>  single one).
>
>Semantically or visually?

I suspect Frank considers meaning to be a semantic concept, not a =
visual.

>>> at the LICR level means that the auxiliary files use the Unicode =
encoding;
>>> if the editor is not a Unicode one these files become unmanageable =
and
>>> messy.
>>
>> not true. the OICR has to be unicode (or more exactly unique and
>>well-defined
>> in the above sense, can be 20bits for all i care) if Omega ever =
should
>>go off
>> the ground. but the interface to the external world could apply a
>>well-defined
>> output translation to something else before writing.
>
>:-/ I meant from the user's point of view.  (Perhaps the replay was
>too quick...)  What I mean is that any LaTeX file ("main" or
>auxiliary) should follow the LaTeX systax in a form closer to the
>"representation" selected by the user (by "representation" a mean
>input encoding and maybe a set of macros).

The problem is that in multilingual documents there may not be a single
such representation---the user can change input encoding just about
anywhere in a document. This is why current LaTeX converts everything to
LICR before it is written to the .aux file: the elements of the input
encoding (as Frank called them above) do not have a single welldefined
meaning. What has been discussed is that one might used some form of
Unicode (most likely UTF-8) in these files instead.

>=3D=3D=3D=3D=3D=3D
>Lars wrote:
>
>> No it wouldn't. If \protect is not \@typeset@protect when \'e is =
expanded
>> then it will be written to a file as \'e.
>
>Right.  Exactly because of that we should not convert text to Unicode
>at this stage; otherwise we must change the definition depending on
>the file to be read.

We do already change e.g. the \catcode of @ for when .aux files are =
read.
Changing the input encoding is much more work but not principally =
different.

>We must only move LaTeX code and its context
>information without changing  it, so that if it is read correctly in =
the
>main file, it will be read correctly in the auxiliary file.

I believe one of the main problems for multilinguality in LaTeX today is
that there is no way of recording (or maybe even of determining) the
current context so that this information can be moved around with every
piece of code affected by it. Hence most current commands strive instead =
to
convert the code to a context-free representation (the LICR) by use of
protected expansion.

>> But such characters (the Spanish as well as the Hebrew) aren't =
allowed in
>> names in LaTeX!
>
>But they should be allowed in the future in we want a true
>multilingual environment.

Why? They are not part of any text, but part of the markup!

>> It seems to me that what you are trying to do is to use a modified =
LaTeX
>> kernel which still does 8-bit input and output (in particular: it =
encodes
>> every character it puts onto an hlist as an 8-bit quantity) on top of =
the
>> Omega 16-bit (or whatever it is right now) typesetting engine. =
Whereas this
>> is more powerful than the current LaTeX in that it can e.g. do
>> language-specific ligature processing without resorting to
>> language-specific fonts, it is no better at handling the problems =
related
>> to _multilinguality_ because it still cannot handle character sets =
that
>> spans more than one (8-bit) encoding. How would for example the =
proposed
>> code deal with the (nonsensical but legal) input
>>    a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu?
>
>I don't understand why you say that.

Because of the example in the summary:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
          A       B         C       D         E
----------------------------------------------------
TeX   a)   "82     \'e      *   - - - - - >   "E9
      b)   \'e     \'e      *   - - - - - >   "E9
      c)   "82     "82      *   - - - - - >   "82
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Omega a)  "82     "82      "82     "00E9      "E9
      b)  \'e     \'e      "82     "00E9      "E9

The last line shows \'e being converted to an 8-bit quantity "82
(appearently the input encoding equivalent) before it is converted to
Unicode. LaTeX lives between columns A and C, so there is no hint of any
non-8-bit processing being done.

>In fact I don't undestand what you
>say :-) -- it looks very complicated to me. Anyway, it can handle two =
bits
>encodings and uft8, and language style files are written using utf8
>(which are directly converted to Unicode without any intermediate
>step).

That's what I would have expected, but the example gives no hint of this
either.

>Regarding the last line, you can escape the current encoding with
>the \unichar macro (which is somewhat tricky to avoid killing
>ligatures/kerning). As I say in the readme file, applying that trick
>to utf8 didn't work.

Isn't the \char primitive in Omega be able to produce arbitrary =
characters
(at least arbitrary characters in the basic multilingual plane)?

>Actually, this preliminary lambda doesn't convert \'e to =E9, but to
>e U+0301 (ie, the corresponding combining char). In the internal
>Unicode step, accents are normalized in this way and then recombined
>by the font ocp. The definition of \' in the la.sd file is very simple:
>
>\DeclareScriptCommand\'[1]{#1\unichar{"0301}}
>
>Very likely, this is one of the parts deserving improvements.

It looks quite reasonable to me, and it is certainly much better than =
the
processing depicted in the example. Does this mean that the example =
should
rather be

    A     B        C          D        E
   \'e   \'e   e^^^^0301   ^^^^00e9   ^^e9

(using the ^^ notation for non-ASCII characters)?

Lars Hellstr=F6m

------_=_NextPart_001_01C0DAF9.E6C8B900
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 23.24 +0200 01-05-11, Javier Bezos wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;Frank wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; LaTeX conceptually has only three levels: =
source, ICR, Output</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;However, I still think that it's necessary =
separating code processing</FONT>

<BR><FONT SIZE=3D2>&gt;from text processing.&nbsp; Both concepts are =
mixed up by TeX (and</FONT>

<BR><FONT SIZE=3D2>&gt;therefore LaTeX) making, say, uppercasing a =
tricky thing.&nbsp; Remember</FONT>

<BR><FONT SIZE=3D2>&gt;that \uppercase only changes chars in the code, =
and that</FONT>

<BR><FONT SIZE=3D2>&gt;\MakeUppercase first expands the argument and =
then applies a set of</FONT>

<BR><FONT SIZE=3D2>&gt;transformation (including math but not including =
text hidden in</FONT>

<BR><FONT SIZE=3D2>&gt;protected macros!).&nbsp; Well, since ocp's are =
applied to text after</FONT>

<BR><FONT SIZE=3D2>&gt;expansion (not including math but including =
actual text even if</FONT>

<BR><FONT SIZE=3D2>&gt;hidden) we are doing things at the right place =
and in the right way.</FONT>
</P>

<P><FONT SIZE=3D2>The problem with current Omega is that it only =
provides text processing via</FONT>

<BR><FONT SIZE=3D2>OCPs, but no code processing. Uppercasing as a =
stylistic variation is</FONT>

<BR><FONT SIZE=3D2>clearly text processing and appears to be handled =
well (and in the right</FONT>

<BR><FONT SIZE=3D2>place) by the current Omega. With TeX it is best =
handled using special</FONT>

<BR><FONT SIZE=3D2>fonts, but current LaTeX has no interface for that =
and it would require a</FONT>

<BR><FONT SIZE=3D2>lot of fonts. If uppercasing is done for some other =
reason then Omega is no</FONT>

<BR><FONT SIZE=3D2>better than TeX.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;Another problem is if input encoding belongs to =
code transformations</FONT>

<BR><FONT SIZE=3D2>&gt;or text tranformations.&nbsp; Very likely you are =
right when you say that</FONT>

<BR><FONT SIZE=3D2>&gt;after full expansion it's too late and when =
reading the source file is</FONT>

<BR><FONT SIZE=3D2>&gt;too early.&nbsp; An intermediate step seem more =
sensible, thus making wrong</FONT>

<BR><FONT SIZE=3D2>&gt;the \'e stuff discussed in the recent messages. =
Another useful addition</FONT>

<BR><FONT SIZE=3D2>&gt;could be an ocp aware variant of \edef (or a =
similar device).</FONT>
</P>

<P><FONT SIZE=3D2>Indeed such a device is needed. Ideally it should work =
in the mouth (so</FONT>

<BR><FONT SIZE=3D2>that it could be used without messing up the =
kerning).</FONT>
</P>

<P><FONT SIZE=3D2>&gt;And</FONT>

<BR><FONT SIZE=3D2>&gt;regarding font transformation, they should be =
handled by fonts, but</FONT>

<BR><FONT SIZE=3D2>&gt;the main problem is that metric information (ie, =
tfm) cannot be</FONT>

<BR><FONT SIZE=3D2>&gt;modified from within TeX, except a few =
parameters; I really wonder</FONT>

<BR><FONT SIZE=3D2>&gt;if allowing more changes, mainly ligatures, is =
feasible (that</FONT>

<BR><FONT SIZE=3D2>&gt;solution would be better than font ocp's and =
vf's, I think).</FONT>
</P>

<P><FONT SIZE=3D2>I don't understand this. What kind of font =
transformations are you</FONT>

<BR><FONT SIZE=3D2>referring to?</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt;&nbsp; my requirement for a usable internal =
representation is that I can take a</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp; single element of it at any time and =
it has a welldefined meaning (and a</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp; single one).</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Semantically or visually?</FONT>
</P>

<P><FONT SIZE=3D2>I suspect Frank considers meaning to be a semantic =
concept, not a visual.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt;&gt; at the LICR level means that the =
auxiliary files use the Unicode encoding;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&gt; if the editor is not a Unicode one these =
files become unmanageable and</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&gt; messy.</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; not true. the OICR has to be unicode (or =
more exactly unique and</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;well-defined</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; in the above sense, can be 20bits for all i =
care) if Omega ever should</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;go off</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; the ground. but the interface to the =
external world could apply a</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;well-defined</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; output translation to something else before =
writing.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;:-/ I meant from the user's point of view.&nbsp; =
(Perhaps the replay was</FONT>

<BR><FONT SIZE=3D2>&gt;too quick...)&nbsp; What I mean is that any LaTeX =
file (&quot;main&quot; or</FONT>

<BR><FONT SIZE=3D2>&gt;auxiliary) should follow the LaTeX systax in a =
form closer to the</FONT>

<BR><FONT SIZE=3D2>&gt;&quot;representation&quot; selected by the user =
(by &quot;representation&quot; a mean</FONT>

<BR><FONT SIZE=3D2>&gt;input encoding and maybe a set of macros).</FONT>
</P>

<P><FONT SIZE=3D2>The problem is that in multilingual documents there =
may not be a single</FONT>

<BR><FONT SIZE=3D2>such representation---the user can change input =
encoding just about</FONT>

<BR><FONT SIZE=3D2>anywhere in a document. This is why current LaTeX =
converts everything to</FONT>

<BR><FONT SIZE=3D2>LICR before it is written to the .aux file: the =
elements of the input</FONT>

<BR><FONT SIZE=3D2>encoding (as Frank called them above) do not have a =
single welldefined</FONT>

<BR><FONT SIZE=3D2>meaning. What has been discussed is that one might =
used some form of</FONT>

<BR><FONT SIZE=3D2>Unicode (most likely UTF-8) in these files =
instead.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;=3D=3D=3D=3D=3D=3D</FONT>

<BR><FONT SIZE=3D2>&gt;Lars wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; No it wouldn't. If \protect is not =
\@typeset@protect when \'e is expanded</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; then it will be written to a file as =
\'e.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Right.&nbsp; Exactly because of that we should =
not convert text to Unicode</FONT>

<BR><FONT SIZE=3D2>&gt;at this stage; otherwise we must change the =
definition depending on</FONT>

<BR><FONT SIZE=3D2>&gt;the file to be read.</FONT>
</P>

<P><FONT SIZE=3D2>We do already change e.g. the \catcode of @ for when =
.aux files are read.</FONT>

<BR><FONT SIZE=3D2>Changing the input encoding is much more work but not =
principally different.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;We must only move LaTeX code and its =
context</FONT>

<BR><FONT SIZE=3D2>&gt;information without changing&nbsp; it, so that if =
it is read correctly in the</FONT>

<BR><FONT SIZE=3D2>&gt;main file, it will be read correctly in the =
auxiliary file.</FONT>
</P>

<P><FONT SIZE=3D2>I believe one of the main problems for multilinguality =
in LaTeX today is</FONT>

<BR><FONT SIZE=3D2>that there is no way of recording (or maybe even of =
determining) the</FONT>

<BR><FONT SIZE=3D2>current context so that this information can be moved =
around with every</FONT>

<BR><FONT SIZE=3D2>piece of code affected by it. Hence most current =
commands strive instead to</FONT>

<BR><FONT SIZE=3D2>convert the code to a context-free representation =
(the LICR) by use of</FONT>

<BR><FONT SIZE=3D2>protected expansion.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt; But such characters (the Spanish as well as =
the Hebrew) aren't allowed in</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; names in LaTeX!</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;But they should be allowed in the future in we =
want a true</FONT>

<BR><FONT SIZE=3D2>&gt;multilingual environment.</FONT>
</P>

<P><FONT SIZE=3D2>Why? They are not part of any text, but part of the =
markup!</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt; It seems to me that what you are trying to do =
is to use a modified LaTeX</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; kernel which still does 8-bit input and =
output (in particular: it encodes</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; every character it puts onto an hlist as an =
8-bit quantity) on top of the</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; Omega 16-bit (or whatever it is right now) =
typesetting engine. Whereas this</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; is more powerful than the current LaTeX in =
that it can e.g. do</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; language-specific ligature processing =
without resorting to</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; language-specific fonts, it is no better at =
handling the problems related</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; to _multilinguality_ because it still cannot =
handle character sets that</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; spans more than one (8-bit) encoding. How =
would for example the proposed</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; code deal with the (nonsensical but legal) =
input</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp;&nbsp;&nbsp; =
a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu?</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;I don't understand why you say that.</FONT>
</P>

<P><FONT SIZE=3D2>Because of the example in the summary:</FONT>

<BR><FONT =
SIZE=3D2>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D</FONT>

<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
C&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
D&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; E</FONT>

<BR><FONT =
SIZE=3D2>----------------------------------------------------</FONT>

<BR><FONT SIZE=3D2>TeX&nbsp;&nbsp; a)&nbsp;&nbsp; =
&quot;82&nbsp;&nbsp;&nbsp;&nbsp; \'e&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
*&nbsp;&nbsp; - - - - - &gt;&nbsp;&nbsp; &quot;E9</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b)&nbsp;&nbsp; =
\'e&nbsp;&nbsp;&nbsp;&nbsp; \'e&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
*&nbsp;&nbsp; - - - - - &gt;&nbsp;&nbsp; &quot;E9</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c)&nbsp;&nbsp; =
&quot;82&nbsp;&nbsp;&nbsp;&nbsp; &quot;82&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
*&nbsp;&nbsp; - - - - - &gt;&nbsp;&nbsp; &quot;82</FONT>

<BR><FONT =
SIZE=3D2>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D</FONT>

<BR><FONT SIZE=3D2>Omega a)&nbsp; &quot;82&nbsp;&nbsp;&nbsp;&nbsp; =
&quot;82&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;82&nbsp;&nbsp;&nbsp;&nbsp; =
&quot;00E9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;E9</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b)&nbsp; =
\'e&nbsp;&nbsp;&nbsp;&nbsp; \'e&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
&quot;82&nbsp;&nbsp;&nbsp;&nbsp; =
&quot;00E9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;E9</FONT>
</P>

<P><FONT SIZE=3D2>The last line shows \'e being converted to an 8-bit =
quantity &quot;82</FONT>

<BR><FONT SIZE=3D2>(appearently the input encoding equivalent) before it =
is converted to</FONT>

<BR><FONT SIZE=3D2>Unicode. LaTeX lives between columns A and C, so =
there is no hint of any</FONT>

<BR><FONT SIZE=3D2>non-8-bit processing being done.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;In fact I don't undestand what you</FONT>

<BR><FONT SIZE=3D2>&gt;say :-) -- it looks very complicated to me. =
Anyway, it can handle two bits</FONT>

<BR><FONT SIZE=3D2>&gt;encodings and uft8, and language style files are =
written using utf8</FONT>

<BR><FONT SIZE=3D2>&gt;(which are directly converted to Unicode without =
any intermediate</FONT>

<BR><FONT SIZE=3D2>&gt;step).</FONT>
</P>

<P><FONT SIZE=3D2>That's what I would have expected, but the example =
gives no hint of this</FONT>

<BR><FONT SIZE=3D2>either.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;Regarding the last line, you can escape the =
current encoding with</FONT>

<BR><FONT SIZE=3D2>&gt;the \unichar macro (which is somewhat tricky to =
avoid killing</FONT>

<BR><FONT SIZE=3D2>&gt;ligatures/kerning). As I say in the readme file, =
applying that trick</FONT>

<BR><FONT SIZE=3D2>&gt;to utf8 didn't work.</FONT>
</P>

<P><FONT SIZE=3D2>Isn't the \char primitive in Omega be able to produce =
arbitrary characters</FONT>

<BR><FONT SIZE=3D2>(at least arbitrary characters in the basic =
multilingual plane)?</FONT>
</P>

<P><FONT SIZE=3D2>&gt;Actually, this preliminary lambda doesn't convert =
\'e to =E9, but to</FONT>

<BR><FONT SIZE=3D2>&gt;e U+0301 (ie, the corresponding combining char). =
In the internal</FONT>

<BR><FONT SIZE=3D2>&gt;Unicode step, accents are normalized in this way =
and then recombined</FONT>

<BR><FONT SIZE=3D2>&gt;by the font ocp. The definition of \' in the =
la.sd file is very simple:</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT =
SIZE=3D2>&gt;\DeclareScriptCommand\'[1]{#1\unichar{&quot;0301}}</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Very likely, this is one of the parts deserving =
improvements.</FONT>
</P>

<P><FONT SIZE=3D2>It looks quite reasonable to me, and it is certainly =
much better than the</FONT>

<BR><FONT SIZE=3D2>processing depicted in the example. Does this mean =
that the example should</FONT>

<BR><FONT SIZE=3D2>rather be</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; A&nbsp;&nbsp;&nbsp;&nbsp; =
B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
C&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
D&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; E</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; \'e&nbsp;&nbsp; \'e&nbsp;&nbsp; =
e^^^^0301&nbsp;&nbsp; ^^^^00e9&nbsp;&nbsp; ^^e9</FONT>
</P>

<P><FONT SIZE=3D2>(using the ^^ notation for non-ASCII =
characters)?</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0DAF9.E6C8B900--