MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C098FB.5FEDB100"
In-Reply-To:  <Pine.LNX.4.10.10102131831200.2744-100000@Sina.sharif.ac.ir>
References: <14985.13977.836075.844694@gargle.gargle.HOWL>            <Pine.LNX.4.10.10102131831200.2744-100000@Sina.sharif.ac.ir>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary
Date: Sat, 17 Feb 2001 16:46:40 +0100
Message-ID:  <14990.40160.856691.624617@istrati.zdv.uni-mainz.de>
From: "Frank Mittelbach" <frank.mittelbach@LATEX-PROJECT.ORG>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C098FB.5FEDB100
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Roozbeh,

 > On Tue, 13 Feb 2001, Marcel Oliver wrote:
 >
 > [Regarding UTF8]
 > > - Diagnostic messages could (although not with current TeX engine) =
be
 > >   output in the correct script.
 >
 > Really? Not with current TeX engine? Why?

it is not possible for log entries regarding the typesetting progress, =
eg
overfull box indications will be presented in the encoding of the font =
(which
in itself is not wrong but doesn't necessarily help if yo try to find =
that
text in your source:-)

and it is also not possible really in diagnostic messages produced from =
the
format or packages since the script that is used for them is hardwired =
into
TeX and at most changable at invocation, ie it is some 8bit thingie and =
might
in fact represent the upper part of 8bit as ^^ notation (depending on =
the
implementation)

as I understand the situation, Omega has the same defects though they =
appear
slightly different due to a different internal model (see the longer =
mail
about LICR and OICR)

 > > - The "combining characters" of Unicode are difficult to handle =
with a
 > >   TeX based parser.  (Does "difficult" mean "impossible to get
 > >   right"???  What are the issues???)
 >
 > Every letter should be made active to look forward to find the =
combining
 > character sequence after it, and then puts that over its own head! I =
don't
 > think this is impossible, you need to loop until a non-combining char =
is
 > found.

David explained what that would do to tokenisation of \begin etc (six =
tokens
instead of one), but yes you can provide an surface interface that would =
work
in this way. Only it would make LaTeX a lot lot slower without any =
benefit for
the majority of users (which goes back to my point of it being =
impossible to
make such a change if there aren't any real cute features those people =
wish to
have, to make the overlook other changes)

 > > - The output encoding is limited to 8 bit fonts, which may not be
 > >   enough to get correct kerning for some languages. (Can someone
 > >   confirm or correct this???)
 >
 > We need some examples. I can't find any.

Greek might be one if you require (as LaTeX currently does) that visible =
ascii
is part of the font encoding.

frank

------_=_NextPart_001_01C098FB.5FEDB100
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Roozbeh,</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; On Tue, 13 Feb 2001, Marcel Oliver =
wrote:</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; [Regarding UTF8]</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt; - Diagnostic messages could (although =
not with current TeX engine) be</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt;&nbsp;&nbsp; output in the correct =
script.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; Really? Not with current TeX engine? =
Why?</FONT>
</P>

<P><FONT SIZE=3D2>it is not possible for log entries regarding the =
typesetting progress, eg</FONT>

<BR><FONT SIZE=3D2>overfull box indications will be presented in the =
encoding of the font (which</FONT>

<BR><FONT SIZE=3D2>in itself is not wrong but doesn't necessarily help =
if yo try to find that</FONT>

<BR><FONT SIZE=3D2>text in your source:-)</FONT>
</P>

<P><FONT SIZE=3D2>and it is also not possible really in diagnostic =
messages produced from the</FONT>

<BR><FONT SIZE=3D2>format or packages since the script that is used for =
them is hardwired into</FONT>

<BR><FONT SIZE=3D2>TeX and at most changable at invocation, ie it is =
some 8bit thingie and might</FONT>

<BR><FONT SIZE=3D2>in fact represent the upper part of 8bit as ^^ =
notation (depending on the</FONT>

<BR><FONT SIZE=3D2>implementation)</FONT>
</P>

<P><FONT SIZE=3D2>as I understand the situation, Omega has the same =
defects though they appear</FONT>

<BR><FONT SIZE=3D2>slightly different due to a different internal model =
(see the longer mail</FONT>

<BR><FONT SIZE=3D2>about LICR and OICR)</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; &gt; - The &quot;combining characters&quot; =
of Unicode are difficult to handle with a</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt;&nbsp;&nbsp; TeX based parser.&nbsp; =
(Does &quot;difficult&quot; mean &quot;impossible to get</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt;&nbsp;&nbsp; right&quot;???&nbsp; What =
are the issues???)</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; Every letter should be made active to look =
forward to find the combining</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; character sequence after it, and then puts =
that over its own head! I don't</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; think this is impossible, you need to loop =
until a non-combining char is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; found.</FONT>
</P>

<P><FONT SIZE=3D2>David explained what that would do to tokenisation of =
\begin etc (six tokens</FONT>

<BR><FONT SIZE=3D2>instead of one), but yes you can provide an surface =
interface that would work</FONT>

<BR><FONT SIZE=3D2>in this way. Only it would make LaTeX a lot lot =
slower without any benefit for</FONT>

<BR><FONT SIZE=3D2>the majority of users (which goes back to my point of =
it being impossible to</FONT>

<BR><FONT SIZE=3D2>make such a change if there aren't any real cute =
features those people wish to</FONT>

<BR><FONT SIZE=3D2>have, to make the overlook other changes)</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; &gt; - The output encoding is limited to 8 =
bit fonts, which may not be</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt;&nbsp;&nbsp; enough to get correct =
kerning for some languages. (Can someone</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt;&nbsp;&nbsp; confirm or correct =
this???)</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; We need some examples. I can't find =
any.</FONT>
</P>

<P><FONT SIZE=3D2>Greek might be one if you require (as LaTeX currently =
does) that visible ascii</FONT>

<BR><FONT SIZE=3D2>is part of the font encoding.</FONT>
</P>

<P><FONT SIZE=3D2>frank</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C098FB.5FEDB100--