Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1HG4wf02312 for ; Sat, 17 Feb 2001 17:04:58 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1HG4vd19788 . for ; Sat, 17 Feb 2001 17:04:58 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1HG4vH17177 for ; Sat, 17 Feb 2001 17:04:57 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C098FB.5FEDB100" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id RAA17295 for ; Sat, 17 Feb 2001 17:04:56 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1HG4uH17173 for ; Sat, 17 Feb 2001 17:04:56 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <13.300E6ACB@mail.listserv.gmd.de>; Sat, 17 Feb 2001 17:04:44 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 489523 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sat, 17 Feb 2001 17:04:48 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id RAA20807 for ; Sat, 17 Feb 2001 17:04:45 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id RAA29476 for ; Sat, 17 Feb 2001 17:04:45 +0100 Received: from moutvdom01.kundenserver.de (moutvdom01.kundenserver.de [195.20.224.200]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1HG4kx06822 for ; Sat, 17 Feb 2001 17:04:46 +0100 (MET) Received: from [195.20.224.219] (helo=mrvdom03.kundenserver.de) by moutvdom01.kundenserver.de with esmtp (Exim 2.12 #2) id 14U9qf-0007IL-00 for LATEX-L@urz.uni-heidelberg.de; Sat, 17 Feb 2001 17:04:37 +0100 Received: from manz-3e364882.pool.mediaways.net ([62.54.72.130] helo=istrati.zdv.uni-mainz.de) by mrvdom03.kundenserver.de with esmtp (Exim 2.12 #2) id 14U9qb-0000rp-00 for LATEX-L@URZ.UNI-HEIDELBERG.DE; Sat, 17 Feb 2001 17:04:34 +0100 Received: (from latex3@localhost) by istrati.zdv.uni-mainz.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id QAA05812; Sat, 17 Feb 2001 16:46:41 +0100 In-Reply-To: References: <14985.13977.836075.844694@gargle.gargle.HOWL> Return-Path: X-Mailer: VM 6.75 under Emacs 20.4.1 X-Authentication-Warning: istrati.zdv.uni-mainz.de: latex3 set sender to frank@mittelbach-online.de using -f Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary Date: Sat, 17 Feb 2001 16:46:40 +0100 Message-ID: <14990.40160.856691.624617@istrati.zdv.uni-mainz.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Frank Mittelbach" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3952 This is a multi-part message in MIME format. ------_=_NextPart_001_01C098FB.5FEDB100 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Roozbeh, > On Tue, 13 Feb 2001, Marcel Oliver wrote: > > [Regarding UTF8] > > - Diagnostic messages could (although not with current TeX engine) = be > > output in the correct script. > > Really? Not with current TeX engine? Why? it is not possible for log entries regarding the typesetting progress, = eg overfull box indications will be presented in the encoding of the font = (which in itself is not wrong but doesn't necessarily help if yo try to find = that text in your source:-) and it is also not possible really in diagnostic messages produced from = the format or packages since the script that is used for them is hardwired = into TeX and at most changable at invocation, ie it is some 8bit thingie and = might in fact represent the upper part of 8bit as ^^ notation (depending on = the implementation) as I understand the situation, Omega has the same defects though they = appear slightly different due to a different internal model (see the longer = mail about LICR and OICR) > > - The "combining characters" of Unicode are difficult to handle = with a > > TeX based parser. (Does "difficult" mean "impossible to get > > right"??? What are the issues???) > > Every letter should be made active to look forward to find the = combining > character sequence after it, and then puts that over its own head! I = don't > think this is impossible, you need to loop until a non-combining char = is > found. David explained what that would do to tokenisation of \begin etc (six = tokens instead of one), but yes you can provide an surface interface that would = work in this way. Only it would make LaTeX a lot lot slower without any = benefit for the majority of users (which goes back to my point of it being = impossible to make such a change if there aren't any real cute features those people = wish to have, to make the overlook other changes) > > - The output encoding is limited to 8 bit fonts, which may not be > > enough to get correct kerning for some languages. (Can someone > > confirm or correct this???) > > We need some examples. I can't find any. Greek might be one if you require (as LaTeX currently does) that visible = ascii is part of the font encoding. frank ------_=_NextPart_001_01C098FB.5FEDB100 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary

Roozbeh,

 > On Tue, 13 Feb 2001, Marcel Oliver = wrote:
 >
 > [Regarding UTF8]
 > > - Diagnostic messages could (although = not with current TeX engine) be
 > >   output in the correct = script.
 >
 > Really? Not with current TeX engine? = Why?

it is not possible for log entries regarding the = typesetting progress, eg
overfull box indications will be presented in the = encoding of the font (which
in itself is not wrong but doesn't necessarily help = if yo try to find that
text in your source:-)

and it is also not possible really in diagnostic = messages produced from the
format or packages since the script that is used for = them is hardwired into
TeX and at most changable at invocation, ie it is = some 8bit thingie and might
in fact represent the upper part of 8bit as ^^ = notation (depending on the
implementation)

as I understand the situation, Omega has the same = defects though they appear
slightly different due to a different internal model = (see the longer mail
about LICR and OICR)

 > > - The "combining characters" = of Unicode are difficult to handle with a
 > >   TeX based parser.  = (Does "difficult" mean "impossible to get
 > >   right"???  What = are the issues???)
 >
 > Every letter should be made active to look = forward to find the combining
 > character sequence after it, and then puts = that over its own head! I don't
 > think this is impossible, you need to loop = until a non-combining char is
 > found.

David explained what that would do to tokenisation of = \begin etc (six tokens
instead of one), but yes you can provide an surface = interface that would work
in this way. Only it would make LaTeX a lot lot = slower without any benefit for
the majority of users (which goes back to my point of = it being impossible to
make such a change if there aren't any real cute = features those people wish to
have, to make the overlook other changes)

 > > - The output encoding is limited to 8 = bit fonts, which may not be
 > >   enough to get correct = kerning for some languages. (Can someone
 > >   confirm or correct = this???)
 >
 > We need some examples. I can't find = any.

Greek might be one if you require (as LaTeX currently = does) that visible ascii
is part of the font encoding.

frank

------_=_NextPart_001_01C098FB.5FEDB100--