Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4AH0wf05833 for ; Thu, 10 May 2001 19:00:59 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4AH0w701473 . for ; Thu, 10 May 2001 19:00:58 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4AH0vU04156 for ; Thu, 10 May 2001 19:00:57 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0D972.C91D1780" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id TAA11023 for ; Thu, 10 May 2001 19:00:57 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4AH0vU04152 for ; Thu, 10 May 2001 19:00:57 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <9.CC0C2330@mail.listserv.gmd.de>; Thu, 10 May 2001 18:59:26 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 496128 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Thu, 10 May 2001 19:00:28 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id TAA07857 for ; Thu, 10 May 2001 19:00:27 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id TAA66712 for ; Thu, 10 May 2001 19:00:27 +0200 Received: from smtp.wanadoo.es (m1smtpisp02.wanadoo.es [62.36.220.21] (may be forged)) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4AH0SQ12015 for ; Thu, 10 May 2001 19:00:28 +0200 (MET DST) Received: from wanadoo.es (m1wmail1.wanadoo.es [62.36.220.41]) by smtp.wanadoo.es (8.10.2/8.10.2) with ESMTP id f4AH0QI14801 for ; Thu, 10 May 2001 19:00:26 +0200 (MET DST) Return-Path: x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id TAA07858 x-xam3-api-version: 1.1.11.1.6 x-senderip: 195.53.220.3 Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Thu, 10 May 2001 18:00:26 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "jbezos" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4041 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0D972.C91D1780 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Quick answers to a couple of points. Lars says: >The comparison in Section 3.2.1 of how characters are processed in TeX = and >Omega respectively also seems strange. In Omega case (b), column C, we = see >that the LICR character \'e is converted to an 8-bit character "82 = before >some OTP converts it to the Unicode character "00E9 in column D. Surely >this can't be right---whenever LICR is converted to anything it should = be >to full Unicode, since we will otherwise end up in an encoding morass = much >worse than that in current LaTeX. Surely it's right :-). Remember that =E9 is not an active character in lambda and that ocp's are applied after expansion. Let's consider the input =E9\'e=E9. It's expanded to the character sequence "82 "82 = "82, which is fine. If we define \'e as "00E9 the expansion is "82 "00 "E9 "82, which is definitely wrong. Further, converting the input to Unicode at the LICR level means that the auxiliary files use the Unicode = encoding; if the editor is not a Unicode one these files become unmanageable and = messy. LICR should preserve, IMO, the current LaTeX conventions, and =E9\'e=E9 should be written to these files in exactly that way. Or in other words, any file to be read by LaTeX should follow the "external" LaTeX conventions and only transcoded in the mouth. >As I understand the Omega draft documentation, there can be no more = than >one OTP (the \InputTranslation) acting on the input of LaTeX at any = time >and that OTP in only meant to handle the basic conversion from the = external >encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit >Unicode. All this happens way before the input gets tokenized, so there = is In fact, \InputEncoding was not intended for that, but only for "technical" translations which applies to the whole document as one byte -> two byte or little endian -> big endian. The main problem of it is that it doesn't translate macros: \def\myE{=C9} \InputEncoding =C9\myE only the explicit =C9 is transcoded. However, that can be desirable under some circumstances, but you know in advance which encodings will be used. More dangerous is the following: \comenzar{enumeraci=F3n} % Spanish interface with, say, MacRoman % \comenzar means \begin \InputEncoding \terminar{enumeraci=F3n} % <- that's transcoded using iso hebrew! Regards Javier _____________________________________________________________________ Conoce la que ser=E1 la pel=EDcula del verano y ll=E9vate una camiseta = de cine en http://www.marujasasesinas.com/html/concurso.html ------_=_NextPart_001_01C0D972.C91D1780 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

Quick answers to a couple of points. Lars says:

>The comparison in Section 3.2.1 of how characters = are processed in TeX and
>Omega respectively also seems strange. In Omega = case (b), column C, we see
>that the LICR character \'e is converted to an = 8-bit character "82 before
>some OTP converts it to the Unicode character = "00E9 in column D. Surely
>this can't be right---whenever LICR is converted = to anything it should be
>to full Unicode, since we will otherwise end up = in an encoding morass much
>worse than that in current LaTeX.

Surely it's right :-). Remember that =E9 is not an = active character in
lambda and that ocp's are applied after expansion. = Let's consider
the input =E9\'e=E9. It's expanded to the character = sequence "82 "82 "82,
which is fine. If we define \'e as "00E9 the = expansion is "82 "00 "E9
"82, which is definitely wrong. Further, = converting the input to Unicode
at the LICR level means that the auxiliary files use = the Unicode encoding;
if the editor is not a Unicode one these files become = unmanageable and messy.
LICR should preserve, IMO, the current LaTeX = conventions, and =E9\'e=E9
should be written to these files in exactly that way. = Or in other words,
any file to be read by LaTeX should follow the = "external" LaTeX
conventions and only transcoded in the mouth.

>As I understand the Omega draft documentation, = there can be no more than
>one OTP (the \InputTranslation) acting on the = input of LaTeX at any time
>and that OTP in only meant to handle the basic = conversion from the external
>encoding (ASCII, latin-1, UTF-8, or whatever) to = the internal 32-bit
>Unicode. All this happens way before the input = gets tokenized, so there is

In fact, \InputEncoding was not intended for that, but = only for
"technical" translations which applies to = the whole document
as one byte -> two byte or little endian -> big = endian. The main
problem of it is that it doesn't translate = macros:
\def\myE{=C9}
\InputEncoding <an encoding>
=C9\myE

only the explicit =C9 is transcoded. However, that can = be desirable
under some circumstances, but you know in advance = which encodings
will be used. More dangerous is the following:

\comenzar{enumeraci=F3n} % Spanish interface with, = say, MacRoman
          &nbs= p;            % = \comenzar means \begin
\InputEncoding <iso hebrew>

\terminar{enumeraci=F3n} % <- that's transcoded = using iso hebrew!

Regards
Javier


________________________________________________________________= _____
Conoce la que ser=E1 la pel=EDcula del verano y = ll=E9vate una camiseta de cine en http://www.mar= ujasasesinas.com/html/concurso.html

------_=_NextPart_001_01C0D972.C91D1780--