Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4RJCff26165 for ; Sun, 27 May 2001 21:12:42 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4RJCf728583 . for ; Sun, 27 May 2001 21:12:41 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4RJCeU08477 for ; Sun, 27 May 2001 21:12:40 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0E6E1.00B0F100" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id VAA29529 for ; Sun, 27 May 2001 21:12:31 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4RJCU005046 for ; Sun, 27 May 2001 21:12:30 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <1.EC3E3729@mail.listserv.gmd.de>; Sun, 27 May 2001 21:10:30 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 497092 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sun, 27 May 2001 21:12:26 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id VAA11910 for ; Sun, 27 May 2001 21:12:24 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id VAA44658 for ; Sun, 27 May 2001 21:12:26 +0200 Received: from algonet.se (delenn.tninet.se [195.100.94.104]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4RJCL126230 for ; Sun, 27 May 2001 21:12:22 +0200 (MET DST) Received: from [195.100.226.136] (du136-226.ppp.su-anst.tninet.se [195.100.226.136]) by delenn.tninet.se (BLUETAIL Mail Robustifier 2.2.2) with ESMTP id 537009.990737.990delenn-s1 for ; Sun, 27 May 2001 21:12:17 +0200 In-Reply-To: <15120.57621.256542.391864@gargle.gargle.HOWL> References: <200105270954.f4R9sBI23611@smtp.wanadoo.es> <200105270954.f4R9sBI23611@smtp.wanadoo.es> Return-Path: X-Sender: haberg@pop.matematik.su.se Content-class: urn:content-classes:message Subject: Re: \InputTranslation Date: Sun, 27 May 2001 20:10:33 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Hans Aberg" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4111 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0E6E1.00B0F100 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 13:12 +0200 2001/05/27, Marcel Oliver wrote: >...it looks like there are a couple of strategies: > >1. Store the full language context with every character token sequence > along the lines that Javier suggests. I think that this might turn out to be a no-no for the simple sake of speed: Characters are at such a fundamental level that they should be computationally as simple as possible. I got the impression that the current Omega makes use of only 16-bit characters (right?). -- It is however possible with C/C++ to guarantee = an integral type with at least 32 bits in it, if one stays away from = wchar_t. :-) >2. Treat input encoding completely separate from language context. > Input encoding just determines how to get from an arbitrary > encoding to the Unicode(-like) ICR. Thus, switches in the language > context have to be tagged explicitly by the user. ... >3. Extreme version of 2 (the only strategy that seems to be cleanly > implementable on current Omega): > > We simply define the \InputTranslation to be fixed on a per-file > basis. I think of a hybrid between these two: One advantage of the last one, 3, is that formats become independent of = IO encodings: If there is a mechanism external to the file selecting the encoding, it will be possible to choose the encoding of .aux files etc., and then get Omega get to read it back without changing any pre-compiled format. If the only mechanism is selecting encoding from within a file = that is compiled, this will not be possible. > In other words, we acknowledge that it does not make any > sense in terms of usability to mix input encodings, as such files > simply cannot (and should not) be displayed cleanly in any editor. This does not follow: One can easily define an translation that can = handle different input encodings in the same file. The requirement is instead that the translator must know when it reads = the file byte by byte when and how to switch. If you integrate these = switches with TeX's macro system, then switches can be hard to predict, but that = is all. On the other hand, Robin Fairbairns didn't like the approach 3, because = the directory might become littered with files indicating the encoding. So why not do this: When Omega starts, one indicates the encoding in the first file that Omega is reading. This would be a mode (cf Omega draft, = ch 12), plus an OTP (loc.cit. ch. 8). There can be some simplifying = defaults corresponding to formats that editors can handle (like ASCII and = Unicode). Then other files can be opened using information about mode + OTP as I figure is the case now. But in addition, one can provide external encoding information about a = file that overrides the translation information in the command opening the = file. This way, even though a format is compiled to write and read .aux files = in say Unicode, one may override it and get Omega to write and read .aux = files in say UTF8. The question though, when playing around with these ideas, is how people will use the features implemented. Hans Aberg ------_=_NextPart_001_01C0E6E1.00B0F100 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: \InputTranslation

At 13:12 +0200 2001/05/27, Marcel Oliver wrote:
>...it looks like there are a couple of = strategies:
>
>1. Store the full language context with every = character token sequence
>   along the lines that Javier = suggests.

I think that this might turn out to be a no-no for the = simple sake of
speed: Characters are at such a fundamental level = that they should be
computationally as simple as possible.

I got the impression that the current Omega makes use = of only 16-bit
characters (right?). -- It is however possible with = C/C++ to guarantee an
integral type with at least 32 bits in it, if one = stays away from wchar_t.
:-)

>2. Treat input encoding completely separate from = language context.
>   Input encoding just determines how = to get from an arbitrary
>   encoding to the Unicode(-like) = ICR.  Thus, switches in the language
>   context have to be tagged explicitly = by the user.
...
>3. Extreme version of 2 (the only strategy that = seems to be cleanly
>   implementable on current = Omega):
>
>   We simply define the = \InputTranslation to be fixed on a per-file
>   basis.

I think of a hybrid between these two:

One advantage of the last one, 3, is that formats = become independent of IO
encodings: If there is a mechanism external to the = file selecting the
encoding, it will be possible to choose the encoding = of .aux files etc.,
and then get Omega get to read it back without = changing any pre-compiled
format. If the only mechanism is selecting encoding = from within a file that
is compiled, this will not be possible.

> In other words, we acknowledge that it does not = make any
>   sense in terms of usability to mix = input encodings, as such files
>   simply cannot (and should not) be = displayed cleanly in any editor.

This does not follow: One can easily define an = translation that can handle
different input encodings in the same file.

The requirement is instead that the translator must = know when it reads the
file byte by byte when and how to switch. If you = integrate these switches
with TeX's macro system, then switches can be hard to = predict, but that is
all.

On the other hand, Robin Fairbairns didn't like the = approach 3, because the
directory might become littered with files indicating = the encoding.

So why not do this: When Omega starts, one indicates = the encoding in the
first file that Omega is reading. This would be a = mode (cf Omega draft, ch
12), plus an  OTP (loc.cit. ch. 8). There can be = some simplifying defaults
corresponding to formats that editors can handle = (like ASCII and Unicode).

Then other files can be opened using information about = mode + OTP as I
figure is   the case now.

But in addition, one can provide external encoding = information about a file
that overrides the translation information in the = command opening the file.

This way, even though a format is compiled to write = and read .aux files in
say Unicode, one may override it and get Omega to = write and read .aux files
in say UTF8.

The question though, when playing around with these = ideas, is how people
will use the features implemented.

  Hans Aberg

------_=_NextPart_001_01C0E6E1.00B0F100--