Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4E9AGf20727 for ; Mon, 14 May 2001 11:10:16 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4E9AF720393 . for ; Mon, 14 May 2001 11:10:16 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0DC55.B0A04C00" Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4E9AF004893 for ; Mon, 14 May 2001 11:10:15 +0200 (MET DST) Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id LAA21186 for ; Mon, 14 May 2001 11:10:14 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4E9AEU04874 for ; Mon, 14 May 2001 11:10:14 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <14.AFEB4AF5@mail.listserv.gmd.de>; Mon, 14 May 2001 11:08:37 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 495449 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Mon, 14 May 2001 11:10:10 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id LAA10389 for ; Mon, 14 May 2001 11:10:09 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id LAA24616 for ; Mon, 14 May 2001 11:10:09 +0200 Received: from wisbech.cl.cam.ac.uk (mta1.cl.cam.ac.uk [128.232.0.15]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4E9A7Q23012 for ; Mon, 14 May 2001 11:10:08 +0200 (MET DST) Received: from pallas.cl.cam.ac.uk ([128.232.8.88] helo=cl.cam.ac.uk ident=rf) by wisbech.cl.cam.ac.uk with esmtp (Exim 3.092 #1) id 14zEMh-0006vu-00 for LATEX-L@URZ.UNI-HEIDELBERG.DE; Mon, 14 May 2001 10:10:07 +0100 In-Reply-To: Your message of "Sun, 13 May 2001 21:32:35 +0200." Return-Path: Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Mon, 14 May 2001 10:10:07 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Robin Fairbairns" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4056 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0DC55.B0A04C00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable > >Well, the \InputTranslation and \OutputTranslation primitives of = Omega > >already provide that functionality, so there is no need to deal with > >variable-sized characters in the TeX programming. The problem is that = one > >might want to employ additional sets of translations (which would = then act > >on streams of equally-sized characters) between those extremes of the > >program, but Omega doesn't provide for this. > > I am not sure what you mean here: UTF-8 is variable sized. gasp > I suggested that for every file not using a 32-bit character type, one = has > an additional file (in ASCII) identified by some kind of file name = ending > with information about the encoding. (For example, if the file = "" is > not 32-bit, is there si also an ASCII file named ".encoding".) yeah yeah yeah; all good osi-style practice ... but no-one really uses much of osi networking nowadays, and for good reason -- the techniques it employs are too clunky[*] for the real world. in practice, most people know what encodings their files are in. and if they're into unicode, and encoding in utf-8 or utf-16, the chance that they'll also be using another encoding is likely rather small; if they're using latin-1 in parallel, it'll be consumed quite happily by a utf-8 decoder. imposing a schema file on *everything* is wild overkill. > This way, one can provide as many IO code converters as one bothers to > write, without the extended TeX ever knows anything about it. (If = Omega > uses C++ for IO, one can use something called a codecvt. Or use pipes, > where available.) no. omega does (shame) use clunky old c++ for some parts of its operation, but it uses its own ocp mechanism for transforming encodings. macro coding to switch ocps at input time is trivial, but not attractive for the normal case of using the same encoding all the time. [*] except in the areas "original" ip doesn't natively cope with at all, like fully-extensible addressing and security. ------_=_NextPart_001_01C0DC55.B0A04C00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

> >Well, the \InputTranslation and = \OutputTranslation primitives of Omega
> >already provide that functionality, so there = is no need to deal with
> >variable-sized characters in the TeX = programming. The problem is that one
> >might want to employ additional sets of = translations (which would then act
> >on streams of equally-sized characters) = between those extremes of the
> >program, but Omega doesn't provide for = this.
>
> I am not sure what you mean here: UTF-8 is = variable sized.

gasp

> I suggested that for every file not using a = 32-bit character type, one has
> an additional file (in ASCII) identified by some = kind of file name ending
> with information about the encoding. (For = example, if the file "<name>" is
> not 32-bit, is there si also an ASCII file named = "<name>.encoding".)

yeah yeah yeah; all good osi-style practice ... but = no-one really uses
much of osi networking nowadays, and for good reason = -- the techniques
it employs are too clunky[*] for the real = world.

in practice, most people know what encodings their = files are in.  and
if they're into unicode, and encoding in utf-8 or = utf-16, the chance
that they'll also be using another encoding is likely = rather small; if
they're using latin-1 in parallel, it'll be consumed = quite happily by
a utf-8 decoder.  imposing a schema file on = *everything* is wild
overkill.

> This way, one can provide as many IO code = converters as one bothers to
> write, without the extended TeX ever knows = anything about it. (If Omega
> uses C++ for IO, one can use something called a = codecvt. Or use pipes,
> where available.)

no.  omega does (shame) use clunky old c++ for = some parts of its
operation, but it uses its own ocp mechanism for = transforming
encodings.  macro coding to switch ocps at input = time is trivial, but
not attractive for the normal case of using the same = encoding all the
time.

[*] except in the areas "original" ip = doesn't natively cope with at
all, like fully-extensible addressing and = security.

------_=_NextPart_001_01C0DC55.B0A04C00--