MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0DC55.B0A04C00"
In-Reply-To:  Your message of "Sun, 13 May 2001 21:32:35 +0200."               <v03110701b7248c1d8889@[195.100.226.136]>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Mon, 14 May 2001 10:10:07 +0100
Message-ID:  <E14zEMh-0006vu-00@wisbech.cl.cam.ac.uk>
From: "Robin Fairbairns" <Robin.Fairbairns@CL.CAM.AC.UK>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0DC55.B0A04C00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

> >Well, the \InputTranslation and \OutputTranslation primitives of =
Omega
> >already provide that functionality, so there is no need to deal with
> >variable-sized characters in the TeX programming. The problem is that =
one
> >might want to employ additional sets of translations (which would =
then act
> >on streams of equally-sized characters) between those extremes of the
> >program, but Omega doesn't provide for this.
>
> I am not sure what you mean here: UTF-8 is variable sized.

gasp

> I suggested that for every file not using a 32-bit character type, one =
has
> an additional file (in ASCII) identified by some kind of file name =
ending
> with information about the encoding. (For example, if the file =
"<name>" is
> not 32-bit, is there si also an ASCII file named "<name>.encoding".)

yeah yeah yeah; all good osi-style practice ... but no-one really uses
much of osi networking nowadays, and for good reason -- the techniques
it employs are too clunky[*] for the real world.

in practice, most people know what encodings their files are in.  and
if they're into unicode, and encoding in utf-8 or utf-16, the chance
that they'll also be using another encoding is likely rather small; if
they're using latin-1 in parallel, it'll be consumed quite happily by
a utf-8 decoder.  imposing a schema file on *everything* is wild
overkill.

> This way, one can provide as many IO code converters as one bothers to
> write, without the extended TeX ever knows anything about it. (If =
Omega
> uses C++ for IO, one can use something called a codecvt. Or use pipes,
> where available.)

no.  omega does (shame) use clunky old c++ for some parts of its
operation, but it uses its own ocp mechanism for transforming
encodings.  macro coding to switch ocps at input time is trivial, but
not attractive for the normal case of using the same encoding all the
time.

[*] except in the areas "original" ip doesn't natively cope with at
all, like fully-extensible addressing and security.

------_=_NextPart_001_01C0DC55.B0A04C00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>&gt; &gt;Well, the \InputTranslation and =
\OutputTranslation primitives of Omega</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;already provide that functionality, so there =
is no need to deal with</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;variable-sized characters in the TeX =
programming. The problem is that one</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;might want to employ additional sets of =
translations (which would then act</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;on streams of equally-sized characters) =
between those extremes of the</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;program, but Omega doesn't provide for =
this.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt; I am not sure what you mean here: UTF-8 is =
variable sized.</FONT>
</P>

<P><FONT SIZE=3D2>gasp</FONT>
</P>

<P><FONT SIZE=3D2>&gt; I suggested that for every file not using a =
32-bit character type, one has</FONT>

<BR><FONT SIZE=3D2>&gt; an additional file (in ASCII) identified by some =
kind of file name ending</FONT>

<BR><FONT SIZE=3D2>&gt; with information about the encoding. (For =
example, if the file &quot;&lt;name&gt;&quot; is</FONT>

<BR><FONT SIZE=3D2>&gt; not 32-bit, is there si also an ASCII file named =
&quot;&lt;name&gt;.encoding&quot;.)</FONT>
</P>

<P><FONT SIZE=3D2>yeah yeah yeah; all good osi-style practice ... but =
no-one really uses</FONT>

<BR><FONT SIZE=3D2>much of osi networking nowadays, and for good reason =
-- the techniques</FONT>

<BR><FONT SIZE=3D2>it employs are too clunky[*] for the real =
world.</FONT>
</P>

<P><FONT SIZE=3D2>in practice, most people know what encodings their =
files are in.&nbsp; and</FONT>

<BR><FONT SIZE=3D2>if they're into unicode, and encoding in utf-8 or =
utf-16, the chance</FONT>

<BR><FONT SIZE=3D2>that they'll also be using another encoding is likely =
rather small; if</FONT>

<BR><FONT SIZE=3D2>they're using latin-1 in parallel, it'll be consumed =
quite happily by</FONT>

<BR><FONT SIZE=3D2>a utf-8 decoder.&nbsp; imposing a schema file on =
*everything* is wild</FONT>

<BR><FONT SIZE=3D2>overkill.</FONT>
</P>

<P><FONT SIZE=3D2>&gt; This way, one can provide as many IO code =
converters as one bothers to</FONT>

<BR><FONT SIZE=3D2>&gt; write, without the extended TeX ever knows =
anything about it. (If Omega</FONT>

<BR><FONT SIZE=3D2>&gt; uses C++ for IO, one can use something called a =
codecvt. Or use pipes,</FONT>

<BR><FONT SIZE=3D2>&gt; where available.)</FONT>
</P>

<P><FONT SIZE=3D2>no.&nbsp; omega does (shame) use clunky old c++ for =
some parts of its</FONT>

<BR><FONT SIZE=3D2>operation, but it uses its own ocp mechanism for =
transforming</FONT>

<BR><FONT SIZE=3D2>encodings.&nbsp; macro coding to switch ocps at input =
time is trivial, but</FONT>

<BR><FONT SIZE=3D2>not attractive for the normal case of using the same =
encoding all the</FONT>

<BR><FONT SIZE=3D2>time.</FONT>
</P>

<P><FONT SIZE=3D2>[*] except in the areas &quot;original&quot; ip =
doesn't natively cope with at</FONT>

<BR><FONT SIZE=3D2>all, like fully-extensible addressing and =
security.</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0DC55.B0A04C00--