MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C095A9.CDBB7500"
In-Reply-To:  <14984.20086.524553.168238@fell.open.ac.uk>
References: <14982.45082.150652.74719@istrati.zdv.uni-mainz.de>            <v03110701b6a9aae65099@[195.100.226.129]>            <200102091445.JAA00482@plmsc.psu.edu>            <200102091643.RAA23818@mozart.ujf-grenoble.Fr>            <14980.23750.628032.305093@gargle.gargle.HOWL>            <14982.45082.150652.74719@istrati.zdv.uni-mainz.de>
Content-class: urn:content-classes:message
Subject:      Re: LaTeX's internal char representation (UTF8 or Unicode?)
Date: Tue, 13 Feb 2001 11:37:29 +0100
Message-ID:  <l03130301b6aeb47f503a@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C095A9.CDBB7500
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 21.58 +0100 2001-02-12, Chris Rowley wrote:
>I can now ask the following questions:
>
>Do the designers of Omega think that it needs or has a TRM?
>
>Do the designers of LaTeX-for-Omega think that it needs a TRM?

Despite being neither, I would like to state that I think something of =
that
kind will be very useful (and probably necessary). My reasons for this =
is
my experience with the "harmless character strings" I implemented in the
xdoc package (see CTAN:macros/latex/exptl/xdoc/) as developing some sort =
of
reasonable data type for text strings made it much easier to pass them
around and do things to them (such as create useful sort keys for =
indices).
It needs to be stressed though that the harmless character string are
something quite different from the TRM Chris writes about, as I try to
describe what some piece of code was "before TeX saw (tokenized) it",
whereas the TRM seems to be what it is well inside TeX.

Before the above, Crise wrote:
>This is a thing that enables a computer-based system for processing
>`text' to represent `text things' so that it can, easily and
>independently, do at least the following (not formal definitions):
>
>-- apply transformations to `text strings';

xdoc does some things of this kind, although probably not very relevant =
to
the current context. Perhaps some existing Omega applications provide
better examples?

>-- reason about `text strings';
>
>-- construct more concrete representations of `text strings' as
>   `relatively positioned unrendered graphical objects';
>
>-- reason about such representations of text strings.

Could you please clearify these last two items? What properties would =
these
things havel, would they have e.g. width? Or is it the kind of thing =
which
becomes trivial in latin and similar scripts?

>A TRM is none of the following (although for efficiency of
>implementation it may well be closely related to them):
>
>-- a coding for `text files' (such as utf8 or ASCII);
>
>-- an encoding for strings of unrendered glyphs (such as the `text
>   strings' in a dvi file or pdf file);

One thing (not particularly related to the existence of a TRM) which =
would
most likely be needed in the "glorious successor of TeX" is some way of
converting the latter kind of text string (in font) to the former kind, =
for
use in diagnostic and error messages. Already today the contents of =
overful
hboxes containing math can be very hard to work out from the log =
messages.
But it is probably easier to set up such a conversion if there is a TRM,
since then you "only" need to define explicitly conversions of =
everything
to and from the TRM, instead of separate conversions from each font
encoding to e.g. UTF-8 (and any other output file encoding that might be =
in
use).

Lars Hellstr=F6m

------_=_NextPart_001_01C095A9.CDBB7500
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: LaTeX's internal char representation (UTF8 or =
Unicode?)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 21.58 +0100 2001-02-12, Chris Rowley wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;I can now ask the following questions:</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Do the designers of Omega think that it needs or =
has a TRM?</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Do the designers of LaTeX-for-Omega think that it =
needs a TRM?</FONT>
</P>

<P><FONT SIZE=3D2>Despite being neither, I would like to state that I =
think something of that</FONT>

<BR><FONT SIZE=3D2>kind will be very useful (and probably necessary). My =
reasons for this is</FONT>

<BR><FONT SIZE=3D2>my experience with the &quot;harmless character =
strings&quot; I implemented in the</FONT>

<BR><FONT SIZE=3D2>xdoc package (see CTAN:macros/latex/exptl/xdoc/) as =
developing some sort of</FONT>

<BR><FONT SIZE=3D2>reasonable data type for text strings made it much =
easier to pass them</FONT>

<BR><FONT SIZE=3D2>around and do things to them (such as create useful =
sort keys for indices).</FONT>

<BR><FONT SIZE=3D2>It needs to be stressed though that the harmless =
character string are</FONT>

<BR><FONT SIZE=3D2>something quite different from the TRM Chris writes =
about, as I try to</FONT>

<BR><FONT SIZE=3D2>describe what some piece of code was &quot;before TeX =
saw (tokenized) it&quot;,</FONT>

<BR><FONT SIZE=3D2>whereas the TRM seems to be what it is well inside =
TeX.</FONT>
</P>

<P><FONT SIZE=3D2>Before the above, Crise wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;This is a thing that enables a computer-based =
system for processing</FONT>

<BR><FONT SIZE=3D2>&gt;`text' to represent `text things' so that it can, =
easily and</FONT>

<BR><FONT SIZE=3D2>&gt;independently, do at least the following (not =
formal definitions):</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;-- apply transformations to `text =
strings';</FONT>
</P>

<P><FONT SIZE=3D2>xdoc does some things of this kind, although probably =
not very relevant to</FONT>

<BR><FONT SIZE=3D2>the current context. Perhaps some existing Omega =
applications provide</FONT>

<BR><FONT SIZE=3D2>better examples?</FONT>
</P>

<P><FONT SIZE=3D2>&gt;-- reason about `text strings';</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;-- construct more concrete representations of =
`text strings' as</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; `relatively positioned unrendered =
graphical objects';</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;-- reason about such representations of text =
strings.</FONT>
</P>

<P><FONT SIZE=3D2>Could you please clearify these last two items? What =
properties would these</FONT>

<BR><FONT SIZE=3D2>things havel, would they have e.g. width? Or is it =
the kind of thing which</FONT>

<BR><FONT SIZE=3D2>becomes trivial in latin and similar scripts?</FONT>
</P>

<P><FONT SIZE=3D2>&gt;A TRM is none of the following (although for =
efficiency of</FONT>

<BR><FONT SIZE=3D2>&gt;implementation it may well be closely related to =
them):</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;-- a coding for `text files' (such as utf8 or =
ASCII);</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;-- an encoding for strings of unrendered glyphs =
(such as the `text</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; strings' in a dvi file or pdf =
file);</FONT>
</P>

<P><FONT SIZE=3D2>One thing (not particularly related to the existence =
of a TRM) which would</FONT>

<BR><FONT SIZE=3D2>most likely be needed in the &quot;glorious successor =
of TeX&quot; is some way of</FONT>

<BR><FONT SIZE=3D2>converting the latter kind of text string (in font) =
to the former kind, for</FONT>

<BR><FONT SIZE=3D2>use in diagnostic and error messages. Already today =
the contents of overful</FONT>

<BR><FONT SIZE=3D2>hboxes containing math can be very hard to work out =
from the log messages.</FONT>

<BR><FONT SIZE=3D2>But it is probably easier to set up such a conversion =
if there is a TRM,</FONT>

<BR><FONT SIZE=3D2>since then you &quot;only&quot; need to define =
explicitly conversions of everything</FONT>

<BR><FONT SIZE=3D2>to and from the TRM, instead of separate conversions =
from each font</FONT>

<BR><FONT SIZE=3D2>encoding to e.g. UTF-8 (and any other output file =
encoding that might be in</FONT>

<BR><FONT SIZE=3D2>use).</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C095A9.CDBB7500--