Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1DAhUH24657 for ; Tue, 13 Feb 2001 11:43:30 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1DAhTd00935 . for ; Tue, 13 Feb 2001 11:43:29 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1DAhS713668 for ; Tue, 13 Feb 2001 11:43:28 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C095A9.CDBB7500" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id LAA16254 for ; Tue, 13 Feb 2001 11:43:27 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1DAhQ713664 for ; Tue, 13 Feb 2001 11:43:27 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <9.959D586F@mail.listserv.gmd.de>; Tue, 13 Feb 2001 11:43:02 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 487903 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Tue, 13 Feb 2001 11:37:51 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id LAA28354 for ; Tue, 13 Feb 2001 11:37:50 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id LAA08848 for ; Tue, 13 Feb 2001 11:37:50 +0100 Received: from abel.math.umu.se (abel.math.umu.se [130.239.20.139]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1DAbou16266 for ; Tue, 13 Feb 2001 11:37:50 +0100 (MET) Received: from [130.239.20.144] (mac144.math.umu.se [130.239.20.144]) by abel.math.umu.se (8.9.2/8.9.2) with ESMTP id LAA22903 for ; Tue, 13 Feb 2001 11:35:40 +0100 (CET) In-Reply-To: <14984.20086.524553.168238@fell.open.ac.uk> References: <14982.45082.150652.74719@istrati.zdv.uni-mainz.de> <200102091445.JAA00482@plmsc.psu.edu> <200102091643.RAA23818@mozart.ujf-grenoble.Fr> <14980.23750.628032.305093@gargle.gargle.HOWL> <14982.45082.150652.74719@istrati.zdv.uni-mainz.de> Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id LAA28355 Content-class: urn:content-classes:message Subject: Re: LaTeX's internal char representation (UTF8 or Unicode?) Date: Tue, 13 Feb 2001 11:37:29 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3872 This is a multi-part message in MIME format. ------_=_NextPart_001_01C095A9.CDBB7500 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 21.58 +0100 2001-02-12, Chris Rowley wrote: >I can now ask the following questions: > >Do the designers of Omega think that it needs or has a TRM? > >Do the designers of LaTeX-for-Omega think that it needs a TRM? Despite being neither, I would like to state that I think something of = that kind will be very useful (and probably necessary). My reasons for this = is my experience with the "harmless character strings" I implemented in the xdoc package (see CTAN:macros/latex/exptl/xdoc/) as developing some sort = of reasonable data type for text strings made it much easier to pass them around and do things to them (such as create useful sort keys for = indices). It needs to be stressed though that the harmless character string are something quite different from the TRM Chris writes about, as I try to describe what some piece of code was "before TeX saw (tokenized) it", whereas the TRM seems to be what it is well inside TeX. Before the above, Crise wrote: >This is a thing that enables a computer-based system for processing >`text' to represent `text things' so that it can, easily and >independently, do at least the following (not formal definitions): > >-- apply transformations to `text strings'; xdoc does some things of this kind, although probably not very relevant = to the current context. Perhaps some existing Omega applications provide better examples? >-- reason about `text strings'; > >-- construct more concrete representations of `text strings' as > `relatively positioned unrendered graphical objects'; > >-- reason about such representations of text strings. Could you please clearify these last two items? What properties would = these things havel, would they have e.g. width? Or is it the kind of thing = which becomes trivial in latin and similar scripts? >A TRM is none of the following (although for efficiency of >implementation it may well be closely related to them): > >-- a coding for `text files' (such as utf8 or ASCII); > >-- an encoding for strings of unrendered glyphs (such as the `text > strings' in a dvi file or pdf file); One thing (not particularly related to the existence of a TRM) which = would most likely be needed in the "glorious successor of TeX" is some way of converting the latter kind of text string (in font) to the former kind, = for use in diagnostic and error messages. Already today the contents of = overful hboxes containing math can be very hard to work out from the log = messages. But it is probably easier to set up such a conversion if there is a TRM, since then you "only" need to define explicitly conversions of = everything to and from the TRM, instead of separate conversions from each font encoding to e.g. UTF-8 (and any other output file encoding that might be = in use). Lars Hellstr=F6m ------_=_NextPart_001_01C095A9.CDBB7500 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: LaTeX's internal char representation (UTF8 or = Unicode?)

At 21.58 +0100 2001-02-12, Chris Rowley wrote:
>I can now ask the following questions:
>
>Do the designers of Omega think that it needs or = has a TRM?
>
>Do the designers of LaTeX-for-Omega think that it = needs a TRM?

Despite being neither, I would like to state that I = think something of that
kind will be very useful (and probably necessary). My = reasons for this is
my experience with the "harmless character = strings" I implemented in the
xdoc package (see CTAN:macros/latex/exptl/xdoc/) as = developing some sort of
reasonable data type for text strings made it much = easier to pass them
around and do things to them (such as create useful = sort keys for indices).
It needs to be stressed though that the harmless = character string are
something quite different from the TRM Chris writes = about, as I try to
describe what some piece of code was "before TeX = saw (tokenized) it",
whereas the TRM seems to be what it is well inside = TeX.

Before the above, Crise wrote:
>This is a thing that enables a computer-based = system for processing
>`text' to represent `text things' so that it can, = easily and
>independently, do at least the following (not = formal definitions):
>
>-- apply transformations to `text = strings';

xdoc does some things of this kind, although probably = not very relevant to
the current context. Perhaps some existing Omega = applications provide
better examples?

>-- reason about `text strings';
>
>-- construct more concrete representations of = `text strings' as
>   `relatively positioned unrendered = graphical objects';
>
>-- reason about such representations of text = strings.

Could you please clearify these last two items? What = properties would these
things havel, would they have e.g. width? Or is it = the kind of thing which
becomes trivial in latin and similar scripts?

>A TRM is none of the following (although for = efficiency of
>implementation it may well be closely related to = them):
>
>-- a coding for `text files' (such as utf8 or = ASCII);
>
>-- an encoding for strings of unrendered glyphs = (such as the `text
>   strings' in a dvi file or pdf = file);

One thing (not particularly related to the existence = of a TRM) which would
most likely be needed in the "glorious successor = of TeX" is some way of
converting the latter kind of text string (in font) = to the former kind, for
use in diagnostic and error messages. Already today = the contents of overful
hboxes containing math can be very hard to work out = from the log messages.
But it is probably easier to set up such a conversion = if there is a TRM,
since then you "only" need to define = explicitly conversions of everything
to and from the TRM, instead of separate conversions = from each font
encoding to e.g. UTF-8 (and any other output file = encoding that might be in
use).

Lars Hellstr=F6m

------_=_NextPart_001_01C095A9.CDBB7500--