Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f5AMBKf03777 for ; Mon, 11 Jun 2001 00:11:20 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f5AMBKp22320 . for ; Mon, 11 Jun 2001 00:11:20 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0F1FA.46E67400" Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f5AMBJ011247 for ; Mon, 11 Jun 2001 00:11:19 +0200 (MET DST) Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id AAA04579 for ; Mon, 11 Jun 2001 00:11:18 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f5AMBIU18149 for ; Mon, 11 Jun 2001 00:11:18 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <12.2BBF384B@mail.listserv.gmd.de>; Mon, 11 Jun 2001 0:08:56 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 496766 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Mon, 11 Jun 2001 00:10:08 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id AAA09723 for ; Mon, 11 Jun 2001 00:10:06 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id AAA16152 for ; Mon, 11 Jun 2001 00:10:07 +0200 Received: from mail.umu.se (custer.umdac.umu.se [130.239.8.14]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f5AMA6126183 for ; Mon, 11 Jun 2001 00:10:06 +0200 (MET DST) Received: from [130.239.137.13] (mariehemsv093.sn.umu.se [130.239.137.13]) by mail.umu.se (8.8.8/8.8.8) with ESMTP id AAA18593 for ; Mon, 11 Jun 2001 00:10:06 +0200 (MET DST) Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id AAA09725 Content-class: urn:content-classes:message Subject: Re: \InputTranslation Date: Sun, 10 Jun 2001 23:10:06 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4120 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0F1FA.46E67400 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At Tue, 5 Jun 2001 13:29:19 +0100, Chris Rowley wrote: [...] >Therefore, rather than attempting to categorise the necessary >information and devise suitable ways to provide it, Frank and I came >up with the idea of simply supplying a single logical label for every >ICR string. Since the first, and still the overwhelmingly most >diverse, >parts of this information came from the needs of multi-lingual >documents, we called this label the `language' (maybe not a good >choice). Our thesis is that `every text string must have a >language-label'. The only property these labels need (and indeed are >able) to have is that they \emph{can} help any application or >sub-process to access the information it needs to process that text >string. I suggest that we use the term `context' rather than `language' here. Quoting Webster's, `context' means: The part of a written discourse in which a certain word, phrase or passage appears, necessary to point the meaning, as, it is hard to tell the exact meaning of a word out of context. [snip] >[In order to distinguish these logical language-labels from anything >else in the TeX world let us call them LLLs.] > >In the context of current TeX-related systems this >means that: > >-- whenever a character token list (in an ICR) is constructed or > moved, then its LLL must go with it; The most common event at which a character token list is formed is when = a command is grabbing one of its arguments. With the xparse package in = full control these arguments can be labelled under the current TeX engine, = but it is probably more reasonable to imagine that their attachment is = handled by primitive mechanisms in some extension of TeX. In this case, I = suspect the labels should be thought of as being nestable with separate markers = for beginning and end, so that each token list that is formed gets delimited = by matching begin and end labels that record the current context of the = token list they were extracted from. Thus if we have, in an English context \subsubsection{The use of \"alv} (where the <..> denote such context labels), the token list becoming the argument of \subsubsection would be The use of = \"alv And then it doesn't matter if it is inserted into a French context table = of contents. Upon being written to an external file, the labels should be converted to suitable markup. An interesting question is whether these labels should be explicit = tokens or be hidden from the user (i.e., argument grabbing and things like \futurelet look past them). Making them explicit tokens would probably break tons of code. As for what the labels should be to the user, I think a scheme of making them integers is pretty useless (how they are implemented is of course another matter). A better idea would be to make them some kind of = property lists, i.e., containers for diverse forms of information that are = indexed by some kind of names. Creating new label values from old by copying the values and then changing some would be useful when defining dialects. The main problem I see with context labels is that of when they should = be attached, since one cannot do any context-dependent processing before = the context is determined. I can think of at least three different models: 1. Labels must be present in the input (e.g. encoded using control characters). This might be nice from an implementation point of view, = but it is probably only realistic if such a system would emerge which is accepted in a much wider community than that of the users of TeX, due to the problem of finding suitable editors. This doesn't seem likely. 2. Do as today, i.e., context switches are initiated when commands are executed. This has the problem that the context isn't completely known until the text is being typeset, so one cannot do any irreverible context-dependent processing until then. This seems a bit too = restrictive to me. 3. Have command-like markup for context-switching, but attach labels as part of the tokenization. This has the merit of looking like current = LaTeX markup and allowing LaTeX to keep all ICR strings fully context-labeled, but it would also mean that processing of markup is a two-step process (first all language markup is processed, then all the rest). That = doesn't feel right. Lars Hellstr=F6m ------_=_NextPart_001_01C0F1FA.46E67400 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: \InputTranslation

At Tue, 5 Jun 2001 13:29:19 +0100, Chris Rowley = wrote:
[...]
>Therefore, rather than attempting to categorise = the necessary
>information and devise suitable ways to provide = it, Frank and I came
>up with the idea of simply supplying a single = logical label for every
>ICR string.  Since the first, and still the = overwhelmingly most
>diverse,
>parts of this information came from the needs of = multi-lingual
>documents, we called this label the `language' = (maybe not a good
>choice).  Our thesis is that `every text = string must have a
>language-label'.  The only property these = labels need (and indeed are
>able) to have is that they \emph{can} help any = application or
>sub-process to access the information it needs to = process that text
>string.

I suggest that we use the term `context' rather than = `language' here.
Quoting Webster's, `context' means:

   The part of a written discourse in which = a certain word, phrase
   or passage appears, necessary to point = the meaning, as, it is
   hard to tell the exact meaning of a word = out of context.

[snip]
>[In order to distinguish these logical = language-labels from anything
>else in the TeX world let us call them = LLLs.]
>
>In the context of current TeX-related systems = this
>means that:
>
>-- whenever a character token list (in an ICR) is = constructed or
>   moved, then its LLL must go with = it;

The most common event at which a character token list = is formed is when a
command is grabbing one of its arguments. With the = xparse package in full
control these arguments can be labelled under the = current TeX engine, but
it is probably more reasonable to imagine that their = attachment is handled
by primitive mechanisms in some extension of TeX. In = this case, I suspect
the labels should be thought of as being nestable = with separate markers for
beginning and end, so that each token list that is = formed gets delimited by
matching begin and end labels that record the current = context of the token
list they were extracted from. Thus if we have, in an = English context

   \subsubsection{The use of = <begin-swedish>\"alv<end-swedish>}

(where the <..> denote such context labels), the = token list becoming the
argument of \subsubsection would be

   <begin-english>The use of = <begin-swedish>\"alv<end-swedish><end-english>

And then it doesn't matter if it is inserted into a = French context table of
contents. Upon being written to an external file, the = labels should be
converted to suitable markup.

An interesting question is whether these labels should = be explicit tokens
or be hidden from the user (i.e., argument grabbing = and things like
\futurelet look past them). Making them explicit = tokens would probably
break tons of code.

As for what the labels should be to the user, I think = a scheme of making
them integers is pretty useless (how they are = implemented is of course
another matter). A better idea would be to make them = some kind of property
lists, i.e., containers for diverse forms of = information that are indexed
by some kind of names. Creating new label values from = old by copying the
values and then changing some would be useful when = defining dialects.

The main problem I see with context labels is that of = when they should be
attached, since one cannot do any context-dependent = processing before the
context is determined. I can think of at least three = different models:

1. Labels must be present in the input (e.g. encoded = using control
characters). This might be nice from an = implementation point of view, but
it is probably only realistic if such a system would = emerge which is
accepted in a much wider community than that of the = users of TeX, due to
the problem of finding suitable editors. This doesn't = seem likely.

2. Do as today, i.e., context switches are initiated = when commands are
executed. This has the problem that the context isn't = completely known
until the text is being typeset, so one cannot do any = irreverible
context-dependent processing until then. This seems a = bit too restrictive
to me.

3. Have command-like markup for context-switching, but = attach labels as
part of the tokenization. This has the merit of = looking like current LaTeX
markup and allowing LaTeX to keep all ICR strings = fully context-labeled,
but it would also mean that processing of markup is a = two-step process
(first all language markup is processed, then all the = rest). That doesn't
feel right.

Lars Hellstr=F6m

------_=_NextPart_001_01C0F1FA.46E67400--