MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0F1FA.46E67400"
Content-class: urn:content-classes:message
Subject:      Re: \InputTranslation
Date: Sun, 10 Jun 2001 23:10:06 +0100
Message-ID:  <l03102805b74905cc2b81@[130.239.137.13]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0F1FA.46E67400
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At Tue, 5 Jun 2001 13:29:19 +0100, Chris Rowley wrote:
[...]
>Therefore, rather than attempting to categorise the necessary
>information and devise suitable ways to provide it, Frank and I came
>up with the idea of simply supplying a single logical label for every
>ICR string.  Since the first, and still the overwhelmingly most
>diverse,
>parts of this information came from the needs of multi-lingual
>documents, we called this label the `language' (maybe not a good
>choice).  Our thesis is that `every text string must have a
>language-label'.  The only property these labels need (and indeed are
>able) to have is that they \emph{can} help any application or
>sub-process to access the information it needs to process that text
>string.

I suggest that we use the term `context' rather than `language' here.
Quoting Webster's, `context' means:

   The part of a written discourse in which a certain word, phrase
   or passage appears, necessary to point the meaning, as, it is
   hard to tell the exact meaning of a word out of context.

[snip]
>[In order to distinguish these logical language-labels from anything
>else in the TeX world let us call them LLLs.]
>
>In the context of current TeX-related systems this
>means that:
>
>-- whenever a character token list (in an ICR) is constructed or
>   moved, then its LLL must go with it;

The most common event at which a character token list is formed is when =
a
command is grabbing one of its arguments. With the xparse package in =
full
control these arguments can be labelled under the current TeX engine, =
but
it is probably more reasonable to imagine that their attachment is =
handled
by primitive mechanisms in some extension of TeX. In this case, I =
suspect
the labels should be thought of as being nestable with separate markers =
for
beginning and end, so that each token list that is formed gets delimited =
by
matching begin and end labels that record the current context of the =
token
list they were extracted from. Thus if we have, in an English context

   \subsubsection{The use of <begin-swedish>\"alv<end-swedish>}

(where the <..> denote such context labels), the token list becoming the
argument of \subsubsection would be

   <begin-english>The use of =
<begin-swedish>\"alv<end-swedish><end-english>

And then it doesn't matter if it is inserted into a French context table =
of
contents. Upon being written to an external file, the labels should be
converted to suitable markup.

An interesting question is whether these labels should be explicit =
tokens
or be hidden from the user (i.e., argument grabbing and things like
\futurelet look past them). Making them explicit tokens would probably
break tons of code.

As for what the labels should be to the user, I think a scheme of making
them integers is pretty useless (how they are implemented is of course
another matter). A better idea would be to make them some kind of =
property
lists, i.e., containers for diverse forms of information that are =
indexed
by some kind of names. Creating new label values from old by copying the
values and then changing some would be useful when defining dialects.

The main problem I see with context labels is that of when they should =
be
attached, since one cannot do any context-dependent processing before =
the
context is determined. I can think of at least three different models:

1. Labels must be present in the input (e.g. encoded using control
characters). This might be nice from an implementation point of view, =
but
it is probably only realistic if such a system would emerge which is
accepted in a much wider community than that of the users of TeX, due to
the problem of finding suitable editors. This doesn't seem likely.

2. Do as today, i.e., context switches are initiated when commands are
executed. This has the problem that the context isn't completely known
until the text is being typeset, so one cannot do any irreverible
context-dependent processing until then. This seems a bit too =
restrictive
to me.

3. Have command-like markup for context-switching, but attach labels as
part of the tokenization. This has the merit of looking like current =
LaTeX
markup and allowing LaTeX to keep all ICR strings fully context-labeled,
but it would also mean that processing of markup is a two-step process
(first all language markup is processed, then all the rest). That =
doesn't
feel right.

Lars Hellstr=F6m

------_=_NextPart_001_01C0F1FA.46E67400
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: \InputTranslation</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At Tue, 5 Jun 2001 13:29:19 +0100, Chris Rowley =
wrote:</FONT>

<BR><FONT SIZE=3D2>[...]</FONT>

<BR><FONT SIZE=3D2>&gt;Therefore, rather than attempting to categorise =
the necessary</FONT>

<BR><FONT SIZE=3D2>&gt;information and devise suitable ways to provide =
it, Frank and I came</FONT>

<BR><FONT SIZE=3D2>&gt;up with the idea of simply supplying a single =
logical label for every</FONT>

<BR><FONT SIZE=3D2>&gt;ICR string.&nbsp; Since the first, and still the =
overwhelmingly most</FONT>

<BR><FONT SIZE=3D2>&gt;diverse,</FONT>

<BR><FONT SIZE=3D2>&gt;parts of this information came from the needs of =
multi-lingual</FONT>

<BR><FONT SIZE=3D2>&gt;documents, we called this label the `language' =
(maybe not a good</FONT>

<BR><FONT SIZE=3D2>&gt;choice).&nbsp; Our thesis is that `every text =
string must have a</FONT>

<BR><FONT SIZE=3D2>&gt;language-label'.&nbsp; The only property these =
labels need (and indeed are</FONT>

<BR><FONT SIZE=3D2>&gt;able) to have is that they \emph{can} help any =
application or</FONT>

<BR><FONT SIZE=3D2>&gt;sub-process to access the information it needs to =
process that text</FONT>

<BR><FONT SIZE=3D2>&gt;string.</FONT>
</P>

<P><FONT SIZE=3D2>I suggest that we use the term `context' rather than =
`language' here.</FONT>

<BR><FONT SIZE=3D2>Quoting Webster's, `context' means:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; The part of a written discourse in which =
a certain word, phrase</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; or passage appears, necessary to point =
the meaning, as, it is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; hard to tell the exact meaning of a word =
out of context.</FONT>
</P>

<P><FONT SIZE=3D2>[snip]</FONT>

<BR><FONT SIZE=3D2>&gt;[In order to distinguish these logical =
language-labels from anything</FONT>

<BR><FONT SIZE=3D2>&gt;else in the TeX world let us call them =
LLLs.]</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;In the context of current TeX-related systems =
this</FONT>

<BR><FONT SIZE=3D2>&gt;means that:</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;-- whenever a character token list (in an ICR) is =
constructed or</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; moved, then its LLL must go with =
it;</FONT>
</P>

<P><FONT SIZE=3D2>The most common event at which a character token list =
is formed is when a</FONT>

<BR><FONT SIZE=3D2>command is grabbing one of its arguments. With the =
xparse package in full</FONT>

<BR><FONT SIZE=3D2>control these arguments can be labelled under the =
current TeX engine, but</FONT>

<BR><FONT SIZE=3D2>it is probably more reasonable to imagine that their =
attachment is handled</FONT>

<BR><FONT SIZE=3D2>by primitive mechanisms in some extension of TeX. In =
this case, I suspect</FONT>

<BR><FONT SIZE=3D2>the labels should be thought of as being nestable =
with separate markers for</FONT>

<BR><FONT SIZE=3D2>beginning and end, so that each token list that is =
formed gets delimited by</FONT>

<BR><FONT SIZE=3D2>matching begin and end labels that record the current =
context of the token</FONT>

<BR><FONT SIZE=3D2>list they were extracted from. Thus if we have, in an =
English context</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; \subsubsection{The use of =
&lt;begin-swedish&gt;\&quot;alv&lt;end-swedish&gt;}</FONT>
</P>

<P><FONT SIZE=3D2>(where the &lt;..&gt; denote such context labels), the =
token list becoming the</FONT>

<BR><FONT SIZE=3D2>argument of \subsubsection would be</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; &lt;begin-english&gt;The use of =
&lt;begin-swedish&gt;\&quot;alv&lt;end-swedish&gt;&lt;end-english&gt;</FO=
NT>
</P>

<P><FONT SIZE=3D2>And then it doesn't matter if it is inserted into a =
French context table of</FONT>

<BR><FONT SIZE=3D2>contents. Upon being written to an external file, the =
labels should be</FONT>

<BR><FONT SIZE=3D2>converted to suitable markup.</FONT>
</P>

<P><FONT SIZE=3D2>An interesting question is whether these labels should =
be explicit tokens</FONT>

<BR><FONT SIZE=3D2>or be hidden from the user (i.e., argument grabbing =
and things like</FONT>

<BR><FONT SIZE=3D2>\futurelet look past them). Making them explicit =
tokens would probably</FONT>

<BR><FONT SIZE=3D2>break tons of code.</FONT>
</P>

<P><FONT SIZE=3D2>As for what the labels should be to the user, I think =
a scheme of making</FONT>

<BR><FONT SIZE=3D2>them integers is pretty useless (how they are =
implemented is of course</FONT>

<BR><FONT SIZE=3D2>another matter). A better idea would be to make them =
some kind of property</FONT>

<BR><FONT SIZE=3D2>lists, i.e., containers for diverse forms of =
information that are indexed</FONT>

<BR><FONT SIZE=3D2>by some kind of names. Creating new label values from =
old by copying the</FONT>

<BR><FONT SIZE=3D2>values and then changing some would be useful when =
defining dialects.</FONT>
</P>

<P><FONT SIZE=3D2>The main problem I see with context labels is that of =
when they should be</FONT>

<BR><FONT SIZE=3D2>attached, since one cannot do any context-dependent =
processing before the</FONT>

<BR><FONT SIZE=3D2>context is determined. I can think of at least three =
different models:</FONT>
</P>

<P><FONT SIZE=3D2>1. Labels must be present in the input (e.g. encoded =
using control</FONT>

<BR><FONT SIZE=3D2>characters). This might be nice from an =
implementation point of view, but</FONT>

<BR><FONT SIZE=3D2>it is probably only realistic if such a system would =
emerge which is</FONT>

<BR><FONT SIZE=3D2>accepted in a much wider community than that of the =
users of TeX, due to</FONT>

<BR><FONT SIZE=3D2>the problem of finding suitable editors. This doesn't =
seem likely.</FONT>
</P>

<P><FONT SIZE=3D2>2. Do as today, i.e., context switches are initiated =
when commands are</FONT>

<BR><FONT SIZE=3D2>executed. This has the problem that the context isn't =
completely known</FONT>

<BR><FONT SIZE=3D2>until the text is being typeset, so one cannot do any =
irreverible</FONT>

<BR><FONT SIZE=3D2>context-dependent processing until then. This seems a =
bit too restrictive</FONT>

<BR><FONT SIZE=3D2>to me.</FONT>
</P>

<P><FONT SIZE=3D2>3. Have command-like markup for context-switching, but =
attach labels as</FONT>

<BR><FONT SIZE=3D2>part of the tokenization. This has the merit of =
looking like current LaTeX</FONT>

<BR><FONT SIZE=3D2>markup and allowing LaTeX to keep all ICR strings =
fully context-labeled,</FONT>

<BR><FONT SIZE=3D2>but it would also mean that processing of markup is a =
two-step process</FONT>

<BR><FONT SIZE=3D2>(first all language markup is processed, then all the =
rest). That doesn't</FONT>

<BR><FONT SIZE=3D2>feel right.</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0F1FA.46E67400--