Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f5DCwGf25560 for ; Wed, 13 Jun 2001 14:58:16 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f5DCwFp07175 . for ; Wed, 13 Jun 2001 14:58:15 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f5DCwD015375 for ; Wed, 13 Jun 2001 14:58:13 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0F408.82EF2400" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id OAA12236 for ; Wed, 13 Jun 2001 14:58:11 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f5DCw7U23770 for ; Wed, 13 Jun 2001 14:58:11 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <3.6085EC96@mail.listserv.gmd.de>; Wed, 13 Jun 2001 14:55:40 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 497606 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Wed, 13 Jun 2001 14:58:03 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id OAA12481 for ; Wed, 13 Jun 2001 14:58:00 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id OAA67094 for ; Wed, 13 Jun 2001 14:58:01 +0200 Received: from abel.math.umu.se (abel.math.umu.se [130.239.20.139]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f5DCvw115309 for ; Wed, 13 Jun 2001 14:57:58 +0200 (MET DST) Received: from [130.239.20.144] (mac144.math.umu.se [130.239.20.144]) by abel.math.umu.se (8.9.2/8.9.2) with ESMTP id OAA06701 for ; Wed, 13 Jun 2001 14:54:09 +0200 (CEST) In-Reply-To: Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id OAA12482 Content-class: urn:content-classes:message Subject: Re: \InputTranslation Date: Wed, 13 Jun 2001 13:57:54 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4130 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0F408.82EF2400 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 00.10 +0200 2001-06-11, I wrote: >The main problem I see with context labels is that of when they should = be >attached, since one cannot do any context-dependent processing before = the >context is determined. I can think of at least three different models: [snip] >3. Have command-like markup for context-switching, but attach labels as >part of the tokenization. This has the merit of looking like current = LaTeX >markup and allowing LaTeX to keep all ICR strings fully = context-labeled, >but it would also mean that processing of markup is a two-step process >(first all language markup is processed, then all the rest). That = doesn't >feel right. Here I was thinking of having the processing done by OCPs or something similar. The problem with this is of course that these OCPs would have = to parse the input rather thoroughly to actually determine that a certain command is markup for a context switch and not part of something else. = In particular one would need one such interpreting OCP for every set of catcodes being used, since they otherwise almost certainly would get = things wrong. This is rather unrealistic, and having OCPs doing the = interpretation would probably also in effect be an invitation to syntax = inconsistencies. There is however another way of doing it, by introducing a mechanism = which generalizes \outer (thus I'm still in the game of imagining extensions = to TeX). Let's say a macro is `exceptional' if it uses this mechanism. Like outer macros, an exceptional macro causes TeX to stop if it occurs in a place where TeX is "absorbing tokens at high speed" (TeXbook p. 206), = but unlike outer tokens it doesn't make TeX report an error. Instead TeX = should make notes of everything it was currently doing an push that onto some stack, after which it starts executing the replacement text of the = macro; in particular, it must be possible to make assignments. What the macro = is expected to do is to grab its arguments (with whatever catcodes, input OCPs, language context, etc. in force that are needed for this) and then return (using some new primitive) the resulting token list to TeX, after which TeX resumes whatever processing was interrupted by the exceptional macro. With such a mechanism, one could in the infamous example \newcommand{\foo}{\languageIC{manadrin}{\unichar{}}} have \languageIC being such an exceptional macro, and thus have the \unichar{} tagged as being mandarin *even in the = replacement text of \foo*! More fun one could have with this mechanism would be to define a \verb command that _can_ be used in the arguments (or even replacement texts) of commands! I suspect such a feature could be a = useful argument in convincing users untroubled by multilinguality problems to switch to a new typesetting engine. The crux of the matter is of course how much TeX would have to be = changed to allow such a mechanism. Seeing the exceptional macros wouldn't be a problem, as TeX is already looking for outer macros. The mechanisms for expanding the next token are already fairly reentrant, so I wouldn't = expect many problems there either. What could be tricky is actually executing commmands (since TeX's main_control procedure is never called = recursively), but even that doesn't look like such a problem if we stay away from = doing typesetting; it seems after_token (related to \afterassignment) is the = only global variable that definitely must be saved away! Could perhaps someone with experience of implementing and/or extending = TeX please comment on these ideas? Lars Hellstr=F6m ------_=_NextPart_001_01C0F408.82EF2400 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: \InputTranslation

At 00.10 +0200 2001-06-11, I wrote:
>The main problem I see with context labels is = that of when they should be
>attached, since one cannot do any = context-dependent processing before the
>context is determined. I can think of at least = three different models:
[snip]
>3. Have command-like markup for = context-switching, but attach labels as
>part of the tokenization. This has the merit of = looking like current LaTeX
>markup and allowing LaTeX to keep all ICR strings = fully context-labeled,
>but it would also mean that processing of markup = is a two-step process
>(first all language markup is processed, then all = the rest). That doesn't
>feel right.

Here I was thinking of having the processing done by = OCPs or something
similar. The problem with this is of course that = these OCPs would have to
parse the input rather thoroughly to actually = determine that a certain
command is markup for a context switch and not part = of something else. In
particular one would need one such interpreting OCP = for every set of
catcodes being used, since they otherwise almost = certainly would get things
wrong. This is rather unrealistic, and having OCPs = doing the interpretation
would probably also in effect be an invitation to = syntax inconsistencies.

There is however another way of doing it, by = introducing a mechanism which
generalizes \outer (thus I'm still in the game of = imagining extensions to
TeX). Let's say a macro is `exceptional' if it uses = this mechanism. Like
outer macros, an exceptional macro causes TeX to stop = if it occurs in a
place where TeX is "absorbing tokens at high = speed" (TeXbook p. 206), but
unlike outer tokens it doesn't make TeX report an = error. Instead TeX should
make notes of everything it was currently doing an = push that onto some
stack, after which it starts executing the = replacement text of the macro;
in particular, it must be possible to make = assignments. What the macro is
expected to do is to grab its arguments (with = whatever catcodes, input
OCPs, language context, etc. in force that are needed = for this) and then
return (using some new primitive) the resulting token = list to TeX, after
which TeX resumes whatever processing was interrupted = by the exceptional
macro.

With such a mechanism, one could in the infamous = example

  = \newcommand{\foo}{\languageIC{manadrin}{\unichar{<Unicode = code>}}}

have \languageIC being such an exceptional macro, and = thus have the
\unichar{<Unicode code>} tagged as being = mandarin *even in the replacement
text of \foo*!  More fun one could have with = this mechanism would be to
define a \verb command that _can_ be used in the = arguments (or even
replacement texts) of commands! I suspect such a = feature could be a useful
argument in convincing users untroubled by = multilinguality problems to
switch to a new typesetting engine.

The crux of the matter is of course how much TeX would = have to be changed
to allow such a mechanism. Seeing the exceptional = macros wouldn't be a
problem, as TeX is already looking for outer macros. = The mechanisms for
expanding the next token are already fairly = reentrant, so I wouldn't expect
many problems there either. What could be tricky is = actually executing
commmands (since TeX's main_control procedure is = never called recursively),
but even that doesn't look like such a problem if we = stay away from doing
typesetting; it seems after_token (related to = \afterassignment) is the only
global variable that definitely must be saved = away!

Could perhaps someone with experience of implementing = and/or extending TeX
please comment on these ideas?

Lars Hellstr=F6m

------_=_NextPart_001_01C0F408.82EF2400--