MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0E611.69309D00"
Content-class: urn:content-classes:message
Subject:      \InputTranslation
Date: Sat, 26 May 2001 19:26:32 +0100
Message-ID:  <15119.62808.151690.192812@gargle.gargle.HOWL>
From: "Marcel Oliver" <oliver@NA.UNI-TUEBINGEN.DE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0E611.69309D00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I'd like to bring the discussion back to the ICR issue, in particular
how a hypothetical successor to TeX should handle input encodings.  I
think the point that Omega does not do it the "right" way has been
made pretty clearly.

But what should be the "right" way?  I repost some thoughts from last
week that seem to have been lost among the \(var)epsilons.

--Marcel

Frank Mittelbach writes:
 >  > In fact, \InputEncoding was not intended for that, but only for
 >  > "technical" translations which applies to the whole document
 >  > as one byte -> two byte or little endian -> big endian. The main
 >  > problem of it is that it doesn't translate macros:
 >  > \def\myE{=C9}
 >  > \InputEncoding <an encoding>
 >  > =C9\myE
 >
 > \InputEncoding is the point where one need to go from external
 > source encoding to OICR that is precisely the wound: the current
 > \InputEncoding isn't doing this job fully (and that it is not clear
 > how to do it properly (to be fair))

How about this:

- There is one default \InputTranslation (this, rather than
  \InputEncoding, is the official name of the Omega command) which may
  need to be specified at the time of format creation.  This encoding
  is the one that all macro names need to be in, as well as the
  encoding initially selected for text (I think it does not make any
  sense to allow for multiply encoded macro names in a single
  document).  As there is no legacy cruft with regard to macro names,
  we may as well force this default encoding to be UTF-8.

- Changes in the \InputTranslation follow the usual TeX scoping rules
  (this is obviously not how Omega currently does it), and take effect
  immediately during the initial tokenization.  This would mean that
  the characters \ { } must be in their expected position in every
  permissible encoding, but I guess that's not any more restrictive
  than what we currently have.  I also assume that TeX (Omega) always
  knows whether it is parsing code or text, so that it can select the
  default for code, and the top of the encoding stack for text.

- Regarding Javier's above example: I think this is the correct and
  expected behavior.  I want to be able to able to write:

  \begin{chinese}
    \newcommand{\foo}{***something chinese***}
    \newcommand{\bar}{***and some more chinese***}
  \end{chinese}

  The chinese characters \foo\ and \bar\ are not easy to enter on a
  western keyboard.  If you need to frequently use \foo\ in your
  scholarly discussion of Chinese literature, it is better to first
  define macros for all the chinese characters you need, and then just
  write \verb|\foo| whenever you need \foo.

  (I don't know if this babel-like begin-end of a language selection
  would actually be legal in the document preamble,  but I think the
  strategy is very natural at least.)

- It may be more of a problem how to deal with \'e and the like.
  Would it be possible to force immediate expansion into the
  corresponding internal Unicode token?

------_=_NextPart_001_01C0E611.69309D00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     \InputTranslation</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>I'd like to bring the discussion back to the ICR =
issue, in particular</FONT>

<BR><FONT SIZE=3D2>how a hypothetical successor to TeX should handle =
input encodings.&nbsp; I</FONT>

<BR><FONT SIZE=3D2>think the point that Omega does not do it the =
&quot;right&quot; way has been</FONT>

<BR><FONT SIZE=3D2>made pretty clearly.</FONT>
</P>

<P><FONT SIZE=3D2>But what should be the &quot;right&quot; way?&nbsp; I =
repost some thoughts from last</FONT>

<BR><FONT SIZE=3D2>week that seem to have been lost among the =
\(var)epsilons.</FONT>
</P>

<P><FONT SIZE=3D2>--Marcel</FONT>
</P>

<P><FONT SIZE=3D2>Frank Mittelbach writes:</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; In fact, \InputEncoding was not =
intended for that, but only for</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; &quot;technical&quot; =
translations which applies to the whole document</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; as one byte -&gt; two byte or =
little endian -&gt; big endian. The main</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; problem of it is that it =
doesn't translate macros:</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; \def\myE{=C9}</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; \InputEncoding &lt;an =
encoding&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;&nbsp; &gt; =C9\myE</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; \InputEncoding is the point where one need =
to go from external</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; source encoding to OICR that is precisely =
the wound: the current</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; \InputEncoding isn't doing this job fully =
(and that it is not clear</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; how to do it properly (to be fair))</FONT>
</P>

<P><FONT SIZE=3D2>How about this:</FONT>
</P>

<P><FONT SIZE=3D2>- There is one default \InputTranslation (this, rather =
than</FONT>

<BR><FONT SIZE=3D2>&nbsp; \InputEncoding, is the official name of the =
Omega command) which may</FONT>

<BR><FONT SIZE=3D2>&nbsp; need to be specified at the time of format =
creation.&nbsp; This encoding</FONT>

<BR><FONT SIZE=3D2>&nbsp; is the one that all macro names need to be in, =
as well as the</FONT>

<BR><FONT SIZE=3D2>&nbsp; encoding initially selected for text (I think =
it does not make any</FONT>

<BR><FONT SIZE=3D2>&nbsp; sense to allow for multiply encoded macro =
names in a single</FONT>

<BR><FONT SIZE=3D2>&nbsp; document).&nbsp; As there is no legacy cruft =
with regard to macro names,</FONT>

<BR><FONT SIZE=3D2>&nbsp; we may as well force this default encoding to =
be UTF-8.</FONT>
</P>

<P><FONT SIZE=3D2>- Changes in the \InputTranslation follow the usual =
TeX scoping rules</FONT>

<BR><FONT SIZE=3D2>&nbsp; (this is obviously not how Omega currently =
does it), and take effect</FONT>

<BR><FONT SIZE=3D2>&nbsp; immediately during the initial =
tokenization.&nbsp; This would mean that</FONT>

<BR><FONT SIZE=3D2>&nbsp; the characters \ { } must be in their expected =
position in every</FONT>

<BR><FONT SIZE=3D2>&nbsp; permissible encoding, but I guess that's not =
any more restrictive</FONT>

<BR><FONT SIZE=3D2>&nbsp; than what we currently have.&nbsp; I also =
assume that TeX (Omega) always</FONT>

<BR><FONT SIZE=3D2>&nbsp; knows whether it is parsing code or text, so =
that it can select the</FONT>

<BR><FONT SIZE=3D2>&nbsp; default for code, and the top of the encoding =
stack for text.</FONT>
</P>

<P><FONT SIZE=3D2>- Regarding Javier's above example: I think this is =
the correct and</FONT>

<BR><FONT SIZE=3D2>&nbsp; expected behavior.&nbsp; I want to be able to =
able to write:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; \begin{chinese}</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \newcommand{\foo}{***something =
chinese***}</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; \newcommand{\bar}{***and some more =
chinese***}</FONT>

<BR><FONT SIZE=3D2>&nbsp; \end{chinese}</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; The chinese characters \foo\ and \bar\ are not =
easy to enter on a</FONT>

<BR><FONT SIZE=3D2>&nbsp; western keyboard.&nbsp; If you need to =
frequently use \foo\ in your</FONT>

<BR><FONT SIZE=3D2>&nbsp; scholarly discussion of Chinese literature, it =
is better to first</FONT>

<BR><FONT SIZE=3D2>&nbsp; define macros for all the chinese characters =
you need, and then just</FONT>

<BR><FONT SIZE=3D2>&nbsp; write \verb|\foo| whenever you need =
\foo.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; (I don't know if this babel-like begin-end of a =
language selection</FONT>

<BR><FONT SIZE=3D2>&nbsp; would actually be legal in the document =
preamble,&nbsp; but I think the</FONT>

<BR><FONT SIZE=3D2>&nbsp; strategy is very natural at least.)</FONT>
</P>

<P><FONT SIZE=3D2>- It may be more of a problem how to deal with \'e and =
the like.</FONT>

<BR><FONT SIZE=3D2>&nbsp; Would it be possible to force immediate =
expansion into the</FONT>

<BR><FONT SIZE=3D2>&nbsp; corresponding internal Unicode token?</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0E611.69309D00--