MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C09DB1.F7E31300"
In-Reply-To:  <G90LE6$IKzIqNdorqbCj1WU7mj4mvBs9xb4Cd@wanadoo.es>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.0
Date: Fri, 23 Feb 2001 17:00:48 +0100
Message-ID:  <l03130303b6b94abdcf7c@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C09DB1.F7E31300
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 18.18 +0100 2001-02-19, jbezos wrote:
>> Question raised by this: Can OCPs output control sequences, or do =
they just
>> produce characters?
>
>They can output any token (IIRC there is a bug
>when \input is used, but I'm not sure). In fact,
>tokens are necessary when translating Unicode to,
>say, OT1.
>
>>Can one specify what catcode the characters should have?
>
>Unfortunately not. The catcodes used are the
>catcodes when the replacement is done. That means
>that "private" names containing @ cannot be
>used (in general, or if \csname is used).

OK, so in fact the OCPs cannot produce tokens (they just output
characters), but that is not really a restriction as long as some =
character
has catcode 0 and \csname (or some disguise of it) is available. (The
former condition could be a problem in \verb-like contexts. One would
probably have to have a "private escape" character for the OCPs.)

On the other side of things, how does Omega handle "lost character"
conditions? The current TeX behaviour of ignoring the character and
possibly putting an info message in the log file could certainly be
improved ...  When one is typesetting normal text the approriate action
would be (a) substitution with a character from another font (LaTeX can =
do
this for characters are represented by encoding-specific commands, but
there are no such mechanisms in TeX for explicit character tokens), (b) =
an
error message, or (c) a combination of the two. When one is typesetting
verbatim (or verbatim-like) text however, the priorities are different. =
In
particular, I would like to have some recourse to (d): Typeset a =
suitable
representation (e.g. U+0312, in a suitably distinct font) of the Unicode
for the character.

Lars Hellstr=F6m

------_=_NextPart_001_01C09DB1.F7E31300
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.0</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 18.18 +0100 2001-02-19, jbezos wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; Question raised by this: Can OCPs output =
control sequences, or do they just</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; produce characters?</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;They can output any token (IIRC there is a =
bug</FONT>

<BR><FONT SIZE=3D2>&gt;when \input is used, but I'm not sure). In =
fact,</FONT>

<BR><FONT SIZE=3D2>&gt;tokens are necessary when translating Unicode =
to,</FONT>

<BR><FONT SIZE=3D2>&gt;say, OT1.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;Can one specify what catcode the characters =
should have?</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Unfortunately not. The catcodes used are =
the</FONT>

<BR><FONT SIZE=3D2>&gt;catcodes when the replacement is done. That =
means</FONT>

<BR><FONT SIZE=3D2>&gt;that &quot;private&quot; names containing @ =
cannot be</FONT>

<BR><FONT SIZE=3D2>&gt;used (in general, or if \csname is used).</FONT>
</P>

<P><FONT SIZE=3D2>OK, so in fact the OCPs cannot produce tokens (they =
just output</FONT>

<BR><FONT SIZE=3D2>characters), but that is not really a restriction as =
long as some character</FONT>

<BR><FONT SIZE=3D2>has catcode 0 and \csname (or some disguise of it) is =
available. (The</FONT>

<BR><FONT SIZE=3D2>former condition could be a problem in \verb-like =
contexts. One would</FONT>

<BR><FONT SIZE=3D2>probably have to have a &quot;private escape&quot; =
character for the OCPs.)</FONT>
</P>

<P><FONT SIZE=3D2>On the other side of things, how does Omega handle =
&quot;lost character&quot;</FONT>

<BR><FONT SIZE=3D2>conditions? The current TeX behaviour of ignoring the =
character and</FONT>

<BR><FONT SIZE=3D2>possibly putting an info message in the log file =
could certainly be</FONT>

<BR><FONT SIZE=3D2>improved ...&nbsp; When one is typesetting normal =
text the approriate action</FONT>

<BR><FONT SIZE=3D2>would be (a) substitution with a character from =
another font (LaTeX can do</FONT>

<BR><FONT SIZE=3D2>this for characters are represented by =
encoding-specific commands, but</FONT>

<BR><FONT SIZE=3D2>there are no such mechanisms in TeX for explicit =
character tokens), (b) an</FONT>

<BR><FONT SIZE=3D2>error message, or (c) a combination of the two. When =
one is typesetting</FONT>

<BR><FONT SIZE=3D2>verbatim (or verbatim-like) text however, the =
priorities are different. In</FONT>

<BR><FONT SIZE=3D2>particular, I would like to have some recourse to =
(d): Typeset a suitable</FONT>

<BR><FONT SIZE=3D2>representation (e.g. U+0312, in a suitably distinct =
font) of the Unicode</FONT>

<BR><FONT SIZE=3D2>for the character.</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C09DB1.F7E31300--