MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C09DB3.0C73DB00"
Content-class: urn:content-classes:message
Subject:      Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)
Date: Fri, 23 Feb 2001 17:08:37 +0100
Message-ID:  <l03130300b6bc34f7dec1@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C09DB3.0C73DB00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I would like to point out that the debate on the LICR and related =
matters
has mainly only delt with what one might call the LaTeX Text Character
Model (LTCM), but there is another character model in current LaTeX =
which
should also be given some thought: the LaTeX Math Character Model =
(LMCM).
Possibly one could also distinguish a LaTeX Verbatim Character Model =
(LVCM)
(sorry about all these acronyms), but I'm less certain about that one.

Luckily matters may be easier in these models because there we don't =
have
do deal with that multilingual complex of problems which noone =
completely
understands because noone knows all the languages.

Concerning the LMCM, I believe the expressed opinion was that greek and
cyrillic letters (as input characters) should be allowed in math, but =
that
symbols outside ASCII should not (except when necessary for compability
reasons). I suspect user demands may make the latter problematic if the
input encoding becomes Unicode (in some form), especially if they get =
the
math characters well sorted out, but that is a distant problem. In the
world of 8-bit encodings a restriction of input symbols in math to ASCII =
is
probably the right things to do.

Allowing greek letters does however raise some interesting problems. =
Many
of the greek letters have var-forms in the current math fonts, so which
form should the input letter select? E.g. \epsilon and \varepsilon are
hardly distinct enough to count as different letters/symbols, they are
merely different glyphs, so which one should it be? I for one much =
prefer
\varepsilon, so I would like to have some interface which lets the user
select this.

In a more general view, one should perhaps try to clear up the LMCM so =
that
the user commands select characters (or character plus math class) =
rather
than glyphs. This could make it easier to provide new math fonts in that
one wouldn't have to concentrate on providing precisely the same set of
glyphs as the CM math fonts do, but could provide more (very tricky =
these
days, as new glyph forms require new commands that make documents which =
use
them incompatible with other math fonts) or fewer (possible by =
duplicating
the glyphs) forms of the characters as it suits the design.

Lars Hellstr=F6m

------_=_NextPart_001_01C09DB3.0C73DB00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: LaTeX's internal char prepresentation (UTF8 or =
Unicode?)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>I would like to point out that the debate on the LICR =
and related matters</FONT>

<BR><FONT SIZE=3D2>has mainly only delt with what one might call the =
LaTeX Text Character</FONT>

<BR><FONT SIZE=3D2>Model (LTCM), but there is another character model in =
current LaTeX which</FONT>

<BR><FONT SIZE=3D2>should also be given some thought: the LaTeX Math =
Character Model (LMCM).</FONT>

<BR><FONT SIZE=3D2>Possibly one could also distinguish a LaTeX Verbatim =
Character Model (LVCM)</FONT>

<BR><FONT SIZE=3D2>(sorry about all these acronyms), but I'm less =
certain about that one.</FONT>
</P>

<P><FONT SIZE=3D2>Luckily matters may be easier in these models because =
there we don't have</FONT>

<BR><FONT SIZE=3D2>do deal with that multilingual complex of problems =
which noone completely</FONT>

<BR><FONT SIZE=3D2>understands because noone knows all the =
languages.</FONT>
</P>

<P><FONT SIZE=3D2>Concerning the LMCM, I believe the expressed opinion =
was that greek and</FONT>

<BR><FONT SIZE=3D2>cyrillic letters (as input characters) should be =
allowed in math, but that</FONT>

<BR><FONT SIZE=3D2>symbols outside ASCII should not (except when =
necessary for compability</FONT>

<BR><FONT SIZE=3D2>reasons). I suspect user demands may make the latter =
problematic if the</FONT>

<BR><FONT SIZE=3D2>input encoding becomes Unicode (in some form), =
especially if they get the</FONT>

<BR><FONT SIZE=3D2>math characters well sorted out, but that is a =
distant problem. In the</FONT>

<BR><FONT SIZE=3D2>world of 8-bit encodings a restriction of input =
symbols in math to ASCII is</FONT>

<BR><FONT SIZE=3D2>probably the right things to do.</FONT>
</P>

<P><FONT SIZE=3D2>Allowing greek letters does however raise some =
interesting problems. Many</FONT>

<BR><FONT SIZE=3D2>of the greek letters have var-forms in the current =
math fonts, so which</FONT>

<BR><FONT SIZE=3D2>form should the input letter select? E.g. \epsilon =
and \varepsilon are</FONT>

<BR><FONT SIZE=3D2>hardly distinct enough to count as different =
letters/symbols, they are</FONT>

<BR><FONT SIZE=3D2>merely different glyphs, so which one should it be? I =
for one much prefer</FONT>

<BR><FONT SIZE=3D2>\varepsilon, so I would like to have some interface =
which lets the user</FONT>

<BR><FONT SIZE=3D2>select this.</FONT>
</P>

<P><FONT SIZE=3D2>In a more general view, one should perhaps try to =
clear up the LMCM so that</FONT>

<BR><FONT SIZE=3D2>the user commands select characters (or character =
plus math class) rather</FONT>

<BR><FONT SIZE=3D2>than glyphs. This could make it easier to provide new =
math fonts in that</FONT>

<BR><FONT SIZE=3D2>one wouldn't have to concentrate on providing =
precisely the same set of</FONT>

<BR><FONT SIZE=3D2>glyphs as the CM math fonts do, but could provide =
more (very tricky these</FONT>

<BR><FONT SIZE=3D2>days, as new glyph forms require new commands that =
make documents which use</FONT>

<BR><FONT SIZE=3D2>them incompatible with other math fonts) or fewer =
(possible by duplicating</FONT>

<BR><FONT SIZE=3D2>the glyphs) forms of the characters as it suits the =
design.</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C09DB3.0C73DB00--