MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C09917.AC46EA00"
In-Reply-To:  <Pine.GSO.4.33.0102171248120.22958-100000@sun06.ams.org>
References: <v03110700b6b44d211cd3@[195.100.226.146]>
Content-class: urn:content-classes:message
Subject:      Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)
Date: Sat, 17 Feb 2001 20:27:17 +0100
Message-ID:  <v03110700b6b47b231095@[195.100.226.141]>
From: "Hans Aberg" <haberg@MATEMATIK.SU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C09917.AC46EA00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 12:54 -0500 2001/02/17, Barbara Beeton wrote:
>while this would obviously work for text in natural languages,
>unicode will never contain all the possible "embellished" letters
>and symbols used in math.  (and this may include instances with two
>or even more diacritics on a single letter or symbol.)  this set,
>while not infinite, is much too large to want to address even using
>the unicode private area.  but for latex (or any successor) to be
>useful for the particular content for which tex was first developed,
>this has to be taken into account.

I do not think about math in particular, but the other combining =
symbols:

Whereas Unicode in some case have single symbols for math combined
characters, such as the negation of <=3D> may have its own symbol, in =
other
cases there might not, so that one still has to write \not\myrelation. =
(I
do not know if Unicode has changed lately and now has a lot of math
combining characters.)

Actually, even though one can spend some interesting thinking on how to =
do
with Unicode combining characters if they happen to math, I do not think
that the final solution will make much difference, because the
mathematicians will find out how to handle it.

(Or you will have to explain better what you have in your mind.)

-- I can add that a simple method to allow different input encodings =
when
reading from a file <filename> could be to have it to be treated by =
default
as say Unicode unless there is an ASCII file with say name <filename>.e
with information about the encoding. (One could also allow change the
default encoding for different files by means of startup arguments.) =
This
file <filename>.e could have very simple information, or as complex as =
you
bother to write the preprocessor, if you say want mixed encodings or be
able to switch between encodings in the very same file. -- In effect, =
one
is creating a mini-language for reading encodings in a way that TeX does
not have to bother about it.

  Hans Aberg

------_=_NextPart_001_01C09917.AC46EA00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: LaTeX's internal char prepresentation (UTF8 or =
Unicode?)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 12:54 -0500 2001/02/17, Barbara Beeton =
wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;while this would obviously work for text in =
natural languages,</FONT>

<BR><FONT SIZE=3D2>&gt;unicode will never contain all the possible =
&quot;embellished&quot; letters</FONT>

<BR><FONT SIZE=3D2>&gt;and symbols used in math.&nbsp; (and this may =
include instances with two</FONT>

<BR><FONT SIZE=3D2>&gt;or even more diacritics on a single letter or =
symbol.)&nbsp; this set,</FONT>

<BR><FONT SIZE=3D2>&gt;while not infinite, is much too large to want to =
address even using</FONT>

<BR><FONT SIZE=3D2>&gt;the unicode private area.&nbsp; but for latex (or =
any successor) to be</FONT>

<BR><FONT SIZE=3D2>&gt;useful for the particular content for which tex =
was first developed,</FONT>

<BR><FONT SIZE=3D2>&gt;this has to be taken into account.</FONT>
</P>

<P><FONT SIZE=3D2>I do not think about math in particular, but the other =
combining symbols:</FONT>
</P>

<P><FONT SIZE=3D2>Whereas Unicode in some case have single symbols for =
math combined</FONT>

<BR><FONT SIZE=3D2>characters, such as the negation of &lt;=3D&gt; may =
have its own symbol, in other</FONT>

<BR><FONT SIZE=3D2>cases there might not, so that one still has to write =
\not\myrelation. (I</FONT>

<BR><FONT SIZE=3D2>do not know if Unicode has changed lately and now has =
a lot of math</FONT>

<BR><FONT SIZE=3D2>combining characters.)</FONT>
</P>

<P><FONT SIZE=3D2>Actually, even though one can spend some interesting =
thinking on how to do</FONT>

<BR><FONT SIZE=3D2>with Unicode combining characters if they happen to =
math, I do not think</FONT>

<BR><FONT SIZE=3D2>that the final solution will make much difference, =
because the</FONT>

<BR><FONT SIZE=3D2>mathematicians will find out how to handle it.</FONT>
</P>

<P><FONT SIZE=3D2>(Or you will have to explain better what you have in =
your mind.)</FONT>
</P>

<P><FONT SIZE=3D2>-- I can add that a simple method to allow different =
input encodings when</FONT>

<BR><FONT SIZE=3D2>reading from a file &lt;filename&gt; could be to have =
it to be treated by default</FONT>

<BR><FONT SIZE=3D2>as say Unicode unless there is an ASCII file with say =
name &lt;filename&gt;.e</FONT>

<BR><FONT SIZE=3D2>with information about the encoding. (One could also =
allow change the</FONT>

<BR><FONT SIZE=3D2>default encoding for different files by means of =
startup arguments.) This</FONT>

<BR><FONT SIZE=3D2>file &lt;filename&gt;.e could have very simple =
information, or as complex as you</FONT>

<BR><FONT SIZE=3D2>bother to write the preprocessor, if you say want =
mixed encodings or be</FONT>

<BR><FONT SIZE=3D2>able to switch between encodings in the very same =
file. -- In effect, one</FONT>

<BR><FONT SIZE=3D2>is creating a mini-language for reading encodings in =
a way that TeX does</FONT>

<BR><FONT SIZE=3D2>not have to bother about it.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; Hans Aberg</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C09917.AC46EA00--