MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C09057.39A45780"
Content-class: urn:content-classes:message
Subject:      Re: default inputenc/fontenc tight to language
Date: Tue, 6 Feb 2001 17:09:10 +0100
Message-ID:  <200102061609.LAA21018@pluto.math.albany.edu>
From: "William F. Hammond" <hammond@CSC.ALBANY.EDU>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C09057.39A45780
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Just out of curiosity, I'm wondering what those here think about
unicode and, in particular:

1.  Is its concept of character -- basically unsigned 32 bit
    integer -- durable for, say, the next 100 years?

    (As I read the discussion here, I think not.)

2.  Do we think that 2^32 is a wise upper bound?

    (This question vanishes if we think that representing
    characters as integers, rather than as more complicated data
    structures, is inadequate.)

Unicode is directly relevant to the future of LaTeX to the extent that
LaTeX is going to be robust for formatting XML document types because
normal document content can consist of arbitary sequences of unicode
characters.  XML systems are designed to make decisions only where
markup occurs.  It is reasonable for an XML processor writing in a
typesetting language to know the markup ancestry of a character, e.g.,
whether it is within a math zone, but not reasonable -- unless the
processor, like David Carlisle's xmltex, is a TeX thing -- for it to
know that a particular character must have \ensuremath applied.

I note that in GNU Emacs these days characters can have property lists.

Thanks for your thoughts.

                                    -- Bill

------_=_NextPart_001_01C09057.39A45780
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: default inputenc/fontenc tight to language</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Just out of curiosity, I'm wondering what those here =
think about</FONT>

<BR><FONT SIZE=3D2>unicode and, in particular:</FONT>
</P>

<P><FONT SIZE=3D2>1.&nbsp; Is its concept of character -- basically =
unsigned 32 bit</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; integer -- durable for, say, the =
next 100 years?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; (As I read the discussion here, I =
think not.)</FONT>
</P>

<P><FONT SIZE=3D2>2.&nbsp; Do we think that 2^32 is a wise upper =
bound?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; (This question vanishes if we think =
that representing</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; characters as integers, rather =
than as more complicated data</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; structures, is inadequate.)</FONT>
</P>

<P><FONT SIZE=3D2>Unicode is directly relevant to the future of LaTeX to =
the extent that</FONT>

<BR><FONT SIZE=3D2>LaTeX is going to be robust for formatting XML =
document types because</FONT>

<BR><FONT SIZE=3D2>normal document content can consist of arbitary =
sequences of unicode</FONT>

<BR><FONT SIZE=3D2>characters.&nbsp; XML systems are designed to make =
decisions only where</FONT>

<BR><FONT SIZE=3D2>markup occurs.&nbsp; It is reasonable for an XML =
processor writing in a</FONT>

<BR><FONT SIZE=3D2>typesetting language to know the markup ancestry of a =
character, e.g.,</FONT>

<BR><FONT SIZE=3D2>whether it is within a math zone, but not reasonable =
-- unless the</FONT>

<BR><FONT SIZE=3D2>processor, like David Carlisle's xmltex, is a TeX =
thing -- for it to</FONT>

<BR><FONT SIZE=3D2>know that a particular character must have =
\ensuremath applied.</FONT>
</P>

<P><FONT SIZE=3D2>I note that in GNU Emacs these days characters can =
have property lists.</FONT>
</P>

<P><FONT SIZE=3D2>Thanks for your thoughts.</FONT>
</P>

<P><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
 -- Bill</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C09057.39A45780--