MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0E17F.66E8C180"
In-Reply-To:  <15111.36254.279748.954703@hoelderlin.localdomain>
References: <v03110700b72c59993211@[195.100.226.137]>            <v03110701b72be7d21e0d@[195.100.226.135]>            <l03130302b72adc041e33@[130.239.20.144]>            <v03110707b729b6dba1a0@[195.100.226.134]>            <l03130301b729b1c95cf6@[130.239.20.144]>            <v03110704b72998e89823@[195.100.226.140]>            <Pine.LNX.4.33.0105171743020.32084-100000@bamdad.sharif.ac.ir>            <200105161742.MAA02503@riemann.math.twsu.edu>            <v03110700b72c59993211@[195.100.226.137]>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Sun, 20 May 2001 23:51:19 +0100
Message-ID:  <l03102800b72da8ba7b2c@[130.239.137.13]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0E17F.66E8C180
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 11.25 +0200 2001-05-20, Marcel Oliver wrote:
>Typesetting math the way it is
>done in TeX _is_ visual mark-up, while (most of) the textual mark-up
>in LaTeX is logical mark-up.

Is there really a such qualitative difference between math and text =
today?
In what way does e.g.

   Euclid was a geometer.

contain more logical mark-up than

   a \in A

?  Certainly math is visually more complex than text (at least in the =
Latin
script; I'm not so sure about how a comparison with Arabic would turn =
out),
and manual spacing corrections are more common, but you'll have to
elaborate that idea before I buy it.

>So a distinct MICR will not gain anything (and probably cause multiple
>problems) unless we support full logical mark-up.  However, this is
>really a red herring.  IMHO it will render LaTeX basically unusable
>for tasks it currently excels in (communication between human (!)
>mathematicians), and not add anything to areas where logical markup is
>required (because LaTeX would not be able to use most of the
>additional information anyway).

You're thinking MathML-style typified markup here, right? I don't see =
where
there is something like that in text today---there certainly isn't any
LaTeX markup giving e.g. the analysis of a sentence (which is what that
typified markup most resembles).

>This leaves two issues:
>
>- Mapping Unicode into the current TeX (plus AMS-fonts etc.) naming
>  scheme, so that people will eventually be able to use a Unicode
>  enabled editor for their source files.  Since people from the AMS
>  (and other math publishers?) have been working on the Unicode math
>  planes, I assume that this is essentially understood.
>
>- "Lost character conditions":  If a font does not provide all
>  variations of a symbol that TeX or Unicode define, it should not
>  quietly resort to a many-to-one mapping, i.e., at least a warning
>  must be issued.  This also seems fairly natural.

For all variations of a symbol that (La)TeX defines I can agree, but I
don't agree when it comes to every variation Unicode defines. (Cf. the =
idea
of "dumb" typesetting systems below.) In fact, this is a special case of =
a
more general matter: Should LaTeX necessarily respect all the (more or
less) duplications of characters there are in Unicode, or should it be
allowed to make identifications of characters? I propose the latter.

My main reason for this is that LaTeX is a (comparatively) smart system
which can know things about the context of the text it is typesetting =
and
thereby conclude things like "It says XXX in the manuscript, but should
most likely be YYY instead (because XXX is wrong here, but very similar =
to
YYY, which is reasonable)." An example of this could be that if the =
input
contains the character U+015F (LATIN SMALL LETTER S WITH CEDILLA) and =
the
current language is Romanian, then that should probably be a U+0219 =
(LATIN
SMALL LETTER S WITH COMMA BELOW) instead. If the language had been =
Turkish
then it would have been the other way round. For dumb systems which do =
not
have such information I can understand that Unicode must contain a =
couple
of glyph variants in order to produce acceptable rendering of text.

There are furthermore a some conditions which should be met before such
identification is reasonable:

 * It must take place on the input side of LaTeX. Omega currently has no
   reasonable mechanism for this.

 * Which identifications are made depend the current context; only those
   reasonable should be made. If the language had instead been English =
in
   the above example then there are no grounds to prefer either =
character,
   so then we should simply follow the manuscript.

Recall that it is not unusual to see much more drastic "identification
rules" in books of rules for human typesetters.

Lars Hellstr=F6m

------_=_NextPart_001_01C0E17F.66E8C180
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 11.25 +0200 2001-05-20, Marcel Oliver wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;Typesetting math the way it is</FONT>

<BR><FONT SIZE=3D2>&gt;done in TeX _is_ visual mark-up, while (most of) =
the textual mark-up</FONT>

<BR><FONT SIZE=3D2>&gt;in LaTeX is logical mark-up.</FONT>
</P>

<P><FONT SIZE=3D2>Is there really a such qualitative difference between =
math and text today?</FONT>

<BR><FONT SIZE=3D2>In what way does e.g.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; Euclid was a geometer.</FONT>
</P>

<P><FONT SIZE=3D2>contain more logical mark-up than</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; a \in A</FONT>
</P>

<P><FONT SIZE=3D2>?&nbsp; Certainly math is visually more complex than =
text (at least in the Latin</FONT>

<BR><FONT SIZE=3D2>script; I'm not so sure about how a comparison with =
Arabic would turn out),</FONT>

<BR><FONT SIZE=3D2>and manual spacing corrections are more common, but =
you'll have to</FONT>

<BR><FONT SIZE=3D2>elaborate that idea before I buy it.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;So a distinct MICR will not gain anything (and =
probably cause multiple</FONT>

<BR><FONT SIZE=3D2>&gt;problems) unless we support full logical =
mark-up.&nbsp; However, this is</FONT>

<BR><FONT SIZE=3D2>&gt;really a red herring.&nbsp; IMHO it will render =
LaTeX basically unusable</FONT>

<BR><FONT SIZE=3D2>&gt;for tasks it currently excels in (communication =
between human (!)</FONT>

<BR><FONT SIZE=3D2>&gt;mathematicians), and not add anything to areas =
where logical markup is</FONT>

<BR><FONT SIZE=3D2>&gt;required (because LaTeX would not be able to use =
most of the</FONT>

<BR><FONT SIZE=3D2>&gt;additional information anyway).</FONT>
</P>

<P><FONT SIZE=3D2>You're thinking MathML-style typified markup here, =
right? I don't see where</FONT>

<BR><FONT SIZE=3D2>there is something like that in text today---there =
certainly isn't any</FONT>

<BR><FONT SIZE=3D2>LaTeX markup giving e.g. the analysis of a sentence =
(which is what that</FONT>

<BR><FONT SIZE=3D2>typified markup most resembles).</FONT>
</P>

<P><FONT SIZE=3D2>&gt;This leaves two issues:</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;- Mapping Unicode into the current TeX (plus =
AMS-fonts etc.) naming</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; scheme, so that people will eventually be =
able to use a Unicode</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; enabled editor for their source =
files.&nbsp; Since people from the AMS</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; (and other math publishers?) have been =
working on the Unicode math</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; planes, I assume that this is essentially =
understood.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;- &quot;Lost character conditions&quot;:&nbsp; If =
a font does not provide all</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; variations of a symbol that TeX or Unicode =
define, it should not</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; quietly resort to a many-to-one mapping, =
i.e., at least a warning</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp; must be issued.&nbsp; This also seems =
fairly natural.</FONT>
</P>

<P><FONT SIZE=3D2>For all variations of a symbol that (La)TeX defines I =
can agree, but I</FONT>

<BR><FONT SIZE=3D2>don't agree when it comes to every variation Unicode =
defines. (Cf. the idea</FONT>

<BR><FONT SIZE=3D2>of &quot;dumb&quot; typesetting systems below.) In =
fact, this is a special case of a</FONT>

<BR><FONT SIZE=3D2>more general matter: Should LaTeX necessarily respect =
all the (more or</FONT>

<BR><FONT SIZE=3D2>less) duplications of characters there are in =
Unicode, or should it be</FONT>

<BR><FONT SIZE=3D2>allowed to make identifications of characters? I =
propose the latter.</FONT>
</P>

<P><FONT SIZE=3D2>My main reason for this is that LaTeX is a =
(comparatively) smart system</FONT>

<BR><FONT SIZE=3D2>which can know things about the context of the text =
it is typesetting and</FONT>

<BR><FONT SIZE=3D2>thereby conclude things like &quot;It says XXX in the =
manuscript, but should</FONT>

<BR><FONT SIZE=3D2>most likely be YYY instead (because XXX is wrong =
here, but very similar to</FONT>

<BR><FONT SIZE=3D2>YYY, which is reasonable).&quot; An example of this =
could be that if the input</FONT>

<BR><FONT SIZE=3D2>contains the character U+015F (LATIN SMALL LETTER S =
WITH CEDILLA) and the</FONT>

<BR><FONT SIZE=3D2>current language is Romanian, then that should =
probably be a U+0219 (LATIN</FONT>

<BR><FONT SIZE=3D2>SMALL LETTER S WITH COMMA BELOW) instead. If the =
language had been Turkish</FONT>

<BR><FONT SIZE=3D2>then it would have been the other way round. For dumb =
systems which do not</FONT>

<BR><FONT SIZE=3D2>have such information I can understand that Unicode =
must contain a couple</FONT>

<BR><FONT SIZE=3D2>of glyph variants in order to produce acceptable =
rendering of text.</FONT>
</P>

<P><FONT SIZE=3D2>There are furthermore a some conditions which should =
be met before such</FONT>

<BR><FONT SIZE=3D2>identification is reasonable:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;* It must take place on the input side of LaTeX. =
Omega currently has no</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; reasonable mechanism for this.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;* Which identifications are made depend the =
current context; only those</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; reasonable should be made. If the =
language had instead been English in</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; the above example then there are no =
grounds to prefer either character,</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; so then we should simply follow the =
manuscript.</FONT>
</P>

<P><FONT SIZE=3D2>Recall that it is not unusual to see much more drastic =
&quot;identification</FONT>

<BR><FONT SIZE=3D2>rules&quot; in books of rules for human =
typesetters.</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0E17F.66E8C180--