Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4KMpRf23213 for ; Mon, 21 May 2001 00:51:27 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4KMpR724832 . for ; Mon, 21 May 2001 00:51:27 +0200 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4KMpQ019138 for ; Mon, 21 May 2001 00:51:26 +0200 (MET DST) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0E17F.66E8C180" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id AAA12797 for ; Mon, 21 May 2001 00:51:26 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4KMpPU21216 for ; Mon, 21 May 2001 00:51:25 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <9.5FF33637@mail.listserv.gmd.de>; Mon, 21 May 2001 0:49:37 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 496025 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Mon, 21 May 2001 00:51:22 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id AAA05947 for ; Mon, 21 May 2001 00:51:21 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id AAA23544 for ; Mon, 21 May 2001 00:51:21 +0200 Received: from mail.umu.se (custer.umdac.umu.se [130.239.8.14]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4KMpKj04865 for ; Mon, 21 May 2001 00:51:20 +0200 (MET DST) Received: from [130.239.137.13] (mariehemsv093.sn.umu.se [130.239.137.13]) by mail.umu.se (8.8.8/8.8.8) with ESMTP id AAA11916; Mon, 21 May 2001 00:51:20 +0200 (MET DST) In-Reply-To: <15111.36254.279748.954703@hoelderlin.localdomain> References: <200105161742.MAA02503@riemann.math.twsu.edu> Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id AAA05948 Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Sun, 20 May 2001 23:51:19 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4091 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0E17F.66E8C180 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 11.25 +0200 2001-05-20, Marcel Oliver wrote: >Typesetting math the way it is >done in TeX _is_ visual mark-up, while (most of) the textual mark-up >in LaTeX is logical mark-up. Is there really a such qualitative difference between math and text = today? In what way does e.g. Euclid was a geometer. contain more logical mark-up than a \in A ? Certainly math is visually more complex than text (at least in the = Latin script; I'm not so sure about how a comparison with Arabic would turn = out), and manual spacing corrections are more common, but you'll have to elaborate that idea before I buy it. >So a distinct MICR will not gain anything (and probably cause multiple >problems) unless we support full logical mark-up. However, this is >really a red herring. IMHO it will render LaTeX basically unusable >for tasks it currently excels in (communication between human (!) >mathematicians), and not add anything to areas where logical markup is >required (because LaTeX would not be able to use most of the >additional information anyway). You're thinking MathML-style typified markup here, right? I don't see = where there is something like that in text today---there certainly isn't any LaTeX markup giving e.g. the analysis of a sentence (which is what that typified markup most resembles). >This leaves two issues: > >- Mapping Unicode into the current TeX (plus AMS-fonts etc.) naming > scheme, so that people will eventually be able to use a Unicode > enabled editor for their source files. Since people from the AMS > (and other math publishers?) have been working on the Unicode math > planes, I assume that this is essentially understood. > >- "Lost character conditions": If a font does not provide all > variations of a symbol that TeX or Unicode define, it should not > quietly resort to a many-to-one mapping, i.e., at least a warning > must be issued. This also seems fairly natural. For all variations of a symbol that (La)TeX defines I can agree, but I don't agree when it comes to every variation Unicode defines. (Cf. the = idea of "dumb" typesetting systems below.) In fact, this is a special case of = a more general matter: Should LaTeX necessarily respect all the (more or less) duplications of characters there are in Unicode, or should it be allowed to make identifications of characters? I propose the latter. My main reason for this is that LaTeX is a (comparatively) smart system which can know things about the context of the text it is typesetting = and thereby conclude things like "It says XXX in the manuscript, but should most likely be YYY instead (because XXX is wrong here, but very similar = to YYY, which is reasonable)." An example of this could be that if the = input contains the character U+015F (LATIN SMALL LETTER S WITH CEDILLA) and = the current language is Romanian, then that should probably be a U+0219 = (LATIN SMALL LETTER S WITH COMMA BELOW) instead. If the language had been = Turkish then it would have been the other way round. For dumb systems which do = not have such information I can understand that Unicode must contain a = couple of glyph variants in order to produce acceptable rendering of text. There are furthermore a some conditions which should be met before such identification is reasonable: * It must take place on the input side of LaTeX. Omega currently has no reasonable mechanism for this. * Which identifications are made depend the current context; only those reasonable should be made. If the language had instead been English = in the above example then there are no grounds to prefer either = character, so then we should simply follow the manuscript. Recall that it is not unusual to see much more drastic "identification rules" in books of rules for human typesetters. Lars Hellstr=F6m ------_=_NextPart_001_01C0E17F.66E8C180 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

At 11.25 +0200 2001-05-20, Marcel Oliver wrote:
>Typesetting math the way it is
>done in TeX _is_ visual mark-up, while (most of) = the textual mark-up
>in LaTeX is logical mark-up.

Is there really a such qualitative difference between = math and text today?
In what way does e.g.

   Euclid was a geometer.

contain more logical mark-up than

   a \in A

?  Certainly math is visually more complex than = text (at least in the Latin
script; I'm not so sure about how a comparison with = Arabic would turn out),
and manual spacing corrections are more common, but = you'll have to
elaborate that idea before I buy it.

>So a distinct MICR will not gain anything (and = probably cause multiple
>problems) unless we support full logical = mark-up.  However, this is
>really a red herring.  IMHO it will render = LaTeX basically unusable
>for tasks it currently excels in (communication = between human (!)
>mathematicians), and not add anything to areas = where logical markup is
>required (because LaTeX would not be able to use = most of the
>additional information anyway).

You're thinking MathML-style typified markup here, = right? I don't see where
there is something like that in text today---there = certainly isn't any
LaTeX markup giving e.g. the analysis of a sentence = (which is what that
typified markup most resembles).

>This leaves two issues:
>
>- Mapping Unicode into the current TeX (plus = AMS-fonts etc.) naming
>  scheme, so that people will eventually be = able to use a Unicode
>  enabled editor for their source = files.  Since people from the AMS
>  (and other math publishers?) have been = working on the Unicode math
>  planes, I assume that this is essentially = understood.
>
>- "Lost character conditions":  If = a font does not provide all
>  variations of a symbol that TeX or Unicode = define, it should not
>  quietly resort to a many-to-one mapping, = i.e., at least a warning
>  must be issued.  This also seems = fairly natural.

For all variations of a symbol that (La)TeX defines I = can agree, but I
don't agree when it comes to every variation Unicode = defines. (Cf. the idea
of "dumb" typesetting systems below.) In = fact, this is a special case of a
more general matter: Should LaTeX necessarily respect = all the (more or
less) duplications of characters there are in = Unicode, or should it be
allowed to make identifications of characters? I = propose the latter.

My main reason for this is that LaTeX is a = (comparatively) smart system
which can know things about the context of the text = it is typesetting and
thereby conclude things like "It says XXX in the = manuscript, but should
most likely be YYY instead (because XXX is wrong = here, but very similar to
YYY, which is reasonable)." An example of this = could be that if the input
contains the character U+015F (LATIN SMALL LETTER S = WITH CEDILLA) and the
current language is Romanian, then that should = probably be a U+0219 (LATIN
SMALL LETTER S WITH COMMA BELOW) instead. If the = language had been Turkish
then it would have been the other way round. For dumb = systems which do not
have such information I can understand that Unicode = must contain a couple
of glyph variants in order to produce acceptable = rendering of text.

There are furthermore a some conditions which should = be met before such
identification is reasonable:

 * It must take place on the input side of LaTeX. = Omega currently has no
   reasonable mechanism for this.

 * Which identifications are made depend the = current context; only those
   reasonable should be made. If the = language had instead been English in
   the above example then there are no = grounds to prefer either character,
   so then we should simply follow the = manuscript.

Recall that it is not unusual to see much more drastic = "identification
rules" in books of rules for human = typesetters.

Lars Hellstr=F6m

------_=_NextPart_001_01C0E17F.66E8C180--