Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with ESMTP id s4L9ZmeF001149 for ; Wed, 21 May 2014 11:35:49 +0200 Received: from relay2.uni-heidelberg.de ([129.206.210.211]) by mx-ha.gmx.net (mxgmx002) with ESMTPS (Nemesis) id 0MRUj2-1WKLxe0q2S-00SgAX for ; Wed, 21 May 2014 11:35:43 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s4L9WnoY017423 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 21 May 2014 11:32:49 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s4L3lHAF007331; Wed, 21 May 2014 11:32:48 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 11049772 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Wed, 21 May 2014 11:32:48 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s4L9WluN007454 for ; Wed, 21 May 2014 11:32:47 +0200 Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.130]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s4L9WWc7017189 for ; Wed, 21 May 2014 11:32:35 +0200 Received: from mittelbach-online.de (pD9FE367B.dip0.t-ipconnect.de [217.254.54.123]) by mrelayeu.kundenserver.de (node=mreue007) with ESMTP (Nemesis) id 0LcJPu-1XCzxw0gCM-00jpbt; Wed, 21 May 2014 11:32:30 +0200 Received: from [192.168.123.100] (falco [192.168.123.100]) (Authenticated sender: frank) by mittelbach-online.de (Postfix) with ESMTPSA id 82C24260438 for ; Wed, 21 May 2014 11:32:07 +0200 (CEST) User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 References: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-MailScanner-ID: 82C24260438.A0ADB X-MailScanner: Not scanned: please contact your Internet E-Mail Service Provider for details X-MailScanner-From: frank.mittelbach@latex-project.org X-Spam-Status: No X-Provags-ID: V02:K0:AzY1ci6v+TRJH/1UUYyu35CSST7jaum+uEWrF5LpJkg iGjpohT0pKoYIy8z53zHdnw8uGSfgHEBXUk8fkRgY5kqBgim0t i/Or1EiIMRLz9Souz9AzP4AYpn0bfyB+TLSoM2GyaYJZu2UWBg bQ6CoXk4UoLyT9Cwr10cRgc3sy9MyY6/B5GEfTq+v5yYQtJtrT KWwTmpaIHQS0mv07RXD5i3914wf+B5nVvUPHSDNpkjsPR4PvQM GYXksOpfNQjg8cPEP3OxwR+elrT3P9CajJWG9E6bkHT3qsDC0P NuDqBsDtlTrJtqZWTMJdU7FC/QrIHqSZ1nwfJ1/z0g0XaKVbzu 8wcEbtR5vr7Qp8ez4YLj6i5Hl8DYnf3y9K6S2Qgz9ioRqDqZc8 erPb54QX/F4TQ== Message-ID: <537C728F.8000604@latex-project.org> Date: Wed, 21 May 2014 11:31:59 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Frank Mittelbach Subject: Re: Unicode math To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:LELUcwHYF2c=:2eZSU1jCQ+66q6OB3Hg0P40bKB 8p+aGxCNsfFbLQTrSTRDyMHC6E8SvgQ0Jtrv6XCLnwYndHKBanmASdtCbxgUKbdXYQE4GMa/7 sJl7kdZkWkpqJmt5upKezNyI8RvbluJjTeaPyfRJYiPaj2+jxdFIlr2xloOEWjRgMd2sapdE7 63rPoVTE95LY7aay4V1xi433ypLF59UZUoXRnXhbaMbPKP+zz2QxuVLQS+wtjoKmpTV2YLRd9 /BNWq/Zq/dqabcFvo6c0UOU7UFLQ9WHkatWJ68ZSt4ED01ViDL4NtqbeuSW6a4sqWf/PreWB6 +TuB6iFDdGLVtqAbd9I4kq9IqSFoGaVYqOz6v36lDIyyixHY5oxFSFH2YJ16MwOYDzgeTyrtu 98M0I6nUIK+kGAOu6nmJ61wsN6SSt2kGfvQBPOcSi4KGDGqAOyIyNLGnQ895DjLgFrUS/gy5D OH4TclT4am6mqPu9YQ1KfOFa7+hEa0ISNDpgpQdi2Y+kbqZLpW0mPZDmppeHMGIxeITcr3Z/M 7s2rpmNyJ2VJKQ3orzs6CNxF6C6PHOV9CiFKIVtGec8dfejl1jsOY1OYI5MfqZ1C8ZZKSBilG hif/G981q0Jxx7GHVxJy1CpVN4NcmVAwGfMxWmlbwQxG8rIBP00VNYm6mz6473L+G5ylhppy5 uOyGuLG357cbfTVq/+2nJQZ2/FIP38vrNvnt1L2tMnWHzq0x9DKL7hLte8ICBb3o92DD18Hky gVmTVPQFDUbnupySuC4nEmL7ToHkG9cXK06ykFW/rx1muxnsM0ohT6qQ9AON/2hbzFwuCdL7v 1LctWm+3jyuL8hhd0ycHGdYJ1cfxqSGSUAMzYmQ//SvVYGd7MTBRyoT4ZNkVqS79ND/V1zG4l QiV6WIZyIVoff7SCGYXDouCcxxoZ862T6wGcCG0usHR5GmkNbyqbSbMmUrZCl9TneMr/ZzGo+ 4szQoicORK8qErIrO9vtR0YpXRsD9+/f0i+ytLHA6chqxp7T2OiBd9n/blE73d3ML3TxTULAL nXddBlyqqch9vh2tFLGnSm7OGCkzwLTgOi1RAiEH8008i7xTRmzq9A13DO74xW5A0ZPZtwjj8 fFY4qm17qu8XTkBDWQru8MENeJ4PdMAJD89kN+iIdJ6cJk6UXNKt9CMFyLS94LiidZNgSTYYf MTtHoYxwzuZGAiwTTmljsM70kJTRY5+5rEDncU2zAwUqiQywABOjynlVHAAa3WvQDtMRxVFaP 4o3VLZXAJoIU9fWE1GkBu6ep9y47BdVe4MIqDKpnWbR14RtWFVmIlSnag5MkGAgi2aI05wNzz mRL4/8xKDoxmjKWGLQoUQfO+nBUrp1dFy+1wFCo8MMGcJgmvy+FAYwtP70uDojAyHf5F8DKxj IdK6m/yIH0sNKcNrCDe6ZaEc2ilMK0vFSaHV+9B8mfavD2CcwQF6v1slxRh0Sv1iR4o+au8ch ezLRl7F2rcobCj11G01W8kQSD/IwlxZBGX5YK72r9Jqr2vbVXTIo6V1K2hBXZcAmpMZRnlZaP /cwfSrTS9akFASFEV+/PeGU49RLWVRfiFREWSH2mO X-UI-Loop:V01:ydTkII7WHzY=:GW7DxZevHzz9WdTsFIVNEC145tPvZPC+lPYh6ioW0OI= Status: R X-Status: X-Keywords: X-UID: 7428 In my opinion the Unicode consortium has not screwed up (backspace backspace backspace ...) has not found the best possible for math and there is no way to *properly* reconcile the two worlds. Unicode started out as an attempt to codify plain text letters of all languages. One of the most important axioms in that respect was the idea that a "letter" is an abstract entity, e.g., Latin-small-a and that different glyphs in fonts all represent that single entity "a" regardless of shape or form it takes. So attributes like bold or serif/sans etc are all outside the scope of Unicode encoding. That makes sense if you try to convey textual meaning. This makes sense as "word" has a meaning regardless of being in italics or bold or both. (of course such attributes extend the semantics, e.g. bold may indicate a heading or italic some emphasis but underlying that "word" still has a meaning of its own (in a language). The problem with math though is that symbols in math are traditionally be not just defined by an abstracted shape, but the mathematical community early one used additional attributes of glyphs to convey semantics. So bold-lowercase-latin-letters may denote vectors and in one formula a integral symbol and a bold-integral may have totally different semantics. On top of it the semantics may change from field to field or even from paper to paper (so other than calling it a bold-integral there is not way to describe such symbols semantically). The problem with this is that mathematicians have come up with using effectively any kind of symbol/letter to denote specific semantics and long ago started to use all kind of attributes (that unicode on the level plain text regards as irrelevant) to indicate semantics too. The main point here then is that the moment that happens the attributes become frozen and symbols+attribute become relevant symbols in their own right. As a result to express the language of mathematics unicode would have needed to codify all kind of letter/symbol+attribute(s) as individual unicode points which is a difficult if not impossible task. Nevertheless, they went for this approach to some extend by codifying mathematical alphabets (mainly digits+a-z+A-Z plus some greek) and of course a large number of symbols. In the unicode book it says: The alphabets in this block encode only semantic distinction, but not which font will be used to supply the actual plain, script, Fraktur [...] Characters from the Mathematical Alphanumeric Symbol block are not to be used for nonmathematical styled text. All mathematical alphanumeric symbols have compatibility decompositions to the base Latin and Greek letters. This does not imply that the use of these characters (I guess the base ones - Frank) is discouraged for mathematical use. Folding away such distinctions [..] is usually not desirable, however, as it loses the semantic distinction for which these characters are encoded. That is all true and sensible and to explicitly encode that something is a math-caligraphic S and not just a Latin-S (that happens to be in some caligraphic font) is desirable when passing data from one application to the next as the font information is likely to be lost and thus the semantics. However, it is by no means offering a full codification of mathematical semantics, so by the end of the day you may end up with a mixture of "properly" encoded material + stuff that lost the semantic distinction. the good part is that it covers a lot but it is not comprehensive by any means and can't be due to the approach chosen. It reminds me a bit of a talk I heard recently where somebody was advocating to use sub-superscript unicode digits to avoid having to type _2 or ^3 arguing that this is easier and nicer and better readable. Well to me it isn't the moment you get to real math because then it gets inconsistent and you end up with mixed syntax. For the same reason believe that it would have been better to approach math alphabets differently in unicode and instead of codifying a few (with limited letter sets) acknowledge the fact that this "language" has a meta level where symbol+attribute encode semantics and not just symbol as such. Anyway this is no here nor there as this is what unicode offers nowadays. So where does it fail? - in case of attributed mathematical symbols, most prominently using bold as offered by the bm package, resulting in new symbols as far as the semantics are concerned - in case of multi-letter symbols (that require a fixed font (ie frozen attributes) but with kerning for aesthetic reason) - in case of using alphabets which have not been considered (like two distinctive calligraphic alphabets in parallel, or old german \neq Fraktur (as my Algebra prof did) or cyrillic or ... - in the fact of not supporting diacritics for those alphabets (minor case though) LaTeX2e's math support codified most of the needs of the mathematics language albeit only with its domain (that is within the LaTeX syntax), i.e., it wasn't supporting any unicode code points for math (as they didn't exist). So something like \mathbf was defining individual bold math letters (for which unicode now has its own code point as long as they are basic latin) but it was also offering this for word-like symbols such as \mathbf{Set} So if one now maps that to a full fledged text font that supports kerning, you lose the code point semantic distinction outside LaTeX and if you map it to the unicode plane then you have to manually deal with kerning for multi-letter sequence (which is on-trivial and can't be perfect) or live with horrible spacing. Or you need to change the interface in LaTeX and offer different commands or you change internals and distinguish between single letter and multi-letter arguments. Or ... frank