Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f16GVkH19387 for ; Tue, 6 Feb 2001 17:31:46 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f16GVkd05334 . for ; Tue, 6 Feb 2001 17:31:46 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0905A.4BD3C500" Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f16GVj721008 for ; Tue, 6 Feb 2001 17:31:45 +0100 (MET) Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id RAA14958 for ; Tue, 6 Feb 2001 17:31:45 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f16GVi720998 for ; Tue, 6 Feb 2001 17:31:44 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <1.20838FB6@mail.listserv.gmd.de>; Tue, 6 Feb 2001 17:31:39 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 488872 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Tue, 6 Feb 2001 17:31:42 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id RAA10502 for ; Tue, 6 Feb 2001 17:31:41 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id RAA26366 for ; Tue, 6 Feb 2001 17:31:40 +0100 Received: from nag.co.uk (openmath.nag.co.uk [62.232.54.144]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f16GVcu07165 for ; Tue, 6 Feb 2001 17:31:39 +0100 (MET) Received: (from davidc@localhost) by nag.co.uk (AIX4.2/UCB 8.7/8.7) id QAA23642; Tue, 6 Feb 2001 16:31:23 GMT In-Reply-To: <200102061609.LAA21018@pluto.math.albany.edu> (hammond@CSC.ALBANY.EDU) References: <200102061609.LAA21018@pluto.math.albany.edu> Return-Path: Content-class: urn:content-classes:message Subject: Re: default inputenc/fontenc tight to language Date: Tue, 6 Feb 2001 17:31:23 +0100 Message-ID: <200102061631.QAA23642@nag.co.uk> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "David Carlisle" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3722 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0905A.4BD3C500 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable > but not reasonable -- unless the > processor, like David Carlisle's xmltex, is a TeX thing -- for it to > know that a particular character must have \ensuremath applied. That isn't clear. A unicode text processor is supposed to know an awful lot about each character. It has to "know" that combing characters combine, and is supposed to know the default writing direction of every character, and various other properties. The property of being a math character is really just one of these. In fact it _is_ one of those see http://www.unicode.org/Public/UNIDATA/UnicodeData.html Informative Categories Abbr. Description Lm Letter, Modifier Lo Letter, Other Pc Punctuation, Connector Pd Punctuation, Dash Ps Punctuation, Open Pe Punctuation, Close Pi Punctuation, Initial quote (may behave like Ps or Pe depending = on usage) Pf Punctuation, Final quote (may behave like Ps or Pe depending = on usage) Po Punctuation, Other Sm Symbol, Math ^^^^^^^^^^^^^^^^^^^^^^ Sc Symbol, Currency Sk Symbol, Modifier So Symbol, Other one of the problems xmltex has is that it _doesn't_ know this stuff (and doesn't combine combing characters, for example) Unicode as currently devised hasn't got 2^32 characters, just 17 planes of 2^16, but even so, that's probably enough. But whether the internal canonical form is a unicode number or a latex style 7bit string \'e the issues of mapping between input encodings and this internal form, and from there to font encodings, are probably about the same. David ------_=_NextPart_001_01C0905A.4BD3C500 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: default inputenc/fontenc tight to language

>  but not reasonable -- unless the
> processor, like David Carlisle's xmltex, is a = TeX thing -- for it to
> know that a particular character must have = \ensuremath applied.

That isn't clear. A unicode text processor is supposed = to know an awful
lot about each character. It has to "know" = that combing characters
combine, and is supposed to know the default writing = direction of every
character, and various other properties. The property = of being a math
character is really just one of these.   In = fact it _is_ one of those
see
http://ww= w.unicode.org/Public/UNIDATA/UnicodeData.html


Informative Categories

 Abbr.    Description
  Lm      Letter, = Modifier
  Lo      Letter, = Other
  Pc      Punctuation, = Connector
  Pd      Punctuation, = Dash
  Ps      Punctuation, = Open
  Pe      Punctuation, = Close
  Pi      Punctuation, = Initial quote (may behave like Ps or Pe depending on usage)
  Pf      Punctuation, = Final quote (may behave like Ps or Pe depending on usage)
  Po      Punctuation, = Other
  Sm      Symbol, = Math
^^^^^^^^^^^^^^^^^^^^^^
  Sc      Symbol, = Currency
  Sk      Symbol, = Modifier
  So      Symbol, = Other


one of the problems xmltex has is that it _doesn't_ = know this stuff
(and doesn't combine combing characters, for = example)

Unicode as currently devised hasn't got 2^32 = characters, just 17 planes
of 2^16, but even so, that's probably enough. But = whether the internal
canonical form is a unicode number or a latex style = 7bit string \'e
the issues of mapping between input encodings and = this internal form,
and from there to font encodings, are probably about = the same.


David

------_=_NextPart_001_01C0905A.4BD3C500--