Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1QJhmW02344 for ; Mon, 26 Feb 2001 20:43:48 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1QJhhs24128 . for ; Mon, 26 Feb 2001 20:43:43 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1QJh7H13132 for ; Mon, 26 Feb 2001 20:43:27 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0A02C.6FBC7200" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id UAA28107 for ; Mon, 26 Feb 2001 20:42:51 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1QJgVQ24662 for ; Mon, 26 Feb 2001 20:42:46 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <0.13DBC616@mail.listserv.gmd.de>; Mon, 26 Feb 2001 20:42:20 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 494115 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Mon, 26 Feb 2001 20:36:27 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id UAA05378 for ; Mon, 26 Feb 2001 20:36:25 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id UAA45380 for ; Mon, 26 Feb 2001 20:36:26 +0100 Received: from knatte.tninet.se (knatte.tninet.se [195.100.94.10]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with SMTP id f1QJaPh00908 for ; Mon, 26 Feb 2001 20:36:26 +0100 (MET) Received: (qmail 21565 invoked from network); 26 Feb 2001 20:36:24 +0100 Received: from garibaldi.tninet.se (HELO algonet.se) (195.100.94.103) by knatte.tninet.se with SMTP; 26 Feb 2001 20:36:24 +0100 Received: from [195.100.226.136] (du159-226.ppp.su-anst.tninet.se [195.100.226.159]) by garibaldi.tninet.se (BLUETAIL Mail Robustifier 2.2.1) with ESMTP id 745450.216182.983garibaldi-s1 for ; Mon, 26 Feb 2001 20:36:22 +0100 In-Reply-To: <200102261652.QAA28993@penguin.nag.co.uk> References: (message from Hans Aberg on Mon, 26 Feb 2001 16:37:33 +0100) (message from Hans Aberg on Fri, 23 Feb 2001 21:04:40 +0100) (message from Barbara Beeton on Fri, 23 Feb 2001 11:16:42 -0500) Return-Path: X-Sender: haberg@pop.matematik.su.se Content-class: urn:content-classes:message Subject: Re: LaTeX's internal char prepresentation (UTF8 or Unicode?) Date: Mon, 26 Feb 2001 20:34:45 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Hans Aberg" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4021 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0A02C.6FBC7200 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 16:52 +0000 2001/02/26, David Carlisle wrote: >> What characters are included in a set? >you could look at the tables:-) Yes I did (apart from the fact that I could find no convenient archive = to pick them down, making the process excruciatingly slow on my computer). >> If it is a-zA-Z0-9 plus undotted ij, >> one set has 64 characters, giving room for 1024/64 =3D 16 sets. > >It varies from set to set. the basic collections are > >letters (a-z A-Z) >digits >greek (including variant greek forms) > >some of the alphabets don't have greek some don't have grrek or = letters. Looking into the TeX book, there are 40 Greek characters not identical = to Latin. It might be tempting to add a full set of Greek letters, but in _math_ it seems pointless: letters will mostly appear singly with no = other suitable context information identifying them as Greek. (By contrast, in Greek text, one will know that they are semantically Greek letters from = the context, and further they may be drawn from special Greek fonts, giving them a slightly different look from the Latin letters, which may be = drawn from a different Latin font). If the Greek letters appear in shapes upright slanted bold bold slanted that gives 160 characters. This gives at most (1024 - 160)/64, or 13 Latin sets. I think these = should be Bold Italic Bold Italic Double-struck Calligraphic Bold Calligraphic Script Bold Script Fraktur Sans-serif Bold Sans-serif Sans-serif Italic Sans-serif Bold Italic with no "Bold Fraktur" and no "Monospace". -- The monospace is not really a _math_ font, there is no _semantic_ difference in using a monospace over another font, not even when writing computer language code. So strictly speaking, it is a form of rendering. And the "Bold Fraktur" seems unnecessary. That is, unless somebody can demonstrate that it is in actual use. By contrast, I can think of a (thought) example where Calligraphic and Script are in use in the same formula: I think the "O" of order O(n) (as = in complexity of algorithms, for example) should be in the RSFS like = Script. But it would be perfectly OK to have Calligraphic letters denoting some other quantity (say categorical objects, even though some prefer Script = for that too). Well, anyway, one could without too much effort produce = sensible formulas where the two appear side-by-side, indicating different = semantic meanings. But perhaps Unicode has already made up its mind, so there is nothing to = do about it... Hans Aberg ------_=_NextPart_001_01C0A02C.6FBC7200 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: LaTeX's internal char prepresentation (UTF8 or = Unicode?)

At 16:52 +0000 2001/02/26, David Carlisle = wrote:
>> What characters are included in a = set?
>you could look at the tables:-)

Yes I did (apart from the fact that I could find no = convenient archive to
pick them down, making the process excruciatingly = slow on my computer).

>>  If it is a-zA-Z0-9 plus undotted = ij,
>> one set has 64 characters, giving room for = 1024/64 =3D 16 sets.
>
>It varies from set to set. the basic collections = are
>
>letters (a-z A-Z)
>digits
>greek (including variant greek forms)
>
>some of the alphabets don't have greek some don't = have grrek or letters.

Looking into the TeX book, there are 40 Greek = characters not identical to
Latin. It might be tempting to add a full set of = Greek letters, but in
_math_ it seems pointless: letters will mostly appear = singly with no other
suitable context information identifying them as = Greek. (By contrast, in
Greek text, one will know that they are semantically = Greek letters from the
context, and further they may be drawn from special = Greek fonts, giving
them a slightly different look from the Latin = letters, which may be drawn
from a different Latin font).

If the Greek letters appear in shapes
  upright
  slanted
  bold
  bold slanted
that gives 160 characters.

This gives at most (1024 - 160)/64, or 13 Latin sets. = I think these should be
  Bold
  Italic
  Bold Italic
  Double-struck
  Calligraphic
  Bold Calligraphic
  Script
  Bold Script
  Fraktur
  Sans-serif
  Bold Sans-serif
  Sans-serif Italic
  Sans-serif Bold Italic
with no "Bold Fraktur" and no = "Monospace".

-- The monospace is not really a _math_ font, there is = no _semantic_
difference in using a monospace over another font, = not even when writing
computer language code. So strictly speaking, it is a = form of rendering.

And the "Bold Fraktur" seems unnecessary. = That is, unless somebody can
demonstrate that it is in actual use.

By contrast, I can think of a (thought) example where = Calligraphic and
Script are in use in the same formula: I think the = "O" of order O(n) (as in
complexity of algorithms, for example) should be in = the RSFS like Script.
But it would be perfectly OK to have Calligraphic = letters denoting some
other quantity (say categorical objects, even though = some prefer Script for
that too). Well, anyway, one could without too much = effort produce sensible
formulas where the two appear side-by-side, = indicating different semantic
meanings.

But perhaps Unicode has already made up its mind, so = there is nothing to do
about it...

  Hans Aberg

------_=_NextPart_001_01C0A02C.6FBC7200--