Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1HHt4f03328 for ; Sat, 17 Feb 2001 18:55:04 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1HHt3d20141 . for ; Sat, 17 Feb 2001 18:55:03 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0990A.C1694C00" Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1HHt3Q05482 for ; Sat, 17 Feb 2001 18:55:03 +0100 (MET) Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id SAA01255 for ; Sat, 17 Feb 2001 18:55:02 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1HHt2H21523 for ; Sat, 17 Feb 2001 18:55:02 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <0.91E2DC6D@mail.listserv.gmd.de>; Sat, 17 Feb 2001 18:54:50 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 489588 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sat, 17 Feb 2001 18:54:56 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id SAA23409 for ; Sat, 17 Feb 2001 18:54:44 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id SAA58222 for ; Sat, 17 Feb 2001 18:54:44 +0100 Received: from ams.org (sun06.ams.org [130.44.1.6]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1HHsix21869 for ; Sat, 17 Feb 2001 18:54:44 +0100 (MET) Received: from localhost (bnb@localhost) by ams.org (8.11.1/8.11.1) with ESMTP id f1HHseo01781 for ; Sat, 17 Feb 2001 12:54:41 -0500 (EST) In-Reply-To: Return-Path: Content-class: urn:content-classes:message Subject: Re: LaTeX's internal char prepresentation (UTF8 or Unicode?) Date: Sat, 17 Feb 2001 18:54:40 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Barbara Beeton" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3957 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0990A.C1694C00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable hans aberg writes There appears to be two variations, one based on the original TeX, = and one with TeX having some kind of extensions. As for the second approach, it seems me that the internal = representation should be 32-bit Unicode. As TeX does not seem well equipped = handling the encoding issues, one should then hook up a preprocessor providing = the suitable translations. Thus whatever encoding -> preprocessor -> UTeX This easy-to-write preprocessor can combine combining characters to = single Unicode characters, if possible, or otherwise write them on a form = that UTeX easily can handle, say by switching from postfix to prefix = notation, or whatever. With further tweaking of the TeX engine it could even = combine TeX combinations such as "--", "---" into single Unicode characters. while this would obviously work for text in natural languages, unicode will never contain all the possible "embellished" letters and symbols used in math. (and this may include instances with two or even more diacritics on a single letter or symbol.) this set, while not infinite, is much too large to want to address even using the unicode private area. but for latex (or any successor) to be useful for the particular content for which tex was first developed, this has to be taken into account. -- bb ------_=_NextPart_001_01C0990A.C1694C00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: LaTeX's internal char prepresentation (UTF8 or = Unicode?)

hans aberg writes

    There appears to be two variations, = one based on the original TeX, and one
    with TeX having some kind of = extensions.

    As for the second approach, it = seems me that the internal representation
    should be 32-bit Unicode. As TeX = does not seem well equipped handling the
    encoding issues, one should then = hook up a preprocessor providing the
    suitable translations. Thus
        whatever = encoding -> preprocessor -> UTeX
    This easy-to-write preprocessor = can combine combining characters to single
    Unicode characters, if possible, = or otherwise write them on a form that
    UTeX easily can handle, say by = switching from postfix to prefix notation,
    or whatever. With further tweaking = of the TeX engine it could even combine
    TeX combinations such as = "--", "---" into single Unicode characters.

while this would obviously work for text in natural = languages,
unicode will never contain all the possible = "embellished" letters
and symbols used in math.  (and this may include = instances with two
or even more diacritics on a single letter or = symbol.)  this set,
while not infinite, is much too large to want to = address even using
the unicode private area.  but for latex (or any = successor) to be
useful for the particular content for which tex was = first developed,
this has to be taken into account.
          &nbs= p;            = ;            =             &= nbsp;        -- bb

------_=_NextPart_001_01C0990A.C1694C00--