Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1E8AdH30426 for ; Wed, 14 Feb 2001 09:10:39 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1E8Add05125 . for ; Wed, 14 Feb 2001 09:10:39 +0100 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1E8AcM24162 for ; Wed, 14 Feb 2001 09:10:38 +0100 (MET) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0965D.9DCDA980" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id JAA13591 for ; Wed, 14 Feb 2001 09:10:37 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1E8AbM24158 for ; Wed, 14 Feb 2001 09:10:37 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <8.70DDDF4F@mail.listserv.gmd.de>; Wed, 14 Feb 2001 9:10:29 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 487621 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Wed, 14 Feb 2001 09:10:33 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id JAA13983 for ; Wed, 14 Feb 2001 09:10:32 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id JAA29278 for ; Wed, 14 Feb 2001 09:10:33 +0100 Received: from nets5.rz.rwth-aachen.de (nets5.rz.RWTH-Aachen.DE [137.226.144.13]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1E8AXV04314 for ; Wed, 14 Feb 2001 09:10:33 +0100 (MET) Received: from campino.informatik.rwth-aachen.de (campino.Informatik.RWTH-Aachen.DE [137.226.116.240]) by nets5.rz.rwth-aachen.de (8.10.1/8.10.1/6) with ESMTP id f1E8AWZ18437 for ; Wed, 14 Feb 2001 09:10:32 +0100 (MET) Received: from diabolo.Informatik.RWTH-Aachen.DE (diabolo.Informatik.RWTH-Aachen.DE [137.226.53.122]) by campino.informatik.rwth-aachen.de (8.9.1a/8.9.1/3) with ESMTP id JAA25805 for ; Wed, 14 Feb 2001 09:10:31 +0100 (MET) In-Reply-To: <14985.47705.856506.806342@istrati.zdv.uni-mainz.de> Return-Path: X-Sender: blume@diabolo.Informatik.RWTH-Aachen.DE Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary Date: Wed, 14 Feb 2001 09:41:51 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Achim Blumensath" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3912 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0965D.9DCDA980 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hello, On Tue, 13 Feb 2001, Frank Mittelbach wrote: > > - On all major platforms, support for editing and displaying UTF8 > > exists and either is currently moving into mass deployment. = Major > > programming languages have UTF8 libraries, so the basic > > infrastructure for UTF8 is or will be in place shortly. > > remains to be seen. in the long term most likely yes, but how many of = the > people on this list can easily (in their favorite editing system) edit = or > generate a utf8 encoded file? hands up? The standard encoding of BeOS is UTF8. I don't know whether the number = of TeX-installations under BeOS exceeds, say, 100, though. I don't think that Omega or NTS will replace TeX anytime soon, so here are some rough ideas how to implement unicode support in TeX: (a) Internally unicode characters can be encodes as command sequences of the form \, i.e., `A' would become `\0041'. (b) Each font would define these sequences appropriately, i.e, `\def\0041{A}'. Characters not included in the font would raise an error message. (c) To convert the input file to the internal representation one could write a preprocessor in TeX which is invoked by the \documentclass command. That's IMHO the easiest way and I don't think the runtime = penalty would be that great. The preprocessor should leave command sequences and braces alone, i.e., `\begin{bar}' would become = `\begin{\0062\0061\0072}'. The only problem I see with this approache are \catcode-changes. Any thoughts? Achim -- ________________________________________________________________________ | \_____/ = | Achim Blumensath \O/ \___/\ = | Mathematische Grundlagen der Informatik =3Do=3D \ = /\ \| www-mgi.informatik.rwth-aachen.de/~blume /"\ = o----| ____________________________________________________________________\___|= ------_=_NextPart_001_01C0965D.9DCDA980 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary

Hello,

On Tue, 13 Feb 2001, Frank Mittelbach wrote:
>  > - On all major platforms, support for = editing and displaying UTF8
>  >   exists and either is = currently moving into mass deployment.  Major
>  >   programming languages = have UTF8 libraries, so the basic
>  >   infrastructure for UTF8 = is or will be in place shortly.
>
> remains to be seen. in the long term most likely = yes, but how many of the
> people on this list can easily (in their = favorite editing system) edit or
> generate a utf8 encoded file? hands up?

The standard encoding of BeOS is UTF8. I don't know = whether the number of
TeX-installations under BeOS exceeds, say, 100, = though.

I don't think that Omega or NTS will replace TeX = anytime soon, so here
are some rough ideas how to implement unicode support = in TeX:

(a) Internally unicode characters can be encodes as = command sequences of
the form \<some hex sequence>, i.e., `A' would = become `\0041'.

(b) Each font would define these sequences = appropriately, i.e,
`\def\0041{A}'. Characters not included in the font = would raise an
error message.

(c) To convert the input file to the internal = representation one could
write a preprocessor in TeX which is invoked by the = \documentclass
command. That's IMHO the easiest way and I don't = think the runtime penalty
would be that great. The preprocessor should leave = command sequences and
braces alone, i.e., `\begin{bar}' would become = `\begin{\0062\0061\0072}'.
The only problem I see with this approache are = \catcode-changes.

Any thoughts?

Achim
--
________________________________________________________________= ________
          &nbs= p;            = ;            =             &= nbsp;           &n= bsp;  | \_____/ |
   Achim = Blumensath          &nb= sp;           &nbs= p;            = ;       \O/ \___/\ |
   Mathematische Grundlagen der = Informatik          &nb= sp;        =3Do=3D  \ /\ = \|
   = www-mgi.informatik.rwth-aachen.de/~blume     &nb= sp;            = /"\   o----|
________________________________________________________________= ____\___|

------_=_NextPart_001_01C0965D.9DCDA980--