Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4EAIhf22462 for ; Mon, 14 May 2001 12:18:43 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4EAIh720683 . for ; Mon, 14 May 2001 12:18:43 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4EAIbU10899 for ; Mon, 14 May 2001 12:18:37 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0DC5F.4096C380" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id MAA11165 for ; Mon, 14 May 2001 12:18:37 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4EAIb011407 for ; Mon, 14 May 2001 12:18:37 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <9.3D469F21@mail.listserv.gmd.de>; Mon, 14 May 2001 12:16:59 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 495533 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Mon, 14 May 2001 12:18:33 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id MAA11237 for ; Mon, 14 May 2001 12:18:32 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id MAA38616 for ; Mon, 14 May 2001 12:18:32 +0200 Received: from smtp.wanadoo.es (m1smtpisp02.wanadoo.es [62.36.220.21] (may be forged)) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4EAIUQ02836 for ; Mon, 14 May 2001 12:18:30 +0200 (MET DST) Received: from wanadoo.es (m1wmail1.wanadoo.es [62.36.220.41]) by smtp.wanadoo.es (8.10.2/8.10.2) with ESMTP id f4EAINI12613 for ; Mon, 14 May 2001 12:18:28 +0200 (MET DST) Return-Path: x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id MAA11238 x-xam3-api-version: 1.1.11.1.6 x-senderip: 195.53.220.3 Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Mon, 14 May 2001 11:18:23 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "jbezos" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4057 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0DC5F.4096C380 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Lars wrote > >For example, removing the fi ligature in Turkish. Or using an = alternate > >ortography in languages with contextual analysis. > > That doesn't seem like metric transformations to me, but more like Actually, they aren't, but for some reason Knuth very likely understands, this information is included in the tfm files (text font *metrics*). > There is some concern that unifying Han characters may lead to = confusion > because they are sometimes used differently by the various East = Asian > languages. Computationally, Han character unification presents no = more > difficulty than employing a single Latin character set that is used = to > write languages as different as English and French. > > If they are not different in Unicode then there probably is no reason = to > make them different in LaTeX either. As far as Unicode is concerned, that's right because Unicode doesn't deal with glyphs at all; but when we have to select a glyph from a font we need some additional information. (And even Unicode 3.1 provides tag chars for protocols not using "text" tags like xml or LaTeX.) , by doing so we are creating again a closed system > >using its own conventions with no links with external tools adapted > >to Unicode. I will be able to process a file and extract information > Depends on what type of information it is. For information specifying = the > language almost certainly yes. If you want to move around information > saying "the 8-bit characters in this piece of text should be = interpreted > according to the following input encoding" then I would say no = (amongst > other things because it would constitute a representation not known to > other programs). And I myself gave a good argument in favour of that! > Why should there exist characters which are not encoded using Unicode = en > route from the mouth to the stomach, if we're anyway using Unicode for = e.g. > hyphenation? Provided we are using Unicode for hyphenation. This is one of the main problems of TeX -- hyphenation depends on the font encoding (?). > Exactly in what way normalization should be applied and when clearly = needs > further study. Agreed. Javier _____________________________________________________________________ Conoce la que ser=E1 la pel=EDcula del verano y ll=E9vate una camiseta = de cine en http://www.marujasasesinas.com/html/concurso.html ------_=_NextPart_001_01C0DC5F.4096C380 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

Lars wrote
> >For example, removing the fi ligature in = Turkish. Or using an alternate
> >ortography in languages with contextual = analysis.
>
> That doesn't seem like metric transformations to = me, but more like

Actually, they aren't, but for some reason = Knuth
very likely understands, this information is = included
in the tfm files (text font *metrics*).

>   There is some concern that unifying = Han characters may lead to confusion
>   because they are sometimes used = differently by the various East Asian
>   languages. Computationally, Han = character unification presents no more
>   difficulty than employing a single = Latin character set that is used to
>   write languages as different as = English and French.
>
> If they are not different in Unicode then there = probably is no reason to
> make them different in LaTeX either.

As far as Unicode is concerned, that's right = because
Unicode doesn't deal with glyphs at all; but when = we
have to select a glyph from a font we need = some
additional information. (And even Unicode 3.1
provides tag chars  for protocols not = using
"text" tags like xml or LaTeX.)

, by doing so we are creating again a closed = system
> >using its own conventions with no links with = external tools adapted
> >to Unicode. I will be able to process a file = and extract information

> Depends on what type of information it is. For = information specifying the
> language almost certainly yes. If you want to = move around information
> saying "the 8-bit characters in this piece = of text should be interpreted
> according to the following input encoding" = then I would say no (amongst
> other things because it would constitute a = representation not known to
> other programs).

And I myself gave a good argument in favour of
that!

> Why should there exist characters which are not = encoded using Unicode en
> route from the mouth to the stomach, if we're = anyway using Unicode for e.g.
> hyphenation?

Provided we are using Unicode for hyphenation.
This is one of the main problems of TeX -- = hyphenation
depends on the font encoding (?).

> Exactly in what way normalization should be = applied and when clearly needs
> further study.

Agreed.

Javier
________________________________________________________________= _____
Conoce la que ser=E1 la pel=EDcula del verano y = ll=E9vate una camiseta de cine en http://www.mar= ujasasesinas.com/html/concurso.html

------_=_NextPart_001_01C0DC5F.4096C380--