Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4ED5kf23599 for ; Mon, 14 May 2001 15:05:46 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4ED5j721468 . for ; Mon, 14 May 2001 15:05:46 +0200 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4ED5g025299 for ; Mon, 14 May 2001 15:05:42 +0200 (MET DST) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0DC76.96C33100" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id PAA25622 for ; Mon, 14 May 2001 15:05:41 +0200 (MEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4ED5f025293 for ; Mon, 14 May 2001 15:05:41 +0200 (MET DST) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <2.9401B5CE@mail.listserv.gmd.de>; Mon, 14 May 2001 15:04:03 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 495879 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Mon, 14 May 2001 15:05:38 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id PAA14358 for ; Mon, 14 May 2001 15:05:36 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id PAA18292 for ; Mon, 14 May 2001 15:05:36 +0200 Received: from abel.math.umu.se (abel.math.umu.se [130.239.20.139]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4ED5ZQ00796 for ; Mon, 14 May 2001 15:05:35 +0200 (MET DST) Received: from [130.239.20.144] (mac144.math.umu.se [130.239.20.144]) by abel.math.umu.se (8.9.2/8.9.2) with ESMTP id PAA08814 for ; Mon, 14 May 2001 15:02:16 +0200 (CEST) In-Reply-To: Return-Path: X-Sender: lars@abel.math.umu.se x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id PAA14360 Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary 2.2 Date: Mon, 14 May 2001 14:05:35 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4061 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0DC76.96C33100 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 12.18 +0200 2001-05-14, jbezos wrote: >Lars wrote >> >For example, removing the fi ligature in Turkish. Or using an = alternate >> >ortography in languages with contextual analysis. >> >> That doesn't seem like metric transformations to me, but more like > >Actually, they aren't, but for some reason Knuth >very likely understands, this information is included >in the tfm files (text font *metrics*). Don't overestimate Knuth's foresight here. He needed some place to store the font-related information TeX would need, so he simply packed all = kinds of information into a single file. >> There is some concern that unifying Han characters may lead to = confusion >> because they are sometimes used differently by the various East = Asian >> languages. Computationally, Han character unification presents no = more >> difficulty than employing a single Latin character set that is used = to >> write languages as different as English and French. >> >> If they are not different in Unicode then there probably is no reason = to >> make them different in LaTeX either. > >As far as Unicode is concerned, that's right because >Unicode doesn't deal with glyphs at all; but when we >have to select a glyph from a font we need some >additional information. (And even Unicode 3.1 >provides tag chars for protocols not using >"text" tags like xml or LaTeX.) If Unicode doesn't consider them to be distinct characters, then I see = no reason why LaTeX should. In general, we don't (as users of TeX or some extension) select a glyph from a font, as the font is already a mapping from characters to glyphs. If the user wants a specific rendering of a character then he should choose a font where the character is rendered = in that way, not request that each font should provide all alternative renderings. You may want to compare with the situation in the latin script some 200--300 years ago. Some languages (e.g. French) were always set in antiqua, whereas others (e.g. German) were always set in fraktur. Had computers existed back then there would probably had been tag characters for selecting antiqua or fraktur in the encodings used, but there would have been little point in having distinct code points for the antiqua = and fraktur alphabets. Math is of course an exception, since the = corresponding letters aren't semantically equivalent in mathematical formulae. >> Why should there exist characters which are not encoded using Unicode = en >> route from the mouth to the stomach, if we're anyway using Unicode = for e.g. >> hyphenation? > >Provided we are using Unicode for hyphenation. >This is one of the main problems of TeX -- hyphenation >depends on the font encoding (?). You have to encode the hyphenation patterns somehow. As Unicode will = cover all known scripts it can be used as a universal encoding. Furthermore I thought that there were OCPs (acting approximately at \shipout time) = that converted from Unicode to the actual font encodings when they are not = the same. Is this not correct? Lars Hellstr=F6m ------_=_NextPart_001_01C0DC76.96C33100 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary 2.2

At 12.18 +0200 2001-05-14, jbezos wrote:
>Lars wrote
>> >For example, removing the fi ligature in = Turkish. Or using an alternate
>> >ortography in languages with contextual = analysis.
>>
>> That doesn't seem like metric = transformations to me, but more like
>
>Actually, they aren't, but for some reason = Knuth
>very likely understands, this information is = included
>in the tfm files (text font *metrics*).

Don't overestimate Knuth's foresight here. He needed = some place to store
the font-related information TeX would need, so he = simply packed all kinds
of information into a single file.

>>   There is some concern that = unifying Han characters may lead to confusion
>>   because they are sometimes used = differently by the various East Asian
>>   languages. Computationally, Han = character unification presents no more
>>   difficulty than employing a = single Latin character set that is used to
>>   write languages as different as = English and French.
>>
>> If they are not different in Unicode then = there probably is no reason to
>> make them different in LaTeX either.
>
>As far as Unicode is concerned, that's right = because
>Unicode doesn't deal with glyphs at all; but when = we
>have to select a glyph from a font we need = some
>additional information. (And even Unicode = 3.1
>provides tag chars  for protocols not = using
>"text" tags like xml or LaTeX.)

If Unicode doesn't consider them to be distinct = characters, then I see no
reason why LaTeX should. In general, we don't (as = users of TeX or some
extension) select a glyph from a font, as the font is = already a mapping
from characters to glyphs. If the user wants a = specific rendering of a
character then he should choose a font where the = character is rendered in
that way, not request that each font should provide = all alternative
renderings.

You may want to compare with the situation in the = latin script some
200--300 years ago. Some languages (e.g. French) were = always set in
antiqua, whereas others (e.g. German) were always set = in fraktur. Had
computers existed back then there would probably had = been tag characters
for selecting antiqua or fraktur in the encodings = used, but there would
have been little point in having distinct code points = for the antiqua and
fraktur alphabets. Math is of course an exception, = since the corresponding
letters aren't semantically equivalent in = mathematical formulae.

>> Why should there exist characters which are = not encoded using Unicode en
>> route from the mouth to the stomach, if = we're anyway using Unicode for e.g.
>> hyphenation?
>
>Provided we are using Unicode for = hyphenation.
>This is one of the main problems of TeX -- = hyphenation
>depends on the font encoding (?).

You have to encode the hyphenation patterns somehow. = As Unicode will cover
all known scripts it can be used as a universal = encoding. Furthermore I
thought that there were OCPs (acting approximately at = \shipout time) that
converted from Unicode to the actual font encodings = when they are not the
same. Is this not correct?

Lars Hellstr=F6m

------_=_NextPart_001_01C0DC76.96C33100--