MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0DC76.96C33100"
In-Reply-To:  <GDBLYN$Ife5ywbJjaHUQ5xE_bDZCYR7WbXxSsz@wanadoo.es>
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Mon, 14 May 2001 14:05:35 +0100
Message-ID:  <l03130300b72583874ed7@[130.239.20.144]>
From: =?iso-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@MATH.UMU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0DC76.96C33100
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 12.18 +0200 2001-05-14, jbezos wrote:
>Lars wrote
>> >For example, removing the fi ligature in Turkish. Or using an =
alternate
>> >ortography in languages with contextual analysis.
>>
>> That doesn't seem like metric transformations to me, but more like
>
>Actually, they aren't, but for some reason Knuth
>very likely understands, this information is included
>in the tfm files (text font *metrics*).

Don't overestimate Knuth's foresight here. He needed some place to store
the font-related information TeX would need, so he simply packed all =
kinds
of information into a single file.

>>   There is some concern that unifying Han characters may lead to =
confusion
>>   because they are sometimes used differently by the various East =
Asian
>>   languages. Computationally, Han character unification presents no =
more
>>   difficulty than employing a single Latin character set that is used =
to
>>   write languages as different as English and French.
>>
>> If they are not different in Unicode then there probably is no reason =
to
>> make them different in LaTeX either.
>
>As far as Unicode is concerned, that's right because
>Unicode doesn't deal with glyphs at all; but when we
>have to select a glyph from a font we need some
>additional information. (And even Unicode 3.1
>provides tag chars  for protocols not using
>"text" tags like xml or LaTeX.)

If Unicode doesn't consider them to be distinct characters, then I see =
no
reason why LaTeX should. In general, we don't (as users of TeX or some
extension) select a glyph from a font, as the font is already a mapping
from characters to glyphs. If the user wants a specific rendering of a
character then he should choose a font where the character is rendered =
in
that way, not request that each font should provide all alternative
renderings.

You may want to compare with the situation in the latin script some
200--300 years ago. Some languages (e.g. French) were always set in
antiqua, whereas others (e.g. German) were always set in fraktur. Had
computers existed back then there would probably had been tag characters
for selecting antiqua or fraktur in the encodings used, but there would
have been little point in having distinct code points for the antiqua =
and
fraktur alphabets. Math is of course an exception, since the =
corresponding
letters aren't semantically equivalent in mathematical formulae.

>> Why should there exist characters which are not encoded using Unicode =
en
>> route from the mouth to the stomach, if we're anyway using Unicode =
for e.g.
>> hyphenation?
>
>Provided we are using Unicode for hyphenation.
>This is one of the main problems of TeX -- hyphenation
>depends on the font encoding (?).

You have to encode the hyphenation patterns somehow. As Unicode will =
cover
all known scripts it can be used as a universal encoding. Furthermore I
thought that there were OCPs (acting approximately at \shipout time) =
that
converted from Unicode to the actual font encodings when they are not =
the
same. Is this not correct?

Lars Hellstr=F6m

------_=_NextPart_001_01C0DC76.96C33100
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 12.18 +0200 2001-05-14, jbezos wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;Lars wrote</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; &gt;For example, removing the fi ligature in =
Turkish. Or using an alternate</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; &gt;ortography in languages with contextual =
analysis.</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; That doesn't seem like metric =
transformations to me, but more like</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Actually, they aren't, but for some reason =
Knuth</FONT>

<BR><FONT SIZE=3D2>&gt;very likely understands, this information is =
included</FONT>

<BR><FONT SIZE=3D2>&gt;in the tfm files (text font *metrics*).</FONT>
</P>

<P><FONT SIZE=3D2>Don't overestimate Knuth's foresight here. He needed =
some place to store</FONT>

<BR><FONT SIZE=3D2>the font-related information TeX would need, so he =
simply packed all kinds</FONT>

<BR><FONT SIZE=3D2>of information into a single file.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt;&nbsp;&nbsp; There is some concern that =
unifying Han characters may lead to confusion</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp;&nbsp; because they are sometimes used =
differently by the various East Asian</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp;&nbsp; languages. Computationally, Han =
character unification presents no more</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp;&nbsp; difficulty than employing a =
single Latin character set that is used to</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp;&nbsp; write languages as different as =
English and French.</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; If they are not different in Unicode then =
there probably is no reason to</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; make them different in LaTeX either.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;As far as Unicode is concerned, that's right =
because</FONT>

<BR><FONT SIZE=3D2>&gt;Unicode doesn't deal with glyphs at all; but when =
we</FONT>

<BR><FONT SIZE=3D2>&gt;have to select a glyph from a font we need =
some</FONT>

<BR><FONT SIZE=3D2>&gt;additional information. (And even Unicode =
3.1</FONT>

<BR><FONT SIZE=3D2>&gt;provides tag chars&nbsp; for protocols not =
using</FONT>

<BR><FONT SIZE=3D2>&gt;&quot;text&quot; tags like xml or LaTeX.)</FONT>
</P>

<P><FONT SIZE=3D2>If Unicode doesn't consider them to be distinct =
characters, then I see no</FONT>

<BR><FONT SIZE=3D2>reason why LaTeX should. In general, we don't (as =
users of TeX or some</FONT>

<BR><FONT SIZE=3D2>extension) select a glyph from a font, as the font is =
already a mapping</FONT>

<BR><FONT SIZE=3D2>from characters to glyphs. If the user wants a =
specific rendering of a</FONT>

<BR><FONT SIZE=3D2>character then he should choose a font where the =
character is rendered in</FONT>

<BR><FONT SIZE=3D2>that way, not request that each font should provide =
all alternative</FONT>

<BR><FONT SIZE=3D2>renderings.</FONT>
</P>

<P><FONT SIZE=3D2>You may want to compare with the situation in the =
latin script some</FONT>

<BR><FONT SIZE=3D2>200--300 years ago. Some languages (e.g. French) were =
always set in</FONT>

<BR><FONT SIZE=3D2>antiqua, whereas others (e.g. German) were always set =
in fraktur. Had</FONT>

<BR><FONT SIZE=3D2>computers existed back then there would probably had =
been tag characters</FONT>

<BR><FONT SIZE=3D2>for selecting antiqua or fraktur in the encodings =
used, but there would</FONT>

<BR><FONT SIZE=3D2>have been little point in having distinct code points =
for the antiqua and</FONT>

<BR><FONT SIZE=3D2>fraktur alphabets. Math is of course an exception, =
since the corresponding</FONT>

<BR><FONT SIZE=3D2>letters aren't semantically equivalent in =
mathematical formulae.</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&gt; Why should there exist characters which are =
not encoded using Unicode en</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; route from the mouth to the stomach, if =
we're anyway using Unicode for e.g.</FONT>

<BR><FONT SIZE=3D2>&gt;&gt; hyphenation?</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;Provided we are using Unicode for =
hyphenation.</FONT>

<BR><FONT SIZE=3D2>&gt;This is one of the main problems of TeX -- =
hyphenation</FONT>

<BR><FONT SIZE=3D2>&gt;depends on the font encoding (?).</FONT>
</P>

<P><FONT SIZE=3D2>You have to encode the hyphenation patterns somehow. =
As Unicode will cover</FONT>

<BR><FONT SIZE=3D2>all known scripts it can be used as a universal =
encoding. Furthermore I</FONT>

<BR><FONT SIZE=3D2>thought that there were OCPs (acting approximately at =
\shipout time) that</FONT>

<BR><FONT SIZE=3D2>converted from Unicode to the actual font encodings =
when they are not the</FONT>

<BR><FONT SIZE=3D2>same. Is this not correct?</FONT>
</P>

<P><FONT SIZE=3D2>Lars Hellstr=F6m</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0DC76.96C33100--