MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0DC5F.4096C380"
Content-class: urn:content-classes:message
Subject:      Re: Multilingual Encodings Summary 2.2
Date: Mon, 14 May 2001 11:18:23 +0100
Message-ID:  <GDBLYN$Ife5ywbJjaHUQ5xE_bDZCYR7WbXxSsz@wanadoo.es>
From: "jbezos" <jbezos@WANADOO.ES>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0DC5F.4096C380
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Lars wrote
> >For example, removing the fi ligature in Turkish. Or using an =
alternate
> >ortography in languages with contextual analysis.
>
> That doesn't seem like metric transformations to me, but more like

Actually, they aren't, but for some reason Knuth
very likely understands, this information is included
in the tfm files (text font *metrics*).

>   There is some concern that unifying Han characters may lead to =
confusion
>   because they are sometimes used differently by the various East =
Asian
>   languages. Computationally, Han character unification presents no =
more
>   difficulty than employing a single Latin character set that is used =
to
>   write languages as different as English and French.
>
> If they are not different in Unicode then there probably is no reason =
to
> make them different in LaTeX either.

As far as Unicode is concerned, that's right because
Unicode doesn't deal with glyphs at all; but when we
have to select a glyph from a font we need some
additional information. (And even Unicode 3.1
provides tag chars  for protocols not using
"text" tags like xml or LaTeX.)

, by doing so we are creating again a closed system
> >using its own conventions with no links with external tools adapted
> >to Unicode. I will be able to process a file and extract information

> Depends on what type of information it is. For information specifying =
the
> language almost certainly yes. If you want to move around information
> saying "the 8-bit characters in this piece of text should be =
interpreted
> according to the following input encoding" then I would say no =
(amongst
> other things because it would constitute a representation not known to
> other programs).

And I myself gave a good argument in favour of
that!

> Why should there exist characters which are not encoded using Unicode =
en
> route from the mouth to the stomach, if we're anyway using Unicode for =
e.g.
> hyphenation?

Provided we are using Unicode for hyphenation.
This is one of the main problems of TeX -- hyphenation
depends on the font encoding (?).

> Exactly in what way normalization should be applied and when clearly =
needs
> further study.

Agreed.

Javier
_____________________________________________________________________
Conoce la que ser=E1 la pel=EDcula del verano y ll=E9vate una camiseta =
de cine en http://www.marujasasesinas.com/html/concurso.html

------_=_NextPart_001_01C0DC5F.4096C380
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: Multilingual Encodings Summary 2.2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Lars wrote</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;For example, removing the fi ligature in =
Turkish. Or using an alternate</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;ortography in languages with contextual =
analysis.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt; That doesn't seem like metric transformations to =
me, but more like</FONT>
</P>

<P><FONT SIZE=3D2>Actually, they aren't, but for some reason =
Knuth</FONT>

<BR><FONT SIZE=3D2>very likely understands, this information is =
included</FONT>

<BR><FONT SIZE=3D2>in the tfm files (text font *metrics*).</FONT>
</P>

<P><FONT SIZE=3D2>&gt;&nbsp;&nbsp; There is some concern that unifying =
Han characters may lead to confusion</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; because they are sometimes used =
differently by the various East Asian</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; languages. Computationally, Han =
character unification presents no more</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; difficulty than employing a single =
Latin character set that is used to</FONT>

<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; write languages as different as =
English and French.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt; If they are not different in Unicode then there =
probably is no reason to</FONT>

<BR><FONT SIZE=3D2>&gt; make them different in LaTeX either.</FONT>
</P>

<P><FONT SIZE=3D2>As far as Unicode is concerned, that's right =
because</FONT>

<BR><FONT SIZE=3D2>Unicode doesn't deal with glyphs at all; but when =
we</FONT>

<BR><FONT SIZE=3D2>have to select a glyph from a font we need =
some</FONT>

<BR><FONT SIZE=3D2>additional information. (And even Unicode 3.1</FONT>

<BR><FONT SIZE=3D2>provides tag chars&nbsp; for protocols not =
using</FONT>

<BR><FONT SIZE=3D2>&quot;text&quot; tags like xml or LaTeX.)</FONT>
</P>

<P><FONT SIZE=3D2>, by doing so we are creating again a closed =
system</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;using its own conventions with no links with =
external tools adapted</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;to Unicode. I will be able to process a file =
and extract information</FONT>
</P>

<P><FONT SIZE=3D2>&gt; Depends on what type of information it is. For =
information specifying the</FONT>

<BR><FONT SIZE=3D2>&gt; language almost certainly yes. If you want to =
move around information</FONT>

<BR><FONT SIZE=3D2>&gt; saying &quot;the 8-bit characters in this piece =
of text should be interpreted</FONT>

<BR><FONT SIZE=3D2>&gt; according to the following input encoding&quot; =
then I would say no (amongst</FONT>

<BR><FONT SIZE=3D2>&gt; other things because it would constitute a =
representation not known to</FONT>

<BR><FONT SIZE=3D2>&gt; other programs).</FONT>
</P>

<P><FONT SIZE=3D2>And I myself gave a good argument in favour of</FONT>

<BR><FONT SIZE=3D2>that!</FONT>
</P>

<P><FONT SIZE=3D2>&gt; Why should there exist characters which are not =
encoded using Unicode en</FONT>

<BR><FONT SIZE=3D2>&gt; route from the mouth to the stomach, if we're =
anyway using Unicode for e.g.</FONT>

<BR><FONT SIZE=3D2>&gt; hyphenation?</FONT>
</P>

<P><FONT SIZE=3D2>Provided we are using Unicode for hyphenation.</FONT>

<BR><FONT SIZE=3D2>This is one of the main problems of TeX -- =
hyphenation</FONT>

<BR><FONT SIZE=3D2>depends on the font encoding (?).</FONT>
</P>

<P><FONT SIZE=3D2>&gt; Exactly in what way normalization should be =
applied and when clearly needs</FONT>

<BR><FONT SIZE=3D2>&gt; further study.</FONT>
</P>

<P><FONT SIZE=3D2>Agreed.</FONT>
</P>

<P><FONT SIZE=3D2>Javier</FONT>

<BR><FONT =
SIZE=3D2>________________________________________________________________=
_____</FONT>

<BR><FONT SIZE=3D2>Conoce la que ser=E1 la pel=EDcula del verano y =
ll=E9vate una camiseta de cine en <A =
HREF=3D"http://www.marujasasesinas.com/html/concurso.html">http://www.mar=
ujasasesinas.com/html/concurso.html</A></FONT></P>

</BODY>
</HTML>
------_=_NextPart_001_01C0DC5F.4096C380--