MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C099C4.6300A400"
In-Reply-To:  <200102181055.f1IAt4i20466@smtp.wanadoo.es>
Content-class: urn:content-classes:message
Subject:      Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)
Date: Sun, 18 Feb 2001 17:02:19 +0100
Message-ID:  <v03110700b6b59e4db412@[195.100.226.143]>
From: "Hans Aberg" <haberg@MATEMATIK.SU.SE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C099C4.6300A400
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

At 11:51 +0100 2001/02/18, Javier Bezos wrote:
>>  Since we have been asked to provide input encoding changes for LaTeX =
within
>>  paragraphs, eg for individual words, something like this would =
happen
>>if such
>>  a change appears, say, inside the argument of \section.
>
>A system to coordinate preprocess and "internal" process is necessary.

The way I thought of it, the preprocessor should be able to handle mixed
encodings. -- Thus, the extended TeX (and LaTeX) only sees Unicode
characters, and nothing else.

Also, I think that the use of multiple encodings in a single file is a
pretty transitory thing: MacOS X, released in a regular version the next
month supports Unicode fully -- so the access to editors able to handle
Unicode will happen pretty soon (no more than a few years), as the
availability on personal computers will push the developments a great =
deal.
And the reason for using multiple encodings is probably the result of =
the
lack of editors that can handle Unicode.

So, I do not think it matters if one uses a seemingly complicated =
system,
with additional files specifying encoding for now: People will probably
soon want to be able to translate their multiple encoding files to =
single
Unicode encoding files instead. (If you formerly wrote files with a =
mixture
of say Russian and Latin encodings, where it was only possible to see =
the
correct renderings by changing the settings of the editor, then when you
get hold of a Unicode editor, the first thing that you would want is to =
not
having the bother of changing the settings of the editor all the time.
Thus, you would want a convenient way of converting your old files to
Unicode so that your new editor can read them. Therefore it is best if
these old mixed encodings files already have a markup that admits an =
easy
conversion to Unicode.)

It is always difficult to judge the future, but, well, this is my guess.

  Hans Aberg

------_=_NextPart_001_01C099C4.6300A400
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: LaTeX's internal char prepresentation (UTF8 or =
Unicode?)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>At 11:51 +0100 2001/02/18, Javier Bezos wrote:</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp; Since we have been asked to provide =
input encoding changes for LaTeX within</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp; paragraphs, eg for individual words, =
something like this would happen</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;if such</FONT>

<BR><FONT SIZE=3D2>&gt;&gt;&nbsp; a change appears, say, inside the =
argument of \section.</FONT>

<BR><FONT SIZE=3D2>&gt;</FONT>

<BR><FONT SIZE=3D2>&gt;A system to coordinate preprocess and =
&quot;internal&quot; process is necessary.</FONT>
</P>

<P><FONT SIZE=3D2>The way I thought of it, the preprocessor should be =
able to handle mixed</FONT>

<BR><FONT SIZE=3D2>encodings. -- Thus, the extended TeX (and LaTeX) only =
sees Unicode</FONT>

<BR><FONT SIZE=3D2>characters, and nothing else.</FONT>
</P>

<P><FONT SIZE=3D2>Also, I think that the use of multiple encodings in a =
single file is a</FONT>

<BR><FONT SIZE=3D2>pretty transitory thing: MacOS X, released in a =
regular version the next</FONT>

<BR><FONT SIZE=3D2>month supports Unicode fully -- so the access to =
editors able to handle</FONT>

<BR><FONT SIZE=3D2>Unicode will happen pretty soon (no more than a few =
years), as the</FONT>

<BR><FONT SIZE=3D2>availability on personal computers will push the =
developments a great deal.</FONT>

<BR><FONT SIZE=3D2>And the reason for using multiple encodings is =
probably the result of the</FONT>

<BR><FONT SIZE=3D2>lack of editors that can handle Unicode.</FONT>
</P>

<P><FONT SIZE=3D2>So, I do not think it matters if one uses a seemingly =
complicated system,</FONT>

<BR><FONT SIZE=3D2>with additional files specifying encoding for now: =
People will probably</FONT>

<BR><FONT SIZE=3D2>soon want to be able to translate their multiple =
encoding files to single</FONT>

<BR><FONT SIZE=3D2>Unicode encoding files instead. (If you formerly =
wrote files with a mixture</FONT>

<BR><FONT SIZE=3D2>of say Russian and Latin encodings, where it was only =
possible to see the</FONT>

<BR><FONT SIZE=3D2>correct renderings by changing the settings of the =
editor, then when you</FONT>

<BR><FONT SIZE=3D2>get hold of a Unicode editor, the first thing that =
you would want is to not</FONT>

<BR><FONT SIZE=3D2>having the bother of changing the settings of the =
editor all the time.</FONT>

<BR><FONT SIZE=3D2>Thus, you would want a convenient way of converting =
your old files to</FONT>

<BR><FONT SIZE=3D2>Unicode so that your new editor can read them. =
Therefore it is best if</FONT>

<BR><FONT SIZE=3D2>these old mixed encodings files already have a markup =
that admits an easy</FONT>

<BR><FONT SIZE=3D2>conversion to Unicode.)</FONT>
</P>

<P><FONT SIZE=3D2>It is always difficult to judge the future, but, well, =
this is my guess.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; Hans Aberg</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C099C4.6300A400--