MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C2C985.A2441780"
In-Reply-To:  <Pine.LNX.4.44.0301311753330.4431-100000@gilas>
References: <15903.14792.193451.96963@istrati.mittelbach-online.de>            <Pine.LNX.4.44.0301311753330.4431-100000@gilas>
Content-class: urn:content-classes:message
Subject:      Re: latex/3480: Support for UTF-8 missing in inputenc.sty
Date: Sat, 1 Feb 2003 00:59:38 +0100
Message-ID: A<15931.3562.730605.294877@istrati.mittelbach-online.de>
Thread-Topic:      Re: latex/3480: Support for UTF-8 missing in inputenc.sty
Thread-Index: AcLJhaMTq6F2s8yVRjyE6dAM5TPcRw==
From: "Frank Mittelbach" <frank.mittelbach@LATEX-PROJECT.ORG>
To: <LATEX-L@listserv.uni-heidelberg.de>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@listserv.uni-heidelberg.de>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C2C985.A2441780
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Roozbeh Pournader writes:

 > > what we try is to provide a utf8 input encoding, how likely is it =
that some
 > > editor or application generates that Adobe thing? not very i would =
guess (at
 > > least not now) therefore i would not assign anything.
 >
 > Something that may happen:
 >
 > 1. A TeX document is typeset with a PS Type 1 font will have the =
dotlessj
 > somewhere. After being converted to PDF, you will have the glyph in a =
PDF
 > document. Adobe tools see a 'dotlessj' there.
 >
 > 2. Someone copies and pastes it from Acrobat Reader into a document =
using
 > an editor that supports Adobe private use characters. He sees a =
dotlessj
 > there.

which "some" editor is that? i'm not saying it is not possible, i'm just
saying that as long as something is a) not very likely b) potentially
controversial we should in the first step not made a fixed assignment =
...
 >
 > 3. The output is fed back into LaTeX.

not a problem, what would happen is that we get that char  U+F6BE
and would say, sorry, nothing set up for this. Then all it needs is

\DeclareUnicodeChar{F6BE}{\j}  % already forgotten what's today syntax =
is :-)

in the preamble of the document and off we go. 'course if that becomes =
the
standard situation we might as well put it in, right now i would leave =
it open

 > Unicode doesn't distinguish that much between text and math =
characters. It
 > says somewhere that you may use a math character as a bullet or =
something.
 > I guess the best way to implement this is if you saw the character in =
text
 > mode it is \textasteriskcentered and if you saw it in math mode it is =
'*'.

that's not the way it works in TeX, is it? at the time input encoding is
translated to LICR we are before the decision for "text" or "math".  the
naming conventions for the LICR objects are a bit dubious here as they =
often
say "\text..." but that is the major goal for them, ie make the LICR =
objects
work in text and with different font encodings.

note that any LICR object, say, \"a is first of all only an abstract =
name for
the character umlaut-a. it is not the instruction put an accent of a nor =
is
\textsterling the pound glyph but the abstract name for the character =
pounds.

technically, all the (text)-font-encoding commands and the majority of =
LICR objects
are font-encoding commands only work and TeX text and not in TeX math =
today,
which is why naming them \text... was useful at one stage.

the inpmath proposal adds a new dimension to that by basically allowing =
to
define a mapping from LICR to math chars/commands/constructs.

if i would start afresh then the LICR objects should probably get names =
which
are a bit more genderless, eg \LICR... but then this isn't the way it =
developed
so we are more or less stuck with the current set of names.

it might as well be that U+2217 should be translated to =
\textasteriskcentered
when inpmath (or rather its successor implmentation) is incorporated but =
as
long as this isn't the case i would not map something that is only =
likely to
come up in the middle of a math formula to something that \LaTeX is =
going to
choke on if surrounded by $...$

 > Anyway, what is the usage of \textasteriskcentered? I may be able to
 > follow it up with Unicode guys and see if we need a character for =
that.

the only common usage in LaTeX (i think) is as a bullet for some itemize =
level

good night
frank

------_=_NextPart_001_01C2C985.A2441780
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: latex/3480: Support for UTF-8 missing in =
inputenc.sty</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Roozbeh Pournader writes:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; &gt; what we try is to provide a utf8 input =
encoding, how likely is it that some</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt; editor or application generates that =
Adobe thing? not very i would guess (at</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt; least not now) therefore i would not =
assign anything.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; Something that may happen:</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; 1. A TeX document is typeset with a PS =
Type 1 font will have the dotlessj</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; somewhere. After being converted to PDF, =
you will have the glyph in a PDF</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; document. Adobe tools see a 'dotlessj' =
there.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; 2. Someone copies and pastes it from =
Acrobat Reader into a document using</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; an editor that supports Adobe private use =
characters. He sees a dotlessj</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; there.</FONT>
</P>

<P><FONT SIZE=3D2>which &quot;some&quot; editor is that? i'm not saying =
it is not possible, i'm just</FONT>

<BR><FONT SIZE=3D2>saying that as long as something is a) not very =
likely b) potentially</FONT>

<BR><FONT SIZE=3D2>controversial we should in the first step not made a =
fixed assignment ...</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; 3. The output is fed back into =
LaTeX.</FONT>
</P>

<P><FONT SIZE=3D2>not a problem, what would happen is that we get that =
char&nbsp; U+F6BE</FONT>

<BR><FONT SIZE=3D2>and would say, sorry, nothing set up for this. Then =
all it needs is</FONT>
</P>

<P><FONT SIZE=3D2>\DeclareUnicodeChar{F6BE}{\j}&nbsp; % already =
forgotten what's today syntax is :-)</FONT>
</P>

<P><FONT SIZE=3D2>in the preamble of the document and off we go. 'course =
if that becomes the</FONT>

<BR><FONT SIZE=3D2>standard situation we might as well put it in, right =
now i would leave it open</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; Unicode doesn't distinguish that much =
between text and math characters. It</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; says somewhere that you may use a math =
character as a bullet or something.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; I guess the best way to implement this is =
if you saw the character in text</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; mode it is \textasteriskcentered and if =
you saw it in math mode it is '*'.</FONT>
</P>

<P><FONT SIZE=3D2>that's not the way it works in TeX, is it? at the time =
input encoding is</FONT>

<BR><FONT SIZE=3D2>translated to LICR we are before the decision for =
&quot;text&quot; or &quot;math&quot;.&nbsp; the</FONT>

<BR><FONT SIZE=3D2>naming conventions for the LICR objects are a bit =
dubious here as they often</FONT>

<BR><FONT SIZE=3D2>say &quot;\text...&quot; but that is the major goal =
for them, ie make the LICR objects</FONT>

<BR><FONT SIZE=3D2>work in text and with different font =
encodings.</FONT>
</P>

<P><FONT SIZE=3D2>note that any LICR object, say, \&quot;a is first of =
all only an abstract name for</FONT>

<BR><FONT SIZE=3D2>the character umlaut-a. it is not the instruction put =
an accent of a nor is</FONT>

<BR><FONT SIZE=3D2>\textsterling the pound glyph but the abstract name =
for the character pounds.</FONT>
</P>

<P><FONT SIZE=3D2>technically, all the (text)-font-encoding commands and =
the majority of LICR objects</FONT>

<BR><FONT SIZE=3D2>are font-encoding commands only work and TeX text and =
not in TeX math today,</FONT>

<BR><FONT SIZE=3D2>which is why naming them \text... was useful at one =
stage.</FONT>
</P>

<P><FONT SIZE=3D2>the inpmath proposal adds a new dimension to that by =
basically allowing to</FONT>

<BR><FONT SIZE=3D2>define a mapping from LICR to math =
chars/commands/constructs.</FONT>
</P>

<P><FONT SIZE=3D2>if i would start afresh then the LICR objects should =
probably get names which</FONT>

<BR><FONT SIZE=3D2>are a bit more genderless, eg \LICR... but then this =
isn't the way it developed</FONT>

<BR><FONT SIZE=3D2>so we are more or less stuck with the current set of =
names.</FONT>
</P>

<P><FONT SIZE=3D2>it might as well be that U+2217 should be translated =
to \textasteriskcentered</FONT>

<BR><FONT SIZE=3D2>when inpmath (or rather its successor implmentation) =
is incorporated but as</FONT>

<BR><FONT SIZE=3D2>long as this isn't the case i would not map something =
that is only likely to</FONT>

<BR><FONT SIZE=3D2>come up in the middle of a math formula to something =
that \LaTeX is going to</FONT>

<BR><FONT SIZE=3D2>choke on if surrounded by $...$</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; Anyway, what is the usage of =
\textasteriskcentered? I may be able to</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; follow it up with Unicode guys and see if =
we need a character for that.</FONT>
</P>

<P><FONT SIZE=3D2>the only common usage in LaTeX (i think) is as a =
bullet for some itemize level</FONT>
</P>

<P><FONT SIZE=3D2>good night</FONT>

<BR><FONT SIZE=3D2>frank</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C2C985.A2441780--