MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C08D5C.4AEE1B00"
In-Reply-To:  <14970.60068.179603.570418@fell.open.ac.uk>
References: <14968.34118.306909.315983@istrati.zdv.uni-mainz.de>            <200101312200.XAA09346@bar.loria.fr>            <14969.12533.759505.917813@istrati.zdv.uni-mainz.de>            <14970.60068.179603.570418@fell.open.ac.uk>
Content-class: urn:content-classes:message
Subject:      Re: default inputenc/fontenc tight to language
Date: Fri, 2 Feb 2001 22:05:59 +0100
Message-ID:  <14971.8503.549122.613285@istrati.zdv.uni-mainz.de>
From: "Frank Mittelbach" <frank.mittelbach@LATEX-PROJECT.ORG>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C08D5C.4AEE1B00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Chris wrote:

 > > a bit inconsistent that, isn't it?
 >
 > Not really: since input encoding really does mean just that.

i meant inconsistent that we got input encodings fine but font encodings =
not
(or rather font encodings as well but missed out an important extra bit)

 > Once the text is `inside LaTeX' the input encoding is irrelevant: =
that
 > is the beauty and strength of the LaTeX text character model.

yes it is :-)

so inputencodings are fine.

but the problem that i was trying to point at is this:

 assuming we have a bit of text in the internal LaTeX representation, eg =
this:

   Trank der G\"otter \M{d} Trank der ...

 then there is no way for LaTeX without further help to determine the =
best
 font encoding to typeset this in.

 why is this so?

 - one first would need to analyse the whole text to find out which =
collection
   of glyphs are needed (that would result in a number of possible =
encodings,
   but it also might result in the need for more than one encoding)

 - but which of the possible encodings to use can depend on factors like
   do i have the desired fonts in this encoding or only in others ...

anyway, already the first analysis is a problem inside TeX because TeX =
works
sequentially so you would need to implement a multi pass system leaning =
about all
the snippest of text as you go along and then reuse that information on =
later
passes. looks like a nightmare to me.

so if TeX can't do it automatically, we have to tell it what to use and =
with
NFSS2 we need to tell it which font encodings to use at those points. =
And this
is bad because users shouldn't be forced to bother about this font only =
available
in encoding A and that one in B and ...

Karsten pointed to some undocumented alpha code autofe.sty which =
attempts to
provide a solution for the problem. But this really is intended for a
different environment where you can (or more easily) change font =
encodings as
you go along.

so back to the strange text above and think about how some algorithm =
(like
autofe) would work on finding the right encodings. assuming we start in =
OT1

 Trank der G   % no problem up to this point

 \"o           %* ahh, now this is in OT1 but it would be far better to =
use T1
               % now. but switching would be bad as well since we are in =
the
               % middle of a word ...
 tter          % so we are now either in T1 or OT1 depending on the =
decision
               % above

 \M{d}         % but this strange beast only exists in T4 so we have to =
switch

 Trank der     %* so what do we use now for this?
               %  T4 does contain those letter. do we carry on?

whatever happens at the points marked * the typeset result would be a =
mess.


when we write

\fontencoding{FOO}\selectfont

we tell the system that we want it to select a font with the current
characteristics (ie family,shape...) in a very specific encoding but =
what we
actually only should say is "the following text is in a certain glyph
collection, ie contains certain glyphs"

we unfortunately can't express the latter so we are forced to do the =
former.

with moving argument, eg a section head this becomes a real problem. if =
the
section head is, say in Russian (as in Denis example) we have to somehow =
state
that the glyph collection for typesetting is one with cyrillic =
characters.

since we have no concept for this we can only express that it should be =
in the
encoding TA2 or X2 or whatever, which is (technically) fine for the =
heading
itself being typeset. but passing the information about the FONT =
encoding to,
say, the toc is wrong, since the toc might be typeset with different =
fonts or
different sizes for which we do not have TA2 fonts but only X2 fonts

this is i think a longer example of what Chris wrote:

 > > but would it help if the language has a tie
 > > to the [font] encoding?
 >
 > Whether the `intended font encoding' should be part of a moving
 > argument leads to an important question.
 >
 > Note the word `intended': will it always be the case that text from a
 > moving argument should be turned into glyphs using the same font =
encoding
 > as was used for the original text?

no it need not, it only needs the same glyph collection.

so we would do better by tying "glyph collections" to languages and let =
the
system worry about which actual font encoding to use given other =
constraints
during the typesetting process.

this is the kind of extension NFSS2 would need in my opinion.


frank

------_=_NextPart_001_01C08D5C.4AEE1B00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: default inputenc/fontenc tight to language</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Chris wrote:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; &gt; a bit inconsistent that, isn't =
it?</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; Not really: since input encoding really =
does mean just that.</FONT>
</P>

<P><FONT SIZE=3D2>i meant inconsistent that we got input encodings fine =
but font encodings not</FONT>

<BR><FONT SIZE=3D2>(or rather font encodings as well but missed out an =
important extra bit)</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; Once the text is `inside LaTeX' the input =
encoding is irrelevant: that</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; is the beauty and strength of the LaTeX =
text character model.</FONT>
</P>

<P><FONT SIZE=3D2>yes it is :-)</FONT>
</P>

<P><FONT SIZE=3D2>so inputencodings are fine.</FONT>
</P>

<P><FONT SIZE=3D2>but the problem that i was trying to point at is =
this:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;assuming we have a bit of text in the internal =
LaTeX representation, eg this:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp; Trank der G\&quot;otter \M{d} Trank der =
...</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;then there is no way for LaTeX without further =
help to determine the best</FONT>

<BR><FONT SIZE=3D2>&nbsp;font encoding to typeset this in.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;why is this so?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;- one first would need to analyse the whole text =
to find out which collection</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; of glyphs are needed (that would result =
in a number of possible encodings,</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; but it also might result in the need for =
more than one encoding)</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;- but which of the possible encodings to use can =
depend on factors like</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; do i have the desired fonts in this =
encoding or only in others ...</FONT>
</P>

<P><FONT SIZE=3D2>anyway, already the first analysis is a problem inside =
TeX because TeX works</FONT>

<BR><FONT SIZE=3D2>sequentially so you would need to implement a multi =
pass system leaning about all</FONT>

<BR><FONT SIZE=3D2>the snippest of text as you go along and then reuse =
that information on later</FONT>

<BR><FONT SIZE=3D2>passes. looks like a nightmare to me.</FONT>
</P>

<P><FONT SIZE=3D2>so if TeX can't do it automatically, we have to tell =
it what to use and with</FONT>

<BR><FONT SIZE=3D2>NFSS2 we need to tell it which font encodings to use =
at those points. And this</FONT>

<BR><FONT SIZE=3D2>is bad because users shouldn't be forced to bother =
about this font only available</FONT>

<BR><FONT SIZE=3D2>in encoding A and that one in B and ...</FONT>
</P>

<P><FONT SIZE=3D2>Karsten pointed to some undocumented alpha code =
autofe.sty which attempts to</FONT>

<BR><FONT SIZE=3D2>provide a solution for the problem. But this really =
is intended for a</FONT>

<BR><FONT SIZE=3D2>different environment where you can (or more easily) =
change font encodings as</FONT>

<BR><FONT SIZE=3D2>you go along.</FONT>
</P>

<P><FONT SIZE=3D2>so back to the strange text above and think about how =
some algorithm (like</FONT>

<BR><FONT SIZE=3D2>autofe) would work on finding the right encodings. =
assuming we start in OT1</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;Trank der G&nbsp;&nbsp; % no problem up to this =
point</FONT>
</P>

<P><FONT =
SIZE=3D2>&nbsp;\&quot;o&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp; %* ahh, now this is in OT1 but it would be far better to use =
T1</FONT>

<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp; % now. but switching would be bad as well since we =
are in the</FONT>

<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp; % middle of a word ...</FONT>

<BR><FONT =
SIZE=3D2>&nbsp;tter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
 % so we are now either in T1 or OT1 depending on the decision</FONT>

<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp; % above</FONT>
</P>

<P><FONT =
SIZE=3D2>&nbsp;\M{d}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; % =
but this strange beast only exists in T4 so we have to switch</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;Trank der&nbsp;&nbsp;&nbsp;&nbsp; %* so what do =
we use now for this?</FONT>

<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp; %&nbsp; T4 does contain those letter. do we carry =
on?</FONT>
</P>

<P><FONT SIZE=3D2>whatever happens at the points marked * the typeset =
result would be a mess.</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>when we write</FONT>
</P>

<P><FONT SIZE=3D2>\fontencoding{FOO}\selectfont</FONT>
</P>

<P><FONT SIZE=3D2>we tell the system that we want it to select a font =
with the current</FONT>

<BR><FONT SIZE=3D2>characteristics (ie family,shape...) in a very =
specific encoding but what we</FONT>

<BR><FONT SIZE=3D2>actually only should say is &quot;the following text =
is in a certain glyph</FONT>

<BR><FONT SIZE=3D2>collection, ie contains certain glyphs&quot;</FONT>
</P>

<P><FONT SIZE=3D2>we unfortunately can't express the latter so we are =
forced to do the former.</FONT>
</P>

<P><FONT SIZE=3D2>with moving argument, eg a section head this becomes a =
real problem. if the</FONT>

<BR><FONT SIZE=3D2>section head is, say in Russian (as in Denis example) =
we have to somehow state</FONT>

<BR><FONT SIZE=3D2>that the glyph collection for typesetting is one with =
cyrillic characters.</FONT>
</P>

<P><FONT SIZE=3D2>since we have no concept for this we can only express =
that it should be in the</FONT>

<BR><FONT SIZE=3D2>encoding TA2 or X2 or whatever, which is =
(technically) fine for the heading</FONT>

<BR><FONT SIZE=3D2>itself being typeset. but passing the information =
about the FONT encoding to,</FONT>

<BR><FONT SIZE=3D2>say, the toc is wrong, since the toc might be typeset =
with different fonts or</FONT>

<BR><FONT SIZE=3D2>different sizes for which we do not have TA2 fonts =
but only X2 fonts</FONT>
</P>

<P><FONT SIZE=3D2>this is i think a longer example of what Chris =
wrote:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&gt; &gt; but would it help if the language has =
a tie</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; &gt; to the [font] encoding?</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; Whether the `intended font encoding' =
should be part of a moving</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; argument leads to an important =
question.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; Note the word `intended': will it always =
be the case that text from a</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; moving argument should be turned into =
glyphs using the same font encoding</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; as was used for the original text?</FONT>
</P>

<P><FONT SIZE=3D2>no it need not, it only needs the same glyph =
collection.</FONT>
</P>

<P><FONT SIZE=3D2>so we would do better by tying &quot;glyph =
collections&quot; to languages and let the</FONT>

<BR><FONT SIZE=3D2>system worry about which actual font encoding to use =
given other constraints</FONT>

<BR><FONT SIZE=3D2>during the typesetting process.</FONT>
</P>

<P><FONT SIZE=3D2>this is the kind of extension NFSS2 would need in my =
opinion.</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>frank</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C08D5C.4AEE1B00--