Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f12L8U710985 for ; Fri, 2 Feb 2001 22:08:30 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f12L9L708961 . for ; Fri, 2 Feb 2001 22:09:22 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f12L8T725834 for ; Fri, 2 Feb 2001 22:08:29 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C08D5C.4AEE1B00" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id WAA10201 for ; Fri, 2 Feb 2001 22:08:29 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f12L8SM17908 for ; Fri, 2 Feb 2001 22:08:28 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <9.203CA4FD@mail.listserv.gmd.de>; Fri, 2 Feb 2001 22:08:24 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 486452 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Fri, 2 Feb 2001 22:08:24 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id WAA03150 for ; Fri, 2 Feb 2001 22:08:23 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id WAA47778 for ; Fri, 2 Feb 2001 22:08:24 +0100 Received: from moutvdom01.kundenserver.de (moutvdom01.kundenserver.de [195.20.224.200]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f12L8Nu21271 for ; Fri, 2 Feb 2001 22:08:24 +0100 (MET) Received: from [195.20.224.204] (helo=mrvdom00.kundenserver.de) by moutvdom01.kundenserver.de with esmtp (Exim 2.12 #2) id 14OnRK-0006ot-00 for LATEX-L@urz.uni-heidelberg.de; Fri, 2 Feb 2001 22:08:18 +0100 Received: from manz-3e36464d.pool.mediaways.net ([62.54.70.77] helo=istrati.zdv.uni-mainz.de) by mrvdom00.kundenserver.de with esmtp (Exim 2.12 #2) id 14OnR2-00056k-00 for LATEX-L@URZ.UNI-HEIDELBERG.DE; Fri, 2 Feb 2001 22:08:00 +0100 Received: (from latex3@localhost) by istrati.zdv.uni-mainz.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id WAA13293; Fri, 2 Feb 2001 22:06:00 +0100 In-Reply-To: <14970.60068.179603.570418@fell.open.ac.uk> References: <14968.34118.306909.315983@istrati.zdv.uni-mainz.de> <200101312200.XAA09346@bar.loria.fr> <14969.12533.759505.917813@istrati.zdv.uni-mainz.de> <14970.60068.179603.570418@fell.open.ac.uk> Return-Path: X-Mailer: VM 6.75 under Emacs 20.4.1 X-Authentication-Warning: istrati.zdv.uni-mainz.de: latex3 set sender to frank@mittelbach-online.de using -f Content-class: urn:content-classes:message Subject: Re: default inputenc/fontenc tight to language Date: Fri, 2 Feb 2001 22:05:59 +0100 Message-ID: <14971.8503.549122.613285@istrati.zdv.uni-mainz.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Frank Mittelbach" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3700 This is a multi-part message in MIME format. ------_=_NextPart_001_01C08D5C.4AEE1B00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Chris wrote: > > a bit inconsistent that, isn't it? > > Not really: since input encoding really does mean just that. i meant inconsistent that we got input encodings fine but font encodings = not (or rather font encodings as well but missed out an important extra bit) > Once the text is `inside LaTeX' the input encoding is irrelevant: = that > is the beauty and strength of the LaTeX text character model. yes it is :-) so inputencodings are fine. but the problem that i was trying to point at is this: assuming we have a bit of text in the internal LaTeX representation, eg = this: Trank der G\"otter \M{d} Trank der ... then there is no way for LaTeX without further help to determine the = best font encoding to typeset this in. why is this so? - one first would need to analyse the whole text to find out which = collection of glyphs are needed (that would result in a number of possible = encodings, but it also might result in the need for more than one encoding) - but which of the possible encodings to use can depend on factors like do i have the desired fonts in this encoding or only in others ... anyway, already the first analysis is a problem inside TeX because TeX = works sequentially so you would need to implement a multi pass system leaning = about all the snippest of text as you go along and then reuse that information on = later passes. looks like a nightmare to me. so if TeX can't do it automatically, we have to tell it what to use and = with NFSS2 we need to tell it which font encodings to use at those points. = And this is bad because users shouldn't be forced to bother about this font only = available in encoding A and that one in B and ... Karsten pointed to some undocumented alpha code autofe.sty which = attempts to provide a solution for the problem. But this really is intended for a different environment where you can (or more easily) change font = encodings as you go along. so back to the strange text above and think about how some algorithm = (like autofe) would work on finding the right encodings. assuming we start in = OT1 Trank der G % no problem up to this point \"o %* ahh, now this is in OT1 but it would be far better to = use T1 % now. but switching would be bad as well since we are in = the % middle of a word ... tter % so we are now either in T1 or OT1 depending on the = decision % above \M{d} % but this strange beast only exists in T4 so we have to = switch Trank der %* so what do we use now for this? % T4 does contain those letter. do we carry on? whatever happens at the points marked * the typeset result would be a = mess. when we write \fontencoding{FOO}\selectfont we tell the system that we want it to select a font with the current characteristics (ie family,shape...) in a very specific encoding but = what we actually only should say is "the following text is in a certain glyph collection, ie contains certain glyphs" we unfortunately can't express the latter so we are forced to do the = former. with moving argument, eg a section head this becomes a real problem. if = the section head is, say in Russian (as in Denis example) we have to somehow = state that the glyph collection for typesetting is one with cyrillic = characters. since we have no concept for this we can only express that it should be = in the encoding TA2 or X2 or whatever, which is (technically) fine for the = heading itself being typeset. but passing the information about the FONT = encoding to, say, the toc is wrong, since the toc might be typeset with different = fonts or different sizes for which we do not have TA2 fonts but only X2 fonts this is i think a longer example of what Chris wrote: > > but would it help if the language has a tie > > to the [font] encoding? > > Whether the `intended font encoding' should be part of a moving > argument leads to an important question. > > Note the word `intended': will it always be the case that text from a > moving argument should be turned into glyphs using the same font = encoding > as was used for the original text? no it need not, it only needs the same glyph collection. so we would do better by tying "glyph collections" to languages and let = the system worry about which actual font encoding to use given other = constraints during the typesetting process. this is the kind of extension NFSS2 would need in my opinion. frank ------_=_NextPart_001_01C08D5C.4AEE1B00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: default inputenc/fontenc tight to language

Chris wrote:

 > > a bit inconsistent that, isn't = it?
 >
 > Not really: since input encoding really = does mean just that.

i meant inconsistent that we got input encodings fine = but font encodings not
(or rather font encodings as well but missed out an = important extra bit)

 > Once the text is `inside LaTeX' the input = encoding is irrelevant: that
 > is the beauty and strength of the LaTeX = text character model.

yes it is :-)

so inputencodings are fine.

but the problem that i was trying to point at is = this:

 assuming we have a bit of text in the internal = LaTeX representation, eg this:

   Trank der G\"otter \M{d} Trank der = ...

 then there is no way for LaTeX without further = help to determine the best
 font encoding to typeset this in.

 why is this so?

 - one first would need to analyse the whole text = to find out which collection
   of glyphs are needed (that would result = in a number of possible encodings,
   but it also might result in the need for = more than one encoding)

 - but which of the possible encodings to use can = depend on factors like
   do i have the desired fonts in this = encoding or only in others ...

anyway, already the first analysis is a problem inside = TeX because TeX works
sequentially so you would need to implement a multi = pass system leaning about all
the snippest of text as you go along and then reuse = that information on later
passes. looks like a nightmare to me.

so if TeX can't do it automatically, we have to tell = it what to use and with
NFSS2 we need to tell it which font encodings to use = at those points. And this
is bad because users shouldn't be forced to bother = about this font only available
in encoding A and that one in B and ...

Karsten pointed to some undocumented alpha code = autofe.sty which attempts to
provide a solution for the problem. But this really = is intended for a
different environment where you can (or more easily) = change font encodings as
you go along.

so back to the strange text above and think about how = some algorithm (like
autofe) would work on finding the right encodings. = assuming we start in OT1

 Trank der G   % no problem up to this = point

 \"o        &n= bsp;  %* ahh, now this is in OT1 but it would be far better to use = T1
          &nbs= p;    % now. but switching would be bad as well since we = are in the
          &nbs= p;    % middle of a word ...
 tter         = % so we are now either in T1 or OT1 depending on the decision
          &nbs= p;    % above

 \M{d}         % = but this strange beast only exists in T4 so we have to switch

 Trank der     %* so what do = we use now for this?
          &nbs= p;    %  T4 does contain those letter. do we carry = on?

whatever happens at the points marked * the typeset = result would be a mess.


when we write

\fontencoding{FOO}\selectfont

we tell the system that we want it to select a font = with the current
characteristics (ie family,shape...) in a very = specific encoding but what we
actually only should say is "the following text = is in a certain glyph
collection, ie contains certain glyphs"

we unfortunately can't express the latter so we are = forced to do the former.

with moving argument, eg a section head this becomes a = real problem. if the
section head is, say in Russian (as in Denis example) = we have to somehow state
that the glyph collection for typesetting is one with = cyrillic characters.

since we have no concept for this we can only express = that it should be in the
encoding TA2 or X2 or whatever, which is = (technically) fine for the heading
itself being typeset. but passing the information = about the FONT encoding to,
say, the toc is wrong, since the toc might be typeset = with different fonts or
different sizes for which we do not have TA2 fonts = but only X2 fonts

this is i think a longer example of what Chris = wrote:

 > > but would it help if the language has = a tie
 > > to the [font] encoding?
 >
 > Whether the `intended font encoding' = should be part of a moving
 > argument leads to an important = question.
 >
 > Note the word `intended': will it always = be the case that text from a
 > moving argument should be turned into = glyphs using the same font encoding
 > as was used for the original text?

no it need not, it only needs the same glyph = collection.

so we would do better by tying "glyph = collections" to languages and let the
system worry about which actual font encoding to use = given other constraints
during the typesetting process.

this is the kind of extension NFSS2 would need in my = opinion.



frank

------_=_NextPart_001_01C08D5C.4AEE1B00--