MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C09148.9469AC00"
In-Reply-To:  <Pine.LNX.4.10.10102071403350.4028-100000@Sina.sharif.ac.ir>
References: <14975.56331.365469.731085@istrati.zdv.uni-mainz.de>            <Pine.LNX.4.10.10102071403350.4028-100000@Sina.sharif.ac.ir>
Content-class: urn:content-classes:message
Subject:      Re: default inputenc/fontenc tight to language
Date: Wed, 7 Feb 2001 21:41:53 +0100
Message-ID:  <14977.45841.640881.805735@istrati.zdv.uni-mainz.de>
From: "Frank Mittelbach" <frank.mittelbach@LATEX-PROJECT.ORG>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C09148.9469AC00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

 > > who will? the user groups? for many lanugages there isn't a user =
group
 >
 > There are many interested experts around for those languages without =
a
 > user group. One of the gathering places is the Omega mailing list.

i know that, but that doesn't mean any of those groups and neither the =
user
groups are necessarily qualified to decide a standard.

just pick the random example from my varioref package: i have language =
support
in there from users and this gets changed every now and then because i =
get
claims that such and such is not the right phrasing. how should I decide =
if
people from a single country claim their wording isn't sounding correct?

and changing the default midway (as i did in case of varioref several =
times)
is really bad since it is making old document invalid. but i had to =
change
because it turned out that one or the other phrasing was indeed =
incorrect

you can argue that a standard defined by those people interested is =
better
than none. but it is also try that if at all possible you should stick =
with a
default once decided. so the problem is to find out when you are likely =
to
have enough data to make a decision

so to come back to inputencs (which the above really was about):

 - right now LaTeX by default lets 8bit chars pass if inputenc is not
   loaded. this is an unfortunate fact of life and no package and only a
   kernel modification would change that and within 2e there will be no =
such
   kernel modification, so with that we have to live for the moment.

 - but i consider this really problematical because the upper part of =
8bit is
   unknown territory and i do not subscribe to Thierry's approach of =
using
   straight 8bit plus a T1 encoded font and hope all works out well. it =
is
   true that for certain languages (including Thierry's and my own) it =
does
   work if i'm on the right kind of computer but for others it does not =
and it
   certainly wouldn't work if the font encoding mechanisms would be =
extended
   to allow switching encodings according to font availability as =
suggested.

 - one can summarize the current situation as follows: it defines a =
default
   which is "pass whatever is coming straight to the font encoding" and =
that
   requires the used input encoding and the font encoding to be the same =
and
   it limits the use of fonts very very drastically. it is a straight
   extension of what Don did with 7bit with the slight difference that =
for
   7bit most keyboard encodings are identical

I would propose that a follow up kernel (call it ltx3 or whatever, eg a
consolidated version emerging from the currently developed x... packages =
one
day), would by default make the upper half an error if no input encoding =
is
specified. Sorry Thierry :-) but you shouldn't feel that bad about it a) =
i'm
known to change by mind and b) processors are that fast these days that =
you
can really work without problems with something like inputenc you will =
not
notice it.

in that case only by specifying a input/keyboard encoding you get access =
to
using 8bit characters but at the same time you are assured that the =
document
contains all the necessary information to actually process it correctly
elsewhere and you do not have the potential problem, reported by =C9ric, =
that
users do not notice that half their letters (ie those with accents)
vanished. they wouldn't, they would produce error messages.

now to provide default input encodings depending on language would help =
a
certain number of people to be able to leave out *one* line in the =
preamble of
the document (and if you are lucky with your choice, the larger part of =
the
LaTeX users) but at the same time would mean that people, who naively =
just use
any key on their keyboard but having an keyboard incompatible with the =
default,
would run in exactly the same problem  =C9ric reported: they would now =
get wrong
output without noticing. So then, perhaps not  =C9ric but somebody else =
would
rightly moan about such stupid defaults which make it likely that people =
get
incorrect documents. so in my opinion it there should be no default for =
input
encodings other than the one which is currently called "ascii" in =
inputenc and
which makes any 8bit an error.

the above is only about input encodings; as I said earlier the situation =
for
output encodings is different and there are already defaults in current =
Babel
and in the implementation i'm working on they will get more generalised =
trying
to take into account the problems discussed concerning the use or not =
use of
certain encodings for certain fonts.

the main problem i see with defaults for output encodings is that for =
languages
like French or German there isn't really a good default because you will =
have
always a large user group which is dead against one or the other, eg T1 =
viz
OT1 for other languages it is simpler. however this is more a political =
than a
technical question, ie who doesn't like THEM the day they make X for =
language
Y the default ... :-)

frank

------_=_NextPart_001_01C09148.9469AC00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: default inputenc/fontenc tight to language</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>&nbsp;&gt; &gt; who will? the user groups? for many =
lanugages there isn't a user group</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; There are many interested experts around =
for those languages without a</FONT>

<BR><FONT SIZE=3D2>&nbsp;&gt; user group. One of the gathering places is =
the Omega mailing list.</FONT>
</P>

<P><FONT SIZE=3D2>i know that, but that doesn't mean any of those groups =
and neither the user</FONT>

<BR><FONT SIZE=3D2>groups are necessarily qualified to decide a =
standard.</FONT>
</P>

<P><FONT SIZE=3D2>just pick the random example from my varioref package: =
i have language support</FONT>

<BR><FONT SIZE=3D2>in there from users and this gets changed every now =
and then because i get</FONT>

<BR><FONT SIZE=3D2>claims that such and such is not the right phrasing. =
how should I decide if</FONT>

<BR><FONT SIZE=3D2>people from a single country claim their wording =
isn't sounding correct?</FONT>
</P>

<P><FONT SIZE=3D2>and changing the default midway (as i did in case of =
varioref several times)</FONT>

<BR><FONT SIZE=3D2>is really bad since it is making old document =
invalid. but i had to change</FONT>

<BR><FONT SIZE=3D2>because it turned out that one or the other phrasing =
was indeed incorrect</FONT>
</P>

<P><FONT SIZE=3D2>you can argue that a standard defined by those people =
interested is better</FONT>

<BR><FONT SIZE=3D2>than none. but it is also try that if at all possible =
you should stick with a</FONT>

<BR><FONT SIZE=3D2>default once decided. so the problem is to find out =
when you are likely to</FONT>

<BR><FONT SIZE=3D2>have enough data to make a decision</FONT>
</P>

<P><FONT SIZE=3D2>so to come back to inputencs (which the above really =
was about):</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;- right now LaTeX by default lets 8bit chars =
pass if inputenc is not</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; loaded. this is an unfortunate fact of =
life and no package and only a</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; kernel modification would change that =
and within 2e there will be no such</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; kernel modification, so with that we =
have to live for the moment.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;- but i consider this really problematical =
because the upper part of 8bit is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; unknown territory and i do not subscribe =
to Thierry's approach of using</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; straight 8bit plus a T1 encoded font and =
hope all works out well. it is</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; true that for certain languages =
(including Thierry's and my own) it does</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; work if i'm on the right kind of =
computer but for others it does not and it</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; certainly wouldn't work if the font =
encoding mechanisms would be extended</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; to allow switching encodings according =
to font availability as suggested.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;- one can summarize the current situation as =
follows: it defines a default</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; which is &quot;pass whatever is coming =
straight to the font encoding&quot; and that</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; requires the used input encoding and the =
font encoding to be the same and</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; it limits the use of fonts very very =
drastically. it is a straight</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; extension of what Don did with 7bit with =
the slight difference that for</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp; 7bit most keyboard encodings are =
identical</FONT>
</P>

<P><FONT SIZE=3D2>I would propose that a follow up kernel (call it ltx3 =
or whatever, eg a</FONT>

<BR><FONT SIZE=3D2>consolidated version emerging from the currently =
developed x... packages one</FONT>

<BR><FONT SIZE=3D2>day), would by default make the upper half an error =
if no input encoding is</FONT>

<BR><FONT SIZE=3D2>specified. Sorry Thierry :-) but you shouldn't feel =
that bad about it a) i'm</FONT>

<BR><FONT SIZE=3D2>known to change by mind and b) processors are that =
fast these days that you</FONT>

<BR><FONT SIZE=3D2>can really work without problems with something like =
inputenc you will not</FONT>

<BR><FONT SIZE=3D2>notice it.</FONT>
</P>

<P><FONT SIZE=3D2>in that case only by specifying a input/keyboard =
encoding you get access to</FONT>

<BR><FONT SIZE=3D2>using 8bit characters but at the same time you are =
assured that the document</FONT>

<BR><FONT SIZE=3D2>contains all the necessary information to actually =
process it correctly</FONT>

<BR><FONT SIZE=3D2>elsewhere and you do not have the potential problem, =
reported by =C9ric, that</FONT>

<BR><FONT SIZE=3D2>users do not notice that half their letters (ie those =
with accents)</FONT>

<BR><FONT SIZE=3D2>vanished. they wouldn't, they would produce error =
messages.</FONT>
</P>

<P><FONT SIZE=3D2>now to provide default input encodings depending on =
language would help a</FONT>

<BR><FONT SIZE=3D2>certain number of people to be able to leave out =
*one* line in the preamble of</FONT>

<BR><FONT SIZE=3D2>the document (and if you are lucky with your choice, =
the larger part of the</FONT>

<BR><FONT SIZE=3D2>LaTeX users) but at the same time would mean that =
people, who naively just use</FONT>

<BR><FONT SIZE=3D2>any key on their keyboard but having an keyboard =
incompatible with the default,</FONT>

<BR><FONT SIZE=3D2>would run in exactly the same problem&nbsp; =C9ric =
reported: they would now get wrong</FONT>

<BR><FONT SIZE=3D2>output without noticing. So then, perhaps not&nbsp; =
=C9ric but somebody else would</FONT>

<BR><FONT SIZE=3D2>rightly moan about such stupid defaults which make it =
likely that people get</FONT>

<BR><FONT SIZE=3D2>incorrect documents. so in my opinion it there should =
be no default for input</FONT>

<BR><FONT SIZE=3D2>encodings other than the one which is currently =
called &quot;ascii&quot; in inputenc and</FONT>

<BR><FONT SIZE=3D2>which makes any 8bit an error.</FONT>
</P>

<P><FONT SIZE=3D2>the above is only about input encodings; as I said =
earlier the situation for</FONT>

<BR><FONT SIZE=3D2>output encodings is different and there are already =
defaults in current Babel</FONT>

<BR><FONT SIZE=3D2>and in the implementation i'm working on they will =
get more generalised trying</FONT>

<BR><FONT SIZE=3D2>to take into account the problems discussed =
concerning the use or not use of</FONT>

<BR><FONT SIZE=3D2>certain encodings for certain fonts.</FONT>
</P>

<P><FONT SIZE=3D2>the main problem i see with defaults for output =
encodings is that for languages</FONT>

<BR><FONT SIZE=3D2>like French or German there isn't really a good =
default because you will have</FONT>

<BR><FONT SIZE=3D2>always a large user group which is dead against one =
or the other, eg T1 viz</FONT>

<BR><FONT SIZE=3D2>OT1 for other languages it is simpler. however this =
is more a political than a</FONT>

<BR><FONT SIZE=3D2>technical question, ie who doesn't like THEM the day =
they make X for language</FONT>

<BR><FONT SIZE=3D2>Y the default ... :-)</FONT>
</P>

<P><FONT SIZE=3D2>frank</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C09148.9469AC00--