MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C092DD.88FBB980"
In-Reply-To:  <200102091643.RAA23818@mozart.ujf-grenoble.Fr>
References: <v03110701b6a9aae65099@[195.100.226.129]>            <200102091445.JAA00482@plmsc.psu.edu>            <200102091643.RAA23818@mozart.ujf-grenoble.Fr>
Content-class: urn:content-classes:message
Subject:      Re: inputenc text (and/or math)
Date: Fri, 9 Feb 2001 22:10:30 +0100
Message-ID:  <14980.23750.628032.305093@gargle.gargle.HOWL>
From: "Marcel Oliver" <oliver@NA.UNI-TUEBINGEN.DE>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C092DD.88FBB980
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,

some thoughts about the current discussion on input/font encoding
issues (sorry for not quoting all your mails, there are just too
many).  Some have come up in some form or other, so if I repeat
anything, it's because I think it's important...

- I believe the only reasonable default _input encoding_ is UTF8.
  Being a superset of ASCII while covering all of unicode, it seems
  the ideal long-time solution to all input-encoding related problems.
  UTF8 is also becoming rather well supported by editors and other
  applications.

- In particular, defaulting to any other 8-bit input encoding in
  LaTeX2e should be avoided at all cost because it would really mess
  up the upgrade path to UTF8 later.  (As far as I understand, the
  proposed default of \usepackage[T1]{fontenc} does not default any
  8bit input encoding.  Is this correct?)

- A regular user should never have to specify the _font encoding_.
  There should only be language environments (as provided by babel)
  and font packages (e.g. times, palatino).  This means:

  * Babel (or something providing equivalent functionality---I
    strongly believe that it should become part core LaTeX3) must be
    endowed with a default set of fonts for all languages it supports.
    Some language environment defaults could be marked experimental,
    meaning that associated fonts and TFMs may change once better
    quality free fonts become available, but all languages must work
    "out of the box".  One the other hand, languages like german for
    which the EC fonts are well accepted (?), could be frozen straigt
    away.

  * The language environment chooses the default font encoding unless
    a font package is explicitly loaded.  There may be more than one
    language environment per language if different typographical
    esthetics need to be satisfied.

  * Babel must hook into the currently active font package.  If a
    language environment is selected, the font package must be called
    to set itself up.  In other words, every font package must make a
    decision about encoding as a function of the language selected.
    If the language is unknown to the font package, a warning or an
    error must be issued.  (I am sure the set of supported
    language-font pairs will grow quickly if a good mechanism for
    soliciting contributions is implemented.)

  * Maybe one can introduce commands like
      \uselanguage{spanish}
      \usefont{times}
    and autoload the necessary packages, to make clear that these
    attributes function orthogonally to each other and to "ordinary"
    packages.

- Is there really a need for breaking the distinction of math mode
  vs. non-math mode?  As far as Greek letters go, the most common one
  is $\mu$ in units.  This raises the question if one should not
  provide standard markup for units anyway (some journal packages are
  doing it---there are also spacing issues involved that warrant
  special treatment), for example as a "tools" package in the standard
  LaTeX distribution.  Further, the $\mu$ in units which are usually
  set in upright shape should presumably be different from the $\mu$
  in math which goes with italic shape letters.  All other uses I can
  think of are either clearly math, or clearly Greek, so it seems more
  important to make Greek as such work "out of the box".

- Hyphenation tables should really be Unicode (so possibly UTF8
  encoded).  They are logically neither input nor output encoding
  related, and should work regardless whether either refers to a
  castrated font set.

- For special needs, such as easy typing of cyrillic math in 7bit
  ASCII one could provide special input encodings.  In full unicode
  this shouldn't be a problem, should it?

I am aware that some of these demands cannot really be met within
Knuthian TeX, but it seems LaTeX3 is prepared to eventually go beyond
TeX.  So it may be useful to define a minimal set of required
extensions/changes, as this issue could be a major roadblock to
enlarging the developer base.  For example, is there much motivation
for anybody to clean up the hyphenation mess before a clean long-term
solution (not just a work-around) is agreed on?

Just some ideas,

Marcel

------_=_NextPart_001_01C092DD.88FBB980
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: inputenc text (and/or math)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Hi,</FONT>
</P>

<P><FONT SIZE=3D2>some thoughts about the current discussion on =
input/font encoding</FONT>

<BR><FONT SIZE=3D2>issues (sorry for not quoting all your mails, there =
are just too</FONT>

<BR><FONT SIZE=3D2>many).&nbsp; Some have come up in some form or other, =
so if I repeat</FONT>

<BR><FONT SIZE=3D2>anything, it's because I think it's =
important...</FONT>
</P>

<P><FONT SIZE=3D2>- I believe the only reasonable default _input =
encoding_ is UTF8.</FONT>

<BR><FONT SIZE=3D2>&nbsp; Being a superset of ASCII while covering all =
of unicode, it seems</FONT>

<BR><FONT SIZE=3D2>&nbsp; the ideal long-time solution to all =
input-encoding related problems.</FONT>

<BR><FONT SIZE=3D2>&nbsp; UTF8 is also becoming rather well supported by =
editors and other</FONT>

<BR><FONT SIZE=3D2>&nbsp; applications.</FONT>
</P>

<P><FONT SIZE=3D2>- In particular, defaulting to any other 8-bit input =
encoding in</FONT>

<BR><FONT SIZE=3D2>&nbsp; LaTeX2e should be avoided at all cost because =
it would really mess</FONT>

<BR><FONT SIZE=3D2>&nbsp; up the upgrade path to UTF8 later.&nbsp; (As =
far as I understand, the</FONT>

<BR><FONT SIZE=3D2>&nbsp; proposed default of \usepackage[T1]{fontenc} =
does not default any</FONT>

<BR><FONT SIZE=3D2>&nbsp; 8bit input encoding.&nbsp; Is this =
correct?)</FONT>
</P>

<P><FONT SIZE=3D2>- A regular user should never have to specify the =
_font encoding_.</FONT>

<BR><FONT SIZE=3D2>&nbsp; There should only be language environments (as =
provided by babel)</FONT>

<BR><FONT SIZE=3D2>&nbsp; and font packages (e.g. times, =
palatino).&nbsp; This means:</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; * Babel (or something providing equivalent =
functionality---I</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; strongly believe that it should =
become part core LaTeX3) must be</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; endowed with a default set of =
fonts for all languages it supports.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; Some language environment defaults =
could be marked experimental,</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; meaning that associated fonts and =
TFMs may change once better</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; quality free fonts become =
available, but all languages must work</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; &quot;out of the box&quot;.&nbsp; =
One the other hand, languages like german for</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; which the EC fonts are well =
accepted (?), could be frozen straigt</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; away.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; * The language environment chooses the default =
font encoding unless</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; a font package is explicitly =
loaded.&nbsp; There may be more than one</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; language environment per language =
if different typographical</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; esthetics need to be =
satisfied.</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; * Babel must hook into the currently active =
font package.&nbsp; If a</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; language environment is selected, =
the font package must be called</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; to set itself up.&nbsp; In other =
words, every font package must make a</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; decision about encoding as a =
function of the language selected.</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; If the language is unknown to the =
font package, a warning or an</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; error must be issued.&nbsp; (I am =
sure the set of supported</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; language-font pairs will grow =
quickly if a good mechanism for</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; soliciting contributions is =
implemented.)</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp; * Maybe one can introduce commands like</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
\uselanguage{spanish}</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \usefont{times}</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; and autoload the necessary =
packages, to make clear that these</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; attributes function orthogonally =
to each other and to &quot;ordinary&quot;</FONT>

<BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp; packages.</FONT>
</P>

<P><FONT SIZE=3D2>- Is there really a need for breaking the distinction =
of math mode</FONT>

<BR><FONT SIZE=3D2>&nbsp; vs. non-math mode?&nbsp; As far as Greek =
letters go, the most common one</FONT>

<BR><FONT SIZE=3D2>&nbsp; is $\mu$ in units.&nbsp; This raises the =
question if one should not</FONT>

<BR><FONT SIZE=3D2>&nbsp; provide standard markup for units anyway (some =
journal packages are</FONT>

<BR><FONT SIZE=3D2>&nbsp; doing it---there are also spacing issues =
involved that warrant</FONT>

<BR><FONT SIZE=3D2>&nbsp; special treatment), for example as a =
&quot;tools&quot; package in the standard</FONT>

<BR><FONT SIZE=3D2>&nbsp; LaTeX distribution.&nbsp; Further, the $\mu$ =
in units which are usually</FONT>

<BR><FONT SIZE=3D2>&nbsp; set in upright shape should presumably be =
different from the $\mu$</FONT>

<BR><FONT SIZE=3D2>&nbsp; in math which goes with italic shape =
letters.&nbsp; All other uses I can</FONT>

<BR><FONT SIZE=3D2>&nbsp; think of are either clearly math, or clearly =
Greek, so it seems more</FONT>

<BR><FONT SIZE=3D2>&nbsp; important to make Greek as such work &quot;out =
of the box&quot;.</FONT>
</P>

<P><FONT SIZE=3D2>- Hyphenation tables should really be Unicode (so =
possibly UTF8</FONT>

<BR><FONT SIZE=3D2>&nbsp; encoded).&nbsp; They are logically neither =
input nor output encoding</FONT>

<BR><FONT SIZE=3D2>&nbsp; related, and should work regardless whether =
either refers to a</FONT>

<BR><FONT SIZE=3D2>&nbsp; castrated font set.</FONT>
</P>

<P><FONT SIZE=3D2>- For special needs, such as easy typing of cyrillic =
math in 7bit</FONT>

<BR><FONT SIZE=3D2>&nbsp; ASCII one could provide special input =
encodings.&nbsp; In full unicode</FONT>

<BR><FONT SIZE=3D2>&nbsp; this shouldn't be a problem, should it?</FONT>
</P>

<P><FONT SIZE=3D2>I am aware that some of these demands cannot really be =
met within</FONT>

<BR><FONT SIZE=3D2>Knuthian TeX, but it seems LaTeX3 is prepared to =
eventually go beyond</FONT>

<BR><FONT SIZE=3D2>TeX.&nbsp; So it may be useful to define a minimal =
set of required</FONT>

<BR><FONT SIZE=3D2>extensions/changes, as this issue could be a major =
roadblock to</FONT>

<BR><FONT SIZE=3D2>enlarging the developer base.&nbsp; For example, is =
there much motivation</FONT>

<BR><FONT SIZE=3D2>for anybody to clean up the hyphenation mess before a =
clean long-term</FONT>

<BR><FONT SIZE=3D2>solution (not just a work-around) is agreed =
on?</FONT>
</P>

<P><FONT SIZE=3D2>Just some ideas,</FONT>
</P>

<P><FONT SIZE=3D2>Marcel</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C092DD.88FBB980--