Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f4JHKKf18257 for ; Sat, 19 May 2001 19:20:20 +0200 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f4JHKF718688 . for ; Sat, 19 May 2001 19:20:15 +0200 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4JHKA028058 for ; Sat, 19 May 2001 19:20:10 +0200 (MET DST) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0E087.FAD76200" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id TAA27850 for ; Sat, 19 May 2001 19:20:10 +0200 (MEST) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f4JHK9028054 for ; Sat, 19 May 2001 19:20:09 +0200 (MET DST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <5.EFD024C0@mail.listserv.gmd.de>; Sat, 19 May 2001 19:18:23 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 495945 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sat, 19 May 2001 19:20:06 +0200 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id TAA23480 for ; Sat, 19 May 2001 19:20:05 +0200 (MET DST) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id TAA52180 for ; Sat, 19 May 2001 19:20:05 +0200 Received: from naf1.mathematik.uni-tuebingen.de (naf1.mathematik.uni-tuebingen.de [134.2.161.197]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f4JHK5j11292 for ; Sat, 19 May 2001 19:20:05 +0200 (MET DST) Received: from na13.mathematik.uni-tuebingen.de (na13 [134.2.161.180]) by naf1.mathematik.uni-tuebingen.de (8.9.3+Sun/8.9.3) with ESMTP id TAA27862; Sat, 19 May 2001 19:20:02 +0200 (MET DST) Received: (from oliver@localhost) by na13.mathematik.uni-tuebingen.de (8.9.3+Sun/8.9.1) id TAA13960; Sat, 19 May 2001 19:20:02 +0200 (MET DST) In-Reply-To: References: <200105161742.MAA02503@riemann.math.twsu.edu> Return-Path: X-Mailer: VM 6.88 under Emacs 20.7.2 x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id TAA23481 X-Authentication-Warning: na13.mathematik.uni-tuebingen.de: oliver set sender to oliver@na13 using -f Content-class: urn:content-classes:message Subject: \InputEncoding [Was: Multilingual Encodings Summary 2.2] Date: Sat, 19 May 2001 18:20:01 +0100 Message-ID: <15110.43841.565997.259011@gargle.gargle.HOWL> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Marcel Oliver" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4086 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0E087.FAD76200 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Frank Mittelbach writes: > > In fact, \InputEncoding was not intended for that, but only for > > "technical" translations which applies to the whole document > > as one byte -> two byte or little endian -> big endian. The main > > problem of it is that it doesn't translate macros: > > \def\myE{=C9} > > \InputEncoding > > =C9\myE > > \InputEncoding is the point where one need to go from external > source encoding to OICR that is precisely the wound: the current > \InputEncoding isn't doing this job fully (and that it is not clear > how to do it properly (to be fair)) How about this: - There is one default \InputEncoding (which may need to be specified at the time of format creation). This encoding is the one that all macro names need to be in, as well as the encoding initially selected for text (I think it does not make any sense to allow for multiply encoded macro names in a single document). As there is no legacy cruft with regard to macro names, we may as well force this default encoding to be UTF-8. - Changes in the \InputEncoding follow the usual TeX scoping rules (this is obviously not how Omega currently does it), and take effect immediately during the initial tokenization. This would mean that the characters \ { } must be in their expected position in every permissible encoding, but I guess that's not any more restrictive than what we currently have. I also assume that TeX (Omega) always knows whether it is parsing code or text, so that it can select the default for code, and the top of the encoding stack for text. - Regarding Javier's above example: I think this is the correct and expected behavior. I want to be able to able to write: \begin{chinese} \newcommand{\foo}{***something chinese***} \newcommand{\bar}{***and some more chinese***} \end{chinese} The chinese characters \foo\ and \bar\ are not easy to enter on a western keyboard. If you need to frequently use \foo\ in your scholarly discussion of Chinese literature, it is better to first define macros for all the chinese characters you need, and then just write \verb|\foo| whenever you need \foo. (I don't know if this babel-like begin-end of a language selection would actually be legal in the document preamble, but I think the strategy is very natural at least.) - It may be more of a problem how to deal with \'e and the like. Would it be possible to force immediate expansion into the corresponding internal Unicode token? Marcel ------_=_NextPart_001_01C0E087.FAD76200 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable \InputEncoding [Was: Multilingual Encodings Summary = 2.2]

Frank Mittelbach writes:
 >  > In fact, \InputEncoding was not = intended for that, but only for
 >  > "technical" = translations which applies to the whole document
 >  > as one byte -> two byte or = little endian -> big endian. The main
 >  > problem of it is that it = doesn't translate macros:
 >  > \def\myE{=C9}
 >  > \InputEncoding <an = encoding>
 >  > =C9\myE
 >
 > \InputEncoding is the point where one need = to go from external
 > source encoding to OICR that is precisely = the wound: the current
 > \InputEncoding isn't doing this job fully = (and that it is not clear
 > how to do it properly (to be fair))

How about this:

- There is one default \InputEncoding (which may need = to be specified
  at the time of format creation).  This = encoding is the one that all
  macro names need to be in, as well as the = encoding initially
  selected for text (I think it does not make = any sense to allow for
  multiply encoded macro names in a single = document).  As there is no
  legacy cruft with regard to macro names, we = may as well force this
  default encoding to be UTF-8.

- Changes in the \InputEncoding follow the usual TeX = scoping rules
  (this is obviously not how Omega currently = does it), and take effect
  immediately during the initial = tokenization.  This would mean that
  the characters \ { } must be in their expected = position in every
  permissible encoding, but I guess that's not = any more restrictive
  than what we currently have.  I also = assume that TeX (Omega) always
  knows whether it is parsing code or text, so = that it can select the
  default for code, and the top of the encoding = stack for text.

- Regarding Javier's above example: I think this is = the correct and
  expected behavior.  I want to be able to = able to write:

  \begin{chinese}
    \newcommand{\foo}{***something = chinese***}
    \newcommand{\bar}{***and some more = chinese***}
  \end{chinese}

  The chinese characters \foo\ and \bar\ are not = easy to enter on a
  western keyboard.  If you need to = frequently use \foo\ in your
  scholarly discussion of Chinese literature, it = is better to first
  define macros for all the chinese characters = you need, and then just
  write \verb|\foo| whenever you need = \foo.

  (I don't know if this babel-like begin-end of a = language selection
  would actually be legal in the document = preamble,  but I think the
  strategy is very natural at least.)

- It may be more of a problem how to deal with \'e and = the like.
  Would it be possible to force immediate = expansion into the
  corresponding internal Unicode token?

Marcel

------_=_NextPart_001_01C0E087.FAD76200--