Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1HLvBf05486 for ; Sat, 17 Feb 2001 22:57:11 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1HLvAd20784 . for ; Sat, 17 Feb 2001 22:57:10 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1HLv5H00232 for ; Sat, 17 Feb 2001 22:57:05 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0992C.942D9580" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id WAA28909 for ; Sat, 17 Feb 2001 22:57:05 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1HLv3H00227 for ; Sat, 17 Feb 2001 22:57:03 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <1.6334BB5C@mail.listserv.gmd.de>; Sat, 17 Feb 2001 22:56:55 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 489709 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sat, 17 Feb 2001 22:57:00 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id WAA27016 for ; Sat, 17 Feb 2001 22:56:58 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id WAA54800 for ; Sat, 17 Feb 2001 22:56:58 +0100 Received: from moutvdom00.kundenserver.de (moutvdom00.kundenserver.de [195.20.224.149]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1HLuxx11692 for ; Sat, 17 Feb 2001 22:56:59 +0100 (MET) Received: from [195.20.224.208] (helo=mrvdom01.schlund.de) by moutvdom00.kundenserver.de with esmtp (Exim 2.12 #2) id 14UFLd-0004ib-00; Sat, 17 Feb 2001 22:56:57 +0100 Received: from manz-3e364831.pool.mediaways.net ([62.54.72.49] helo=istrati.zdv.uni-mainz.de) by mrvdom01.schlund.de with esmtp (Exim 2.12 #2) id 14UFLY-0002pm-01; Sat, 17 Feb 2001 22:56:52 +0100 Received: (from latex3@localhost) by istrati.zdv.uni-mainz.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id WAA06990; Sat, 17 Feb 2001 22:54:06 +0100 In-Reply-To: <14990.30852.571842.571065@istrati.zdv.uni-mainz.de> References: <200102122049.f1CKnvi13875@smtp.wanadoo.es> <14990.30852.571842.571065@istrati.zdv.uni-mainz.de> Return-Path: X-Mailer: VM 6.75 under Emacs 20.4.1 x-mime-autoconverted: from quoted-printable to 8bit by relay.urz.uni-heidelberg.de id WAA27017 X-Authentication-Warning: istrati.zdv.uni-mainz.de: latex3 set sender to frank@mittelbach-online.de using -f Content-class: urn:content-classes:message Subject: Embarrassingly wrong Date: Sat, 17 Feb 2001 22:54:05 +0100 Message-ID: <14990.62205.711349.925864@istrati.zdv.uni-mainz.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Frank Mittelbach" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3963 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0992C.942D9580 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Frank Mittelbach writes: > > A few people will unfortunately get this posting twice since it is = both sent > to LATEX-L as well as to the Omega developers (several of which are = on > LaTeX-L), sorry for that. > > We thought this advisable as we make a number of suggestions = regarding > extensions/changes to Omega's character token processing. (Any = technical > discusion of these suggestions should probably be confined to the = omega > developers list though) fortunately it is a weekend and nobody has already told us ... so we can = at least claim we found out ourselves shortly after sending the message = out: Omega already has input modes and translations which do support what we = are asking for, ie the translation from the source document to the internal unicode form. thus OICR1=3DOICR2 and all our rambling about it was wrong what seems to remain is a) problems with controlling these input translations; the way it works = in omega according to the documentation is that a change applies to the = next line in a file. However in an example like the following: \ocp\OCPa=3Dinutf8 \def\foo{abc=E4d} % default seems to be latin1 \show\foo % the following fails (not surprisingly) % and can't be corrected later on \def\foo{ab \InputTranslation currentfile\OCPa c=C3=A4} \show\foo the second \foo will now contains the tokens \foo=3Dmacro: ->ab \InputTranslation currentfile\OCPa c^^c3^^a4. thus if you ever use this \foo later on you will get the wrong = characters because the input was umlaut-a in utf8 but what is stored in \foo are = the _two_ characters uppercase-A-with-tilde and currency-sign). furthermore if this \foo is used anywhere it will change the input translation from the next line on to utf8 and this could be in a = completely different file. This might look like a contrieved example but on a higher level of = macro encoding this type of problem will happen whenever an \InputTranslation = is used either directly or within some macro definition (like a language = tag) and that is placed, for example, inside an argument of some other tag. Since we have been asked to provide input encoding changes for LaTeX = within paragraphs, eg for individual words, something like this would happen = if such a change appears, say, inside the argument of \section. b) the other problem that seems to remain is: > Another problem of the current model seems to be that, even if trans = A did the > encoding transformation to Unicode ie we have only a single OICR, > transformations of type D (ie transformation of character token = strings) can't > be controlled by a mechanism similar to the one that is available for > transformations of type C, ie in one case we have ocps and in the = other area, > when we work on structural issues like building TOC or arranging data = for page > representation no such mechanism is available. Thus is seems = interesting to > think about whether or not a similar concept (not necessarily the = same!) > should be made available for this part of the process. > > In other words the concept of ocps makes perfect sense for character = string > manipulation but one has to [pretend] to typeset something to have = them > available in current Omega, but a large amount of document processing = is > concerned with character string manipulation not related to = typesetting at > all. what is no longer a problem though is the example we gave for the above = since for that particular case (writing to output streams) Omega provides = output translations. hope by this correction we got a little closer to the truth :-) frank & chris ------_=_NextPart_001_01C0992C.942D9580 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Embarrassingly wrong

Frank Mittelbach writes:
 >
 > A few people will unfortunately get this = posting twice since it is both sent
 > to LATEX-L as well as to the Omega = developers (several of which are on
 > LaTeX-L), sorry for that.
 >
 > We thought this advisable as we make a = number of suggestions regarding
 > extensions/changes to Omega's character = token processing. (Any technical
 > discusion of these suggestions should = probably be confined to the omega
 > developers list though)

fortunately it is a weekend and nobody has already = told us ... so we can at
least claim we found out ourselves shortly after = sending the message out:
Omega already has input modes and translations  = which do support what we are
asking for, ie the translation from the source = document to the internal
unicode form.

thus OICR1=3DOICR2 and all our rambling about it was = wrong

what seems to remain is

 a) problems with controlling these input = translations; the way it works in
 omega according to the documentation is that a = change applies to the next
 line in a file. However in an example like the = following:

    \ocp\OCPa=3Dinutf8

    \def\foo{abc=E4d} % default seems = to be latin1
    \show\foo


    % the following fails (not = surprisingly)
    % and can't be corrected later = on

    \def\foo{ab
    \InputTranslation = currentfile\OCPa
    c=C3=A4}
    \show\foo


 the second \foo will now contains the = tokens

   \foo=3Dmacro:
    ->ab \InputTranslation = currentfile\OCPa c^^c3^^a4.

 thus if you ever use this \foo later on you will = get the wrong characters
 because the input was umlaut-a in utf8 but what = is stored in \foo are the
 _two_ characters uppercase-A-with-tilde and = currency-sign).

 furthermore if this \foo is used anywhere it = will change the input
 translation from the next line on to utf8 and = this could be in a completely
 different file.

 This might look like a contrieved example but on = a higher level of macro
 encoding this type of problem will happen = whenever an \InputTranslation is
 used either directly or within some macro = definition (like a language tag)
 and that is placed, for example, inside an = argument of some other tag.

 Since we have been asked to provide input = encoding changes for LaTeX within
 paragraphs, eg for individual words, something = like this would happen if such
 a change appears, say, inside the argument of = \section.


 b)  the other problem that seems to remain = is:

 > Another problem of the current model seems = to be that, even if trans A did the
 > encoding transformation to Unicode ie we = have only a single OICR,
 > transformations of type D (ie = transformation of character token strings) can't
 > be controlled by a mechanism similar to = the one that is available for
 > transformations of type C, ie in one case = we have ocps and in the other area,
 > when we work on structural issues like = building TOC or arranging data for page
 > representation no such mechanism is = available. Thus is seems interesting to
 > think about whether or not a similar = concept (not necessarily the same!)
 > should be made available for this part of = the process.
 >
 > In other words the concept of ocps makes = perfect sense for character string
 > manipulation but one has to [pretend] to = typeset something to have them
 > available in current Omega, but a large = amount of document processing is
 > concerned with character string = manipulation not related to typesetting at
 > all.

what is no longer a problem though is the example we = gave for the above since
for that particular case (writing to output streams) = Omega provides output
translations.

hope by this correction we got a little closer to the = truth :-)

frank & chris

------_=_NextPart_001_01C0992C.942D9580--