Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1EFb8H01498 for ; Wed, 14 Feb 2001 16:37:08 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1EFb8d06786 . for ; Wed, 14 Feb 2001 16:37:08 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1EFb7M07785 for ; Wed, 14 Feb 2001 16:37:07 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0969B.FD4AAA00" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id QAA15528 for ; Wed, 14 Feb 2001 16:37:07 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1EFb6M07779 for ; Wed, 14 Feb 2001 16:37:06 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <15.D08660CE@mail.listserv.gmd.de>; Wed, 14 Feb 2001 16:36:59 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 488727 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Wed, 14 Feb 2001 16:37:03 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id QAA24525 for ; Wed, 14 Feb 2001 16:37:01 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id QAA38368 for ; Wed, 14 Feb 2001 16:36:57 +0100 Received: from musse.tninet.se (musse.tninet.se [195.100.94.12]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with SMTP id f1EFapx29873 for ; Wed, 14 Feb 2001 16:36:51 +0100 (MET) Received: (qmail 10654 invoked from network); 14 Feb 2001 16:36:23 +0100 Received: from delenn.tninet.se (HELO algonet.se) (195.100.94.104) by musse.tninet.se with SMTP; 14 Feb 2001 16:36:23 +0100 Received: from [195.100.226.150] (du150-226.ppp.su-anst.tninet.se [195.100.226.150]) by delenn.tninet.se (BLUETAIL Mail Robustifier 2.2.1) with ESMTP id 402982.164979.982delenn-s2 for ; Wed, 14 Feb 2001 16:36:19 +0100 In-Reply-To: <14986.22793.581261.472446@informatik.uni-stuttgart.de> References: Return-Path: X-Sender: haberg@pop.matematik.su.se Content-class: urn:content-classes:message Subject: Re: Side remarks about TeX input sequence Date: Wed, 14 Feb 2001 16:31:45 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Hans Aberg" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3923 This is a multi-part message in MIME format. ------_=_NextPart_001_01C0969B.FD4AAA00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 11:08 +0100 2001/02/14, Bernd Raichle wrote: > > >Incidentally one reason why xmltex can not support utf16 is that > > >TeX buffers to ^J (or ^M) and throws away any bytes with value 32 = that > > >occur at the end of this buffer, which might just be half of a = 16bit > > >quantity that you'd rather keep. there's no way to control this > > >behaviour from within TeX. > > > > So TeX is a lot less sophisticated than it appears at first sight. > >David has simpliefied it a lot. Instead of saying ``TeX buffers to ^J >(or ^M)'' is should read ``TeX buffers to the system-dependend and >file type dependend end-of-line marker''. This is the normal thing under say C/C++ for "text" files (but not for "binary" files"), even though TeX uses Pascal, so people tend to forget about it. > Nowadays stream oriented >files are common, where a special character (^J or ^M) or a special >combination of characters (^M^J) are used as end-of-line markers. UNIX, MacOS & DOS (MSOS). > In >the past and even nowadays there exist other file types where the >end-of-line marker is not part of the file (i.e. a special character), >e.g. files with a fixed-width record (aka line) length. Today, which ones? >And if you have to deal with files using a fixed-width record length >usually padded by blanks, it was (and still is?) a good idea to remove >these padding character at an appropriate stage ... why not directly >after reading the line? No, this is not the normal parsing: The normal would be to open the = files, and read the characters one by one, and let the lexical remove the space characters whenever it sees it. A simple way to stack the input files in say C++ is to merely have a function that creates and open the streams = and the parses it. When the function is finished, the parsed file = automatically closes. Normally, one does not allow lexical and grammar constructs pass over files (i.e., one does not allow that the a word in one file is = closed by some additional letter in another file, or that say matching braces = have "{" in one file and "}" which simplifies this method. This way one does not have to worry about buffering at all, as it is hadnled automatically by the stream classes. > > TeX really is a program from another age... > >Yes! TeX is written between 1977 and 1982! My thought was that perhaps this kind of modern way of handling streams = was (is?) not available in Pascal, so therefore, when it had to be = implemented by hand, simplifications were made. Hans Aberg ------_=_NextPart_001_01C0969B.FD4AAA00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Side remarks about TeX input sequence

At 11:08 +0100 2001/02/14, Bernd Raichle wrote:
> > >Incidentally one reason why xmltex can = not support utf16 is that
> > >TeX buffers to ^J (or ^M) and throws = away any bytes with value 32 that
> > >occur at the end of this buffer, which = might just be half of a 16bit
> > >quantity that you'd rather keep. = there's no way to control this
> > >behaviour from within TeX.
> >
> > So TeX is a lot less sophisticated than it = appears at first sight.
>
>David has simpliefied it a lot.  Instead of = saying ``TeX buffers to ^J
>(or ^M)'' is should read ``TeX buffers to the = system-dependend and
>file type dependend end-of-line marker''.

This is the normal thing under say C/C++ for = "text" files (but not for
"binary" files"), even though TeX uses = Pascal, so people tend to forget
about it.

>  Nowadays stream oriented
>files are common, where a special character (^J = or ^M) or a special
>combination of characters (^M^J) are used as = end-of-line markers.

UNIX, MacOS & DOS (MSOS).

>  In
>the past and even nowadays there exist other file = types where the
>end-of-line marker is not part of the file (i.e. = a special character),
>e.g. files with a fixed-width record (aka line) = length.

Today, which ones?

>And if you have to deal with files using a = fixed-width record length
>usually padded by blanks, it was (and still is?) = a good idea to remove
>these padding character at an appropriate stage = ... why not directly
>after reading the line?

No, this is not the normal parsing: The normal would = be to open the files,
and read the characters one by one, and let the = lexical remove the space
characters whenever it sees it. A simple way to stack = the input files in
say C++ is to merely have a function that creates and = open the streams and
the parses it. When the function is finished, the = parsed file automatically
closes. Normally, one does not allow lexical and = grammar constructs pass
over files (i.e., one does not allow that the a word = in one file is closed
by some additional letter in another file, or that = say matching braces have
"{" in one file and "}" which = simplifies this method.

This way one does not have to worry about buffering at = all, as it is
hadnled automatically by the stream classes.

> > TeX really is a program from another = age...
>
>Yes!  TeX is written between 1977 and = 1982!

My thought was that perhaps this kind of modern way of = handling streams was
(is?) not available in Pascal, so therefore, when it = had to be implemented
by hand, simplifications were made.

  Hans Aberg

------_=_NextPart_001_01C0969B.FD4AAA00--