Re: Side remarks about TeX input sequence

Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1DIAbH27736 for ; Tue, 13 Feb 2001 19:10:37 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1DIAbd02288 . for ; Tue, 13 Feb 2001 19:10:37 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1DIAaM12643 for ; Tue, 13 Feb 2001 19:10:36 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C095E8.43DECC80" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id TAA14438 for ; Tue, 13 Feb 2001 19:10:35 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1DIAZ726300 for ; Tue, 13 Feb 2001 19:10:35 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <11.16EE07EC@mail.listserv.gmd.de>; Tue, 13 Feb 2001 19:10:27 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 488760 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Tue, 13 Feb 2001 19:10:32 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id TAA07486 for ; Tue, 13 Feb 2001 19:10:30 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id TAA31816 for ; Tue, 13 Feb 2001 19:10:30 +0100 Received: from musse.tninet.se (musse.tninet.se [195.100.94.12]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with SMTP id f1DIATg24814 for ; Tue, 13 Feb 2001 19:10:29 +0100 (MET) Received: (qmail 17316 invoked from network); 13 Feb 2001 19:10:28 +0100 Received: from garibaldi.tninet.se (HELO algonet.se) (195.100.94.103) by musse.tninet.se with SMTP; 13 Feb 2001 19:10:28 +0100 Received: from [195.100.226.164] (du164-226.ppp.su-anst.tninet.se [195.100.226.164]) by garibaldi.tninet.se (BLUETAIL Mail Robustifier 2.2.1) with ESMTP id 853401.87824.982garibaldi-s2 ; Tue, 13 Feb 2001 19:10:24 +0100 In-Reply-To: References: Hans Aberg's message of "Tue, 13 Feb 2001 15:10:54 +0100" Return-Path: X-Sender: haberg@pop.matematik.su.se Content-class: urn:content-classes:message Subject: Re: Side remarks about TeX input sequence Date: Tue, 13 Feb 2001 18:55:22 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Hans Aberg" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3886 This is a multi-part message in MIME format. ------_=_NextPart_001_01C095E8.43DECC80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 11:47 -0500 2001/02/13, Michael John Downes wrote: >> I am not a TeX guru, but I get the impression that the TeX looks like = this: >> >> The string of TeX tokens buffer is normally empty, but sometimes a = macro >> may insert a string of tokens (perhaps a macro expansion can be = viewed as >> though the body is first inserted in this buffer, before being = evaluated). > >Yes, that is quite true, but Knuth calls that buffer an input stream, >and there may be multiple nested input streams open at any given = moment. Perhaps I simplified it: It should probably look like where the input streams are stacked (including the stream buffers then). = -- I generally skip over this stacking things, because one can easily treat = a set of stacked input streams as a single input stream. >> The > tokens at need. > >TeX reads into the buffer one line at a time. How can this be true? What happens if a command in the middle of a line changes the catcodes, or contains a macro that expands to a \input ? >In particular, the >character at the end of the line will be whatever was the value of >\endlinechar at the point when the read was triggered. But the catcode >of the endlinechar can be changed at any point until TeX takes it from >the line buffer and turns it into a token. I get the impression that you refer to a technique that TeX is using = merely to buffer the input. -- But in reality this is only a buffering = technique, which does not affect the first one character-by-one and then one = token-one lookahead technique. >> TeX does not back-track. > >\futurelet, \uppercase, \lowercase, \expandafter and perhaps one or two >others do backtracking in the token stream, but yes there is no >backtracking in the line buffer. This is not backtracking, really: Backtracking means that one reads = ahead, and later decides that the sequence was wrong based on semantic evaluations, and then re-parses the sequences. Those commands that you mention have a deterministic course of action, namely they save away = some tokens, and later put them back in the front of the token stream, just = as any macro can do when expanding. Hans Aberg ------_=_NextPart_001_01C095E8.43DECC80 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Side remarks about TeX input sequence

At 11:47 -0500 2001/02/13, Michael John Downes = wrote:
>> I am not a TeX guru, but I get the = impression that the TeX looks like this:
>> <string of TeX tokens> = <not yet gulped up ASCII (or 8-bit)>
>> The string of TeX tokens buffer is normally = empty, but sometimes a macro
>> may insert a string of tokens (perhaps a = macro expansion can be viewed as
>> though the body is first inserted in = this buffer, before being evaluated).
>
>Yes, that is quite true, but Knuth calls that = buffer an input stream,
>and there may be multiple nested input streams = open at any given moment.

Perhaps I simplified it: It should probably look = like
<string of TeX tokens> <current = input stream>
where the input streams are stacked (including the = stream buffers then). --
I generally skip over this stacking things, because = one can easily treat a
set of stacked input streams as a single input = stream.

>> The <not yet gulped up ASCII (or 8-bit) = buffer is read converted into
>> tokens at need.
>
>TeX reads into the buffer one line at a = time.

How can this be true? What happens if a command in the = middle of a line
changes the catcodes, or contains a macro that = expands to a \input
<filename>?

>In particular, the
>character at the end of the line will be whatever = was the value of
>\endlinechar at the point when the read was = triggered. But the catcode
>of the endlinechar can be changed at any point = until TeX takes it from
>the line buffer and turns it into a token.

I get the impression that you refer to a technique = that TeX is using merely
to buffer the input. -- But in reality this is only a = buffering technique,
which does not affect the first one character-by-one = and then one token-one
lookahead technique.

>> TeX does not back-track.
>
>\futurelet, \uppercase, \lowercase, \expandafter = and perhaps one or two
>others do backtracking in the token stream, but = yes there is no
>backtracking in the line buffer.

This is not backtracking, really: Backtracking means = that one reads ahead,
and later decides that the sequence was wrong based = on semantic
evaluations, and then re-parses the sequences. Those = commands that you
mention have a deterministic course of action, namely = they save away some
tokens, and later put them back in the front of the = token stream, just as
any macro can do when expanding.

Hans Aberg

------_=_NextPart_001_01C095E8.43DECC80--