MIME-Version: 1.0
References: <CAFUtaNEvBktQbJytj8MhWuZtOBODbT+AQ_eBHqxhh7DXu29BnQ@mail.gmail.com>
            <CAFUtaNFqo=mHZtP2LOE29S6APD=9m7m7VX2oDptNLh_C+wsj2Q@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Message-ID:  <CANQYN6yfu=65CX47pnz=_kz7DmJCQpHeNuDVbmd5KXCsFGbLMQ@mail.gmail.com>
Date:         Fri, 3 Aug 2012 16:52:50 +0200
Reply-To: Mailing list for the LaTeX3 project
              <LATEX-L@listserv.uni-heidelberg.de>
Sender: Mailing list for the LaTeX3 project <LATEX-L@listserv.uni-heidelberg.de>
From: Bruno Le Floch <blflatex@GMAIL.COM>
Subject: Re: Peek ahead for next token not in token-list
To: LATEX-L@listserv.uni-heidelberg.de
In-Reply-To:  <CAFUtaNFqo=mHZtP2LOE29S6APD=9m7m7VX2oDptNLh_C+wsj2Q@mail.gmail.com>
Precedence: list
Status: R

Hello Joel,

I promised to go back to you earlier but didn't, sorry about that.
I'm replying to two emails in one, and the result is somewhat long,
hopefully helpful.

> I've been developing my xpeek package [...]
> see <https://github.com/jcsalomon/xpeek>.

I see that you use the "NPC" prefix in xpeek, probably because of some
code I had written (back when you were asking for a \NewPeekCommand
command).  It may be better to use xpeek as a prefix: since there can
be no two packages on CTAN with the same name, using that name as a
prefix for internal commands should avoid clashes.

Furthermore, it would be best if you use the convention \__xpeek_...
for internal commands, and \l__xpeek_... for internal variables.  You
probably don't have any public code-level functions \xpeek_... or
variables \l_xpeek_..., but this would be the conventional beginning.
To make the internal convention more convenient and shorter to type,
we recently introduced l3docstrip.

Replace docstrip by l3docstrip, and replace "xpeek" (or "NPC") by "@@"
in all names.  Then add

%    \begin{macrocode}
%<@@=xpeek>
%    \end{macrocode}

near the start of the implementation section (see e.g., some l3kernel
modules for a model).  This change will make it very easy to change
the module name if needed, will make the code shorter, and will make
the command names less accessible from outside.

>     \textit{foof}\xspace.
>     \textit{foof}\xspace!
>
> Thinking about the problem, it seems I need the ability to scan ahead
> in the input stream, ignoring tokens from one list while looking for
> tokens from another.  In Expl3 terms, I’m hoping to define something
> like `\peek_inlist_ignore_auxlist:nnTF`.

It should be \xpeek, or \@@ (transformed to \__xpeek), not \peek in
any case :).  I think it is very important not to use the kernel
namespace even when the command name would make more sense with such a
name.  For instance, in randomwalk.sty I have
\@@_int_set_to_random:Nnn, not \int_set_to_random:Nnn.

>         \peek_ignore_list:N \ignorelist
>         `\l_peek_token'

This syntax is impossible to acheive since \peek_ignore_list:N has no
way to know where the `\l_peek_token' "argument" is supposed to end.

> The direction I’m considering is to read ahead, consuming tokens. Each
> token read is added to a save-list and compared to the ignore-list. If
> it’s on the ignore-list, continue; otherwise put the save-list back on
> the input stream and stop.
>
> Does this sound reasonable so far?

Somewhat reasonable, yes.  I'm not sure what the best approach is.
You need to collect the tokens in your ignore list, and you then need
to perform an action depending on the next token. It is possible to
define \xpeek_collect_do:nn, whose first argument is a list of tokens
to ignore, whose second argument is some operation to perform, which
will receive as an argument the tokens:

    \xpeek_collect_do:nn { abc } { \foo \bar } caada

=>

    \foo \bar { caa } da

Assuming that we have this function (see below for an implementation),
and that the following token (the first which is not collected) has
its meaning copied to \l_peek_token (like any \peek function), then we
can built a \nextnonpunct as

    \DeclareDocumentCommand { \nextnonpunct } { }
      { \xpeek_collect_do:nn { .,!? } { ` \l_peek_token ' \use:n } }

where the \use:n unbraces whatever punctuation \xpeek_collect_do:nn
has collected.


How is \xpeek_collect_do:nn implemented?  I'm introducing a quark just
to have a macro different from anything you may see when peeking
ahead: then \peek_meaning:NF always takes the F branch.  Not happy
about that hack.

    \quark_new:N \q_@@
    \bool_new:N \l_@@_ignore_bool
    \cs_new_protected:Npn \xpeek_collect_do:nn #1#2
      { \@@_collect_do:nnnn { #1 } { #2 } { } { } }
    \cs_new_protected:Npn \@@_collect_do:nnnn #1#2#3#4
      {
        \peek_meaning:NF \q_@@
          {
            \bool_set_false:N \l_@@_ignore_bool
            \tl_map_inline:nn {#1}
              {
                \token_if_eq_charcode:NNT \l_peek_token ##1
                  {
                    \bool_set_true:N \l_@@_ignore_bool
                    \tl_map_break:
                  }
              }
            \bool_if:NTF \l_@@_ignore_bool
              { \@@_collect_do:nnnn {#1} {#2} { #3#4 } }
              { #2 { #3#4 } }
          }
      }

> To consume tokens one-by-one, I built this function:
>
>     \cs_new_protected:Npn \peek_meaning_really_remove:NTF #1 #2 #3
>       {
>         \peek_meaning_remove:NTF #1
>           { #2 }
>           {
>             \peek_meaning_remove:NT \l_peek_token
>               { #3 }
>           }
>       }

Well, that would remove tokens, not collect them.

> (This should be created via \prg_new_conditional, but I haven’t yet
> figured that out.)

It is (pretty much?) impossible to define peek-like functions as conditionals.

> Is the direction I'm taking appropriate for what I'm trying to do?

Yes.

> Is there some existing functionality that would help that I'm overlooking?

Not really.  I think we should add \peek_after:nw to cover my use of
\peek_meaning:NF \q_@@ in the code above.  That would make the code
reasonably clean.  I've added this function to
l3trial/l3kernel-extras, not on CTAN, only on the SVN repository.

One correct long-term approach would be to provide a parser for some
class of grammar, but that is extremely hard in TeX (the regular
expression parser l3regex took me about 4 months of hard work). So
don't expect this any time soon.

At least for now, I think the \xpeek_collect_do:nn code I give above
is (up to a few improvement) a reasonable approach to practical
situations where someone wants to look ahead in the input stream.  So
I'd say, provide \xpeek_collect_do:nn or a similar functionality as a
public code-level function in your xpeek package.

On 7/30/12, Joel C. Salomon <joelcsalomon@gmail.com> wrote:
> After some experimentation, it seems that the \peek_* family of
> functions don't work well inside l3prg conditionals; source3.pdf seems
> to bear this out in the justification for \__peek_def:nnnn.

Indeed: consider

   \prg_new_conditional:Npnn \foo:n #1 { TF }
     { \prg_return_true: }

This is (currently) equivalent to

   \cs_new:Npn \foo:nTF #1 { \prg_return_true: \c_zero }

and the \prg_return_true: \c_zero combination is equivalent to
\use_i:nn (see definition of \prg_return_true:), which selects the
true branch and discards the false branch.  Note how the \foo:nTF
macro only takes one argument: the other two "arguments" are left in
the input stream until the last moment, where \prg_return_true/false:
selects one of the two.  The problem with peek functions is that they
need to see past those conditional branches in the input stream.
Thus, \peek_meaning:NTF is roughly

    \cs_new_protected:Npn \peek_meaning:NTF #1#2#3
      {
        \cs_set_eq:NN \l__peek_search_token #1
        \cs_set_nopar:Npx \__peek_true:w { \exp_not:n {#2} }
        \cs_set_nopar:Npx \__peek_false:w { \exp_not:n {#3} }
        \peek_after:Nw \__peek_meaning:
      }
    \cs_new_protected_nopar:Npn \__peek_meaning:
      {
        \token_if_eq_meaning:NNTF \l__peek_search_token \l_peek_token
          { \__peek_true:w } { \__peek_false:w }
      }

The T and F arguments must be taken out of the input stream, stored
into dedicated functions \__peek_true:w and \__peek_false:w, and put
back after the test.

> On TeX.SE, Clemens Niederberger posted an answer to the specific
> question I'd posed; see <http://tex.stackexchange.com/a/64351/2966>.
> It works well, but it's built on recursive expansion of macros with :w
> specifiers that I'm really not understanding. I'm thinking, therefore,
> that I'm better off getting help implementing the functionality I want
> in parts.

I suspect his solution is needlessly complicated (he seems to test if
the token is in the ignore list in a roundabout way).

> What sorts of restrictions are there on the use of \l_peek_token
> inside the true-code & false-code branches of the \peek_* functions?

None, as far as I know.

> Is it reasonable to use \__peek_def:nnnn to generate something like
> \peek_unconditional:TF? (The false-code branch should never execute, I
> expect.)

Definitely not.  \__peek_def:nnnn is internal, and may change at a
whim.  We have been careful to mark internal functions as such, and
make no guarantee whatsoever that they will remain.  The function you
want is \peek_after:nw (see l3kernel-extras), and for now, you can use
your own copy

    \tl_new:N \l__xpeek_code_tl
    \cs_new_protected:Npn \xpeek_after:nw #1
      {
        \tl_set:Nn \l__xpeek_code_tl {#1}
        \peek_after:Nw \l__xpeek_code_tl
      }

> Actually, it's \peek_unconditional_remove:T I think I need.

I don't think you need that one since the token should be kept
somewhere.  The copy \l_peek_token is not appropriate, since that
control sequence will later be changed to the next token in the input
stream.  Think of \l_peek_token as a pointer (that's almost not a
lie), which TeX can unfortunately not dereference.

> \tl_new:N \g_jcs_matchlist_tl
> \tl_new:N \g_jcs_ignorelist_tl
> \tl_new:N \l_jcs_ignored_tokens_tl
>
> \cs_new:Npn \jcs_peek_in_matchlist_ignore_ignorelist:TF #1#2
>   {
>     \tl_clear:N \l_jcs_ignored_tokens_tl
>     \__jcs_peek_in_matchlist_ignore_ignorelist_aux:TF {#1}{#2}
>   }

Braces missing.

> \cs_new:Npn \__jcs_peek_in_matchlist_ignore_ignorelist_aux:TF #1#2
>   {
>     \peek_unconditional_remove:T
>       {
>         \tl_if_in:N?TF \g_jcs_ignorelist_tl { something involving
> \l_peek_token }

Not possible, unfortunately. You have to map through
\g_jcs_ignorelist_tl, comparing \l_peek_token to each token in the
ignorelist (see code for \xpeek_collect_do:nn above).

>           {
>             \tl_put_right:N? \l_jcs_ignored_tokens_tl  { something
> involving \l_peek_token }
>             keep looking, probably by recursing

Yes, that's roughly what I'm doing.  I'm storing the tokens as macro
arguments #3 and #4 of \@@_collect_do:nnnn, but that's not very
sensible, storing in a token list is better.


>             \tl_use:N  \l_jcs_ignored_tokens_tl
>             \tl_if_in:N?TF \g_jcs_matchlistlist_tl { something
> involving \l_peek_token }
>               {#1} {#2}

Again, \tl_if_in is not useable here.  You should probably define an
auxiliary test

    \prg_new_protected_conditional:Npnn \@@_if_in:NN #1#2 { TF }
      {
        \bool_set_false:N \l_@@_bool
        \tl_map_inline:Nn #1
          {
            \token_if_eq_charcode:NNT #2 ##1
              { \bool_set_true:N \l_@@_bool \tl_map_break: }
          }
        \bool_if:NTF \l_@@_bool
         { \prg_return_true: } { \prg_return_false: }
      }

Used as \@@_if_in:NNTF \g_jcs_ignorelist_tl \l_peek_token { } { }.

> Does this sound like the correct path to head down?

Yes.

Best regards,
Bruno