Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with ESMTP id r9O0FvZD030559 for ; Thu, 24 Oct 2013 02:15:58 +0200 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx104) with ESMTPS (Nemesis) id 0MbbJp-1VIqG23sEX-00J29H for ; Thu, 24 Oct 2013 02:15:51 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id r9O0Ceii028534 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 24 Oct 2013 02:12:41 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9NM16O6007510; Thu, 24 Oct 2013 02:12:40 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 10483539 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 24 Oct 2013 02:12:40 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9O0CeZp017934 for ; Thu, 24 Oct 2013 02:12:40 +0200 Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9O0CVbh027789 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Thu, 24 Oct 2013 02:12:34 +0200 Received: by mail-ob0-f179.google.com with SMTP id uy5so1597711obc.38 for ; Wed, 23 Oct 2013 17:12:31 -0700 (PDT) X-Received: by 10.182.251.194 with SMTP id zm2mr45154obc.99.1382573550926; Wed, 23 Oct 2013 17:12:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.171.5 with HTTP; Wed, 23 Oct 2013 17:12:10 -0700 (PDT) X-Google-Sender-Auth: mS66WKZRBFsUkD9l6bIEupcTri0 Content-Type: text/plain; charset=ISO-8859-1 Message-ID: Date: Thu, 24 Oct 2013 02:12:10 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Michiel Helvensteijn Subject: l3regex feature request + a question about the implementation To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:pyACHLrB7jY=:jIqdg6v/AawyuqdQUp9N5Qf6rg yd+CdVBRZ3sD/xus2WVyGKjyFWkXV+VH1yvVbaoTqBKZZOp1BSOE7E9cZJFK8qNpGd3MFEUKf qGu0k8ImXSc93m0KYKUpmQegyxFutDvBrFrinC3n5iBuinYh3/yjdW0IIW7svNcuSaKSy/++p +hn448Y3mmy9unzQlaK3JJbLcAbtk60jFcTtFXA5gYWaufZ3Xl2p72YGe+O4usrb29wdq3O+2 mosSmueGsNNlq7zp44fHrbUHukG5HPuXqTMlMdFZECr1NnCnGPJLpiXP2OPh4R3PcgPvXGnov Adqdp3mhcYgqJwBR3/+DNrKpip85r5Ipa6+GBMSz4097XGG0rAglo3oSZYoXBwvHgUlova/p2 bk8b6jECGCyNLhq+rsQEivHzyzd8VUqBUHu2tcPMfL388/RDXgjSiAiRauaMtkrn/Yv7ln3y2 v9E5LNl9Zku5fBjKWCAUAht0u7mN9HGO2PdxDmMDDUHaipqQu9lTMiWPNf6rPXP3x3NAGDTNZ hAgtGXooTNyWAhFeE8slM4fZt4ePTetnIC1VMbaGJxKVLProFrhX5mGiBrGAstrq2oOigt/cE pZi/m2bIVGqA6xW9SmzE2NAQV01vrSjES7i0c/5hCudBduvthTpxa+ZTZx+00XxbaI1Wj8Ctm Bo1qy9JccKfxvo81Tasl+WLyDylruuhQoqP9cInFQ7ZsCOzaZwOwK6MjH6rVfaLt+WISWPsLC iiO+rfpEehgXxxD9JDXzvXVyyUrZgugbEHjtRn4ulu8Y+lOYJ1TTeG46hqwy4iG2ITLBgwuTC nBd+GZ96IGWJuls6WyAH/soRTbTO6BmXr6LjYyrPBsCAIgqDftN4nGfkqQhTxshB8LJNqrgyp 7BgufygOTy2SRYjoeRLirGsPZ5ZDdFIoUhySqBoAYqKtuBy1Ir146fzthvNuES/RV2EM/Sykj FD4W+Wr1aBROLZQz+H4Xzl3goMa1zjnATXYLA9tD5VgGx9XjaUzzUg5WE+w/So4y3EYIKCcHP 3Je7PN7bWFG/VFncHJZT45yBzTnmoVSrfA8F74Fk1HRx4KZoNOwZx13puHPFyoIIEUUg4js+7 B+ykbrRs1PLLnnQt2zqLncsY2zu3JQa0Srxx6w2cH5QFHKJMowAH0kZhLVD84ZmM2E89fFGwp cy0OBxsfDIjFQXVnO7F06tdzXqICDdSPtvKQzkx+4Hf9unHP04WCkM1TacZUEPlxc+f/YSgVs ZPHJpZ007P1/3q95kXKt780KAg2RKMydiGzIzlnQ9Owv2rmQ+3t0+hijz1TGOOhVDSfvQ4ZIN 8xc6sIKJd6juzD6hpC3izYwmX/mzTBSv0e9vXLYRRQOwZRa1UmXQIyV74LZAqb6AYCITLGewm adZ5xpV0mwa3NsMtOkYJMORk/ISfdpV+KQsg3tbItessjmzqd8DcpNavNK59EX8t9IjxWyfkx PKw6NpsnG3kGT6eYO+VObCw0JIthsMN5GuOiOq2oytpGLoHwTQXKdGKKxvegHUFbkiRodBIDN 1K5Fo//VYGxWT0ZnjjsN887jdwzl0kSdMdsB3OSzf X-UI-Loop:V01:/Vm5mPFApqQ=:h5yzP2GNTmghskNQwGEiJT/ZSfR6ffoeo35nrOPvEqs= Status: R X-Status: X-Keywords: X-UID: 7292 Hi all! (and in particular, Bruno) I've been exploring l3regex a bit more. Kudos on that one! It seems to work well, and it will be very useful to me in the near future. It would be even more useful if the public interface could expose just a bit more of the lower level functionality, which brings me to my feature request: I'd like to match against a compiled regex, but feed it one token at a time, rather than the entire token list at once. At each point in between, I want to know whether a match is still possible. If not, I want to go back one step and retrieve the captured groups (and perhaps other available meta-data). From how the package is implemented, this functionality should already exist, just not in the public interface. My motivation behind this request is as follows: I'm still working on a lexical analyzer package (already using it for personal documents; lets me do some pretty cool things). And I now realize that in writing my implementation I'm basically duplicating the effort that Bruno has already gone through (and ending up with significantly less advanced features, I might add). But to use l3regex I need to be able to supply it with tokens while I scan. Secondly, an implementation question (which might lead to a discussion): You're now running a regex as a NFA (Nondeterministic Finite Automaton), keeping track of all active branches while matching against an input. Is there a particular reason you're not translating it into a DFA (Deterministic Finite Automaton) during the compilation phase, i.e., applying the powerset construction? This would certainly speed up matching a great deal, and reduce memory usage in most cases (except for those artificial ones that would require an exponential number of states). It seems you're already determined to restrict the package to regular languages, so a DFA should always do the trick. It's possible I'm not aware of all the complications, though. I'd be interested in hearing your thoughts. Cheers! -- www.mhelvens.net