Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with ESMTP id r9OMIf7r017333 for ; Fri, 25 Oct 2013 00:18:42 +0200 Received: from relay2.uni-heidelberg.de ([129.206.210.211]) by mx-ha.gmx.net (mxgmx105) with ESMTPS (Nemesis) id 0M6OIR-1VtS7c3Fud-00yQ55 for ; Fri, 25 Oct 2013 00:18:35 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9OMG6Es001374 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 25 Oct 2013 00:16:06 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9OM14iU007799; Fri, 25 Oct 2013 00:16:06 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 10483471 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Fri, 25 Oct 2013 00:16:06 +0200 Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9OMG55N009353 for ; Fri, 25 Oct 2013 00:16:05 +0200 Received: from mail-oa0-f43.google.com (mail-oa0-f43.google.com [209.85.219.43]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id r9OMFrFF010838 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Fri, 25 Oct 2013 00:15:56 +0200 Received: by mail-oa0-f43.google.com with SMTP id m1so171686oag.30 for ; Thu, 24 Oct 2013 15:15:52 -0700 (PDT) X-Received: by 10.60.125.138 with SMTP id mq10mr17539oeb.90.1382652598793; Thu, 24 Oct 2013 15:09:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.171.5 with HTTP; Thu, 24 Oct 2013 15:09:38 -0700 (PDT) References: <5268D3D0.1050209@residenset.net> <52693F2A.80902@residenset.net> Content-Type: text/plain; charset=ISO-8859-1 X-MIME-Autoconverted: from quoted-printable to 8bit by listserv.uni-heidelberg.de id r9OMG55N009354 Message-ID: Date: Fri, 25 Oct 2013 00:09:38 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Michiel Helvensteijn Subject: Re: l3regex feature request + a question about the implementation To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: <52693F2A.80902@residenset.net> Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by relay2.uni-heidelberg.de id r9OMG6Es001374 Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:BJmRr7OVEn8=:ypKYtA9HLzyyvImLhMsTleXRFx 1J3Ye5L9uosy7sa3Uhr8Pf+7EfFCQvhuOSpjbj9Oa39ouCY2N0BfEwZyjmstGEBv9XSsSil36 /90tn30b7rQYJp8lyD14+++3z3/+r9dJQtboq/81CGQDFBYZO0Wz+68CWn2ma5yLOxsDBiOm9 6fPMqOz7zxDj8wRVvaEvcuMvhPFay7SFadlar70CJTukOCuvDZYlCUAmZZn7eIDsp6iPoqnNJ PIMs+UEurrWbTvmM2OQWpXYf8Nbe7yrMsbaYS431Z95u1JId1vWZUAA3gVgKVB6LPFLMZGFks 3egfKb8iNlyTzC/dKVbFlxlHSw201dPQFWWHPmeLHMTkANa+CfUjO6H9u1QfjUse+ygkfVxmu p2uILpnRde2GfkdHTNbw/RxO364jGb57kkZgpy1Jxpqq4ZP7MbprKIHdbPMcWsIZfkO1sTPR5 +CsgcHKpgmczLonuzwLza05bG9EFSclfkPJ/IF0sqXoDuZCMwQGitx9PckeX6qqbTEusPPIQ5 P/1nIh2bu+8+y0RxqqMX3qdEcNe1gYL+2P7XLoynWLC//d1um7+imhoLOgrDjYosffPv6C0r2 r0l6/n56INe4rugGlLuXzbmlkvbIhBPtRvtZga96dISp/Hop2QlmA6bILNQM8SQeMyX+rCCa3 5zNM71zt8Eym/mTkePRgSaoCzlkhtDerJ884rXIeDqqWgGAKHHpzYFOQaroLNb1LG562AxsLY ae7aCICBHa9E7fW0VDYBnz3i9iLLuw2BERm7LXwVvkh33SnoWVAYxJBfhuItEdmoUEIxVEfeb NGUK0+N+f6jehu5xZBtHvGbdgLXnDEx7PbJAkSYGNKDwl911eLr6ifw7Ts976SN4JcaCccu58 HtkXyJIieoG1PQmaAOMILvdc27NyEXwHOE36x3Z+oT2/m6OGxvTo2i3lvOCIEnyXHz840MAwc LkKyuL3RRVcU10Qs3zbtT25Q309ZbJiWEiD8JKic4C91BqfLjpOqoA2Pe+1MaNNaHqpAxCtfy 2c/plwA6bfxs9ae5Yyu+NOJ8qg7HLWXP00UQbJEc4dVc/mWA4vP9Oi1j4xbWnLNqCI2o/I/HM +4ycdBGatVJjFse3PIAZiTFwon/iICKJFixf5fIa3aSnuVTECLaaXMH9xGMbHCA+SlsBv5KqV oxuLr2m5pU/VJJKoJjnsZ36976JBk+qhBL6hPNU4eIB5YwW6tP7+WEo46BdKXTJXydcSmIWWm aDKOAUwMSRgT5Zmeq4hoeWjehUw2qW3i7ZDVlOAC/UeGV3cVK4NSIApjm9b4SmAy9ZiTvpXXL yn/GbgxRbWLg8O8TIyXGoW/ykIaMKt6GLezaktRCwq//DxGhm8SFDfu2I6pNWpdC4gMnOzJ59 4BeH4koPegVlBC5911exKgCzaqdkrF8+JFXz2h1LM2zN6uGbBzCfElMKPxsoKbqUbIUwfr3N9 WuFc1tRFZOinZx+Y6wZGdwFCeFR90wPJBLO+3zG5GHtfPL1hpp0Ep3wUZkHQO5mBh8CODoigD dqyCVhNG7kelR7X+3dGE90BxMDL0ypo9YTcq8ZG7d X-UI-Loop:V01:YvtQrCncqdE=:/YvPGw0bC3RMlU0G6+PqeW/UvIwnHBpCvyauoFIsy5s= Status: R X-Status: X-Keywords: X-UID: 7298 On Thu, Oct 24, 2013 at 5:39 PM, Lars Hellstr=F6m wrote: >> Interesting, I haven't come across the idea of a lazy powerset >> construction before. But I don't think I fully understand what you are >> proposing. Using "a family of csnames" suggests to me the general >> technique of maintaining an input-output table for a pure function. > > I was thinking one csname per state of the DFA (subset of states of the > NFA), that in turn acted as a function from input character to new stat= e of > DFA. This should work perfectly well in your one-token-at-a-time settin= g. I was trying to determine what was 'lazy' about it. But I think I get it now. You're only building states as you need them and leaving those transition functions partially defined. You always try the DFA first, but if you run into a dead end (no transition to fit the input), you have to fall back to the NFA. Yes, that might work. >> (2) I would also need the ability to merge new (N|D)FA's into an >> existing one and mark the corresponding accepting states with an >> identifier. I could then query this identifier when a match was found, >> so I could know which original pattern(s) matched. > > Doable, but whichever way you do it, you effectively end up executing a= ll > the component automata in parallel. With DFAs, the new state space is t= he > cartesian product of the component state spaces. With NFAs, you have a = set > of active branches for each component automaton. So if you're anyway fe= eding > data to your automaton one token at a time, then you might as well feed= each > token to a sequence of automata instead. Properly merging DFA's will gain you plenty if some rudimentary optimizations are performed. But not so much for NFA's, as you say, which is a shame. I've defined near 200 separate math-mode lexemes in my thesis so far. I wouldn't want to start up 200 automata for every potential occurrence. My current implementation uses a trie, i.e., a very primitive DFA. --=20 www.mhelvens.net