Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with ESMTP id r9OFgISB014734 for ; Thu, 24 Oct 2013 17:42:19 +0200 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx005) with ESMTPS (Nemesis) id 0MT6YQ-1VAeN02uVK-00S8kk for ; Thu, 24 Oct 2013 17:42:12 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id r9OFdBdq021567 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 24 Oct 2013 17:39:11 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9OFG0IR023489; Thu, 24 Oct 2013 17:39:07 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 10502218 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 24 Oct 2013 17:39:07 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9OFd7PJ005998 for ; Thu, 24 Oct 2013 17:39:07 +0200 Received: from csep02.cliche.se (csep02.cliche.se [195.249.40.184]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9OFcsFx019900 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 24 Oct 2013 17:38:57 +0200 Received: from nova-2.local (unknown [130.243.93.86]) by csep02.cliche.se (Postfix) with ESMTP id 062A318663C for ; Thu, 24 Oct 2013 17:38:53 +0200 (CEST) User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; sv-SE; rv:1.9.2.28) Gecko/20120306 Thunderbird/3.1.20 MIME-Version: 1.0 References: <5268D3D0.1050209@residenset.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-MIME-Autoconverted: from quoted-printable to 8bit by listserv.uni-heidelberg.de id r9OFd7PJ005999 Message-ID: <52693F2A.80902@residenset.net> Date: Thu, 24 Oct 2013 17:39:22 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: =?ISO-8859-1?Q?Lars_Hellstr=F6m?= Subject: Re: l3regex feature request + a question about the implementation To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by relay.uni-heidelberg.de id r9OFdBdq021567 Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:Bh6qONQdrXw=:YPcq1/wl02oe/rQ1CJ06MtG50u IAosg0yjVupxltIG+N1EB6niUK2/pMGpL0qrTDBJieDLGFT72s90zYs/CcT8+Ww6/g0qO9C1r bx0JJ3Ts6NXA7wyuTR6K212wEU12f+btDAcX3ZH5tIHMr7qRVPeZmCs1EEMefBs2Iux7WUm4B FK4G+wt2UJKBoVtKQ1Iys/ABv3MeM3pxZmRyiatd0w+uZUdISlfRSErMX+xsKKMrWy1zEnk9D onzusnKzzVQKCwaK0Sbpra4qdcvSHWMS4E4wUmdPRRWadNBcJCcCg5k58m5yRjkjoiiRuiw40 HmvZv98zFm7has2E6sWSuK6f6tOc1osXz6OmQ8GztcnLSKt97kMRgXGlm6pY7eH7RbX/Vj4Ry uO3dUrIC9x4M5jidWswMtX3ypa++oqMVznx8gTIK16vIsraiRRDv41BVjzh8hTYG0NwAf7LXP Qqf/tOpS2uDFzUYco2KcfScqWN94qhl3AMxCdUcGhBNQuwxcwc228oVt88dh/pRbPvwvHhz3p vzG51WJL+KJd4I2EllQyqwe8NU7vlAG6/yRSE7kNUTQzjGhfEs14UCOMFcJqEKHePp4wiEbFN 04tBkf6BimG4nOwsue7LCNwG2arzAt++rMsBi8IPoCY8jYowShJSL5mUYLUmAC74QXEq8YOqQ mCQ163DI7PVUlbWJPJSk+5hMDoIkXpuRt4b/eDODL9eHX8gpblwkCOE7pKHr+UbvZOeh8yDaK p/DkosMVnoww4kY+PwmctD2rwFoQwWivWbl3HzPXqtarXfeUZUlm0jFaiziZv9En2KGiJLAnC zchb1h6DHLZyRguJNQXNJDbL4mR+Or3u62Dj7Zf5PqnjZhBmHAa6KTGMXHGjjqGyvtmfysJLJ MLsAEWWr6Qaq8TAWLqHm7oSlAxgqmnM7lIKS5ntSXWbv07DqjlCi1t7Traw2MJ+kQLXHKasmc GIy6GrImV9qMy5nvPeuT7x/rn3uUg6kPvtln2VwgpnXspnT+mbYjZXkHok9yAlqKOzifyn7Rt UokuiQqPxm8CX1/ble6UHKW1vEONDsqrg3RiFW+VHBnBJyqZnRbVCnD1aVh6fED/rmaMmNgUi xBGHs4c/XWytSdGmsuC3K7+jVNeQ0pwlaQicz4kZcsCdDv4qwYyOUylLiCwLvLOocNLLQbNTx ATbjQWPYWCcmaBiezKJKWIzn3zaS+uMQLSFeEGxg192INi+v9Ijq3rpRhOj3E6Md5O7bjPDLP dTmDxF3iygikJIeR4DAvEoTU6rEKpGLr762//+TobPMUGnMnsjmzY3oWPO+E30bXaYJj/JzSY SbEvmAk5KawWE9o5lcjKX/JxFxCpshE/wlcJ/8p8C0m/Andl2wRAv/pDlARsbCLSu1Oy8Yi++ zoB1UC+YaC4i3li+fYfqt0V32L1p164A6UcVHA3W6OdyeiFiVTgFJdefA74XFEjQkozwW46ku P9IQKYSNTSR8nuU9dpEW61u8SpSPFM2Re1hqH3aOwj4OVZGcxnUbwgM2gWqFJxfW+t4PfO/15 cDVHxpcjvF1vNDbClv+4wDTsBc1FVvaoOOMSVikRC X-UI-Loop:V01:NpZ01pgaR3Y=:jelDEeeoY6G7dWMqNwFsROXrYtHl94/58dDcSB9buFs= Status: R X-Status: X-Keywords: X-UID: 7297 Michiel Helvensteijn skrev 2013-10-24 13.43: > On Thu, Oct 24, 2013 at 10:01 AM, Lars Hellstr=F6m > wrote: >> An optimal approach for your setting would perhaps be a lazy powerset >> construction: don't carry it out for a particular state in the DFA unt= il you >> actually reach that state during matching (should I perhaps have added for clarity) >> , but then record the result; matching typically >> doesn't reach all the states of the DFA, since you don't match against >> arbitrary data. This, however, assumes that you have some efficient me= thod >> of memoising the results from lazy computations. (I suppose a family o= f >> control sequence names could do the trick.) > > Interesting, I haven't come across the idea of a lazy powerset > construction before. But I don't think I fully understand what you are > proposing. Using "a family of csnames" suggests to me the general > technique of maintaining an input-output table for a pure function. I was thinking one csname per state of the DFA (subset of states of the=20 NFA), that in turn acted as a function from input character to new state = of=20 DFA. This should work perfectly well in your one-token-at-a-time setting. > But that would only memoize exact inputs; variations on the same > pattern wouldn't get any benefit, and it wouldn't work in my setting > anyway, where I get one token at a time. My current implementation > uses a "trie" datastructure, which might be one step better than a > simple table. But still not good enough by far, I think. > > When I first read the term "lazy powerset construction" in your post, > I imagined restructuring the automaton on-the-fly somehow, which > sounded pretty cool. In my experience, most operations on automata tend to produce automata wi= th=20 epsilon-transitions, which one then at some cost (not cool) has to get ri= d=20 of before executing them. Unless the operation does something even worse = to=20 the state set... :-( > (2) I would also need the ability to merge new (N|D)FA's into an > existing one and mark the corresponding accepting states with an > identifier. I could then query this identifier when a match was found, > so I could know which original pattern(s) matched. Doable, but whichever way you do it, you effectively end up executing all= =20 the component automata in parallel. With DFAs, the new state space is the= =20 cartesian product of the component state spaces. With NFAs, you have a se= t=20 of active branches for each component automaton. So if you're anyway feed= ing=20 data to your automaton one token at a time, then you might as well feed e= ach=20 token to a sequence of automata instead. Lars Hellstr=F6m