Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with ESMTP id r9ODlc98013590 for ; Thu, 24 Oct 2013 15:47:39 +0200 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx108) with ESMTPS (Nemesis) id 0LbgVb-1VxvhI38VX-00lFSS for ; Thu, 24 Oct 2013 15:47:32 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id r9ODiEvS012011 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 24 Oct 2013 15:44:14 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9O2253R022095; Thu, 24 Oct 2013 15:44:13 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 10489122 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 24 Oct 2013 15:44:13 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9ODiDNR000995 for ; Thu, 24 Oct 2013 15:44:13 +0200 Received: from mail-oa0-f52.google.com (mail-oa0-f52.google.com [209.85.219.52]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r9ODhoiZ016914 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Thu, 24 Oct 2013 15:43:54 +0200 Received: by mail-oa0-f52.google.com with SMTP id n10so2345702oag.25 for ; Thu, 24 Oct 2013 06:43:50 -0700 (PDT) X-Received: by 10.60.76.72 with SMTP id i8mr2226900oew.11.1382622230496; Thu, 24 Oct 2013 06:43:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.171.5 with HTTP; Thu, 24 Oct 2013 06:43:30 -0700 (PDT) References: <5268D3D0.1050209@residenset.net> <52691B49.1070503@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Message-ID: Date: Thu, 24 Oct 2013 15:43:30 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Michiel Helvensteijn Subject: Re: l3regex feature request + a question about the implementation To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: <52691B49.1070503@gmail.com> Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:PQrIEG4z2Ak=:uuCefvQA2lllZIz5BjspD562Bv oUYw1syB9mqc0+YZbxkbcXCFjhCcSXVpUzBaZrROoiS7qNXi4hlnvIU7jDPlLy3cBeLIsNAUd v1bYaD8haEWemVN/1C9hGKKTCKvXWo6fZ4Niyyt1ZGWDTtuc/awzatiBoPI0YE41JR5ON3WNU rnFWRXeo8lhESvV7v3v0XQfU5B10zAiTaKmGVFKZnExWotgpN+y5/IBoheK2+tklWU+hpAjYH SQD+gdgzcEMNyJKTkzakGIIYkf2U/kSV3+UOES0XFuxZhVvshdabEV9HJJ3lxVai6Gcv78VcU +14JrkmVrJGHhj0mrlZso2UfiTT4NavNV59Gyd5NLlDEfgBAtF5IpdOVotvBOjKwYVkCl3gGR CFNUgCm28+p7MWq2nlnj+p/JZsfOks+6y6iB1jnP2XExZH8SHzNTG+CFxcqNHPsW5gxLLmpLf ajFXx3Kr4/pITc4rnuDl36WB5CsmurkqJlEKWq7R1Lj0545ydJUbOvkFSD4N48CF4GFw6SpLd azT9RM65rH/gffnagQe1N369lt4TYKi1xr7Vyf9Xowv0+00Uv6A2eaz/He8+AnAb0FxcI0zPi iaK2b7X448EyODcbu097Y0dSEGI/YKEimbMWqMtpmxurBFiIDhSsEUAG3XcD3xsBG10LH5UgC lYhMwn7lYmZyN6vOegadTS+AD425Dcf898EmAJJd+6l+kpol68l+CnuBHZfppN6SAtH0dwX4Q LA+p79Tc9Ff4zlL6K6hIglZpTbRNNKXMogbl2k9R090MZGxZBGa1goMyLYJYYx/g8A8+D//8z RpY0+GNfWLeEzmJjXlgCjMTvEnwZBUAfXt5yNegqTRIqaObgyncCBy9urHRGnqtRNrOtVQjsT centKZe5SAXNBKlVg/ppuT32984MnoIhexab2g4H5JhlMIuwhdttp894uah/TA8wxWuJeaWMm HuhtDXDNFvHiYeubmD34WKRjdBXaSo8bJsnHbNbY4x5vwu2vWRHVVotnoDGNjZn6I7An3uwwY YOdgXWYfGkXF6ZHLzUGgUw7wBtVTr0xRdPPAJSMTstl8eahq57yonRYrOpJKBJQAciXzlS7dZ LMN7rhPc5iNqB9lKj9zIf97Yix7k4h6RWzcRw01ytc8qOD2xsRL0z17NeHGPGThbj/q2PMJqW 34IEYorBDPkfwHCcoXUFOjzbqQHoMIoLpyLgJtUg/SZY5vRRqGc/DTncMtpbBv2F9YAK+J5G+ GnnFeFvFF39XS9Gj24+iu8x7wmtfxF+NXpNFtKIw/PgAzRtFMIbZrAVFfIzGYg8H4hsTUbvlK EfFGh/MArWNhbBYK4i0NohsUAN5SIxtu+nHmHBGxVMA/KMJ9KMXktI5S0lmhiyh9LlJmAI0Ls gRO//0giyg9UP+oZQDGKKbZjg5sVrC1Kk8Jqtpt2Yt1eagMA9O6nLTcb/l23ey9kpWZ5FosXR EC7wHNK6RbNrfM7wEonaYwspl5MU/yuwyDKKyimViJYkb6Y2j/JpQTQVfdF0p4k4NdAX0ANHM /5vf1Ef7QurSyN681F/4DWu6b+wzAM7dcoSPBr1gp X-UI-Loop:V01:m3GVm1at7Qs=:aDQTB9/D385LktVD2cAZujKM+4ykrV/9rLQsglY7wrM= Status: R X-Status: X-Keywords: X-UID: 7296 On Thu, Oct 24, 2013 at 3:06 PM, Paulo Roberto Massa Cereda wrote: > What sometimes comes back to haunt me is the fact the in a DFA we have a > transition function which has to be total by definition, so we might end up > with a converted DFA which has 2^n states in the worst case, an the number > of transition might end up being way greater than the number of states. In my experience, practical cases never get the exponential state-space explosion. An example that does: "the language of strings over the alphabet {0,1} in which there are at least n characters, the n-th from last of which is 1" (Wikipedia). Keep in mind, though, that the using a NFA for this (with n+1 states), does *not* get rid of this complexity, but merely moves it from compile-time to run-time, where 2^n active paths will need to be remembered. (No one should be using a regular expression for such a task anyway.) > I think we have benefits and shortcomings in both situations. I'm pro DFA, > but the construction is quite time-consuming. I've been thinking about that. Why not memoize a regex-to-minimized-DFA translation in an auxiliary file? That way you only have to compile a new regex once, even across LaTeX runs. By the way, I have to admit I'm not sure under which optimizations sub-group capturing can survive. > One possible alternative is to > use a NFA and applying Thompson's construction or a similar algorithm I assume this is what l3regex already does, to construct the NFA. -- www.mhelvens.net