Received: from mx0.gmx.net (mx0.gmx.net [213.165.64.100]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with SMTP id p9ALh76P020068 for ; Mon, 10 Oct 2011 23:43:08 +0200 Received: (qmail 6170 invoked by alias); 10 Oct 2011 21:43:02 -0000 Delivered-To: GMX delivery to rainer.schoepf@gmx.net Received: (qmail invoked by alias); 10 Oct 2011 21:43:02 -0000 Received: from relay.uni-heidelberg.de (EHLO relay.uni-heidelberg.de) [129.206.100.212] by mx0.gmx.net (mx085) with SMTP; 10 Oct 2011 23:43:02 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9ALekBw000530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 10 Oct 2011 23:40:46 +0200 Received: from listserv.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9AHPahH001513; Mon, 10 Oct 2011 23:40:45 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 1775078 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Mon, 10 Oct 2011 23:40:45 +0200 Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9ALejFr001711 for ; Mon, 10 Oct 2011 23:40:45 +0200 Received: from csep02.cliche.se (csep02.cliche.se [195.249.40.184]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9ALeVHe000477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 10 Oct 2011 23:40:35 +0200 Received: from nova-2.local (c-94-255-156-147.cust.bredband2.com [94.255.156.147]) by csep02.cliche.se (Postfix) with ESMTP id 6E99F1865E0 for ; Mon, 10 Oct 2011 23:40:29 +0200 (CEST) User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; sv-SE; rv:1.9.2.22) Gecko/20110902 Thunderbird/3.1.14 MIME-Version: 1.0 References: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by listserv.uni-heidelberg.de id p9ALejFr001712 Message-ID: <4E93664D.7090105@residenset.net> Date: Mon, 10 Oct 2011 23:40:29 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: =?ISO-8859-1?Q?Lars_Hellstr=F6m?= Subject: Re: Strings, and regular expressions To: LATEX-L@listserv.uni-heidelberg.de In-Reply-To: Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-GMX-Antispam: 0 (eXpurgate); Detail=5D7Q89H36p6NW8kflD/cST5nYVQ7iQWSQ9RkUH3vwA9c/Mov+elzPdkXohJS/m3S6oVBa MaPpYsxbOEGuU+kfdLK8Zvcza0NsfUdYtlzWsRp0wLkZuvqnvuu7RXaiUkMnjAeysNvC0UNFyv3Z I9OZiEXtbtPof26Ixo2pBNGssFu0t+GXqDnJq0YOJi4qXE5iJyzIKhYnCI=V1; X-Resent-By: Forwarder X-Resent-For: rainer.schoepf@gmx.net X-Resent-To: rainer@rainer-schoepf.de Status: R X-Status: X-Keywords: X-UID: 6910 Bruno Le Floch skrev 2011-10-10 17.07: > Hello all, > > We just added on CTAN two related modules: Did you by any chance mean SVN rather than CTAN? (Or maybe there is a delay in changes becoming visible.) [snip] > The l3regex module allows for testing if a string matches a given > regular expression, counting matches, extracting submatches, splitting > at occurrences of a regular expression, Sometimes one goes the other way, and returns all matches of a string. I wouldn't go so far as saying that this is more useful than splitting at things that match, however. >and doing replacement (see > documentation for function names). > > Speed requirements forbid a back-tracking approach, hence > back-references cannot be supported. Only "truly regular" features are > implemented. Having browsed the code now, I see that you execute the automaton in general NFA form. Have you considered executing it in epsilon-free NFA form instead? Getting rid of epsilons does not increase the number of states (it can even reduce the number of reachable states), but I'm not sure how it would interact with match path priority... [snip] > - Newlines. Currently, "." matches every character; in perl and PCRE > it should not match new lines. As you know, the situation with new > lines in TeX is a little bit odd, since they are converted to the > \endlinechar upon reading, and normally not tokenized, simply giving > rise to a space or a \par. Should we still decide that "." does not > match the CR nor LF characters? Or should it simply no match the > \endlinechar? The way I'm used to regexps, you can control using "metasyntax" whether . matches newline or not, and likewise how ^ behaves with respect to beginning-of-line, so I wouldn't find it wrong if ^ and \A are synonyms. > - Same question for caseless matching, and for look-ahead/look-behind > assertions. Can you even do assertions without exploding the number of states? (Whenever I've thought about it, I tend to end up with the conclusion that one cannot, but I might be missing something.) Lars Hellström