Received: from mx0.gmx.net (mx0.gmx.net [213.165.64.100]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with SMTP id p9D89Rwk003910 for ; Thu, 13 Oct 2011 10:09:29 +0200 Received: (qmail 32228 invoked by alias); 13 Oct 2011 08:09:22 -0000 Delivered-To: GMX delivery to rainer.schoepf@gmx.net Received: (qmail invoked by alias); 13 Oct 2011 08:09:22 -0000 Received: from relay.uni-heidelberg.de (EHLO relay.uni-heidelberg.de) [129.206.100.212] by mx0.gmx.net (mx037) with SMTP; 13 Oct 2011 10:09:22 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9D86cpA030228 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Oct 2011 10:06:38 +0200 Received: from listserv.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9D7FjQo001703; Thu, 13 Oct 2011 10:06:37 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 1789587 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 13 Oct 2011 10:06:37 +0200 Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9D86bn4021064 for ; Thu, 13 Oct 2011 10:06:37 +0200 Received: from ueamailgate01.uea.ac.uk (ueamailgate01.uea.ac.uk [139.222.131.184]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9D86Kw8030071 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 13 Oct 2011 10:06:24 +0200 Received: from ueams02.uea.ac.uk (ueams02.uea.ac.uk [139.222.131.131]) by ueamailgate01.uea.ac.uk (8.13.8/8.13.8) with ESMTP id p9D86KBQ011833 for ; Thu, 13 Oct 2011 09:06:20 +0100 Received: from [139.222.114.31] by ueams02.uea.ac.uk with esmtp (Exim 4.69) (envelope-from ) id 1REGG7-0002Ye-Gy for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 13 Oct 2011 09:03:19 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 References: <4E93664D.7090105@residenset.net> <7225.1318285652@cl.cam.ac.uk> <4E945FF9.1060803@residenset.net> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN, outgoing) X-CanIt-Geo: ip=139.222.131.131; country=GB; region=I9; city=Norwich; latitude=52.6333; longitude=1.3000; http://maps.google.com/maps?q=52.6333,1.3000&z=6 X-CanItPRO-Stream: UEA:outgoing (inherits from UEA:default,base:default) X-Canit-Stats-ID: 05FI86kDj - f3e54a6e7464 - 20111013 X-Scanned-By: CanIt (www . roaringpenguin . com) on 139.222.131.184 Message-ID: <4E969C01.2010604@morningstar2.co.uk> Date: Thu, 13 Oct 2011 09:06:25 +0100 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Joseph Wright Subject: Re: Strings, and regular expressions To: LATEX-L@listserv.uni-heidelberg.de In-Reply-To: Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-GMX-Antispam: 0 (Sender is in whitelist: joseph.wright@MORNINGSTAR2.CO.UK); Detail=5D7Q89H36p4L00VTXC6D4q0N+AH0PUCnBi0P5cROEGjO+pG7NAH/K+tf9SrVFtpLrKONl 2T9EL4W4U4jgzLbnCcGpk1z/zwmKT/K1fv3lD0=V1; X-Resent-By: Forwarder X-Resent-For: rainer.schoepf@gmx.net X-Resent-To: rainer@rainer-schoepf.de Status: R X-Status: X-Keywords: X-UID: 6934 On 13/10/2011 08:58, Will Robertson wrote: > On 12/10/2011, at 1:29 PM, Bruno Le Floch wrote: > >> For short strings (e.g., matching \d\d\d\d-\d\d-\d\d on 2011-10-11), >> one third of the time is spent on building the automaton from the >> regular expression, and two thirds on running the automaton. I don't >> know how important that is in practice. Two aspects: >> >> - providing it requires more code --- true >> >> - the N arguments may be confusing (e.g., some people may think that >> it expects the regex as a string variable) --- not such a problem >> because the variable is checked to indeed be a proper compiled regex. >> >> If the feeling is that it should go, then I'll remove that this weekend. > > I think it would be premature to remove it at this stage. In fact, if you were going to remove either of them, it'd make more sense to me to remove the slower inline versions. > > If you really thing the n/N distinction is confusing, what about a "currying" mechanism whereby regexes automatically create their prebuilt form and save it in an internal macro which is used for subsequent calls to the same regex? AS you say, I don't think removing stuff is the right way forward at this point. The reason for raising this was to be clear that the current approach is the best one. Having a set of n/N functions does seem to require quite a lot of variants in the documentation, so I wanted to be clear that this is best. As Will says, an alternative is simply to save all regexes automatically, and check for the existence of the regex before building it. That of course costs in terms of macros, so the question is how many regexes are likely to be used. (We are talking about a typesetting system, so really this should not normally be 100s.) On the other hand, perhaps the distinction is fine, and I'm worrying too much. -- Joseph Wright