Received: from mx0.gmx.net (mx0.gmx.net [213.165.64.100]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with SMTP id p9H1dFZA009254 for ; Mon, 17 Oct 2011 03:39:16 +0200 Received: (qmail 1810 invoked by alias); 17 Oct 2011 01:39:10 -0000 Delivered-To: GMX delivery to rainer.schoepf@gmx.net Received: (qmail invoked by alias); 17 Oct 2011 01:39:09 -0000 Received: from relay.uni-heidelberg.de (EHLO relay.uni-heidelberg.de) [129.206.100.212] by mx0.gmx.net (mx031) with SMTP; 17 Oct 2011 03:39:09 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9H1aqew010171 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Oct 2011 03:36:53 +0200 Received: from listserv.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9GM1aU7007247; Mon, 17 Oct 2011 03:36:52 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 1778622 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Mon, 17 Oct 2011 03:36:52 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9H1aq7p023723 for ; Mon, 17 Oct 2011 03:36:52 +0200 Received: from mail-ey0-f177.google.com (mail-ey0-f177.google.com [209.85.215.177]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id p9H1alOw012763 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Mon, 17 Oct 2011 03:36:51 +0200 Received: by eye3 with SMTP id 3so3113002eye.22 for ; Sun, 16 Oct 2011 18:36:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.5.3 with SMTP id 3mr20876958fat.4.1318815407464; Sun, 16 Oct 2011 18:36:47 -0700 (PDT) Received: by 10.152.4.193 with HTTP; Sun, 16 Oct 2011 18:36:47 -0700 (PDT) References: <4E9AF462.1010401@morningstar2.co.uk> Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Whitelist: Message-ID: Date: Sun, 16 Oct 2011 21:36:47 -0400 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Bruno Le Floch Subject: Re: Strings, and regular expressions To: LATEX-L@listserv.uni-heidelberg.de In-Reply-To: <4E9AF462.1010401@morningstar2.co.uk> Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-GMX-Antispam: 0 (eXpurgate); Detail=5D7Q89H36p5x1RWm4Ldx8pHNe5ytInNcF36Xo58Jm5VT5o1gTzFEfTojhQb0iNIehqaZH bLx7Jy0KQpZrh1sRkCEfjwZOheMue0R/SqRtrNKi6sXLJvBYWPZjjmYaKIZ0DU7ZdzEVRJN4UON0 8dCcfk/H++MyCUPVi1quiOjNy2IqVYbG+N4XAIy3Ite09rgfLv4DTYhbhA=V1; X-Resent-By: Forwarder X-Resent-For: rainer.schoepf@gmx.net X-Resent-To: rainer@rainer-schoepf.de Status: R X-Status: X-Keywords: X-UID: 6950 On 10/16/11, Joseph Wright wrote: > On 10/10/2011 16:07, Bruno Le Floch wrote: >> The l3str module provides functions to get the length of a string, >> extract substrings or individual characters, testing for string >> equality (the curent \str_if_eq:nnTF). Some support for encodings is >> provided: percent encoding, conversion from utf-8 to a string of >> bytes, and most functions of Heiko Oberdiek's pdfescape package. >>hly welcome. > > Some comments having read the code and documentation. Thank you Joseph for the cleanup. > I don't like the name in \str_from_to:nnn - it sounds like a copy > function. What's wrong with \str_substr:nnn or just \str_sub:nnn? I couldn't think of an unambiguous name. \str_substr:nnn is fine. > In the same function, the indexing is described as "\meta{start index} > (inclusive) and \meta{end index} (exclusive)". This seems very odd to me > - I'd expect > > \str_from_to:nnn { abcdef } { 1 } { 4 } > > to leave "bcde" in the input stream. I followed the python convention, in which you think of the index as lying between pairs of characters: (0)a(1)b(2)c(3)d(4)e(5)f(6) Hence, extracting from 1 to 4 gives "bcd". The advantage of doing it that way is that the length of what you get is \(4 - 1\). Another advantage is that getting the first characters is easy: \str_substr:nnn { } { 0 } { }. A drawback is that getting all characters from a given point to the end is \str_substr:nnn { } { } { \c_max_int } rather than \str_substr:nnn { } { } { -1 }. Does that make sense? > What's the reasoning for "\str_if_contains_char:NN" rather than just > "\str_if_in:NN"? The second N argument is not enough to know whether you expect a char or a string variable. Should I code an expandable \str_if_in:nn? > I see you have a number of "UTF_viii" functions. I can see that you are > covering any confusion with UTF-16, but would simply "UTF" be better? No, although I do agree that "UTF_viii" is long :(. We will need utf-16 to deal with PDF, as Heiko pointed out in a previous email. Perhaps we should drop support for utf-8 and instead only support utf-16? > I also saw that the docs mentioned "\str_if_UTF_viii:N", which does not > exist. I've removed it, as I think the docs and the code should match as > much as possible. Yes. I never got to implementing it :). Should we lower-case "utf" in function names? -- Bruno