Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by h1439878.stratoserver.net (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id t18CcSgq018322 for ; Sun, 8 Feb 2015 13:38:29 +0100 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx106) with ESMTPS (Nemesis) id 0MTeqC-1YBbRO1mNM-00QVcW for ; Sun, 08 Feb 2015 13:38:22 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id t18CaKu2009742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 8 Feb 2015 13:36:20 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id t188H0SD005956; Sun, 8 Feb 2015 13:36:20 +0100 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 11607366 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Sun, 8 Feb 2015 13:36:19 +0100 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id t18CaJcl020870 for ; Sun, 8 Feb 2015 13:36:19 +0100 Received: from smtp2.easily.co.uk (smtp2.easily.co.uk [91.194.151.17]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id t18Ca6FR013326 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 8 Feb 2015 13:36:09 +0100 Received: from [86.155.215.115] (port=50773 helo=palladium.local) by smtp2.easily.co.uk with esmtpa (Exim 4.43) id 1YKR5p-0004Db-Ri for latex-l@listserv.uni-heidelberg.de; Sun, 08 Feb 2015 12:36:05 +0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Message-ID: <54D75835.1070804@morningstar2.co.uk> Date: Sun, 8 Feb 2015 12:36:05 +0000 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Joseph Wright Subject: expl3 case changing functions To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-MIME-Autoconverted: from 8bit to quoted-printable by relay.uni-heidelberg.de id t18CaKu2009742 Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:OuGi/fzbdik=:ALE7npIf5VBxZEqelfCSGViqXV Ys7YQEN+GAjbHPqTueNjrNrLzrJNFgqfSmAcUATbhzSRQ7JXyuipyxZnJOMn8CPFfys5LI+xv hYv4h97KvyUSVKiKW6EnhDfTvcfZpgcFg46IigngNNA+ixKJuWBYR3MhlWbsAdle6WvyGsVo8 DtTIprFPGOQqLhbQzzrT+fjzMlVudm2kYkwBJ0oDWk/hhig6zoQEs1Yt6hMIO6uLuJVDh3otU XnhvP18mvcFv0yioiuao9JlYEsWJTQfmsRu7T0Q/Z2QRxnJdov4Iq840wRiSn1f2mzZiX3/4j /N/M1GmnM6VKt3Y8Xf75vROYi+8Kgjcj8+pxEqbx6yYCor25y7ekNhc1oH5o60hiKVKwe8Qlv wKd0D8t5n8mKPwNO/yVY0Geeaprj2METzf3zUcBKtw0P0SOxu8cn+ptDT2w43Xa8MivvbjRNP Wul9EZZuEWsHX8nabXaOX1v9gV8gPnEbMFPI5BtMcZBzGrOE4Y/E0KD552vgrSQQMOk1Fz4ky kfFm/o8297g2hv19dKB+UuCZyuCzt6YR8pHMz7AZnCQ71q5XTb5nRQIi9dXnQLag+XC0obZrc NU76yu2nrN4NFnSezqcoC0gu3jkXvnUe8KLDjbA7etK03wgYVO9QA6EMUusvRd9+rcxsGnlqd QjV7b48AfUVYF2pxK1vGc8hXfD2jHW6572NmjYHFV1r1JRcnLHfSdR+DpXiZ49Yu3oJS9Wj0A NBpw0hqHf+LtmRHbodMBWVF11emICDxEaRLKiElglzvpBviSyEA2Ziz5WDEPWAcib315zCUuv RjsQONOv2VonC9yOMXOEU6tUVTqUtYoFPn8shluwWnYRiFnFkcd5qCs4VjYVYwSnAVTw7Kral TpxS/0RPdw1gBhiIPBzWa11ToLdtXW9xl0+2DKG/CFmuY9QVmdKjRValWZ5FVObToXrxxmqOp ezf+VKYO8bPP9PU/0WwiLqvt3nqMxKDsDHr/OR6gQkqtchuvWVzoHb/3fcF2FUiRF79PApuVk hstHK+96BesIHSZakZQAc5AkAMMTz8lFoWr298JQtCeVAxmIeMhusMFj9xySY3rb6XLj3LTg1 UxtOR2tu7Dv3kC3FQDNTRHpaEfKMxDIlPKpp4ij0CnnGW2n1sxa7CPu+iPTiiabAbmgGRTAyT FZWVw06ltSnAav9ICyGM4YjR5s3vdBdkpd6BysnzqAVJNU2cIff3tSqAA3Js9wHolgLRssbZN uYJpNBdbHb5iFbUxmt92pWvmcUYwQ9z1tbm/rRjftAPCccr0OcuLobzgnnnOEiNevvcnf0lMn TO2vz65PglVJ01Fp3MSlCWwuwZI++FMxCX62QwwYkvAB1yA3oI7PzZkKaKsxatzhN+2G3wyhn S7LrBVWXAluwvxS0NmSTiJpXbsGkhfrNA98YySFKC6XCEXxJrnp5aMkPMpt1LqPQjjwPdwd9h XtjOQj52J/Ay25MtEPjUL6KU9b8LC7XTrssk6d0Wt/Auuz9+TBnDNLIJ+YMXGU2b4nFr/bDQ= = X-UI-Loop:V01:QPl1BelsoYM=:fKG6HaYSWS1YZwUsrUZFNs5ruMFo14eOAWsQVO1PsiQ= X-UI-Out-Filterresults: notjunk:1; X-Scanned-By: MIMEDefang 2.71 on 85.214.41.38 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by h1439878.stratoserver.net id t18CcSgq018322 Status: R X-Status: X-Keywords: X-UID: 7644 Hello all, A few months ago now we added various expandable case changing functions to expl3 with clearly 'experimental' status. I've recently had some useful feedback on aspects of the behaviour and have revised some of the code. I've now got some more questions, so thought it would be useful to raise those here. (Note: I've updated the SVN code but this has yet to go to CTAN. I can arrange a release if people want to test but not grab via GitHub.) *Background* The current implementation has six functions \tl_upper_case:n \tl_lower_case:n \tl_mixed_case:n \tl_upper_case:nn \tl_lower_case:nn \tl_mixed_case:nn where the two-argument versions deal with language-specific case changing. The functions are x-type expandable. 'Letters' can be case changed from the full Unicode range when using XeTeX/LuaTeX and the mappings do not have to be 1-1 (cf. \uppercase/\lowercase). There is also \str_fold_case:n which does folding for programmatic applications. That function has a different set of use cases and is not considered further here. *Escaping from case changing* The current implementation follows a BibTeX-like convention for preventing case changing: braced content is not changed. In the original approach there was no mechanism to do case changing inside the argument to a command as a result. I have now altered this to include a list of commands where case changing should be applied, so for example it would be possible to arrange that \tl_upper_case:n { Hello~\emph{world} } will case change the argument to \emph. At present, this functionality is designed to work with commands taking one argument (i.e. a second or subsequent argument will be unaffected). The alternative to such an approach is to case change everything and provide an escape mechanism (cf. the textcase package and \NoChangeCase). As a user, I can see advantages to both approaches. One thing that is not currently covered is dealing automatically with math mode content. That is doable but would require some consistent interface. In particular, while dealing with "$ ... $" and "\( ... \)" is straight-forward (single-token delimiters), it would be more challenging to cover "\begin{math} ... \end{math}" or similar. Some of this has a relationship to expandability: see the next area. *Expandability* The current implementation is expandable as this allows the 'natural' usage \tl_set:Nx \l_tmpa_tl { \tl_upper_case:n { foo } } \tl_show:N \l_tmpa_tl % => "FOO" Expandablity imposes some restrictions on the code and does have a performance knock-on. The need to deal with changes that are not 1-1 or have other context-dependence means that the performance aspect is not so important: a full solution using \uppercase/\lowercase would still require a mapping or similar to deal with all of the possibilities. One area that is more tricky in this regard is input which is not fully expanded. For example \def\myname{Joseph Wright} \MakeUppercase{Written by \myname} will yield "WRITTEN BY JOSEPH WRIGHT" as there is an \edef inside the LaTeX2e command before case changing. In contrast, the expl3 functions currently do no expansion so \tl_upper_case:n { Written~by~\myname } gives "WRITTEN BY Joseph Wright". Notably, if used in setting a token list the content would be "WRITTEN BY \myname", i.e. further expansion is inhibited. It is not clear to me what the 'expected' outcome might be. It would be possible to use f-type expansion to deal with stored tokens before case changing, but for input such as \tl_upper_case:n { Written~by \\ Joseph~Wright } that could break outcomes with LaTeX2e: \\ would be 'lost' and this would could problematic if the text was used later in for example a center environment. A non-expandable implementation could use the same logic as \MakeUppercase but at the cost that case changing for storage would then need dedicated functions for example \tl_set_upper_case:Nn \tl_set_lower_case:Nnn This looses the 'natural' approach to case changing inside a tl setting and requires separate 'set a tl with case changing' and 'typeset case changed text' functions. *LICR/Non-native input* The original implementation for the expl3 functions only case changes letters. Adding an 'escape' to cover e.g. \emph also allows coverage of things like "\'{e}" and so it was natural to consider LICR input. I have therefore extended the code to allow coverage of everything handled by \MakeUppercase when T1/T2A/T2B/T2C/T4/T5/LGR encodings are in use. There is of course a performance hit, but this should be comparable to that for processing letters. That then leaves the question of input outside of the ASCII range when using pdfTeX. It would I think be possible to do this using an approach detecting inputenc active chars, but I am reluctant to go this way (in the longer term it will be increasingly hard to justify using a 8-bit program as the world standardises on Unicode). With inputenc loaded case changing does work if the input goes via LICR \documentclass{article} \usepackage[utf8]{inputenc} \usepackage{expl3} \makeatletter \ExplSyntaxOn \cs_generate_variant:Nn \tl_upper_case:n { V } \cs_new_protected:Npn \MakeExplUpperCase #1 { \group_begin: \protected@edef \l_tmpa_tl {#1} \tl_upper_case:V \l_tmpa_tl \group_end: } \ExplSyntaxOff \makeatother \begin{document} \MakeExplUpperCase{Héllo} \end{document} Again, this has a link to expandability. *Naming* As noted in previous mails on this topic, the naming here (\tl_...) at least in part reflects the fact this code is difficult name. Any better naming schemes welcome! *Conclusions* The current code works but there are open questions. What I am hoping for is feedback on the ideas and in particular what issues come up with real use cases. Ideas about all or any of the above, or indeed other aspects, most welcome. -- Joseph Wright