Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by h1439878.stratoserver.net (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id t18MI0Ia029993 for ; Sun, 8 Feb 2015 23:18:01 +0100 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx012) with ESMTPS (Nemesis) id 0M2YQX-1XSD4R3vKX-00sOqF for ; Sun, 08 Feb 2015 23:17:55 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id t18MG2bI009234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 8 Feb 2015 23:16:02 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id t17N137Z030684; Sun, 8 Feb 2015 23:16:02 +0100 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 11613357 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Sun, 8 Feb 2015 23:16:02 +0100 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id t18MG2Ce001515 for ; Sun, 8 Feb 2015 23:16:02 +0100 Received: from smtp3.easily.co.uk (smtp3.easily.co.uk [91.194.151.18]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id t18MFkAJ007688 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 8 Feb 2015 23:15:48 +0100 Received: from [109.158.98.18] (port=58054 helo=Palladium.home) by smtp3.easily.co.uk with esmtpa (Exim 4.43) id 1YKa8n-0008Kv-MP for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Sun, 08 Feb 2015 22:15:45 +0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 References: <54D75835.1070804@morningstar2.co.uk> Content-Type: text/plain; charset=utf-8 Message-ID: <54D7E011.2080604@morningstar2.co.uk> Date: Sun, 8 Feb 2015 22:15:45 +0000 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Joseph Wright Subject: Re: expl3 case changing functions To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: <54D75835.1070804@morningstar2.co.uk> Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-MIME-Autoconverted: from 8bit to quoted-printable by relay.uni-heidelberg.de id t18MG2bI009234 Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:jkWAQFftAqg=:ipi79vzq0Q3vU7DQG1CTwPRkUl k8fYfNmY6vPi+v5jPBiCwxXmAgnsBQXl7ciVd4IXNhW/GBs2buembZrpe4bLvE+kMXI39R3WM 0uP65B1F7M/iMWoPbkPK7ma3a7ZT0iqBEKDnGlLrvxO7Uf8IqX/LFUUsBnyWISfHq2ZCfZntX LBoEI2Hw/fiuAMWZU5ZiHkg8Bv2eNTNnHRfTrHktbx/yOhd4xOlBE6d5/rVKyYLWHQ6HPE+OU X87swi2FARM6ErgmBVj6N9/PonLD4FE8QzTwntlulpYD+mQN2YynYAEc5uJiieRY9/0kPfZzb 799hOwuPJOh+FwXB8CFEQnasvf7M7qcSDl4EDTyRFd5SooWiy4Qy4X8C84XAD0AatRDcDTs+q vGQ3F56TDspjfX9CGlpHO50efJvzgaSJi1XvKzqSswqOEBJzKLYr7iImmkDuh3yFNEY4DVSzy S/a+GtNVoZxJYuB6cYh6xUZzC3edpS0UONd+O5iKxilq1t0+a5QZjIxGivfBltooN6jXh8tBr VjpeOImlhFpnyvbE2dWIcsVv5KBpS3zv4TNCMLfPgQi2SvLXUmUa/twbt/Hs7Iq0Q6cnW9Unx 8o+p3PfQW88fU/SG61BSaIG5If//qVkbnzuauN30hmweUxx037SSe002/oKIlzUgATo5MmWb7 u+Nx7UhVTyOc9KUwASDIT6132FkAVqFGuUHDmWwjA1fl6zWpNWrrRotYVU6UTSM0PQfkfSlDp lQJ37lCzIkVBC+poN9W23E8QrGERAuj+Be1Fr0tPjOiG16Bo4hK2FYUZtGLWlZSmD/sNhwnhu okH39pODgwevxbSMCyjU0jNgurxW9qyJA6YgBt2EzEYruc45oSOFVj8oeTr0j8pu4TZHyc++F r2LVSxSenLBxsDfiNfVPdxNWEgJw5USWelOqfytJk+BDiB5Sl7oCYkEUALGWMcxdVp6wxKwOv M+ariSH44g8/eIgFSL27RYtbqY2n6OUZQqowsUWkPFQZKH5aHe7MXv8BIaTonuGylbRTk2RQ6 YIaXGUKqR9WzG07NTYVU6X1/PefEyltVQB0RHBCbM1hrK4rkfgdK8Buk6IpMJeo4c9NxiyJbm tjxXtsEmLJpOvPC95+BdMZXKUNNkTfgpCtvhHGiZcm1SDu5st6auRjpZrDJbMdn+7gIG9ozsL mw4m+p/Q/kRIy9gX9Yb6TahLFVkjVJ9tsOcku+YfWIdySzVO978gg1GbE+UE1TWJNWrJvqYP6 PZABZHEOYbpFw1ypP8HAOtm75cSEXxEF46O5IOesHgvx+voPQ8vl4GicWhs5aoImxR/cX6C3X GPbgPYe68QHbYkJKlIMfbym3Dz0WazRN0g5IeoKnB4YD96qmd50RSfGtNS7dyjxKGhargiS79 2Gs089KVs0qonrJjSOmqq1OY6sQcrWieAnNTgEC+r6Qz2LIWYCj98oEEVSRjp3yU307JrPHaE Tt7mc56QWaZkN/aXpze7hmBKz0WhjgpQPWYV24m5zhXlpNTaXWoo3fNdf1yLh2Wf1HiNCWow= = X-UI-Loop:V01:wtsN5FAr2QI=:JCEt5fHQh5wSBw9zTZdXrHtew6DvtZW7FRltEd95Wr0= X-UI-Out-Filterresults: notjunk:1; X-Scanned-By: MIMEDefang 2.71 on 85.214.41.38 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by h1439878.stratoserver.net id t18MI0Ia029993 Status: R X-Status: X-Keywords: X-UID: 7645 On 08/02/2015 12:36, Joseph Wright wrote: > Hello all, > > A few months ago now we added various expandable case changing functions > to expl3 with clearly 'experimental' status. I've recently had some > useful feedback on aspects of the behaviour and have revised some of the > code. I've now got some more questions, so thought it would be useful to > raise those here. (Note: I've updated the SVN code but this has yet to > go to CTAN. I can arrange a release if people want to test but not grab > via GitHub.) > > *Background* > > The current implementation has six functions > > \tl_upper_case:n > \tl_lower_case:n > \tl_mixed_case:n > \tl_upper_case:nn > \tl_lower_case:nn > \tl_mixed_case:nn > > where the two-argument versions deal with language-specific case > changing. The functions are x-type expandable. 'Letters' can be case > changed from the full Unicode range when using XeTeX/LuaTeX and the > mappings do not have to be 1-1 (cf. \uppercase/\lowercase). > > There is also \str_fold_case:n which does folding for programmatic > applications. That function has a different set of use cases and is not > considered further here. > > *Escaping from case changing* > > The current implementation follows a BibTeX-like convention for > preventing case changing: braced content is not changed. In the original > approach there was no mechanism to do case changing inside the argument > to a command as a result. I have now altered this to include a list of > commands where case changing should be applied, so for example it would > be possible to arrange that > > \tl_upper_case:n { Hello~\emph{world} } > > will case change the argument to \emph. At present, this functionality > is designed to work with commands taking one argument (i.e. a second or > subsequent argument will be unaffected). > > The alternative to such an approach is to case change everything and > provide an escape mechanism (cf. the textcase package and > \NoChangeCase). As a user, I can see advantages to both approaches. > > One thing that is not currently covered is dealing automatically with > math mode content. That is doable but would require some consistent > interface. In particular, while dealing with "$ ... $" and "\( ... \)" > is straight-forward (single-token delimiters), it would be more > challenging to cover "\begin{math} ... \end{math}" or similar. Some of > this has a relationship to expandability: see the next area. > > *Expandability* > > The current implementation is expandable as this allows the 'natural' usage > > \tl_set:Nx \l_tmpa_tl > { \tl_upper_case:n { foo } } > \tl_show:N \l_tmpa_tl % => "FOO" > > Expandablity imposes some restrictions on the code and does have a > performance knock-on. The need to deal with changes that are not 1-1 or > have other context-dependence means that the performance aspect is not > so important: a full solution using \uppercase/\lowercase would still > require a mapping or similar to deal with all of the possibilities. > > One area that is more tricky in this regard is input which is not fully > expanded. For example > > \def\myname{Joseph Wright} > \MakeUppercase{Written by \myname} > > will yield "WRITTEN BY JOSEPH WRIGHT" as there is an \edef inside the > LaTeX2e command before case changing. In contrast, the expl3 functions > currently do no expansion so > > \tl_upper_case:n { Written~by~\myname } > > gives "WRITTEN BY Joseph Wright". Notably, if used in setting a token > list the content would be "WRITTEN BY \myname", i.e. further expansion > is inhibited. > > It is not clear to me what the 'expected' outcome might be. It would be > possible to use f-type expansion to deal with stored tokens before case > changing, but for input such as > > \tl_upper_case:n { Written~by \\ Joseph~Wright } > > that could break outcomes with LaTeX2e: \\ would be 'lost' and this > would could problematic if the text was used later in for example a > center environment. A non-expandable implementation could use the same > logic as \MakeUppercase but at the cost that case changing for storage > would then need dedicated functions for example > > \tl_set_upper_case:Nn > \tl_set_lower_case:Nnn > > This looses the 'natural' approach to case changing inside a tl setting > and requires separate 'set a tl with case changing' and 'typeset case > changed text' functions. > > *LICR/Non-native input* > > The original implementation for the expl3 functions only case changes > letters. Adding an 'escape' to cover e.g. \emph also allows coverage of > things like "\'{e}" and so it was natural to consider LICR input. I have > therefore extended the code to allow coverage of everything handled by > \MakeUppercase when T1/T2A/T2B/T2C/T4/T5/LGR encodings are in use. There > is of course a performance hit, but this should be comparable to that > for processing letters. > > That then leaves the question of input outside of the ASCII range when > using pdfTeX. It would I think be possible to do this using an approach > detecting inputenc active chars, but I am reluctant to go this way (in > the longer term it will be increasingly hard to justify using a 8-bit > program as the world standardises on Unicode). With inputenc loaded case > changing does work if the input goes via LICR > > \documentclass{article} > \usepackage[utf8]{inputenc} > \usepackage{expl3} > \makeatletter > \ExplSyntaxOn > \cs_generate_variant:Nn \tl_upper_case:n { V } > \cs_new_protected:Npn \MakeExplUpperCase #1 > { > \group_begin: > \protected@edef \l_tmpa_tl {#1} > \tl_upper_case:V \l_tmpa_tl > \group_end: > } > \ExplSyntaxOff > \makeatother > \begin{document} > \MakeExplUpperCase{Héllo} > \end{document} > > Again, this has a link to expandability. > > *Naming* > > As noted in previous mails on this topic, the naming here (\tl_...) at > least in part reflects the fact this code is difficult name. Any better > naming schemes welcome! > > *Conclusions* > > The current code works but there are open questions. What I am hoping > for is feedback on the ideas and in particular what issues come up with > real use cases. Ideas about all or any of the above, or indeed other > aspects, most welcome. I've had some feedback via other channels and will summarise here 'for the record'. (Sources: transcript http://chat.stackexchange.com/transcript/message/19958526#19958526 onward and direct mail.) *Escaping from case changing* David Carlisle points out that using the BibTeX-like approach leaves a problem with ligatures. Whilst input such as {Text} rather than {T}ext does help, the alternative route taken by textcase \NoChangeCase{Text} allows for the 'escape' mechanism to be entirely transparent at the typesetting stage (as the appropriate commands can be equivalent to \use:n). Barbara Beeton provides a useful example where a brace group is 'trapped' inside a word with the BibTeX-like scheme as for example MacArthur => MacARTHUR requires input M{ac}Arthur with the current set up and this cannot be done to avoid a ligature break. I am therefore minded to alter the approach in this area to follow textcase: such a change will if done include adding a sensible set of standard commands to the 'ignore list' (\label, \ref, ...). Adopting a texcase-like approach also suggests that automatically handling math mode might be desirable: a first pass for that might well be based on matching single-token delimiters ($...$/\(...\) as standard settings) with logic that more complex arrangements will be best covered by the \NoChangeCase concept. *Expandability* One approach suggested (again by David C.) to this area is to start with an assumption of e-TeX (\robustify for the etoolbox package for example can be used to make existing commands e-TeX protected). With that assumption, it is relatively straight-forward to expand 'variable-like' macros and leave 'command-like' ones alone. (I already have code that does much the same in siunitx.) Retaining an expandable approach does seem sensible as it allows what many other languages do: case changing in a 'functional' sense (or rather as a macro language in an x-type expansion sense). As already noted, the need for contextual case mappings means that using the TeX primitives directly still requires a separate mapping phase and so performance issues are not so significant. *LICR/Non-native input* As the code here is being developed primarily for use to support future work, and that will increasingly mean Unicode-native engines, comments here suggest sticking to the 'ASCII/Unicode' line taken to date. As such, pdfTeX use with non-ASCII input will need pre-processing via \protected@edef as suggested to produce LICR data which can be handled correctly. Depending on other feedback, I will likely implement the above changes over the coming days and then look to update the release code. -- Joseph Wright