MIME-Version: 1.0
References: <AANLkTinjH5Ppg2QWWBTDBBU9u3Vvdsjnrrr1OvPvWfnA@mail.gmail.com>
            <DA7C7068-6B67-41E9-87F2-21C9E58249A8@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Message-ID:  <AANLkTimhF5fTJJoT3eKHrd=DZE4kOmg60R62pvqxDwDX@mail.gmail.com>
Date:         Tue, 15 Feb 2011 03:22:06 -0500
Reply-To: Mailing list for the LaTeX3 project
              <LATEX-L@listserv.uni-heidelberg.de>
Sender: Mailing list for the LaTeX3 project <LATEX-L@listserv.uni-heidelberg.de>
From: Bruno Le Floch <blflatex@GMAIL.COM>
Subject: Re: Expandable versions of \uppercase, \MakeUppercase, \lowercase, \MakeLowercase
To: LATEX-L@listserv.uni-heidelberg.de
In-Reply-To:  <DA7C7068-6B67-41E9-87F2-21C9E58249A8@gmail.com>
Precedence: list
Status: R

> My personal opinion on uppercasing/lowercasing is that it should be a
> property of the font;

Both Will and Frank agree on this, but currently, in many fonts that's not
possible. Also, it is in fact possible to have an algorithm to produce
expandably
the result of {replacing some tokens by a corresponding macro} in a given tl.
Yes, macro: it can even take arguments.

For instance, with my current code (using a specific "case table"),

    \def\foo#1{arg=#1.}
    \expandsome{A\foo BC{\expandthis\foo{\B\expandthis\foo{A}} \D\E} !}

will expand in two steps to

    A\foo BC{arg={\B arg=A.}. \D\E} !

Also, we now have \expandafter:nw which expands the token after its
argument before carrying on with the argument. It works by
"\expandafter-casing" the first argument, namely, replacing every
token by "\expandafter<token>" (including braces and spaces).


> In terms of the algorithms to perform these operations, I prefer the way
> Joseph's code executes (e.g., keeping the number of csnames low) but I
> prefer the extensibility of Bruno's (although I suspect Bruno's is faster --
> but a better question to ask is whether Joseph's is too slow).

After some work, I realized that there are two points:
(1) whether to use many macros, or look at a bunch of cases for each character.
(2) whether to be careful with braces and spaces or not.

The second point allows us do what I mentionned above. The first point
is not necessary for this extensibility, and it will only play a role
in speed issues. We are talking about defining (26 + #accents) macros
for uppercase, and the same number for lowercase (although I guess
that with UTF8, this can become much bigger).

For a typical input (sentences, braced stuff) of 5000 tokens, with
\tracingall, a wordcount (lines, words, bytes) gives:

 2128102  7187359 67511843 Joseph-ULcase.log
  230901  1161005  8589159 ULcase.log

where ULcase.log is my current version with brace and space checking, and
Joseph-ULcase.log has no brace checking. My version could be optimized
significantly (2-3x) by using the fact that the replacement that we want for
each token takes no argument, but as I said, I want to stay general,
because it becomes much more powerful.


> something like "\prg_case_str:nVn {#1} \g_uc_replacements_tl { <else> }".

I think that it would work. And in fact Joseph's way combined with
some ideas I have had will allow us to have a

\tl_expand_some:nn {abca} { {a} {A} {b} {\use_ii_i:nn} }   => AAc

And in fact, we _should_ be able to replace #text at definition time
as well, allowing 9 _named_ arguments (I'm not taking this very
seriously ;-) ). Namely, replace #first by #1 and #second by #2 in the
following

\keyworddef\foo#first#second{arg1 is #first, arg2 is #second}


I don't know where I should put the code, so it is at
  http://users.aims.ac.za/~bruno/LaTeX/ULcase/ULcase.sty
Note that it is really just a plain TeX file with no \bye, compilable
with pdftex, pdflatex, etc.

-- 
Regards,
Bruno