Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with ESMTP id s2RIr0P2022529 for ; Thu, 27 Mar 2014 19:53:01 +0100 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx008) with ESMTPS (Nemesis) id 0La3Sf-1Wv8Sg00dx-00liSl for ; Thu, 27 Mar 2014 19:52:55 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id s2RIoGMn020127 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 27 Mar 2014 19:50:16 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s2RHYVbo025240; Thu, 27 Mar 2014 19:50:15 +0100 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 10890784 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 27 Mar 2014 19:50:15 +0100 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s2RIoFFN027273 for ; Thu, 27 Mar 2014 19:50:15 +0100 Received: from csep02.cliche.se (csep02.cliche.se [195.249.40.184]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id s2RIo2Hq010836 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 27 Mar 2014 19:50:05 +0100 Received: from nova-2.local (unknown [130.243.94.123]) by csep02.cliche.se (Postfix) with ESMTPA id DF0D9728BE for ; Thu, 27 Mar 2014 19:49:58 +0100 (CET) User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; sv-SE; rv:1.9.2.28) Gecko/20120306 Thunderbird/3.1.20 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-MIME-Autoconverted: from quoted-printable to 8bit by listserv.uni-heidelberg.de id s2RIoFFN027274 Message-ID: <533473A9.3020401@residenset.net> Date: Thu, 27 Mar 2014 19:53:29 +0100 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: =?ISO-8859-1?Q?Lars_Hellstr=F6m?= Subject: harmless package public beta To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by relay.uni-heidelberg.de id s2RIoGMn020127 Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-GMX-Antivirus: 0 (no virus found) X-UI-Filterresults: notjunk:1;V01:K0:gz5si3a95n0=:ktDSgtDqhlhNqyYhSzac77qjQr ARRZnKEXAQ1lh/7kSFkPU4DS9ZdniSIekMExn+zBmVLBisxBrQm0kwX4i1dK2bG+5ARsvAP0P tiaDWcu68zmntasVd2YJziMyhjZ0E1Iy3epUMtmIeDp4n9KORlC4+5na8CHMNBYIfGw+r1UvV qK0tTxcWwAt+esOPSqLISvyTxIn7maXLR2kBGeWjdDt8A0AZE/eKoQTZjIrFWMyvX0blpvwpI Xv5Ov/G2DMj4krQCUsgCVupfYZPOQMk/1W5oxDcCd4LRW6EQQFq8MUqQNB3QBTR1i+cpGlpnq 0/EcSHoSwX0QwGLMrB9+wtAiS594kCLP8pfRii7VU2XlRMqSpj3eQCy2d+dhFmU5Y4po6j80H yctPCnZVqzXNSOilo9VD9M04/e6CJO50nxUr2H7lVTth92iTA1CvxtfVa+BuuzDx3+48NS9hk GNtJIJinFL6M78Qgw1ie8/XTBRLv2DGtE4d52F6azFjhSZGdVsCbnbddYiPesyQpxb7QokMTE Bq/UYF70JiP9Z9lwEj6WAkeahRtE+Kf72uNDJyp9CqDZy6GlQwf8QmOffixm5lOnPbQG6WT8f bBAbs9E3H0JHIp2dKKA2A341S6XN9Aqx0NWVXBijnpHnKVKd8S+BrhoN59LdZ8Bp9/loKvmAF wSs43MLiAotzmneI1ql6FD5yX92GeFVyS+Nu2WIgSYmXc7Ow3J/Lh9891QdS/WyZVnwSCYBIH c7cGTDq2Z8tzbik+fBRKN4xuNxnLmcN2ycDAGCUj+zv6qCuytFiMGWsbplcmjeGtJwhj+ilzV HQXBYnBHytgNAJIGbvpvNbEpfBoJDYfaUXrqsFtJK8Xvma4thJ23E+BUcw702yS7C1suLo2bA YyRjLsnb/LVuDqCNj6avEYNovIfSnlj4Z5doTZ30fQRrUCaR+2l7GjR4A2Dd0Z8uATDjO7AAH HMS2GS3H9Vn3POHa+oOwN6bcp4CB4n34kzcA0cYkeWS/Z3yqYUIPeBSy+/nUjQ6wD6QoiYIQm Fw6Lw/bBPKm0d2B60+t02MxVzN0JTAe+qeUPs6yBs+TgLvXTiKMtuatfPEPnY7ypZ6R1ssXFP 2ZWfiZYGWCJEer2n7mI8aZ2G+5weviiv19wpmweekhYkNkWu3WnXNmPEtnxBJ5xMr+3uhgVzG z35gUl9Uyhkv8OxIMQfY790XBZ/2QYgJYFeEcd0apJQMcTsG4fZcO5hkdTwOXZMqgxUXRRyss gjdx/tpMHxLa3lIVznthWHxgg5E0aVIDeD16tSTXmPbsVVaXulO72e27XopXSf7p27bwQBCVC klVwvHVKSgtYPxl++pibf4Wb8Q+Ha25pHml8ChktwbniXWMkGgVkN+r+GgkOzWzlgy6aH+/n4 x4PYw2maobQA06IXcEqNV2hBPZ0E/03Xp45KpwiNpnFjbi+8oQrPrmsEJtFLghw20goCjXhQM 3Q5BF67w== X-UI-Loop:V01:jpub/7iStUs=:NYo0PRjCzOlSN1GNhjlUCCGkOCbmiMKxzql4lUps88I= Status: R X-Status: X-Keywords: X-UID: 7350 As long-time subscribers of this list may perhaps recall, I have at times= in=20 the past written about the need for a proper representation of (input)=20 character strings, because documenting code means one needs to treat=20 identifiers and other character data as text even though they may contain= =20 all sorts of troublesome characters. Back in 2000 I released the xdoc2=20 package which has this as one of its functional areas, and it has served = me=20 well, but time has passed and this is one area that has been due for a=20 modernisation. What I'm now making a (hopefully brief) "public beta" release of is a=20 package called `harmless', which is a low-level package containing only=20 functions for turning user input into "harmless character strings", and=20 functions for making use of such harmless character strings. The package = can=20 (for the duration of the beta period, at least) be downloaded from http://www.mdh.se/polopoly_fs/1.57096!/Menu/general/column-content/attach= ment/harmless.zip and its typeset documentation is at http://www.mdh.se/polopoly_fs/1.57095!/Menu/general/column-content/attach= ment/harmless.pdf My reason for posting about it here on LATEX-L is twofold. For one, I thi= nk=20 it would be a good idea for the expl3 documenting system to migrate to th= is=20 more robust foundation, as I seem to recall there being identifiers here = and=20 there in expl3 that the present system utterly fails to handle correctly.= =20 The second, more immediate reason is however that I'd like a second opini= on=20 as to how well I've managed to follow the expl3 naming conventions; if th= ere=20 is something I misnamed, then it would be much better to fix it /before/=20 uploading a v1.0 to CTAN than after (even if that is mostly for the sake = of=20 principles; I don't expect a huge following anytime soon). And just to be clear: harmless is a LaTeX2e package rather than an expl3=20 package, and it does not require anything expl3, but it seeks to follow=20 expl3 coding conventions in the two respects of source catcodes and contr= ol=20 sequence naming. The first is a matter of forward compatibility, as there= =20 are plenty of places in the code where an extra space makes a lot of=20 difference, and getting all of them right when converting to an=20 ASCII-space-is-ignored \catcode setting (as would probably happen some da= y)=20 would be difficult; better then to use that setting from day one. And if=20 going that far with spaces and ~, one might as well do all of : and _ too= ,=20 and use them rather than @ when naming one's own programming level comman= ds.=20 (I did not, however, user expl3-style names for TeX primitives and LaTeX2= e=20 commands, because doing a bulk search-and-replace of control sequence nam= es=20 will instead be pretty straightforward when that day comes.) Features of the harmless package include: * Not dependent on \catcode changes. * Supports both 8-bit and Unicode character models (and the latter is n= ot=20 restricted to the BMP). * Can convert an arbitrary sequence of tokens into a harmless character= =20 string. * Supports both 8-bit and UTF-8 as input encoding (see also below on th= e=20 use of LICR commands as escapes for Unicode text) * Harmless character strings are robust: they can be written to a file = and=20 then \input again by TeX without distortion. * Harmless character strings can be typeset by way of LaTeX internal=20 character representation. * Harmless character strings can be converted to a number of "data"=20 formats, including: - PDFText - XML character data - raw character string - x-url-encoding - UTF-8 and UTF-16 - sanitized sequence of TeX character tokens (usable in a \csname) * Specific commands in text to be turned into harmless character string= s=20 may be used as escapes for hard-to-type things or meta items. Predefined=20 sets of escapes (not active by default, but possible to activate through = a=20 single command) include: - backslash + one of space, #, $, %, &, backslash, ^, {, and } for tha= t=20 particular character - LICR commands for accents and non-A--Z letters (accents make Unicode= =20 combining characters) - \texorpdfstring - accent+base combinations for combining characters Users can define additional escapes and meta items, using convenient inte= rface. * As an advanced feature, the body of an environment may be turned into= a=20 harmless character sequence. But that doesn't have a document level inter= face. That's not quite all the features, but it covers the bulk of it. So...=20 comments, anyone? Lars Hellstr=F6m PS: In case someone wonders "Why XML?", I might add that I sort-of promis= ed=20 that last year, in a proper research paper no less=20 (http://ceur-ws.org/Vol-1010/paper-22.pdf). That application of the harml= ess=20 package now has a working prototype=20 (http://openmath.org/pipermail/om/2014-March/001835.html), even if it is = far=20 from feature-complete.