Received: from mx0.gmx.net (mx0.gmx.net [213.165.64.100]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with SMTP id p9C9lSvV005493 for ; Wed, 12 Oct 2011 11:47:29 +0200 Received: (qmail 16939 invoked by alias); 12 Oct 2011 09:47:20 -0000 Delivered-To: GMX delivery to rainer.schoepf@gmx.net Received: (qmail invoked by alias); 12 Oct 2011 09:47:19 -0000 Received: from relay.uni-heidelberg.de (EHLO relay.uni-heidelberg.de) [129.206.100.212] by mx0.gmx.net (mx047) with SMTP; 12 Oct 2011 11:47:19 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9C9iWcE016429 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 12 Oct 2011 11:44:32 +0200 Received: from listserv.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9C96ijM003205; Wed, 12 Oct 2011 11:44:31 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 1798423 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Wed, 12 Oct 2011 11:44:31 +0200 Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9C9iVeo006380 for ; Wed, 12 Oct 2011 11:44:31 +0200 Received: from mail-ey0-f177.google.com (mail-ey0-f177.google.com [209.85.215.177]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9C9iF6V016324 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Wed, 12 Oct 2011 11:44:19 +0200 Received: by eye3 with SMTP id 3so789594eye.22 for ; Wed, 12 Oct 2011 02:44:15 -0700 (PDT) Received: by 10.223.76.11 with SMTP id a11mr40633626fak.1.1318412655615; Wed, 12 Oct 2011 02:44:15 -0700 (PDT) Received: from irwin.vpn.uni-freiburg.de (p548077AA.dip.t-dialin.net. [84.128.119.170]) by mx.google.com with ESMTPS id n1sm2830872fad.20.2011.10.12.02.44.13 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 12 Oct 2011 02:44:14 -0700 (PDT) Received: by irwin.vpn.uni-freiburg.de (Postfix, from userid 500) id DF7CB14F6C; Wed, 12 Oct 2011 11:39:58 +0200 (CEST) Mail-Followup-To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE References: <4E945D77.6090309@morningstar2.co.uk> <20111011153219.GA3677@oberdiek.my-fqdn.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Message-ID: <20111012093958.GA9734@oberdiek.my-fqdn.de> Date: Wed, 12 Oct 2011 11:39:58 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Heiko Oberdiek Subject: Re: Strings, and regular expressions To: LATEX-L@listserv.uni-heidelberg.de In-Reply-To: Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-GMX-Antispam: 0 (eXpurgate); Detail=5D7Q89H36p4wKdcM+Mw5ofD1NDlpc7xKS5vd3tIo1w4EsQrS+0OYrVxkKK3b3O2Ao9bCu RNZ+wjCsoal2lkoz60muFr96z/P9Ad/WIJXS5RV4rhuYkl93bVJ7gOv3YvzlTV/K4Wzgi6Vkt5iH xMn7FDVapmjbCZskYsISDGN6HTJ+0wFHfB3mB4RQhOypqSGBL1LOB0hf0A=V1; X-Resent-By: Forwarder X-Resent-For: rainer.schoepf@gmx.net X-Resent-To: rainer@rainer-schoepf.de Status: R X-Status: X-Keywords: X-UID: 6920 On Tue, Oct 11, 2011 at 11:07:13PM -0400, Bruno Le Floch wrote: > > hyperref already reencodes bookmark strings with setting > > pdfencoding=auto. The bookmark string is construced in > > Unicode encoding. Then the reencoding to PDFDocEncoding is tried. > > If successful the result string is used, otherwise the Unicode string. > > For the reencoding stuff package stringenc is used and don't need > > to be expandable for hyperref. > > Thank you Heiko. The stringenc package provides _many_ different > encodings. Can you point me to which are useful for pdf purposes? Most important for PDF strings: * PDFDocEncoding * UTF-16 (hyperref also uses "ascii-print" in case of XeTeX because of encoding problems with \special.) > I guess that most "iso-..." and "cp..." encodings are an overkill for > a kernel. They should be loadable as files similar to LaTeX's .def files for inputenc or fontenc. Then the kernel can provide a base set and others can be provided by other projects. But I don't see the disadvantage if such a base set is not minimal. Then, when strings are written to PS/PDF, they need further escaping: * String escaping, provided by \pdfescapestring. * Name escaping, provided by \pdfescapename. * Hex strings, provided by \pdfescapehex. The latter is also useful for other contexts, e.g. for protecting arbitrary string data in auxiliary files. As hex string special characters like '{', '}', '\', '#', ... do not harm. These pdfTeX features are provided for LuaTeX in package `pdftexcmds' and package `pdfescape' provides the features for other engines. > Also, when you say "Unicode encoding", I presume that this means > native strings for XeTeX and LuaTeX, but what about pdfTeX? Do you use > "UTF-16" (if so, LE or BE?), or some other UTF? In the context of bookmarks and other PDF strings "Unicode" means UTF-16 (hyperref uses BE, but there is a byte order mark). And the strings are a sequence of bytes. The big chars of XeTeX or LuaTeX don't help, because they get written as UTF-8. Yours sincerely Heiko Oberdiek