Received: from mx0.gmx.net (mx0.gmx.net [213.165.64.100]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with SMTP id p9DAup27013573 for ; Thu, 13 Oct 2011 12:56:52 +0200 Received: (qmail 10551 invoked by alias); 13 Oct 2011 10:56:43 -0000 Delivered-To: GMX delivery to rainer.schoepf@gmx.net Received: (qmail invoked by alias); 13 Oct 2011 10:56:42 -0000 Received: from relay.uni-heidelberg.de (EHLO relay.uni-heidelberg.de) [129.206.100.212] by mx0.gmx.net (mx049) with SMTP; 13 Oct 2011 12:56:42 +0200 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id p9DAsOqB020874 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Oct 2011 12:54:24 +0200 Received: from listserv.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9DAnskG001703; Thu, 13 Oct 2011 12:54:23 +0200 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 1810448 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Thu, 13 Oct 2011 12:54:23 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id p9DAsNWC016976 for ; Thu, 13 Oct 2011 12:54:23 +0200 Received: from mail-ey0-f177.google.com (mail-ey0-f177.google.com [209.85.215.177]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id p9DAsAxS015519 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Thu, 13 Oct 2011 12:54:14 +0200 Received: by eye3 with SMTP id 3so2318987eye.22 for ; Thu, 13 Oct 2011 03:54:10 -0700 (PDT) Received: by 10.213.110.16 with SMTP id l16mr1281408ebp.59.1318503250039; Thu, 13 Oct 2011 03:54:10 -0700 (PDT) Received: from irwin.vpn.uni-freiburg.de (p54806125.dip.t-dialin.net. [84.128.97.37]) by mx.google.com with ESMTPS id w58sm13282527eeb.4.2011.10.13.03.54.08 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 13 Oct 2011 03:54:09 -0700 (PDT) Received: by irwin.vpn.uni-freiburg.de (Postfix, from userid 500) id E842714F4C; Thu, 13 Oct 2011 12:54:04 +0200 (CEST) Mail-Followup-To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE References: <4E945D77.6090309@morningstar2.co.uk> <20111011153219.GA3677@oberdiek.my-fqdn.de> <20111012093958.GA9734@oberdiek.my-fqdn.de> <20111012180101.GA17461@oberdiek.my-fqdn.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Message-ID: <20111013105404.GA32369@oberdiek.my-fqdn.de> Date: Thu, 13 Oct 2011 12:54:04 +0200 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Heiko Oberdiek Subject: Re: Strings, and regular expressions To: LATEX-L@listserv.uni-heidelberg.de In-Reply-To: Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-GMX-Antispam: 0 (eXpurgate); Detail=5D7Q89H36p6scdPqt3b0GPbWhR4FZ5OddQvlk7OY7JoJ44n7NuD++WNZgvHLY9xmenwpZ hRp/vJByZXSU/0M7Z75+BuS9UxEuWj7lTx+wqQMk/Id7/g4i6n/Et30w8SyvyjoAApUsLToEPQxi 7C5QC8gIitCr5P9m5eb7LTTnIcyvlodvoYzx9D2BIL0vAp1bdFSlqqLIGg=V1; X-Resent-By: Forwarder X-Resent-For: rainer.schoepf@gmx.net X-Resent-To: rainer@rainer-schoepf.de Status: R X-Status: X-Keywords: X-UID: 6938 On Thu, Oct 13, 2011 at 05:56:14AM -0400, Bruno Le Floch wrote: > > I wouldn't do it manually. There are mappings files for Unicode: > > http://unicode.org/Public/MAPPINGS/ > > In project I am using these mappings together with a perl script > > to generate the .def files. > > Thank you for the link. It seems that the simplest would be to > directly use the tables provided there as the .def files. Simply > \catcode`\#=14, and set a few other default catcodes, then input the > file, looping over the lines. Are all of the lines of the form > > 0xHH 0xHHHH # comment Not all, dec-mcs.txt is different: sprintf('=%02X U+%04X %s\n', , , ) no comments > (or comment lines), with H = some hexadecimal digit? In other words, > are all those encodings 8-bit only, and with only Unicode points > <65536? In the directory MAPPINGS there are encodings with > 8-bit. And a quick look doesn't reveal Unicode points > U+FFFF. > > It could be made even expandable in linear time > > with a large lookup table (256). > > Right. I was thinking in terms of UTF-8 for some reason, and the > lookup table would be too big. In practice the table would be larger than 256 (16x16) to support lowercase and uppercase digits ([0-9a-fA-F]). The size would be 484 = (10 + 2 x 6) x (10 + 2 x 6). Yours sincerely Heiko Oberdiek