Received: from mx0.gmx.net (mx0.gmx.net [213.165.64.100]) by h1439878.stratoserver.net (8.14.2/8.14.2/Debian-2build1) with SMTP id o1AKV6Ri025719 for ; Wed, 10 Feb 2010 21:31:07 +0100 Received: (qmail 24626 invoked by alias); 10 Feb 2010 20:31:01 -0000 Delivered-To: GMX delivery to rainer.schoepf@gmx.net Received: (qmail invoked by alias); 10 Feb 2010 20:31:00 -0000 Received: from relay.uni-heidelberg.de (EHLO relay.uni-heidelberg.de) [129.206.100.212] by mx0.gmx.net (mx008) with SMTP; 10 Feb 2010 21:31:00 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.94]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id o1AKRp0B028941 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Feb 2010 21:27:51 +0100 Received: from listserv.uni-heidelberg.de (localhost.localdomain [127.0.0.1]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id o1A9aknA013381; Wed, 10 Feb 2010 21:27:45 +0100 Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 16.0) with spool id 400816 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Wed, 10 Feb 2010 21:27:45 +0100 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by listserv.uni-heidelberg.de (8.13.1/8.13.1) with ESMTP id o1AKRjbM027438 for ; Wed, 10 Feb 2010 21:27:45 +0100 Received: from mail-mx2.its.albany.edu (mail-mx2.its.albany.edu [169.226.1.164]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id o1AKRV2O002996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 10 Feb 2010 21:27:36 +0100 Received: from hilbert.math.albany.edu (hilbert.math.albany.edu [169.226.140.28]) by mail-mx2.its.albany.edu (8.14.3/8.14.3) with ESMTP id o1AKRVFu019740 for ; Wed, 10 Feb 2010 15:27:31 -0500 (EST) Received: (from hammond@localhost) by hilbert.math.albany.edu (8.13.8/8.13.8/Submit) id o1AKRUxl019877; Wed, 10 Feb 2010 15:27:30 -0500 (EST) References: <4B727378.8060704@morningstar2.co.uk> <4B729944.5050308@residenset.net> <4B72B36E.6010401@morningstar2.co.uk> <4B730157.5060605@morningstar2.co.uk> User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Scanned-By: MIMEDefang 2.63 on 85.214.41.38 X-Scanned-By: MIMEDefang 2.65 on 169.226.1.164 Message-ID: Date: Wed, 10 Feb 2010 15:27:30 -0500 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: William F Hammond Subject: Re: LaTeX3 8-bit only? To: LATEX-L@listserv.uni-heidelberg.de In-Reply-To: <4B730157.5060605@morningstar2.co.uk> (Joseph Wright's message of "Wed, 10 Feb 2010 18:56:23 +0000") Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=5D7Q89H36p6i75npGen84eVAEFK/syJmiNoEBJhgjYKpglu1TZLLw7xMZnJMXwBFK0zrU udEInhYyaWAzwtcf5K2pCdD+gZ2/z4PnBLkwixZI+pVtXqOlCN41sOWgjaVeH7+UhPxHlGxFK/rc sw7fg==V1; X-Resent-By: Forwarder X-Resent-For: rainer.schoepf@gmx.net X-Resent-To: rainer@rainer-schoepf.de Status: R X-Status: X-Keywords: X-UID: 6240 Joseph Wright writes: > . . . > I was thinking of input encodings, where my point was (supposed to) be > that something like the inputenc "utf8" approach would be an approach > I hope we can avoid as there are better solutions (in the form of > engines which deal with the issue). (Of course, that leaves UTF-16 > issues, but I'd hope that engine developments can help out). > > (I'd point out that LaTeX3 code is intended for use in new documents, > and the rest of the computer world is standardising on UTF-8 as far as > I can see. So I'd hope very much that having an approach based on this > concept is not too risky.) At some point I expect that LaTeX will want to provide for arbitrary unicode "word" characters in command names. If that is the case, then shouldn't standard handling of text-encodings for whole document instances apply? In particular, I think that text-encoding pre-processing (by something like GNU "recode") to meet the needs of the particular TeX engine would be the way to proceed. Presumably the UTF-8 and UTF-16 text encodings for unicode are both supported everywhere in the XML world. While, UTF-8 is more efficient for Western languages, UTF-16 is likely to be favored in regions where the unicode sections devoted to local character sets take more than 4 bytes per character. I also imagine that eventually many documents will have origin under author-level XML document types, and so what is fed to a TeX engine would then be output from a pipeline. In that case I suppose UTF-8 would be a reasonable standard. Let me also point out that, to the extent that XML origination is realized, TeX engines might never need to bite the bullet on non-ascii command names. That is, LaTeX provision of non-ascii command names could be handled via XML front ends that are sponsored by the LaTeX Project. -- Bill