Received: from mail.proteosys.com ([213.139.130.197]) by nummer-3.proteosys with Microsoft SMTPSVC(6.0.3790.1830); Fri, 24 Feb 2006 09:48:32 +0100 Received: by mail.proteosys.com (8.12.10/8.12.2) with ESMTP id k1O8mLoF022547 for ; Fri, 24 Feb 2006 09:48:22 +0100 Received: from comedy.dante.de (localhost. [127.0.0.1]) by comedy.dante.de (8.13.4/8.13.4/Debian-3) with ESMTP id k1O8mKNq016899 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 24 Feb 2006 09:48:20 +0100 Received: (from gnats@localhost) by comedy.dante.de (8.13.4/8.13.4/Submit) id k1O8mKwK016898 for rainer; Fri, 24 Feb 2006 09:48:20 +0100 Received: from blue.lurkemweb.com (blue.lurkemweb.com [147.202.39.172]) by comedy.dante.de (8.13.4/8.13.4/Debian-3) with ESMTP id k1O8mGrx016888 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 24 Feb 2006 09:48:18 +0100 Received: from 237-qos1-madr-d06.inversas.jazztel.es ([62.14.134.237]:1264 helo=PROPIETAFMI1K9) by blue.lurkemweb.com with esmtpa (Exim 4.52) id 1FCYcW-0007x1-A5; Fri, 24 Feb 2006 03:48:13 -0500 Message-ID: <004201c6391f$63be8ec0$ed860e3e@PROPIETAFMI1K9> From: "Javier Bezos" To: , References: <200602222031.k1MKV39f029701@comedy.dante.de> Subject: Re: latex/3844: UTF-8 sanitation in inputenc Date: Fri, 24 Feb 2006 09:46:26 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - blue.lurkemweb.com X-AntiAbuse: Original Domain - latex-project.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - texytipografia.com X-Source: X-Source-Args: X-Source-Dir: X-DANTE-Spam-Score: 0 () X-Scanned-By: MIMEDefang at proteosys.com X-Scanned-By: MIMEDefang 2.51 on 80.237.210.73 X-Scanned-By: MIMEDefang 2.51 on 80.237.210.73 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by comedy.dante.de id k1O8mGrx016888 X-ProteoSys-SPAM-Score: 0 () Return-Path: gnats@dante.de X-OriginalArrivalTime: 24 Feb 2006 08:48:32.0440 (UTC) FILETIME=[174E1380:01C6391F] Status: R X-Status: X-Keywords: X-UID: 4919 Hi all, > 2) Set the category code of all characters which can > possibly serve as a consecutive byte in a multibyte > UTF-8 sequence to 12. Not a bad idea, but inputenc relies on the fact all chars above 127 are active. What happens if the encoding is changed in the middle of the document? > The problem in this case is that you can't simply prefix a > UTF-8 character with \string because the second or third > byte might still `fire'. You have to parse it bytewise and > insert \string (or \protect) in front of every single byte. Maybe defining the chars used as second and third bytes as \string will be enough, but I'm not sure this is a good idea or if it will break something (or even if it's currently done, to be honest...). Javier