Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1BKWvH12102 for ; Sun, 11 Feb 2001 21:32:57 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1BKWud26235 . for ; Sun, 11 Feb 2001 21:32:56 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C09469.D147E280" Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1BKWuM13874 for ; Sun, 11 Feb 2001 21:32:56 +0100 (MET) Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id VAA22840 for ; Sun, 11 Feb 2001 21:32:55 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1BKWtM13870 for ; Sun, 11 Feb 2001 21:32:55 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <6.A4E558D3@mail.listserv.gmd.de>; Sun, 11 Feb 2001 21:32:48 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 487774 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sun, 11 Feb 2001 21:32:52 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id VAA26584 for ; Sun, 11 Feb 2001 21:32:51 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id VAA43050 for ; Sun, 11 Feb 2001 21:32:51 +0100 Received: from Sina.sharif.ac.ir (sina.Sharif.AC.IR [194.225.40.9]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1BKWmu20609 for ; Sun, 11 Feb 2001 21:32:49 +0100 (MET) Received: from localhost (roozbeh@localhost) by Sina.sharif.ac.ir (8.9.3/8.9.3) with ESMTP id AAA18538 for ; Mon, 12 Feb 2001 00:02:45 +0330 In-Reply-To: <14982.51989.349221.285820@istrati.zdv.uni-mainz.de> Return-Path: X-Sender: roozbeh@Sina.sharif.ac.ir Content-class: urn:content-classes:message Subject: Re: LaTeX's internal char prepresentation (UTF8 or Unicode?) Date: Sun, 11 Feb 2001 21:32:45 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Roozbeh Pournader" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3816 This is a multi-part message in MIME format. ------_=_NextPart_001_01C09469.D147E280 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable On Sun, 11 Feb 2001, Frank Mittelbach wrote: > no i mean at the system level. Linux system libraries (which comes to be glibc) has many functions and data types for Unicode support. In version 2.2, glibc has become Unicode-oriented in many ways. Support for UTF8 basic operations has = been there for a long time before 2.2. > what do you mean by windows2000 autodetects > them? my understanding of what UTF8 means as a format is that you = can't > autodetect it. As best you can detect that something is not UTF8, but = how do > you want to detect it as being in that format and not in, say, a file = written > with an 8bit inputencoding which happens to just contain an 8bit = stream which > is by chance also conforming to the UTF8 spec? If it conforms by chance, you are really in bad luck. Although very simple to make such an example by hand, finding a non-UTF8 document that is conformant by accident, is almost impossible. Also, many applications shipped with Windows 2000 attach a signature to the start of file (U+FEFF, Zero-Width No-Break Space) when they want to save the file, so that will make the autodetection much easier. The Unicode Standard accepts this as an autodetection mechanism, and says that this sequence (EF BB BF in UTF-8) is really improbable anywhere other than a UTF-8 file. Although, I do not have a good experience with that, I don't like my HTML files becoming non-conformant according to Unix checkers I have. --roozbeh ------_=_NextPart_001_01C09469.D147E280 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: LaTeX's internal char prepresentation (UTF8 or = Unicode?)

On Sun, 11 Feb 2001, Frank Mittelbach wrote:

> no i mean at the system level.

Linux system libraries (which comes to be glibc) has = many functions and
data types for Unicode support. In version 2.2, glibc = has become
Unicode-oriented in many ways. Support for UTF8 basic = operations has been
there for a long time before 2.2.

> what do you mean by windows2000 = autodetects
> them? my understanding of what UTF8 means as a = format is that you can't
> autodetect it. As best you can detect that = something is not UTF8, but how do
> you want to detect it as being in that format = and not in, say, a file written
> with an 8bit inputencoding which happens to just = contain an 8bit stream which
> is by chance also conforming to the UTF8 = spec?

If it conforms by chance, you are really in bad luck. = Although very
simple to make such an example by hand, finding a = non-UTF8 document that
is conformant by accident, is almost = impossible.

Also, many applications shipped with Windows 2000 = attach a signature to
the start of file (U+FEFF, Zero-Width No-Break Space) = when they want to
save the file, so that will make the autodetection = much easier. The
Unicode Standard accepts this as an autodetection = mechanism, and
says that this sequence (EF BB BF in UTF-8) is really = improbable
anywhere other than a UTF-8 file. Although, I do not = have a good
experience with that, I don't like my HTML files = becoming non-conformant
according to Unix checkers I have.

--roozbeh

------_=_NextPart_001_01C09469.D147E280--