Received: from mail.proteosys.com ([62.225.9.49]) by nummer-3.proteosys with Microsoft SMTPSVC(5.0.2195.5329); Sat, 1 Feb 2003 01:05:31 +0100 Received: by mail.proteosys.com (8.12.2/8.12.2) with ESMTP id h1105T6C011087 for ; Sat, 1 Feb 2003 01:05:30 +0100 Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.27]) by relay.uni-heidelberg.de (8.12.4/8.12.4) with ESMTP id h1101dXM000599; Sat, 1 Feb 2003 01:01:40 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C2C985.A2441780" Received: from listserv (listserv.uni-heidelberg.de [129.206.100.27]) by listserv.uni-heidelberg.de (8.12.2/8.12.2/SuSE Linux 0.6) with ESMTP id h0VN0671019262; Sat, 1 Feb 2003 00:53:50 +0100 Received: from LISTSERV.UNI-HEIDELBERG.DE by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8d) with spool id 6463 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Sat, 1 Feb 2003 00:53:49 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (8.12.2/8.12.2/SuSE Linux 0.6) with ESMTP id h0VNrn5f019628 for ; Sat, 1 Feb 2003 00:53:49 +0100 Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by relay.uni-heidelberg.de (8.12.4/8.12.4) with ESMTP id h1101MXM000576 for ; Sat, 1 Feb 2003 01:01:23 +0100 (MET) Received: from [212.227.126.162] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 18el62-0005KH-00 for LATEX-L@listserv.uni-heidelberg.de; Sat, 01 Feb 2003 01:01:22 +0100 Received: from [80.129.7.234] (helo=istrati.mittelbach-online.de) by mrelayng.kundenserver.de with asmtp (Exim 3.35 #1) id 18el61-00008v-00 for LATEX-L@listserv.uni-heidelberg.de; Sat, 01 Feb 2003 01:01:22 +0100 Received: (from frank@localhost) by istrati.mittelbach-online.de (8.11.2/8.11.2/SuSE Linux 8.11.1-0.5) id h0VNxcD18112; Sat, 1 Feb 2003 00:59:38 +0100 In-Reply-To: References: <15903.14792.193451.96963@istrati.mittelbach-online.de> Return-Path: X-Mailer: VM 6.96 under Emacs 20.7.1 X-OriginalArrivalTime: 01 Feb 2003 00:05:31.0965 (UTC) FILETIME=[A2D756D0:01C2C985] X-Authentication-Warning: istrati.mittelbach-online.de: frank set sender to frank@mittelbach-online.de using -f X-Scanned-By: MIMEDefang 2.28 (www . roaringpenguin . com / mimedefang) X-Spam-Score: -2.6 () EMAIL_ATTRIBUTION,IN_REP_TO,REFERENCES,SPAM_PHRASE_01_02,X_AUTH_WARNING Content-class: urn:content-classes:message Subject: Re: latex/3480: Support for UTF-8 missing in inputenc.sty Date: Sat, 1 Feb 2003 00:59:38 +0100 Message-ID: A<15931.3562.730605.294877@istrati.mittelbach-online.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Re: latex/3480: Support for UTF-8 missing in inputenc.sty Thread-Index: AcLJhaMTq6F2s8yVRjyE6dAM5TPcRw== From: "Frank Mittelbach" To: Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4517 This is a multi-part message in MIME format. ------_=_NextPart_001_01C2C985.A2441780 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Roozbeh Pournader writes: > > what we try is to provide a utf8 input encoding, how likely is it = that some > > editor or application generates that Adobe thing? not very i would = guess (at > > least not now) therefore i would not assign anything. > > Something that may happen: > > 1. A TeX document is typeset with a PS Type 1 font will have the = dotlessj > somewhere. After being converted to PDF, you will have the glyph in a = PDF > document. Adobe tools see a 'dotlessj' there. > > 2. Someone copies and pastes it from Acrobat Reader into a document = using > an editor that supports Adobe private use characters. He sees a = dotlessj > there. which "some" editor is that? i'm not saying it is not possible, i'm just saying that as long as something is a) not very likely b) potentially controversial we should in the first step not made a fixed assignment = ... > > 3. The output is fed back into LaTeX. not a problem, what would happen is that we get that char U+F6BE and would say, sorry, nothing set up for this. Then all it needs is \DeclareUnicodeChar{F6BE}{\j} % already forgotten what's today syntax = is :-) in the preamble of the document and off we go. 'course if that becomes = the standard situation we might as well put it in, right now i would leave = it open > Unicode doesn't distinguish that much between text and math = characters. It > says somewhere that you may use a math character as a bullet or = something. > I guess the best way to implement this is if you saw the character in = text > mode it is \textasteriskcentered and if you saw it in math mode it is = '*'. that's not the way it works in TeX, is it? at the time input encoding is translated to LICR we are before the decision for "text" or "math". the naming conventions for the LICR objects are a bit dubious here as they = often say "\text..." but that is the major goal for them, ie make the LICR = objects work in text and with different font encodings. note that any LICR object, say, \"a is first of all only an abstract = name for the character umlaut-a. it is not the instruction put an accent of a nor = is \textsterling the pound glyph but the abstract name for the character = pounds. technically, all the (text)-font-encoding commands and the majority of = LICR objects are font-encoding commands only work and TeX text and not in TeX math = today, which is why naming them \text... was useful at one stage. the inpmath proposal adds a new dimension to that by basically allowing = to define a mapping from LICR to math chars/commands/constructs. if i would start afresh then the LICR objects should probably get names = which are a bit more genderless, eg \LICR... but then this isn't the way it = developed so we are more or less stuck with the current set of names. it might as well be that U+2217 should be translated to = \textasteriskcentered when inpmath (or rather its successor implmentation) is incorporated but = as long as this isn't the case i would not map something that is only = likely to come up in the middle of a math formula to something that \LaTeX is = going to choke on if surrounded by $...$ > Anyway, what is the usage of \textasteriskcentered? I may be able to > follow it up with Unicode guys and see if we need a character for = that. the only common usage in LaTeX (i think) is as a bullet for some itemize = level good night frank ------_=_NextPart_001_01C2C985.A2441780 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: latex/3480: Support for UTF-8 missing in = inputenc.sty

Roozbeh Pournader writes:

 > > what we try is to provide a utf8 input = encoding, how likely is it that some
 > > editor or application generates that = Adobe thing? not very i would guess (at
 > > least not now) therefore i would not = assign anything.
 >
 > Something that may happen:
 >
 > 1. A TeX document is typeset with a PS = Type 1 font will have the dotlessj
 > somewhere. After being converted to PDF, = you will have the glyph in a PDF
 > document. Adobe tools see a 'dotlessj' = there.
 >
 > 2. Someone copies and pastes it from = Acrobat Reader into a document using
 > an editor that supports Adobe private use = characters. He sees a dotlessj
 > there.

which "some" editor is that? i'm not saying = it is not possible, i'm just
saying that as long as something is a) not very = likely b) potentially
controversial we should in the first step not made a = fixed assignment ...
 >
 > 3. The output is fed back into = LaTeX.

not a problem, what would happen is that we get that = char  U+F6BE
and would say, sorry, nothing set up for this. Then = all it needs is

\DeclareUnicodeChar{F6BE}{\j}  % already = forgotten what's today syntax is :-)

in the preamble of the document and off we go. 'course = if that becomes the
standard situation we might as well put it in, right = now i would leave it open

 > Unicode doesn't distinguish that much = between text and math characters. It
 > says somewhere that you may use a math = character as a bullet or something.
 > I guess the best way to implement this is = if you saw the character in text
 > mode it is \textasteriskcentered and if = you saw it in math mode it is '*'.

that's not the way it works in TeX, is it? at the time = input encoding is
translated to LICR we are before the decision for = "text" or "math".  the
naming conventions for the LICR objects are a bit = dubious here as they often
say "\text..." but that is the major goal = for them, ie make the LICR objects
work in text and with different font = encodings.

note that any LICR object, say, \"a is first of = all only an abstract name for
the character umlaut-a. it is not the instruction put = an accent of a nor is
\textsterling the pound glyph but the abstract name = for the character pounds.

technically, all the (text)-font-encoding commands and = the majority of LICR objects
are font-encoding commands only work and TeX text and = not in TeX math today,
which is why naming them \text... was useful at one = stage.

the inpmath proposal adds a new dimension to that by = basically allowing to
define a mapping from LICR to math = chars/commands/constructs.

if i would start afresh then the LICR objects should = probably get names which
are a bit more genderless, eg \LICR... but then this = isn't the way it developed
so we are more or less stuck with the current set of = names.

it might as well be that U+2217 should be translated = to \textasteriskcentered
when inpmath (or rather its successor implmentation) = is incorporated but as
long as this isn't the case i would not map something = that is only likely to
come up in the middle of a math formula to something = that \LaTeX is going to
choke on if surrounded by $...$

 > Anyway, what is the usage of = \textasteriskcentered? I may be able to
 > follow it up with Unicode guys and see if = we need a character for that.

the only common usage in LaTeX (i think) is as a = bullet for some itemize level

good night
frank

------_=_NextPart_001_01C2C985.A2441780--