Received: by nummer-3.proteosys id <01C19443.4CD7B8FC@nummer-3.proteosys>; Thu, 3 Jan 2002 11:42:11 +0100 Return-Path: <@vm.gmd.de:LATEX-L@DHDURZ1.BITNET> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C19443.4CD7B8FC" x-vm-v5-data: ([nil nil nil nil nil nil nil nil nil][nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil]) X-MimeOLE: Produced By Microsoft Exchange V6.5 x-to: LATEX-L%DHDURZ1.BITNET@uga.cc.uga.edu Content-class: urn:content-classes:message Subject: Should chars over 128 be active? Date: Fri, 1 Nov 1991 10:17:00 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Don Hosek" Sender: "LaTeX-L Mailing list" To: "Rainer M. Schoepf" Reply-To: "LaTeX-L Mailing list" Status: R X-Status: X-Keywords: X-UID: 442 This is a multi-part message in MIME format. ------_=_NextPart_001_01C19443.4CD7B8FC Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable (because, after all, I don't know when to keep my mouth shut...) - Sorry, I'm not a mathematician so I misuse terms all the time. I - didn't mean that the mapping was mathematically 1-to-1, but that - a single byte must map to a single byte. Check it out. - -Now, there I agree. So, if one wants to have \^^80 to mean a c with -cedilla, he must define a macro. This is only possible (in general if -the category code of ^^80 is letter. (see below.) Bzzt. Guess again. Single character control sequences can have any category code one likes. Named commands (to use Norbert Schwarz' terminology) must (in general) be composed of letters although as long as the sequence of letters up to the first non-letter is unique, this is not a hard-and-fast rule. -No, I didn't say that. I can see several possibilities for this, like -having the same code page on both systems, having the TeX job on the -VAX go through a preprocessing step, or even making some characters -active that are in different positions in both code pages. Making all -characters active is, in my opinion, a bad solution -- for a default. Same code page on both systems is assuming a lot. Code pages are rather variable (and can, in fact vary within documents). Having a settable code page, while possibly a good idea, is also not standard issue in TeX implementations. Filters are messy garbage and best avoided whenever possible; should you have any doubts, take a look at the documentation for Jacques Goldberg's original SemiTeX package. All characters active is the only solution that works with all existing TeX 3 implementations (except those which are seriously broken--I won't name names despite my high P.O. factor on this issue) and handles all cases. -The real problem is this: you *need* to tag the files with the -character set used, and the program to work on them must use this -information. Unless this problem is addressed and solved, we cannot -get any further. I wouldn't say that it's _necessary_ to be able to have a named cs with accented characters. Desirable, yes (I find undiacriticalized Spanish rather unpleasant to read, not to mention certain amusing puns that can occur through the loss of a tilde), but necessary? I think that the problem is that we're expecting too big of an advance out of what TeX 3 provides for us. With TeX 2.x, we got by with csnames that weren't quite right. Unfortunately, with TeX 3.x, DEK did not address the issue properly so we must ocntinue to get by with csnames that aren't quite right (fortunately, the bulk of my coding is in English or Latin (depending on my mood) so accents aren't generally a concern for me... this does not mean that I'm ignorant of the problem). - By the way, you qdidn't address the issue of why making chars - 128-255 active is so horrible. -Because characters of catcode 11 are rather special: only they can be -used in control sequences. And besides, it's a hell of an overhead. See above regarding cses. As regards the overhead, it's not too bad. I haven't run into files where LaTeX+my activation of high chars to be used with the PC850 code page didn't fit on a standard TeX (except files that didn't fit for other, more interesting, reasons). -dh ------_=_NextPart_001_01C19443.4CD7B8FC Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Should chars over 128 be active?

(because, after all, I don't know when to keep my = mouth shut...)

-   Sorry, I'm not a mathematician so I = misuse terms all the time. I
-   didn't mean that the mapping was = mathematically 1-to-1, but that
-   a single byte must map to a single = byte. Check it out.
-
-Now, there I agree. So, if one wants to have \^^80 = to mean a c with
-cedilla, he must define a macro. This is only = possible (in general if
-the category code of ^^80 is letter. (see = below.)

Bzzt. Guess again. Single character control sequences = can have
any category code one likes. Named commands (to use = Norbert
Schwarz' terminology) must (in general) be composed = of letters
although as long as the sequence of letters up to the = first
non-letter is unique, this is not a hard-and-fast = rule.

-No, I didn't say that. I can see several = possibilities for this, like
-having the same code page on both systems, having = the TeX job on the
-VAX go through a preprocessing step, or even making = some characters
-active that are in different positions in both code = pages. Making all
-characters active is, in my opinion, a bad solution = -- for a default.

Same code page on both systems is assuming a lot. Code = pages are
rather variable (and can, in fact vary within = documents). Having
a settable code page, while possibly a good idea, is = also not
standard issue in TeX implementations. Filters are = messy garbage
and best avoided whenever possible; should you have = any doubts,
take a look at the documentation for Jacques = Goldberg's original
SemiTeX package. All characters active is the only = solution that
works with all existing TeX 3 implementations (except = those which
are seriously broken--I won't name names despite my = high P.O.
factor on this issue) and handles all cases.

-The real problem is this: you *need* to tag the files = with the
-character set used, and the program to work on them = must use this
-information. Unless this problem is addressed and = solved, we cannot
-get any further.

I wouldn't say that it's _necessary_ to be able to = have a named
cs with accented characters. Desirable, yes (I = find
undiacriticalized Spanish rather unpleasant to read, = not to
mention certain amusing puns that can occur through = the loss of a
tilde), but necessary? I think that the problem is = that we're
expecting too big of an advance out of what TeX 3 = provides for
us. With TeX 2.x, we got by with csnames that weren't = quite
right. Unfortunately, with TeX 3.x, DEK did not = address the issue
properly so we must ocntinue to get by with csnames = that aren't
quite right (fortunately, the bulk of my coding is in = English or
Latin (depending on my mood) so accents aren't = generally a
concern for me... this does not mean that I'm = ignorant of the
problem).

-   By the way, you qdidn't address the = issue of why making chars
-   128-255 active is so horrible.

-Because characters of catcode 11 are rather special: = only they can be
-used in control sequences. And besides, it's a hell = of an overhead.

See above regarding cses. As regards the overhead, = it's not too
bad. I haven't run into files where LaTeX+my = activation of high
chars to be used with the PC850 code page didn't fit = on a
standard TeX (except files that didn't fit for other, = more
interesting, reasons).

-dh

------_=_NextPart_001_01C19443.4CD7B8FC--