More on 8-bit active... gosh, I'm glad I learned = how to change subjects!

Received: by nummer-3.proteosys id <01C19443.4D11CB4C@nummer-3.proteosys>; Thu, 3 Jan 2002 11:42:11 +0100 Return-Path: <@vm.gmd.de:LATEX-L@DHDURZ1.BITNET> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C19443.4D11CB4C" x-vm-v5-data: ([nil nil nil nil nil nil nil nil nil][nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil]) X-MimeOLE: Produced By Microsoft Exchange V6.5 x-to: LATEX-L%DHDURZ1.BITNET@uga.cc.uga.edu Content-class: urn:content-classes:message Subject: More on 8-bit active... gosh, I'm glad I learned how to change subjects! Date: Fri, 1 Nov 1991 10:41:00 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Don Hosek" Sender: "LaTeX-L Mailing list" To: "Rainer M. Schoepf" Reply-To: "LaTeX-L Mailing list" Status: R X-Status: X-Keywords: X-UID: 443 This is a multi-part message in MIME format. ------_=_NextPart_001_01C19443.4D11CB4C Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable -I'm sorry, Don --- you have selectively quoted from the emTeX = documentation, -where a full citation would make it plain that what you are suggesting -is fundamentally wrong: Perhaps, I'm misreading the extended quotation that you give me, but it still seems that emTeX is effectively doing a one byte to many byte mapping of the characters (or else, initializing catcodes and macro meanings in a non-standard manner ->>> This sort of conversion is not possible by changing the ->>> xchar/xord array which assumes a 1-1 mapping of external code ->>> byte to internal code byte. Ergo, use of emTeX's code page ->>> facilities represents a non-TeX extension. -Thus one \stress {can} use the codepage facilities of emTeX to = implement -a one-one mapping, provided only that the same characters exist in the -codepage and the internal (?Cork?) representation. I've never denied it. ->>> How do I define Best? Well here are the criteria: ->>> - No modifications to the TeX program should be necessary -I agree here. So how do you handle wanting to be able to exchange files created under different code pages (assuming no corruption in exchange)? ->>> (well ->>> actually, there are two key changes that should be made: the ->>> basic changes so the program compiles and the modification of ->>> the xchar initialization on ASCII systems so that 0-31 and ->>> 127-255 are mapped to the identical internal chars. I avoid the ->>> issue of EBCDIC because it seems to difficult to work with. I ->>> avoid anything outside of ASCII/EBCDIC since I really don't ->>> think it gets used at all, at least not in the TeX world. Note ->>> that I consider anything with characters 32-127 matching the ->>> table in the TeXbook to be ASCII); -I also agree with ignoring EBCDIC, but would differ with your -definition of ASCII ... I've noticed, but look how loosely I used 1-1! I studied literature and classics, I'm not much at technical precision (after all, no one has ever come up with an adequate definition of "literature" or "classics"). ->>> - The solution should be such that if all files for a document ->>> are shipped to another ASCII system (assuming no corruption in ->>> transmission, a not unreasonable condition) it will produce ->>> identical output; -Provided that the target TeX installation contains provision for -asserting `the following file is in code '. The originator -msut clearly specify what code he or she has sent. Yes, but there are precious few exisiting implementations which support this (in fact, I can only think of two offhand and both of them support it in a manner which *invites* abuse). ->>> - Multiple code pages must be supportable in a single document. -I think this last one causes severe problems; I may return to it later. But don't ignore it because it's very important. There are some crucial combinations which are not addressed by any of the character sets that I've seen (e.g., Slovene/Italian, Slovene/German, Czech/German...) ->>> Now, having the high characters active is not without its ->>> drawbacks, but I don't think that the Unspecified Horrors ->>> mentioned above are quite as bad as "all the drawbacks that that ->>> implies" implies. The only drawback that I can think of is the ->>> exclusion of active characters from multicharacter csnames (which -->>> I will admit is serious, but imho, well compensated by the ->>> benefits of portability and flexibility given by the scheme I've ->>> outlined above). Beyond that, are there drawbacks that I'm ->>> overlooking? I haven't noticed any in the ten months experience ->>> I've had printing six-language documents for one of my clients. ->>> Perhaps, we could have some specific arguments against my ->>> approach (since I'm using it in a production environment, I ->>> consider it fairly important to know whether I may be ->>> encountering difficulties in the future). -Wel, either I'm going mad (or am already mad, as many may think), or -else rendering as active precludes the possibility of its -participating in hyphenation patterns and exceptions; if I am not -mistaken, this totally precludes the possibility of establishing -correct hyphenation patterns for the target language. You're going/gone mad. Exlempum gratis: Let's assume that we have one accented character, x'AA which represents e-acute. Our hyphenation pattern is really short and consists simply of what would be x2^^aa were we to access that character directly. However, we're assuming the Cork character set where e-acute is x'E9 in our font. We could do the following: \catcode"E9=3D11 \patterns{x2^^e9} \catcode"AA=3D\active \def^^aa{\char"E9} Problem goes away. A more general solution would be like the above except the last line is replaced with: \catcode"AA=3D\active \def^^aa{\'{e}} \def\'#1{\expandafter\ifx\csname '#1\endcsname\relax \accent1#1\else \csname '#1\endcsname} \expandafter\def\csname 'e\endcsname{\char"E9} which allows one to even hyphenate a word containing \'e. Output code page independence can be gained by accessing accented characters through ligatures (my preference would be the accent code + the letter, although I've just noticed that Cork does a wholesale job of moving every single accent (or perhaps just most of 'em) to a new location). -dh ------_=_NextPart_001_01C19443.4D11CB4C Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable More on 8-bit active... gosh, I'm glad I learned = how to change subjects!

-I'm sorry, Don --- you have selectively quoted from = the emTeX documentation,
-where a full citation would make it plain that what = you are suggesting
-is fundamentally wrong:

Perhaps, I'm misreading the extended quotation that = you give me,
but it still seems that emTeX is effectively doing a = one byte to
many byte mapping of the characters (or else, = initializing
catcodes and macro meanings in a non-standard = manner

->>> This sort of conversion is not possible = by changing the
->>> xchar/xord array which assumes a 1-1 = mapping of external code
->>> byte to internal code byte. Ergo, use = of emTeX's code page
->>> facilities represents a non-TeX = extension.

-Thus one \stress {can} use the codepage facilities of = emTeX to implement
-a one-one mapping, provided only that the same = characters exist in the
-codepage and the internal (?Cork?) = representation.

I've never denied it.

->>> How do I define Best? Well here are the = criteria:
->>> - No modifications to the TeX program = should be necessary

-I agree here.

So how do you handle wanting to be able to exchange = files created
under different code pages (assuming no corruption in = exchange)?

->>> (well
->>>   actually, there are two key = changes that should be made: the
->>>   basic changes so the = program compiles and the modification of
->>>   the xchar initialization on = ASCII systems so that 0-31 and
->>>   127-255 are mapped to the = identical internal chars. I avoid the
->>>   issue of EBCDIC because it = seems to difficult to work with. I
->>>   avoid anything outside of = ASCII/EBCDIC since I really don't
->>>   think it gets used at all, = at least not in the TeX world. Note
->>>   that I consider anything = with characters 32-127 matching the
->>>   table in the TeXbook to be = ASCII);

-I also agree with ignoring EBCDIC, but would differ = with your
-definition of ASCII ...

I've noticed, but look how loosely I used 1-1! I = studied
literature and classics, I'm not much at technical = precision
(after all, no one has ever come up with an adequate = definition
of "literature" or = "classics").

->>> - The solution should be such that if = all files for a document
->>>   are shipped to another = ASCII system (assuming no corruption in
->>>   transmission, a not = unreasonable condition) it will produce
->>>   identical output;

-Provided that the target TeX installation contains = provision for
-asserting `the following file is in code = <xyz>'. The originator
-msut clearly specify what code he or she has = sent.

Yes, but there are precious few exisiting = implementations which
support this (in fact, I can only think of two = offhand and both
of them support it in a manner which *invites* = abuse).

->>> - Multiple code pages must be = supportable in a single document.

-I think this last one causes severe problems; I may = return to it later.

But don't ignore it because it's very important. There = are some
crucial combinations which are not addressed by any = of the
character sets that I've seen (e.g., = Slovene/Italian,
Slovene/German, Czech/German...)

->>> Now, having the high characters active = is not without its
->>> drawbacks, but I don't think that the = Unspecified Horrors
->>> mentioned above are quite as bad as = "all the drawbacks that that
->>> implies" implies. The only = drawback that I can think of is the
->>> exclusion of active characters from = multicharacter csnames (which
-->>> I will admit is serious, but imho, = well compensated by the
->>> benefits of portability and flexibility = given by the scheme I've
->>> outlined above). Beyond that, are there = drawbacks that I'm
->>> overlooking? I haven't noticed any in = the ten months experience
->>> I've had printing six-language = documents for one of my clients.
->>> Perhaps, we could have some specific = arguments against my
->>> approach (since I'm using it in a = production environment, I
->>> consider it fairly important to know = whether I may be
->>> encountering difficulties in the = future).

-Wel, either I'm going mad (or am already mad, as many = may think), or
-else rendering <e-acute> as active precludes = the possibility of its
-participating in hyphenation patterns and = exceptions; if I am not
-mistaken, this totally precludes the possibility of = establishing
-correct hyphenation patterns for the target = language.

You're going/gone mad. Exlempum gratis:

Let's assume that we have one accented character, x'AA = which
represents e-acute. Our hyphenation pattern is really = short and
consists simply of what would be x2^^aa were we to = access that
character directly. However, we're assuming the Cork = character
set where e-acute is x'E9 in our font. We could do = the following:

\catcode"E9=3D11
\patterns{x2^^e9}

\catcode"AA=3D\active = \def^^aa{\char"E9}

Problem goes away.

A more general solution would be like the above except = the last
line is replaced with:

\catcode"AA=3D\active \def^^aa{\'{e}}
\def\'#1{\expandafter\ifx\csname = '#1\endcsname\relax
\accent1#1\else \csname '#1\endcsname}
\expandafter\def\csname = 'e\endcsname{\char"E9}

which allows one to even hyphenate a word containing = \'e.

Output code page independence can be gained by = accessing accented
characters through ligatures (my preference would be = the accent
code + the letter, although I've just noticed that = Cork does a
wholesale job of moving every single accent (or = perhaps just most
of 'em) to a new location).

-dh

------_=_NextPart_001_01C19443.4D11CB4C--