Re: LaTeX 2.09 <Oct 91> beta-test

Received: by nummer-3.proteosys id <01C19443.4C2C907C@nummer-3.proteosys>; Thu, 3 Jan 2002 11:42:10 +0100 Return-Path: <@vm.gmd.de:LATEX-L@DHDURZ1.BITNET> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C19443.4C2C907C" x-vm-v5-data: ([nil nil nil nil nil nil nil nil nil][nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil]) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message Subject: Re: LaTeX 2.09 beta-test Date: Thu, 31 Oct 1991 14:59:18 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: Sender: "LaTeX-L Mailing list" To: "Rainer M. Schoepf" Reply-To: "LaTeX-L Mailing list" Status: R X-Status: X-Keywords: X-UID: 434 This is a multi-part message in MIME format. ------_=_NextPart_001_01C19443.4C2C907C Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I'm sorry, Don --- you have selectively quoted from the emTeX = documentation, where a full citation would make it plain that what you are suggesting is fundamentally wrong: >>> from \emtex\doc\english\texware.doc: >>> |The /c option >>> |------------- >>> |Currently, only one tcp file is available: 850_tex.tcp. This file = converts >>> |some characters of code page 850 into TeX commands: >>> %%%%%%%%%%%% What the documentation \stress {actually} says is: [36 lines follow] +To create a TCP file you must make a text file in which both the = translation +of special characters into TeX control sequences and the character = conversion +for input and output is given. You can get an example of the format of = the +file by converting the TCP file supplied (850_tex.tcp) into the = equivalent +text file with the command + + maketcp -d 850_tex.tcp example.txt + +The text file can contain comments which are lines with a `%' in column = 1: +you can also make your file more readable by inserting blank lines, = which +are ignored. All other lines are either special character conversions = or +input to internal character conversions - the output character = conversion +=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08 +table is constructed from the input table. Characters to be converted = can be +entered either as is (a single character) or in hexadecimal, in the TeX = 3.0 +format (=08=08ff). The translation of a special character is entered as = follows: + + =08=0884 -> \"a % Umlaut-a + +The line begins with the special character (Umlaut-a), followed by a = space, +an arrow (hyphen and greater than character), space and then the TeX = control +sequence which is to replace the character. The `%' and the text = following +it up to the end of the line will be ignored unless it is part of the = TeX +command - in the following, ONE space will not be ignored: + + =08=08fe -> \%\ % Tex control sequence: "\%\ " + +The conversion of an input character into an internal code (and an = internal +code into an output character) is entered as follows: + + =08=0884 =08=08e4 % Umlaut-a (PC) -> Umlaut-a (ISO = 8859/1) =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08 + +The line begins with the input character followed by a space and the = desired +internal representation (as coded in the TFM file). When the character = =08=08e4 +is to be output, it will be converted into =08=0884 (in this example). >>> This sort of conversion is not possible by changing the >>> xchar/xord array which assumes a 1-1 mapping of external code >>> byte to internal code byte. Ergo, use of emTeX's code page >>> facilities represents a non-TeX extension. Thus one \stress {can} use the codepage facilities of emTeX to implement a one-one mapping, provided only that the same characters exist in the codepage and the internal (?Cork?) representation. If some Cork = characters are not available from the codepage, then some other representation must be sought: this is clearly a valuable area for discussion; if the = codepage contains characters from without the Cork character set, then presumably = macros and/or virtual fonts will be required to reproduce those characters. = But for characters which are in the codepage \stress {and} in Cork, no problem = exists, and they may be treated as letters (or others) throughout TeX. >>> If you're going to >>> use the excuse that you can't reliably send an 8-bit ASCII file >>> to an EBCDIC system to justify incorporating unnecessary >>> incompatibilities between TeX implementations, I kind of wonder >>> what the point of worrying about any sort of TeX consistency is. No, not an 8-bit ASCII file; a simple 7-bit ASCII file. >>> -Thus b is 98 inside of TeX even when your computer normally deals = with >>> -EBCDIC or some other non-ASCII system.'' = =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08 >>> Not that significantly different of a statement from mine. I disagree; very significantly different. DEK clearly refers to `some other non-ASCII system'; you referred only to EBCDIC. IBM codepages may well be regarded as `some other non-ASCII System', even thought the 7-bit mappings are probably identical. Notice, incidentally, the non-isomorphic nature of our communications: I sent a row of carets under the words `or some other non-ASCII SYSTEM'; what came back from you were a row of control-H's ! >>> How do I define Best? Well here are the criteria: >>> - No modifications to the TeX program should be necessary I agree here. >>> (well >>> actually, there are two key changes that should be made: the >>> basic changes so the program compiles and the modification of >>> the xchar initialization on ASCII systems so that 0-31 and >>> 127-255 are mapped to the identical internal chars. I avoid the >>> issue of EBCDIC because it seems to difficult to work with. I >>> avoid anything outside of ASCII/EBCDIC since I really don't >>> think it gets used at all, at least not in the TeX world. Note >>> that I consider anything with characters 32-127 matching the >>> table in the TeXbook to be ASCII); I also agree with ignoring EBCDIC, but would differ with your definition of ASCII ... >>> - The solution should be such that if all files for a document >>> are shipped to another ASCII system (assuming no corruption in >>> transmission, a not unreasonable condition) it will produce >>> identical output; Provided that the target TeX installation contains provision for asserting `the following file is in code '. The originator msut clearly specify what code he or she has sent. >>> - Multiple code pages must be supportable in a single document. I think this last one causes severe problems; I may return to it later. >>> Now, having the high characters active is not without its >>> drawbacks, but I don't think that the Unspecified Horrors >>> mentioned above are quite as bad as "all the drawbacks that that >>> implies" implies. The only drawback that I can think of is the >>> exclusion of active characters from multicharacter csnames (which >>> I will admit is serious, but imho, well compensated by the >>> benefits of portability and flexibility given by the scheme I've >>> outlined above). Beyond that, are there drawbacks that I'm >>> overlooking? I haven't noticed any in the ten months experience >>> I've had printing six-language documents for one of my clients. >>> Perhaps, we could have some specific arguments against my >>> approach (since I'm using it in a production environment, I >>> consider it fairly important to know whether I may be >>> encountering difficulties in the future). Wel, either I'm going mad (or am already mad, as many may think), or else rendering as active precludes the possibility of its participating in hyphenation patterns and exceptions; if I am not mistaken, this totally precludes the possibility of establishing correct hyphenation patterns for the target language. ** Phil. ------_=_NextPart_001_01C19443.4C2C907C Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: LaTeX 2.09 <Oct 91> beta-test

I'm sorry, Don --- you have selectively quoted from = the emTeX documentation,
where a full citation would make it plain that what = you are suggesting
is fundamentally wrong:

>>> from = \emtex\doc\english\texware.doc:

>>> |The /c option
>>> |-------------

>>> |Currently, only one tcp file is = available: 850_tex.tcp. This file converts
>>> |some characters of code page 850 into = TeX commands:
>>> &nbs= p; = ; = %%%%%%%%%%%%

What the documentation \stress {actually} says is: [36 = lines follow]

+To create a TCP file you must make a text file in = which both the translation
+of special characters into TeX control sequences and = the character conversion
+for input and output is given. You can get an = example of the format of the
+file by converting the TCP file supplied = (850_tex.tcp) into the equivalent
+text file with the command
+
+    maketcp -d 850_tex.tcp = example.txt
+
+The text file can contain comments which are lines = with a `%' in column 1:
+you can also make your file more readable by = inserting blank lines, which
+are ignored. All other lines are either special = character conversions or
+input to internal character conversions - the output = character conversion
+=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08
+table is constructed from the input table. = Characters to be converted can be
+entered either as is (a single character) or in = hexadecimal, in the TeX 3.0
+format (=08=08ff). The translation of a special = character is entered as follows:
+
+    =08=0884 -> = \"a     % Umlaut-a
+
+The line begins with the special character = (Umlaut-a), followed by a space,
+an arrow (hyphen and greater than character), space = and then the TeX control
+sequence which is to replace the character. The `%' = and the text following
+it up to the end of the line will be ignored unless = it is part of the TeX
+command - in the following, ONE space will not be = ignored:
+
+    =08=08fe -> = \%\     % Tex control sequence: "\%\ = "
+
+The conversion of an input character into an = internal code (and an internal
+code into an output character) is entered as = follows:
+
+    =08=0884 = =08=08e4           = ;    % Umlaut-a (PC) -> Umlaut-a (ISO 8859/1)
=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=
+
+The line begins with the input character followed by = a space and the desired
+internal representation (as coded in the TFM file). = When the character =08=08e4
+is to be output, it will be converted into =08=0884 = (in this example).

>>> This sort of conversion is not possible = by changing the
>>> xchar/xord array which assumes a 1-1 = mapping of external code
>>> byte to internal code byte. Ergo, use of = emTeX's code page
>>> facilities represents a non-TeX = extension.

Thus one \stress {can} use the codepage facilities of = emTeX to implement
a one-one mapping, provided only that the same = characters exist in the
codepage and the internal (?Cork?) = representation. If some Cork characters
are not available from the codepage, then some other = representation must
be sought: this is clearly a valuable area for = discussion; if the codepage
contains characters from without the Cork character = set, then presumably macros
and/or virtual fonts will be required to reproduce = those characters. But for
characters which are in the codepage \stress {and} in = Cork, no problem exists,
and they may be treated as letters (or others) = throughout TeX.

>>> If you're going to
>>> use the excuse that you can't reliably = send an 8-bit ASCII file
>>> to an EBCDIC system to justify = incorporating unnecessary
>>> incompatibilities between TeX = implementations, I kind of wonder
>>> what the point of worrying about any = sort of TeX consistency is.

No, not an 8-bit ASCII file; a simple 7-bit ASCII = file.

>>> -Thus b is 98 inside of TeX even when = your computer normally deals with
>>> -EBCDIC or some other non-ASCII = system.''
&nbs= p; = =08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08=08= =08=08=08=08=08

>>> Not that significantly different of a = statement from mine.

I disagree; very significantly different. DEK = clearly refers to
`some other non-ASCII system'; you referred only to = EBCDIC. IBM
codepages may well be regarded as `some other = non-ASCII System',
even thought the 7-bit mappings are probably = identical. Notice,
incidentally, the non-isomorphic nature of our = communications:
I sent a row of carets under the words `or some other = non-ASCII SYSTEM';
what came back from you were a row of control-H's = !

>>> How do I define Best? Well here are the = criteria:
>>> - No modifications to the TeX program = should be necessary

I agree here.

>>> (well
>>>   actually, there are two key = changes that should be made: the
>>>   basic changes so the program = compiles and the modification of
>>>   the xchar initialization on = ASCII systems so that 0-31 and
>>>   127-255 are mapped to the = identical internal chars. I avoid the
>>>   issue of EBCDIC because it = seems to difficult to work with. I
>>>   avoid anything outside of = ASCII/EBCDIC since I really don't
>>>   think it gets used at all, = at least not in the TeX world. Note
>>>   that I consider anything = with characters 32-127 matching the
>>>   table in the TeXbook to be = ASCII);

I also agree with ignoring EBCDIC, but would differ = with your
definition of ASCII ...

>>> - The solution should be such that if all = files for a document
>>>   are shipped to another ASCII = system (assuming no corruption in
>>>   transmission, a not = unreasonable condition) it will produce
>>>   identical output;

Provided that the target TeX installation contains = provision for
asserting `the following file is in code = <xyz>'. The originator
msut clearly specify what code he or she has = sent.

>>> - Multiple code pages must be supportable = in a single document.

I think this last one causes severe problems; I may = return to it later.

>>> Now, having the high characters active is = not without its
>>> drawbacks, but I don't think that the = Unspecified Horrors
>>> mentioned above are quite as bad as = "all the drawbacks that that
>>> implies" implies. The only drawback = that I can think of is the
>>> exclusion of active characters from = multicharacter csnames (which
>>> I will admit is serious, but imho, well = compensated by the
>>> benefits of portability and flexibility = given by the scheme I've
>>> outlined above). Beyond that, are there = drawbacks that I'm
>>> overlooking? I haven't noticed any in = the ten months experience
>>> I've had printing six-language documents = for one of my clients.
>>> Perhaps, we could have some specific = arguments against my
>>> approach (since I'm using it in a = production environment, I
>>> consider it fairly important to know = whether I may be
>>> encountering difficulties in the = future).

Wel, either I'm going mad (or am already mad, as many = may think), or
else rendering <e-acute> as active precludes = the possibility of its
participating in hyphenation patterns and exceptions; = if I am not
mistaken, this totally precludes the possibility of = establishing
correct hyphenation patterns for the target = language.

= = = = ** = Phil.

------_=_NextPart_001_01C19443.4C2C907C--