Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1E5c9H29985 for ; Wed, 14 Feb 2001 06:38:09 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1E5c9d04721 . for ; Wed, 14 Feb 2001 06:38:09 +0100 MIME-Version: 1.0 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1E5c9729485 for ; Wed, 14 Feb 2001 06:38:09 +0100 (MET) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C09648.4FFA7680" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id GAA23508 for ; Wed, 14 Feb 2001 06:38:08 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1E5c7M17122 for ; Wed, 14 Feb 2001 06:38:07 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <6.232AD572@mail.listserv.gmd.de>; Wed, 14 Feb 2001 6:38:00 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 487555 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Wed, 14 Feb 2001 06:38:03 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id GAA13056 for ; Wed, 14 Feb 2001 06:38:02 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id GAA04014 for ; Wed, 14 Feb 2001 06:38:02 +0100 Received: from trmail.triumf.ca (trmail.Triumf.CA [142.90.100.150]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1E5c0g15148 for ; Wed, 14 Feb 2001 06:38:01 +0100 (MET) Received: from triumf.ca (mutant.Triumf.CA [142.90.112.22]) by trmail.triumf.ca (8.10.2/8.10.2) with ESMTP id f1E5bx820115 for ; Tue, 13 Feb 2001 21:37:59 -0800 Organization: TRIUMF Return-Path: X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.16-3 i686) X-Accept-Language: en Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary Date: Wed, 14 Feb 2001 06:45:07 +0100 Message-ID: <3A8A1B63.1803CD0F@triumf.ca> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Donald Arseneau" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3911 This is a multi-part message in MIME format. ------_=_NextPart_001_01C09648.4FFA7680 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Frank Mittelbach writes: > > - Hyphenation patterns are specified in terms of the output = encoding. > > This means that every character appearing in the hyphenation = rules > > must have a physical slot in the selected font. > > only in the internal storage format for patterns used within TeX. On = the > abstract level this is not at all true even though the source format = of > existing patterns tend to be written in this form as well. Frank also made an earlier comment in this regard, which, while true, is of marginal relevance to the discussion of multi-lingual/-encoded documents. Yes, it is nice if hyphenation pattern input files use symbolic representaions of characters (\ss) rather than hex code values so they can be used for different font encodings. But the font encoding must be selected when the format is generated! This doesn't help the user who wants to use various font encodings, and it certainly does not facilitate multi-encoded documents. For hyphenation purposes, multiple encodings must be treated as multiple languages. This again points to the need for babel to specify the desired/required font (encoding) when it selects a language. > > However, logically > > hyphenation should not depend on output encoding, and one should = be > > able to mix fonts with different output encodings without losing > > correct hyphenation. > > yes, and it is possible without technical problems (in theory) Possible in TeX as it stands??? Only by loading the patterns resolved for each encoding. Or does "in theory" really mean what it says -- not in practice. > > - It is rather hard to make a new font available under LaTeX. > > Essentially one must create a virtual font which has all the > > character slots in the places where hyphenation expects them to = be. > > wrong. Wrong...I guess. Maybe Frank runs LaTeX on initex and has patched fontenc.sty to load patterns for whatever font encoding is requested. This does bring us to the point about "internal representation". TeX has different levels of internals, and at the level where it builds a horizontal list (as opposed to the higher level of the macro = definitions) the character tokens must map directly to the corresponding font. Some can call this a lack of distinct internal representation. Others can say the relevant representation is in the macros (as with inputenc). Still others can say TeX's internal representation is independent because of virtual fonts. > > - TeX diagnostic messages output the "internal representation", = which > > can quickly become unreadable for scripts that are not = essentially > > ASCII. > > which diagnostics we are talking about here? some of them are in the = font > encoding (which is not the LICR at all) I agree. This is not the issue. In fact, it is only an issue for system configuration! Most TeX implementations now allow messages to be printed without conversion to ^^ format. Donald Arseneau asnd@triumf.ca ------_=_NextPart_001_01C09648.4FFA7680 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary

Frank Mittelbach = <frank.mittelbach@LATEX-PROJECT.ORG> writes:

>  > - Hyphenation patterns are specified = in terms of the output encoding.
>  >   This means that every = character appearing in the hyphenation rules
>  >   must have a physical slot = in the selected font.
>
> only in the internal storage format for patterns = used within TeX. On the
> abstract level this is not at all true even = though the source format of
> existing patterns tend to be written in this = form as well.

Frank also made an earlier comment in this regard, = which, while true,
is of marginal relevance to the discussion of = multi-lingual/-encoded
documents.

Yes, it is nice if hyphenation pattern input files use = symbolic
representaions of characters (\ss) rather than hex = code values
so they can be used for different font = encodings.  But the font
encoding must be selected when the format is = generated!  This
doesn't help the user who wants to use various font = encodings,
and it certainly does not facilitate multi-encoded = documents.

For hyphenation purposes, multiple encodings must be = treated as
multiple languages.  This again points to the = need for babel to
specify the desired/required font (encoding) when it = selects
a language.

>  >   However, logically
>  >   hyphenation should not = depend on output encoding, and one should be
>  >   able to mix fonts with = different output encodings without losing
>  >   correct = hyphenation.
>
> yes, and it is possible without technical = problems (in theory)

Possible in TeX as it stands???  Only by loading = the patterns
resolved for each encoding.  Or does "in = theory" really mean
what it says -- not in practice.

>  > - It is rather hard to make a new font = available under LaTeX.
>  >   Essentially one must = create a virtual font which has all the
>  >   character slots in the = places where hyphenation expects them to be.
>
> wrong.

Wrong...I guess.   Maybe Frank runs LaTeX on = initex and has patched
fontenc.sty to load patterns for whatever font = encoding is requested.

This does bring us to the point about "internal = representation".
TeX has different levels of internals, and at the = level where it builds
a horizontal list (as opposed to the higher level of = the macro definitions)
the character tokens must map directly to the = corresponding font.
Some can call this a lack of distinct internal = representation.
Others can say the relevant representation is in the = macros (as
with inputenc).  Still others can say TeX's = internal representation
is independent because of virtual fonts.

>  > - TeX diagnostic messages output the = "internal representation", which
>  >   can quickly become = unreadable for scripts that are not essentially
>  >   ASCII.
>
> which diagnostics we are talking about here? = some of them are in the font
> encoding (which is not the LICR at all)

I agree.  This is not the issue.  In fact, = it is only an issue for
system configuration!  Most TeX implementations = now allow messages
to be printed without conversion to ^^ format.

Donald = Arseneau           = ;            =    asnd@triumf.ca

------_=_NextPart_001_01C09648.4FFA7680--