Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f1F7l6H11027 for ; Thu, 15 Feb 2001 08:47:06 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f1F7l6d09615 . for ; Thu, 15 Feb 2001 08:47:06 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1F7l5716116 for ; Thu, 15 Feb 2001 08:47:05 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C09723.7E00B900" Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.8.56]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id IAA14125 for ; Thu, 15 Feb 2001 08:47:05 +0100 (MET) Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f1F7l3M01896 for ; Thu, 15 Feb 2001 08:47:03 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <3.504F5326@mail.listserv.gmd.de>; Thu, 15 Feb 2001 8:46:55 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 488237 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Thu, 15 Feb 2001 08:47:00 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id IAA04205 for ; Thu, 15 Feb 2001 08:46:59 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id IAA56042 for ; Thu, 15 Feb 2001 08:46:59 +0100 Received: from moutvdom00.kundenserver.de (moutvdom00.kundenserver.de [195.20.224.149]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with ESMTP id f1F7l0x20095 for ; Thu, 15 Feb 2001 08:47:00 +0100 (MET) Received: from [195.20.224.209] (helo=mrvdom02.schlund.de) by moutvdom00.kundenserver.de with esmtp (Exim 2.12 #2) id 14TJ7y-0007qA-00 for LATEX-L@urz.uni-heidelberg.de; Thu, 15 Feb 2001 08:46:58 +0100 Received: from manz-3e364665.pool.mediaways.net ([62.54.70.101] helo=istrati.zdv.uni-mainz.de) by mrvdom02.schlund.de with esmtp (Exim 2.12 #2) id 14TJ8W-0007Fm-00 for LATEX-L@URZ.UNI-HEIDELBERG.DE; Thu, 15 Feb 2001 08:47:32 +0100 Received: (from latex3@localhost) by istrati.zdv.uni-mainz.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id IAA00751; Thu, 15 Feb 2001 08:44:11 +0100 In-Reply-To: <3A8A1B63.1803CD0F@triumf.ca> References: <3A8A1B63.1803CD0F@triumf.ca> Return-Path: X-Mailer: VM 6.75 under Emacs 20.4.1 X-Authentication-Warning: istrati.zdv.uni-mainz.de: latex3 set sender to frank@mittelbach-online.de using -f Content-class: urn:content-classes:message Subject: Re: Multilingual Encodings Summary Date: Thu, 15 Feb 2001 08:44:11 +0100 Message-ID: <14987.35019.277143.875450@istrati.zdv.uni-mainz.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Frank Mittelbach" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3931 This is a multi-part message in MIME format. ------_=_NextPart_001_01C09723.7E00B900 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Donald, > > > - Hyphenation patterns are specified in terms of the output = encoding. > > > This means that every character appearing in the hyphenation = rules > > > must have a physical slot in the selected font. > > > > only in the internal storage format for patterns used within TeX. = On the > > abstract level this is not at all true even though the source = format of > > existing patterns tend to be written in this form as well. > > Frank also made an earlier comment in this regard, which, while true, > is of marginal relevance to the discussion of multi-lingual/-encoded > documents. Marcel's summary was trying to put forward technical points to be able = to weight them against each other. I was simply trying to put them = technicaly right where i considered them wrong. but i disagree with you when you = say that it has nothing to do with it. > Yes, it is nice if hyphenation pattern input files use symbolic > representaions of characters (\ss) rather than hex code values > so they can be used for different font encodings. But the font > encoding must be selected when the format is generated! This > doesn't help the user who wants to use various font encodings, > and it certainly does not facilitate multi-encoded documents. you are right that the hyphenation patterns have to be selected at = format generation time so that you are unable to actually extend that set for a single document. however, with "these days" TeX implementations there = is typically enough room to actually store mutiple patterns and that means = that for typical usages at a site you can combine all the patterns needed = (for several languages and several font encodings) don't forget that in many cases the largest pattern set can in fact = serve for several font encodings if they (for the character set of the = corresponding language) actually have the same slot positions in the font encoding. > For hyphenation purposes, multiple encodings must be treated as > multiple languages. technically you are right but you are in fact pointing with your = statement at the basic error Don made with TeX3x: calling something \newlanguage and \language when in fact it should have been called something very = different. (eg pointer-to-hyphenation-patterns-related-to-some-output-encoding) a lot of the problems result from using the TeXnical) term. so no: not multiple encodings have to be treated as multiple languages = but within one language you need to store for each font encoding used which = of the pointer-to-hyphenation-patterns-related-to-some-output-encoding's you = have to apply when typesetting in this encoding. and given that you (these days) can store a suitable number of such pointer-to-hyphenation-patterns-related-to-some-output-encoding's you = can with a single format typeset multiscript documents for a number of = combinations of scripts. Clearly you have a limit so if you want to be able to typeset = in too many combinations you need a number of formats but thats in practice not = a real issue. However, to be able to automatically generate those internal pointer-to-hyphenation-patterns-related-to-some-output-encoding's you = need the hyphenation patterns externally stored in something which is independed = of the output encoding. > This again points to the need for babel to > specify the desired/required font (encoding) when it selects > a language. yes. if we take the approach outlined with my xnfss code enabling the = the specification and use of multiple encodings per language then such a = list should be attached to each language. you would then associate with each = such encoding per language a suitable pointer-to-hyphenation-patterns-related-to-some-output-encoding (in a = number of cases it could be the same one, eg OT1 and T1 for German would share = the same) and if you don't have an appropriate pointer-to-hyphenation-patterns-related-to-some-output-encoding you = could select the "no-hypenation" one (or raise an error and ask for a = different format) > > > > However, logically > > > hyphenation should not depend on output encoding, and one = should be > > > able to mix fonts with different output encodings without = losing > > > correct hyphenation. > > > > yes, and it is possible without technical problems (in theory) > > Possible in TeX as it stands??? Only by loading the patterns > resolved for each encoding. of course, but the resolvement process you be happening abslutely automatically in the background which why i say possible without = technical problems. and i said "in theory" because that requires hyphenation = patterns externally stored in a way that i can (for any font encoding) generate = from them the appropriate internal pointer-to-hyphenation-patterns-related-to-some-output-encoding form = without problems. > Or does "in theory" really mean > what it says -- not in practice. no > > > > - It is rather hard to make a new font available under LaTeX. > > > Essentially one must create a virtual font which has all the > > > character slots in the places where hyphenation expects them = to be. > > > > wrong. > > Wrong...I guess. Maybe Frank runs LaTeX on initex and has patched > fontenc.sty to load patterns for whatever font encoding is requested. unfortunately that isn't any longer possible with TeX as the hyphenation = tree is compacted and doesn't allow addition after the first use (ie after a paragraph started. frank ------_=_NextPart_001_01C09723.7E00B900 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Multilingual Encodings Summary

Donald,

 > >  > - Hyphenation patterns are = specified in terms of the output encoding.
 > >  >   This means = that every character appearing in the hyphenation rules
 > >  >   must have a = physical slot in the selected font.
 > >
 > > only in the internal storage format = for patterns used within TeX. On the
 > > abstract level this is not at all = true even though the source format of
 > > existing patterns tend to be written = in this form as well.
 >
 > Frank also made an earlier comment in this = regard, which, while true,
 > is of marginal relevance to the discussion = of multi-lingual/-encoded
 > documents.

Marcel's summary was trying to put forward technical = points to be able to
weight them against each other. I was simply trying = to put them technicaly
right where i considered them wrong. but i disagree = with you when you say that
it has nothing to do with it.

 > Yes, it is nice if hyphenation pattern = input files use symbolic
 > representaions of characters (\ss) rather = than hex code values
 > so they can be used for different font = encodings.  But the font
 > encoding must be selected when the format = is generated!  This
 > doesn't help the user who wants to use = various font encodings,
 > and it certainly does not facilitate = multi-encoded documents.

you are right that the hyphenation patterns have to be = selected at format
generation time so that you are unable to actually = extend that set for a
single document. however, with "these days" = TeX implementations  there is
typically enough room to actually store mutiple = patterns and that means that
for typical usages at a site you can combine all the = patterns needed (for
several languages and several font encodings)

don't forget that in many cases the largest pattern = set can in fact serve for
several font encodings if they (for the character set = of the corresponding
language) actually have the same slot positions  = in the font encoding.


 > For hyphenation purposes, multiple = encodings must be treated as
 > multiple languages.

technically you are right but you are in fact pointing = with your statement at
the basic error Don made with TeX3x: calling = something \newlanguage and
\language when in fact it should have been called = something very different.
(eg = pointer-to-hyphenation-patterns-related-to-some-output-encoding)
a lot of the problems result from using the TeXnical) = term.

so no: not multiple encodings have to be treated as = multiple languages but
within one language you need to store for each font = encoding used which of the
pointer-to-hyphenation-patterns-related-to-some-output-encoding'= s you have to
apply when typesetting in this encoding.

and given that you (these days) can store a suitable = number of such
pointer-to-hyphenation-patterns-related-to-some-output-encoding'= s you can with
a single format typeset multiscript documents for a = number of combinations of
scripts. Clearly you have a limit so if you want to = be able to typeset in too
many combinations you need a number of formats but = thats in practice not a
real issue.

However, to be able to automatically generate those = internal
pointer-to-hyphenation-patterns-related-to-some-output-encoding'= s you need the
hyphenation patterns externally stored in something = which is independed of the
output encoding.

 > This again points to the need for babel = to
 > specify the desired/required font = (encoding) when it selects
 > a language.

yes. if we take the approach outlined with my xnfss = code enabling the the
specification and use of multiple encodings per = language then such a list
should be attached to each language. you would then = associate with each such
encoding per language a suitable
pointer-to-hyphenation-patterns-related-to-some-output-encoding = (in a number
of cases it could be the same one, eg OT1 and T1 for = German would share the
same) and if you don't have an appropriate
pointer-to-hyphenation-patterns-related-to-some-output-encoding = you could
select the "no-hypenation" one (or raise an = error and ask for a different
format)

 >
 > >  >   However, = logically
 > >  >   hyphenation = should not depend on output encoding, and one should be
 > >  >   able to mix = fonts with different output encodings without losing
 > >  >   correct = hyphenation.
 > >
 > > yes, and it is possible without = technical problems (in theory)
 >
 > Possible in TeX as it stands???  Only = by loading the patterns
 > resolved for each encoding.

of course, but the resolvement process you be = happening abslutely
automatically in the background which why i say = possible without technical
problems. and i said "in theory" because = that requires hyphenation patterns
externally stored in a way that i can (for any font = encoding) generate from
them the appropriate internal
pointer-to-hyphenation-patterns-related-to-some-output-encoding = form without
problems.

 > Or does "in theory" really = mean
 > what it says -- not in practice.

no

 >
 > >  > - It is rather hard to = make a new font available under LaTeX.
 > >  >   Essentially = one must create a virtual font which has all the
 > >  >   character = slots in the places where hyphenation expects them to be.
 > >
 > > wrong.
 >
 > Wrong...I guess.   Maybe Frank = runs LaTeX on initex and has patched
 > fontenc.sty to load patterns for whatever = font encoding is requested.

unfortunately that isn't any longer possible with TeX = as the hyphenation tree
is compacted and doesn't allow addition after the = first use (ie after a
paragraph started.

frank

------_=_NextPart_001_01C09723.7E00B900--