Received: from mail.proteosys.com ([62.225.9.49]) by nummer-3.proteosys with Microsoft SMTPSVC(5.0.2195.5329); Thu, 5 Jun 2003 21:21:58 +0200 Received: by mail.proteosys.com (8.12.9/8.12.2) with ESMTP id h55JLtPQ020123 for ; Thu, 5 Jun 2003 21:21:56 +0200 Received: from sun.dante.de (root@sun.dante.de [134.100.9.52]) by rzdspc1.informatik.uni-hamburg.de (8.12.9/8.12.9) with ESMTP id h55JKQch003111 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 5 Jun 2003 21:20:26 +0200 (CEST) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C32B97.BB5D0F00" Received: from rzdspc1.informatik.uni-hamburg.de (rzdspc1.informatik.uni-hamburg.de [134.100.9.61]) by sun.dante.de (8.12.9/8.12.9) with ESMTP id h55JK0HS023064 for ; Thu, 5 Jun 2003 21:20:00 +0200 (CEST) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by rzdspc1.informatik.uni-hamburg.de (8.12.9/8.12.9) with ESMTP id h55JJvcg003084 for ; Thu, 5 Jun 2003 21:19:58 +0200 (CEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from [212.227.126.205] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 19O0HF-0008N6-00; Thu, 05 Jun 2003 21:19:57 +0200 Received: from [80.129.5.205] (helo=istrati.mittelbach-online.de) by mrelayng.kundenserver.de with asmtp (TLSv1:EDH-RSA-DES-CBC3-SHA:168) (Exim 3.35 #1) id 19O0HE-0004G4-00; Thu, 05 Jun 2003 21:19:56 +0200 Received: (from frank@localhost) by istrati.mittelbach-online.de (8.11.2/8.11.2/SuSE Linux 8.11.1-0.5) id h55JIj408943; Thu, 5 Jun 2003 21:18:45 +0200 In-Reply-To: References: <16095.32737.201598.296665@istrati.mittelbach-online.de> Return-Path: X-Mailer: VM 6.96 under Emacs 20.7.1 X-OriginalArrivalTime: 05 Jun 2003 19:21:58.0668 (UTC) FILETIME=[BBC2FCC0:01C32B97] X-Authentication-Warning: istrati.mittelbach-online.de: frank set sender to frank@mittelbach-online.de using -f X-Scanned-By: MIMEDefang 2.28 (www . roaringpenguin . com / mimedefang) X-Spam-Score: -16.7 () IN_REP_TO,REFERENCES,USER_AGENT_VM,X_AUTH_WARNING Content-class: urn:content-classes:message Subject: Re: announce: inputenc support for utf8 Date: Thu, 5 Jun 2003 20:18:45 +0100 Message-ID: <16095.38805.105960.823796@istrati.mittelbach-online.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: announce: inputenc support for utf8 Thread-Index: AcMrl7vmjh4Gzan8QHuXEyKxxISKhg== From: "Frank Mittelbach" To: "Apostolos Syropoulos" Cc: , Reply-To: "Frank Mittelbach" Status: R X-Status: X-Keywords: X-UID: 4624 This is a multi-part message in MIME format. ------_=_NextPart_001_01C32B97.BB5D0F00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Apostolos, > > > ... and I would be very happy to work for the support of the = Greek > > > language! > > > > that's fine, but before that is possible there would need to be a = greek > > font encoding that conforms to LaTeX specifications. we looked at = what is > > there right now, but all we found have been encodings that replace = ascii > > chars by something else. >=20 > Thatis true. However, I wrote such encoding files for the Greek = language > support of ConTeXt and so I will create the necessary files for = LaTeX. well, it is not just creating the files, one should consider a number of design issues, eg - try to make an encoding that is also suitable for "non-TeX-world" = fonts, ie avoid the problems we now have with T1 that no Postscript font, say, implement all glyphs (fortunately the number of missing glyphs is = small there but nevertheless) - try to get those glyphs together that provide the best benefit, eg = what needs to be there to allow proper hyphenation (for different dialects/languages) --- symbols that are not necessary for this = process can go in a companion symbol encoding - other stuff that i may have forgotten this is also the reason why i cc'd Vladimir since he my help you with = some inside how we arrived at the cyrillic encodings eventually > > see what happened to cyrillic: there is T2(A-C) which are official = LaTeX > > encodings as well as X2 which is an extended encoding where you are = on > > your own. If there will be a greek encoding (or more) that fit = those > > restrictions needed for multi-lingual processing then adding utf8 = support > > will be possible without much fuss. >=20 > Okay, but we need a new name for the Greek encoding: LGR is not a = proper > name. Currently there are two basic Greek encodings ISO-8859-7 and > Windows-1253 (their only difference is the slot reserved for Capital > Alpha with tonos) and so I believe one file can be used for both > encodings. So who will ``allocate'' the new encoding name? those are input encodings not fontencodings right? though there is no = problem in taking an input encoding as a font encoding this is not necessarily = the best solution (given the criteria above) as for the name, this would be finally allocated by us, if there is an agreement within the Greek TeX community that this is the right "font" encoding(s) to go for, and that there will be some effort to actually = produce, say, virtual fonts in that encoding the name itself is most likely going to be T7 (or T7A, T7B, ... if there = is more than one encoding and X7 for an extended encoding that my be = necessary for "traditional greek" where you need many more than 128 gyphs for = proper hyphenation, if i remember correctly), but again i like to stress that = this is only the name in the future; before i would be willing to put into the documentation that such and such is an official encoding the above = process should have happened as it would freeze that encoding. so while the encoding is still being developed, discussed and further modified, etc, it should either be run under some L* name or as E7 for experimental in the same fashion itwas donefor other encodings while = they where still under development. i don't know if you come to Brest. in case you do, we might find the = time to talk about it a bit further. right now other commitments will not allow me to participate at all in = any such activities for a good while best frank ------_=_NextPart_001_01C32B97.BB5D0F00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: announce: inputenc support for utf8

Apostolos,

 > >  > ... and I would be very = happy to work for the support of the Greek
 > >  > language!
 > >
 > > that's fine, but before that is = possible there would need to be a greek
 > > font encoding that conforms to LaTeX = specifications. we looked at what is
 > > there right now, but all we found = have been encodings that replace ascii
 > > chars by something else.
 >
 > Thatis true. However, I wrote such = encoding files for the Greek language
 > support of ConTeXt and so I will create = the necessary files for LaTeX.

well, it is not just creating the files, one should = consider a number of
design issues, eg

 - try to make an encoding that is also suitable = for "non-TeX-world" fonts, ie
   avoid the problems we now have with T1 = that no Postscript font, say,
   implement all glyphs (fortunately the = number of missing glyphs is small
   there but nevertheless)

 - try to get those glyphs together that provide = the best benefit, eg what
   needs to be there to allow proper = hyphenation (for different
   dialects/languages) --- symbols that are = not necessary for this process can
   go in a companion symbol encoding

 - other stuff that i may have forgotten

this is also the reason why i cc'd Vladimir since he = my help you with some
inside how we arrived at the cyrillic encodings = eventually

 > > see what happened to cyrillic: there = is T2(A-C) which are official LaTeX
 > > encodings as well as X2 which is an = extended encoding where you are on
 > > your own. If there will be a greek = encoding (or more) that fit those
 > > restrictions needed for multi-lingual = processing then adding utf8 support
 > > will be possible without much = fuss.
 >
 > Okay, but we need a new name for the Greek = encoding: LGR is not a proper
 > name. Currently there are two basic Greek = encodings ISO-8859-7 and
 > Windows-1253 (their only difference is the = slot reserved for Capital
 > Alpha with tonos) and so I believe one = file can be used for both
 > encodings. So who will ``allocate'' the = new encoding name?

those are input encodings not fontencodings right? = though there is no problem
in taking an input encoding as a font encoding this = is not necessarily the
best solution (given the criteria above)

as for the name, this would be finally allocated by = us, if there is an
agreement within the Greek TeX community that this is = the right "font"
encoding(s) to go for, and that there will be some = effort to actually produce,
say, virtual fonts in that encoding

the name itself is most likely going to be T7 (or T7A, = T7B, ... if there is
more than one encoding and X7 for an extended = encoding that my be necessary
for "traditional greek" where you need many = more than 128 gyphs for proper
hyphenation, if i remember correctly), but again i = like to stress that this is
only the name in the future; before i would be = willing to put into the
documentation that such and such is an official = encoding the above process
should have happened as it would freeze that = encoding.

so while the encoding is still being developed, = discussed and further
modified, etc, it should either be run under some L* = name or as E7 for
experimental in the same fashion itwas donefor other = encodings while they
where still under development.

i don't know if you come to Brest. in case you do, we = might find the time to
talk about it a bit further.

right now other commitments will not allow me to = participate at all in any
such activities for a good while

best
frank

------_=_NextPart_001_01C32B97.BB5D0F00--