MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C0781A.100ACD00"
Content-class: urn:content-classes:message
Subject:      Re: GELLMU progress
Date: Sat, 6 Jan 2001 20:50:40 +0100
Message-ID:  <200101061950.OAA03845@pluto.math.albany.edu>
From: "William F. Hammond" <hammond@CSC.ALBANY.EDU>
Sender: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
To: "Multiple recipients of list LATEX-L" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Reply-To: "Mailing list for the LaTeX3 project" <LATEX-L@URZ.UNI-HEIDELBERG.DE>
Status: R

This is a multi-part message in MIME format.

------_=_NextPart_001_01C0781A.100ACD00
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hans Aberg <haberg@MATEMATIK.SU.SE> writes:

> >...
> >I would also like to see somebody translate it to TEI and then =
compare
> >the HTML and LaTeX formattings obtained chez Rahtz from TEI with the
> >native GELLMU formattings.

Actually, I am more interested in getting a copy for Info trees than
in a TEI copy.  (And there is now an SGML version of Texinfo thanks to
Daniele Giacomini that formats to Texinfo.)  I guess I thought that
TEI fans might bite.  I also believe that a DocBook version would
prove useful inasmuch as Docbook is used by the Linux Documentation
Project.

> If you are in the need of various translations, have you tried using =
Flex
> (lexical analyzer generator) and Bison (parser generator, or
> compiler-compiler), see

Are you saying that it's easier to code translations from XML using
lex and yacc descendants rather than using standard XML tools such as
sgmlspl, jade, or xt?  I find that hard to believe.  (Of course, the
situation before 1996 was different.)

[snip]
> -- I use them together with C++, which is convenient as the latter has
> standard string classes.

Although I've written in C, I've never gotten into C++.  Are there
good regular expression libraries for C++?

> One approach is to parse objects into something like the DOM (Document
> Object Model, http://www.w3.org/), and then onto that hook a program =
that
> can translate into several different formats.

Of course, sgmlspl, jade, xt, and other standard sgml/xml tools
provide good frameworks for translating into as many different formats
as one likes by writing, respectively, Perl, DSSSL, and XSLT.
(Possibly also it would be viable to use David Carlisle's xmltex
followed by Eitan Gurari's tex4ht in which case one writes TeX.)

The power of sgmlspl (though not the speed) can match that of any
method except possibly when one wants to descend into CDATA segments.
But then if one finds one's self tempted^{1} to do that (as one might,
for example, in typesetting with TeX or LaTeX the name of TeX or LaTeX
or even the ASCII character '~' from an XML document type that does
not provide these things as empties^{2}), one should instead customize
one's XML document type.

                                    -- Bill

Notes:

1.  There is one reasonable situation where descent into CDATA
*should* take place: math mode contents need to be thoroughly parsed
in translation to MathML from a document type that mathematical
authors will find tolerable.  But there is no issue of that type in
connection with http://math.albany.edu:8010/glf/lfaq.xml although,
alas, one will find <tex/>, <latex/>, and <tld/>.  I wonder how some
of these things would survive a double translation

      gellmu/article ---(hypothetical)---> TEI ----> LaTeX .

2.  The default "article" document type for _regular_ GELLMU provides
three character names for each of the 33 non-alphanumeric but
printable ASCII characters.  Each of those is at risk for some
conceivable translation target.  But an author may simply use one of
these characters for itself when it is safe for both LaTeX and HTML.
And, for example, by default the syntactic translator understands
things like "\$" and "\{".  If the syntactic translator's new internal
verbatim (which becomes <verblist>, a list-like thing) is used (by
calling the front gellmu-verblist for gellmu-trans), then 32 of
of these 33 names are auto-generated (';' is omitted) from literal
verbatim.  Something almost identical happens to literal inline
material like |*~$\| if "manmac" mode is enabled .

------_=_NextPart_001_01C0781A.100ACD00
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7654.12">
<TITLE>     Re: GELLMU progress</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>Hans Aberg &lt;haberg@MATEMATIK.SU.SE&gt; =
writes:</FONT>
</P>

<P><FONT SIZE=3D2>&gt; &gt;...</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;I would also like to see somebody translate =
it to TEI and then compare</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;the HTML and LaTeX formattings obtained chez =
Rahtz from TEI with the</FONT>

<BR><FONT SIZE=3D2>&gt; &gt;native GELLMU formattings.</FONT>
</P>

<P><FONT SIZE=3D2>Actually, I am more interested in getting a copy for =
Info trees than</FONT>

<BR><FONT SIZE=3D2>in a TEI copy.&nbsp; (And there is now an SGML =
version of Texinfo thanks to</FONT>

<BR><FONT SIZE=3D2>Daniele Giacomini that formats to Texinfo.)&nbsp; I =
guess I thought that</FONT>

<BR><FONT SIZE=3D2>TEI fans might bite.&nbsp; I also believe that a =
DocBook version would</FONT>

<BR><FONT SIZE=3D2>prove useful inasmuch as Docbook is used by the Linux =
Documentation</FONT>

<BR><FONT SIZE=3D2>Project.</FONT>
</P>

<P><FONT SIZE=3D2>&gt; If you are in the need of various translations, =
have you tried using Flex</FONT>

<BR><FONT SIZE=3D2>&gt; (lexical analyzer generator) and Bison (parser =
generator, or</FONT>

<BR><FONT SIZE=3D2>&gt; compiler-compiler), see</FONT>
</P>

<P><FONT SIZE=3D2>Are you saying that it's easier to code translations =
from XML using</FONT>

<BR><FONT SIZE=3D2>lex and yacc descendants rather than using standard =
XML tools such as</FONT>

<BR><FONT SIZE=3D2>sgmlspl, jade, or xt?&nbsp; I find that hard to =
believe.&nbsp; (Of course, the</FONT>

<BR><FONT SIZE=3D2>situation before 1996 was different.)</FONT>
</P>

<P><FONT SIZE=3D2>[snip]</FONT>

<BR><FONT SIZE=3D2>&gt; -- I use them together with C++, which is =
convenient as the latter has</FONT>

<BR><FONT SIZE=3D2>&gt; standard string classes.</FONT>
</P>

<P><FONT SIZE=3D2>Although I've written in C, I've never gotten into =
C++.&nbsp; Are there</FONT>

<BR><FONT SIZE=3D2>good regular expression libraries for C++?</FONT>
</P>

<P><FONT SIZE=3D2>&gt; One approach is to parse objects into something =
like the DOM (Document</FONT>

<BR><FONT SIZE=3D2>&gt; Object Model, <A =
HREF=3D"http://www.w3.org/">http://www.w3.org/</A>), and then onto that =
hook a program that</FONT>

<BR><FONT SIZE=3D2>&gt; can translate into several different =
formats.</FONT>
</P>

<P><FONT SIZE=3D2>Of course, sgmlspl, jade, xt, and other standard =
sgml/xml tools</FONT>

<BR><FONT SIZE=3D2>provide good frameworks for translating into as many =
different formats</FONT>

<BR><FONT SIZE=3D2>as one likes by writing, respectively, Perl, DSSSL, =
and XSLT.</FONT>

<BR><FONT SIZE=3D2>(Possibly also it would be viable to use David =
Carlisle's xmltex</FONT>

<BR><FONT SIZE=3D2>followed by Eitan Gurari's tex4ht in which case one =
writes TeX.)</FONT>
</P>

<P><FONT SIZE=3D2>The power of sgmlspl (though not the speed) can match =
that of any</FONT>

<BR><FONT SIZE=3D2>method except possibly when one wants to descend into =
CDATA segments.</FONT>

<BR><FONT SIZE=3D2>But then if one finds one's self tempted^{1} to do =
that (as one might,</FONT>

<BR><FONT SIZE=3D2>for example, in typesetting with TeX or LaTeX the =
name of TeX or LaTeX</FONT>

<BR><FONT SIZE=3D2>or even the ASCII character '~' from an XML document =
type that does</FONT>

<BR><FONT SIZE=3D2>not provide these things as empties^{2}), one should =
instead customize</FONT>

<BR><FONT SIZE=3D2>one's XML document type.</FONT>
</P>

<P><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
 -- Bill</FONT>
</P>

<P><FONT SIZE=3D2>Notes:</FONT>
</P>

<P><FONT SIZE=3D2>1.&nbsp; There is one reasonable situation where =
descent into CDATA</FONT>

<BR><FONT SIZE=3D2>*should* take place: math mode contents need to be =
thoroughly parsed</FONT>

<BR><FONT SIZE=3D2>in translation to MathML from a document type that =
mathematical</FONT>

<BR><FONT SIZE=3D2>authors will find tolerable.&nbsp; But there is no =
issue of that type in</FONT>

<BR><FONT SIZE=3D2>connection with <A =
HREF=3D"http://math.albany.edu:8010/glf/lfaq.xml">http://math.albany.edu:=
8010/glf/lfaq.xml</A> although,</FONT>

<BR><FONT SIZE=3D2>alas, one will find &lt;tex/&gt;, &lt;latex/&gt;, and =
&lt;tld/&gt;.&nbsp; I wonder how some</FONT>

<BR><FONT SIZE=3D2>of these things would survive a double =
translation</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gellmu/article =
---(hypothetical)---&gt; TEI ----&gt; LaTeX .</FONT>
</P>

<P><FONT SIZE=3D2>2.&nbsp; The default &quot;article&quot; document type =
for _regular_ GELLMU provides</FONT>

<BR><FONT SIZE=3D2>three character names for each of the 33 =
non-alphanumeric but</FONT>

<BR><FONT SIZE=3D2>printable ASCII characters.&nbsp; Each of those is at =
risk for some</FONT>

<BR><FONT SIZE=3D2>conceivable translation target.&nbsp; But an author =
may simply use one of</FONT>

<BR><FONT SIZE=3D2>these characters for itself when it is safe for both =
LaTeX and HTML.</FONT>

<BR><FONT SIZE=3D2>And, for example, by default the syntactic translator =
understands</FONT>

<BR><FONT SIZE=3D2>things like &quot;\$&quot; and &quot;\{&quot;.&nbsp; =
If the syntactic translator's new internal</FONT>

<BR><FONT SIZE=3D2>verbatim (which becomes &lt;verblist&gt;, a list-like =
thing) is used (by</FONT>

<BR><FONT SIZE=3D2>calling the front gellmu-verblist for gellmu-trans), =
then 32 of</FONT>

<BR><FONT SIZE=3D2>of these 33 names are auto-generated (';' is omitted) =
from literal</FONT>

<BR><FONT SIZE=3D2>verbatim.&nbsp; Something almost identical happens to =
literal inline</FONT>

<BR><FONT SIZE=3D2>material like |*~$\| if &quot;manmac&quot; mode is =
enabled .</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C0781A.100ACD00--