Received: from webgate.proteosys.de (mail.proteosys-ag.com [62.225.9.49]) by lucy.proteosys (8.11.0/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id f07F34p03969 for ; Sun, 7 Jan 2001 16:03:04 +0100 Received: by webgate.proteosys.de (8.11.0/8.11.0) with ESMTP id f07F3H700591 . for ; Sun, 7 Jan 2001 16:03:17 +0100 Received: from mail.Uni-Mainz.DE (mailserver1.zdv.Uni-Mainz.DE [134.93.8.30]) by mailgate1.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f07F33M00185 for ; Sun, 7 Jan 2001 16:03:03 +0100 (MET) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C078BA.EF468400" Received: from mailgate2.zdv.Uni-Mainz.DE (mailgate2.zdv.Uni-Mainz.DE [134.93.8.57]) by mail.Uni-Mainz.DE (8.9.3/8.9.3) with ESMTP id QAA02882 for ; Sun, 7 Jan 2001 16:03:03 +0100 (MET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from mail.listserv.gmd.de (mail.listserv.gmd.de [192.88.97.5]) by mailgate2.zdv.Uni-Mainz.DE (8.11.0/8.10.2) with ESMTP id f07F30011845 for ; Sun, 7 Jan 2001 16:03:00 +0100 (MET) Received: from mail.listserv.gmd.de (192.88.97.5) by mail.listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <12.C59E9269@mail.listserv.gmd.de>; Sun, 7 Jan 2001 16:03:00 +0100 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 478221 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Sun, 7 Jan 2001 16:02:56 +0100 Received: from ix.urz.uni-heidelberg.de (mail.urz.uni-heidelberg.de [129.206.119.234]) by relay.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id QAA03352 for ; Sun, 7 Jan 2001 16:02:55 +0100 (MET) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by ix.urz.uni-heidelberg.de (8.8.8/8.8.8) with ESMTP id QAA17628 for ; Sun, 7 Jan 2001 16:02:56 +0100 Received: from angel.algonet.se (angel.algonet.se [194.213.74.112]) by relay.uni-heidelberg.de (8.10.2+Sun/8.10.2) with SMTP id f07F2nU12030 for ; Sun, 7 Jan 2001 16:02:52 +0100 (MET) Received: (qmail 14669 invoked from network); 7 Jan 2001 16:02:39 +0100 Received: from garibaldi.tninet.se (HELO algonet.se) (195.100.94.103) by angel.algonet.se with SMTP; 7 Jan 2001 16:02:39 +0100 Received: from [195.100.226.131] (du131-226.ppp.su-anst.tninet.se [195.100.226.131]) by garibaldi.tninet.se (BLUETAIL Mail Robustifier 2.2.1) with ESMTP id 757351.879756.978garibaldi-s0 ; Sun, 07 Jan 2001 16:02:36 +0100 In-Reply-To: <200101061950.OAA03845@pluto.math.albany.edu> Return-Path: X-Sender: haberg@pop.matematik.su.se Content-class: urn:content-classes:message Subject: Re: GELLMU progress Date: Sun, 7 Jan 2001 15:57:14 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Hans Aberg" Sender: "Mailing list for the LaTeX3 project" To: "Multiple recipients of list LATEX-L" Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 3644 This is a multi-part message in MIME format. ------_=_NextPart_001_01C078BA.EF468400 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable At 14:50 -0500 1-01-06, William F. Hammond wrote: >> If you are in the need of various translations, have you tried using = Flex >> (lexical analyzer generator) and Bison (parser generator, or >> compiler-compiler), see > >Are you saying that it's easier to code translations from XML using >lex and yacc descendants rather than using standard XML tools such as >sgmlspl, jade, or xt? I find that hard to believe. (Of course, the >situation before 1996 was different.) I do not know exactly what you want to achieve: I get the impression = that you have an language of your own of some sort, and want to be able to translate it into different formats. If your language is just a dialect = of XML, and there are XML parser generators available similar to that of Bison, then use that. The translation I needed was as follows: From my own language, I want to output C++ code. This proved very difficult, because local code = generates information (such as include files, declarations, and definitions) that should be output in different places and files in the C++ output files. Therefore, instead of doing the parsing immediately into a new language, = I invented an intermediate "formatting" language: Given a set of macro definitions, normally provided a formatting file (thus providing the specific data of the output language, in my case, C++), and a set of iterated lookup tables (in iternal binary format), produced by the = parsing, it knows how to pick together suitable output files. The idea is to make the actual parsing as independent as possible of any output language, only producing the lookup tables. Then by merely = switching the formatting file with the macro definitions, one can generate output = to different languages. >> -- I use them together with C++, which is convenient as the latter = has >> standard string classes. > >Although I've written in C, I've never gotten into C++. Are there >good regular expression libraries for C++? If you need full regular expressions and a full LR(1) parser within your language, then the simplest approach is to let your language output Flex = .l and Bison .y files; then compile these files using Flex and Bison, and finally compile the files so output using a C++ compiler. This is sort = of a standard computer technique: for example, the Haskell compiler GHC = produces .c files in this way. Also note that Flex and Bison are themselves also compilers, and one can use Flex and Bison to write new versions of themselves. -- Actually, = they do. :-) -- I only use C++ because it is convenient to produce an internal binary representation, which later can be used to produce the C++ output = format. The iterated lookup tables I use are just map (meaning that one can index a finite set of variable by string keys) = where "variable" is a class with suitable lookup information. Let's take a simple example: In the output in my application, I need to build a sequence of classes, which can have a sequence of methods, with definitions that should be output in various places. The main point is = that one has a sequence of lookup localities, like in most modern computer languages. In my formatting file, I may have something like the stuff below: Here, <#header|...|header#> <#header|...|#> encloses a macro definition, and <|header_name|> is an invocation of the variable "header_name", and so on. <#header| #ifndef Synergy_<|header_name|>_header #define Synergy_<|header_name|>_header #if !__cplusplus #error Header file "<|header_name|>" only for C++. #endif #include #include "data" #include "construct" <|header_preamble|> namespace Synergy { <|class.declaration|> } // namespace Synergy #endif // <|header_name|>_ |header#> <#class.declaration| extern Synergy::data global_<|class_name:cpp|>; class <|class_name:cpp|> : public virtual construct { public: static const char* category; static object_method_base* lookup_method(const std::string&); static Synergy::data global; class object; typedef <|class_name:cpp|> constructor; virtual root* clone() const { return new Synergy::<|class_name:cpp|>(*this); } virtual bool cloneable() { return <|object_cloneable|>; } <|object_copy_to_clone_method|> virtual Synergy::data method_method(Synergy::data&); virtual Synergy::data method_object(Synergy::data& x) { return new object(x); } virtual Synergy::data method_object_method(Synergy::data& x); <|constructor_method.declaration|> <|constructor_cpp.declare|> class object : <|object_base|><|object_cpp_base|> { public: static const char* category; static object_method_base* lookup_method(const std::string&); <|object_constructor.declaration|> virtual root* clone() const { <|clone_method_definition|> } <|copy_method|> <|object_data|> virtual Synergy::data method_constructor(Synergy::data&) { return Synergy::global_<|class_name:cpp|>; } virtual Synergy::data method_method(Synergy::data&); <|method.declaration|> <|object_cpp.declare|> }; }; |class.declaration#> <#method.declaration| virtual Synergy::data method_<|method_name:cpp|>(Synergy::data&)<|method_is_abstract|>; |#> In my approach each variable can actually have a sequence of lookups attached to it, so it becomes easy to sequence a series of classes with = the same template. Suppose that we want to format a class named `foo' with an object method named `bar' (among other data). Then the C++ code for that (the way I implemented it) would look something like // Create a new class named "foo": (*table)["class"][push_back]["class_name"] =3D "foo"; // Create a method named "bar" belonging to last created class = ("foo"): (*table)["class"][last]["method"][push_back]["method_name"] =3D "bar"; The formatter then uses this lookup table with same kind of iterated localities like in say TeX, or any other modern computer language: When = one prints out the "header" macro, when it encounters the = "class.declaration" variable, it iterates through all classes using the "class.declaration" macro definition. Then, when in the "class.declaration" definition, when = it encounters the "method.declaration", it iterates through all methods _in_that_class_. If a name is not found locally, it iterates towards the base to find a more global name. >> One approach is to parse objects into something like the DOM = (Document >> Object Model, http://www.w3.org/), and then onto that hook a program = that >> can translate into several different formats. > >Of course, sgmlspl, jade, xt, and other standard sgml/xml tools >provide good frameworks for translating into as many different formats >as one likes by writing, respectively, Perl, DSSSL, and XSLT. >(Possibly also it would be viable to use David Carlisle's xmltex >followed by Eitan Gurari's tex4ht in which case one writes TeX.) So actually, I do not parse into a language, but into a binary model, = which has essentially the same general capacities (a local lookup system) of = any language. Then I use another program to format that into a suitable language. > I wonder how some >of these things would survive a double translation > > gellmu/article ---(hypothetical)---> TEI ----> LaTeX . So what I use is something like this your "hypothetical" label here, = except that it is not a language that I use, but a binary model, a sequence of iterated lookup tables. >2. The default "article" document type for _regular_ GELLMU provides >three character names for each of the 33 non-alphanumeric but >printable ASCII characters. As it is a binary model, such parsing concerns are irrelevant. For example, I wanted to write classes with _arbitrary_ binary string names, which does not work with C++, which only allows alpha-numerical names and underscore with some restrictions. But it is easy to mangle (encode) arbitrary binary string names, which I did by an addition to = the formatter; then it is also irrelevant what kind of parsing I use in my original language to produce arbitrary binary string names. If one plays this game along, one ends up with developing a better and better intermediate binary model. For example, suppose I want to write a floating number. Right now, it would suffice to use say the C++ syntax, = and parse them as strings which are output verbatim in the C++ files. But suppose I want to produce output to some languages with a different = syntax than C++ in this respect. Then it would be natural to represent the floating numbers in some internal binary model, and add to the formatter the capacity to write out floating point numbers in different formats. Of course, my needs are specialized at OOPL -> OOPL language = translations, and DPL ("document PL") translations may have other needs. Hans Aberg ------_=_NextPart_001_01C078BA.EF468400 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: GELLMU progress

At 14:50 -0500 1-01-06, William F. Hammond = wrote:
>> If you are in the need of various = translations, have you tried using Flex
>> (lexical analyzer generator) and Bison = (parser generator, or
>> compiler-compiler), see
>
>Are you saying that it's easier to code = translations from XML using
>lex and yacc descendants rather than using = standard XML tools such as
>sgmlspl, jade, or xt?  I find that hard to = believe.  (Of course, the
>situation before 1996 was different.)

I do not know exactly what you want to achieve: I get = the impression that
you have an language of your own of some sort, and = want to be able to
translate it into different formats. If your language = is just a dialect of
XML, and there are XML parser generators available = similar to that of
Bison, then use that.

The translation I needed was as follows: From my own = language, I want to
output C++ code. This proved very difficult, because = local code generates
information (such as include files, declarations, and = definitions) that
should be output in different places and files in the = C++ output files.

Therefore, instead of doing the parsing immediately = into a new language, I
invented an intermediate "formatting" = language: Given a set of macro
definitions, normally provided a formatting file = (thus providing the
specific data of the output language, in my case, = C++), and a set of
iterated lookup tables (in iternal binary format), = produced by the parsing,
it knows how to pick together suitable output = files.

The idea is to make the actual parsing as independent = as possible of any
output language, only producing the lookup tables. = Then by merely switching
the formatting file with the macro definitions, one = can generate output to
different languages.

>> -- I use them together with C++, which is = convenient as the latter has
>> standard string classes.
>
>Although I've written in C, I've never gotten = into C++.  Are there
>good regular expression libraries for C++?

If you need full regular expressions and a full LR(1) = parser within your
language, then the simplest approach is to let your = language output Flex .l
and Bison .y files; then compile these files using = Flex and Bison, and
finally compile the files so output using a C++ = compiler. This is sort of a
standard computer technique: for example, the Haskell = compiler GHC produces
.c files in this way.

Also note that Flex and Bison are themselves also = compilers, and one can
use Flex and Bison to write new versions of = themselves. -- Actually, they
do. :-)

-- I only use C++ because it is convenient to produce = an internal binary
representation, which later can be used to produce = the C++ output format.
The iterated lookup tables I use are just
  map<string, variable>
(meaning that one can index a finite set of variable = by string keys) where
"variable" is a class with suitable lookup = information.

Let's take a simple example: In the output in my = application, I need to
build a sequence of classes, which can have a = sequence of methods, with
definitions that should be output in various places. = The main point is that
one has a sequence of lookup localities, like in most = modern computer
languages.

In my formatting file, I may have something like the = stuff below: Here,
  <#header|...|header#>
  <#header|...|#>
encloses a macro definition, and = <|header_name|> is an invocation of the
variable "header_name", and so on.

<#header|
#ifndef Synergy_<|header_name|>_header
#define Synergy_<|header_name|>_header

#if !__cplusplus
#error Header file "<|header_name|>" = only for C++.
#endif

#include <stdexcept>

#include "data"
#include "construct"

<|header_preamble|>

namespace Synergy {

<|class.declaration|>

} // namespace Synergy

#endif // <|header_name|>_
|header#>

<#class.declaration|
extern Synergy::data = global_<|class_name:cpp|>;
class <|class_name:cpp|> : public virtual = construct {
public:
  static const char* category;
  static object_method_base* lookup_method(const = std::string&);
  static Synergy::data global;
  class object;
  typedef <|class_name:cpp|> = constructor;
  virtual root* clone() const { return = new
Synergy::<|class_name:cpp|>(*this); }
  virtual bool cloneable() { return = <|object_cloneable|>; }
  <|object_copy_to_clone_method|>
  virtual Synergy::data = method_method(Synergy::data&);
  virtual Synergy::data = method_object(Synergy::data& x) { return new
object(x); }
  virtual Synergy::data = method_object_method(Synergy::data& x);
  = <|constructor_method.declaration|>
  <|constructor_cpp.declare|>

  class object : = <|object_base|><|object_cpp_base|> {
  public:
    static const char* = category;
    static object_method_base* = lookup_method(const std::string&);
    = <|object_constructor.declaration|>
    virtual root* clone() const { = <|clone_method_definition|> }
    <|copy_method|>
    <|object_data|>
    virtual Synergy::data = method_constructor(Synergy::data&) { return
Synergy::global_<|class_name:cpp|>; }
    virtual Synergy::data = method_method(Synergy::data&);
    = <|method.declaration|>
    = <|object_cpp.declare|>
  };
};
|class.declaration#>

<#method.declaration|
virtual Synergy::data
method_<|method_name:cpp|>(Synergy::data&)<|method_= is_abstract|>;
|#>


In my approach each variable can actually have a = sequence of lookups
attached to it, so it becomes easy to sequence a = series of classes with the
same template.

Suppose that we want to format a class named `foo' = with an object method
named `bar' (among other data). Then the C++ code for = that (the way I
implemented it) would look something like
    // Create a new class named = "foo":
  = (*table)["class"][push_back]["class_name"] =3D = "foo";
    // Create a method named = "bar" belonging to last created class = ("foo"):
  = (*table)["class"][last]["method"][push_back]["me= thod_name"] =3D "bar";

The formatter then uses this lookup table with same = kind of iterated
localities like in say TeX, or any other modern = computer language: When one
prints out the "header" macro, when it = encounters the "class.declaration"
variable, it iterates through all classes using the = "class.declaration"
macro definition. Then, when in the = "class.declaration" definition, when it
encounters the "method.declaration", it = iterates through all methods
_in_that_class_. If a name is not found locally, it = iterates towards the
base to find a more global name.

>> One approach is to parse objects into = something like the DOM (Document
>> Object Model, http://www.w3.org/), and then onto that = hook a program that
>> can translate into several different = formats.
>
>Of course, sgmlspl, jade, xt, and other standard = sgml/xml tools
>provide good frameworks for translating into as = many different formats
>as one likes by writing, respectively, Perl, = DSSSL, and XSLT.
>(Possibly also it would be viable to use David = Carlisle's xmltex
>followed by Eitan Gurari's tex4ht in which case = one writes TeX.)

So actually, I do not parse into a language, but into = a binary model, which
has essentially the same general capacities (a local = lookup system) of any
language. Then I use another program to format that = into a suitable
language.

>  I wonder how some
>of these things would survive a double = translation
>
>      gellmu/article = ---(hypothetical)---> TEI ----> LaTeX .

So what I use is something like this your = "hypothetical" label here, except
that it is not a language that I use, but a binary = model, a sequence of
iterated lookup tables.

>2.  The default "article" document = type for _regular_ GELLMU provides
>three character names for each of the 33 = non-alphanumeric but
>printable ASCII characters.

As it is a binary model, such parsing concerns are = irrelevant.

For example, I wanted to write classes with = _arbitrary_ binary string
names, which does not work with C++, which only = allows alpha-numerical
names and underscore with some restrictions. But it = is easy to mangle
(encode) arbitrary binary string names, which I did = by an addition to the
formatter; then it is also irrelevant what kind of = parsing I use in my
original language to produce arbitrary binary string = names.

If one plays this game along, one ends up with = developing a better and
better intermediate binary model. For example, = suppose I want to write a
floating number. Right now, it would suffice to use = say the C++ syntax, and
parse them as strings which are output verbatim in = the C++ files. But
suppose I want to produce output to some languages = with a different syntax
than C++ in this respect. Then it would be natural to = represent the
floating numbers in some internal binary model, and = add to the formatter
the capacity to write out floating point numbers in = different formats.

Of course, my needs are specialized at OOPL -> OOPL = language translations,
and DPL ("document PL") translations may = have other needs.

  Hans Aberg

------_=_NextPart_001_01C078BA.EF468400--