Received: from mail.proteosys.com ([213.139.130.197]) by nummer-3.proteosys with Microsoft SMTPSVC(5.0.2195.5329); Fri, 18 Jul 2003 08:42:47 +0200 Received: by mail.proteosys.com (8.12.9/8.12.2) with ESMTP id h6I6gOcH002676 for ; Fri, 18 Jul 2003 08:42:35 +0200 Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by beaker.w-s-r.de (Postfix) with ESMTP id 5E5969F8F3 for ; Fri, 18 Jul 2003 03:23:31 +0200 (CEST) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C34CF7.CC9D4D80" Received: from listserv.uni-heidelberg.de (listserv.uni-heidelberg.de [129.206.100.27]) by relay2.uni-heidelberg.de (8.12.9/8.12.9) with ESMTP id h6I1GmGl015762; Fri, 18 Jul 2003 03:16:49 +0200 (MET DST) Received: from listserv (listserv.uni-heidelberg.de [129.206.100.27]) by listserv.uni-heidelberg.de (8.12.3/8.12.3/SuSE Linux 0.6) with ESMTP id h6HM0DO1008378; Fri, 18 Jul 2003 03:16:08 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5 Received: from LISTSERV.UNI-HEIDELBERG.DE by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8d) with spool id 0129 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Fri, 18 Jul 2003 03:16:08 +0200 Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (8.12.3/8.12.3/SuSE Linux 0.6) with ESMTP id h6I1G7M9009476 for ; Fri, 18 Jul 2003 03:16:07 +0200 Received: from mail.npc.de (fw.npc.de [62.225.140.214]) by relay.uni-heidelberg.de (8.12.9/8.12.9) with ESMTP id h6I1GXmp018941 for ; Fri, 18 Jul 2003 03:16:34 +0200 (MET DST) Received: by mail.npc.de (Postfix, from userid 1014) id 374FA1546; Fri, 18 Jul 2003 03:16:33 +0200 (CEST) In-Reply-To: <200307171432.h6HEWXrZ002742@bilbo.localnet> References: <20030710081528.A12401@diabolo.informatik.rwth-aachen.de> <78ADDA01-B2DC-11D7-8AE7-0050E4455404@atlis.com> <20030711081704.A14039@diabolo.informatik.rwth-aachen.de> <16146.60345.852158.31606@pussy.npc.de> <16150.44860.510973.820690@pussy.npc.de> <200307171432.h6HEWXrZ002742@bilbo.localnet> Return-Path: X-Mailer: VM 7.04 under 21.4 (patch 8) "Honest Recruiter" XEmacs Lucid X-OriginalArrivalTime: 18 Jul 2003 06:42:50.0564 (UTC) FILETIME=[CEBD2040:01C34CF7] X-Scanned-By: MIMEDefang 2.28 (www . roaringpenguin . com / mimedefang) X-Spam-Score: -9.9 () IN_REP_TO,REFERENCES Content-class: urn:content-classes:message Subject: XML vs. (La)TeX markup (was: XML, UTF-8 and TeX engines) Date: Fri, 18 Jul 2003 02:16:32 +0100 Message-ID: A<16151.19056.880153.478641@pussy.npc.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: XML vs. (La)TeX markup (was: XML, UTF-8 and TeX engines) Thread-Index: AcNM987kLj30S+8lRHuwbvffzqKf+w== From: "Joachim Schrod" To: Reply-To: "Mailing list for the LaTeX3 project" Status: R X-Status: X-Keywords: X-UID: 4700 This is a multi-part message in MIME format. ------_=_NextPart_001_01C34CF7.CC9D4D80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable >>>>> "BV" =3D=3D Boris Veytsman writes: JS> From: Joachim Schrod JS> I still like its markup syntax much more than JS> XML for several reasons that are really off topic here. BV> Are you sure about the off topic part? I certainly would like to BV> know these reasons. OK, you've got me. The real answer would be a paper on the inability to do pure semantic markup in most situations, and about tag economy, i.e., the relationship of amount of markup in a document compared to the amount of text. It would also cover the necessity to have either full support by editing environments, or be able to enter and maintain markup manually in "standard" editors. Perhaps a presentation at a TeX conference... :-) But then, I'll try a shorter answer: -- Good typesetting needs ``micromarkup'', things that TeX does with "~", "\ ", "\,", etc. One can imagine semantic markup for many of these items, but the amount of markup definitions and the cognitive load to use the correct markup would be too high for almost all authors. There are also issues where semantic markup gets difficult, DEK's examples of ~ usage in the TeXbook provide good examples for that. Overall, I like the economy of input here: "~", "\,", or "--" are better to read and doesn't disturb the input as much as  , &spatium;, or –. Just imagine this email written with XML entities... ;-) IMHO, the length of a tag should be related to its importance: long tags for important things, short tags for unimportant but necessary stuff. -- Space handling in TeX is more "natural" than in XML. Not in macros, mind you, but in document text. As an example, I like to be able to use blank lines to separate paragraphs, as you can see in this email. This is a markup tradition since decades, and it has proven to be useful. As another example, I also like that multiple blanks collapse to one; that drives me mad in Word. -- I like the possibility to be able to introduce non-standard TeX markup for special situations. E.g., in the TeX Directory Standard, we used markup like \begin{tdsSummary} bibtex/ \BibTeX{} input files . bib/ \BibTeX{} databases . . base/ base distribution (e.g., \path|xampl.bib|) . . misc/ single-file databases . / name of a package \end{tdsSummary} In the document source, the directory structure is much easier to read and to maintain than bibtex \BibTeX{} input files [...] package name of a package In the current source, one spots errors immediately (e.g., how many s). That would be lost in XML markup. Of course, I'm biased since I designed the markup and wrote the macros. :-) SGML provided DATATAG for that, but this was thrown out to make the parser's writer life easier. Umpf, how many parser writers do we have, compared with the number of authors? -- TeX math markup is easier to write and to read then MathML. Mathematicians can also use its flexibility to introduce arbitrary new expressions in their "natural language math". -- Editor support for (La)TeX source input is better than for XML. Actually, this is a very difficile and difficult topic that would need a paper in itself. Please note that this reflects my current view on the state of available tools; there's nothing to prevent anybody creating better XML editors -- they're promised since years, but they don't arrive. Actually, there are good XML document editors like Framemaker; but they're not as platform-independent as I would need them. (For the record, I tried many editors, and currently use psgml-mode in XEmacs. But it's not as good as AUC-TeX.) -- An often cited reason to use XML markup instead of TeX is the better support for validation and transformation of XML documents. But IMHO this is overemphasized, it is not needed as often as we discuss it. Most XML documents that I've seen are not even conformant to some schema, therefore one needs special transform scripts for more document classes that one thinks at the start of an XML project. This is from my practical experience in introducing XML in multinational large companies for mission-critical documents. There it was even very hard to achieve agreement on structures for formal documents like service level agreements -- the ad-hoc markup that may be used for informal documents is good for nothing. Hell, corporate users even don't use Word document styles when they're available and prefer to klick on their bold and italics button or change the type size directly. That's the reality I'm doing business in. Of course, there are XML validators out there -- one only has to fight with the inability to express completely sensible document structures in DTDs or schemas. The resulting document structure definitions are either very complex or very generic. Style sheets for complex schemas are very hard to write, e.g., that's one of the reasons why we don't have good support for high-quality Docbook output. Validation of very generic structures doesn't bring enough advantages, then valid documents are still nonsense. Last, but not least: If markup validation is really so important, one can and should spend effort to make a TeX validator available. There are several TeX parser implementations out there -- I wrote one myself in two weeks. (Btw, presented at the TUG conference in Santa Barbara, years ago.) They can be utilized with sensible effort. -- Actually, IMO the main disadvantage of TeX markup is the shortage of skillfull people in the job market to implement that markup. That makes any manager worth his salary shy away from TeX. For me, that's the main reason to use XML, I find more people with the needed skills. But it's late and I should stop here. I hope you got an impression of my viewpoint. As I've written above, a full elaboration is beyond the scope of this email discussion. Cheers, Joachim -- =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D Joachim Schrod Email: jschrod@acm.org Roedermark, Germany ``How do we persuade new users that spreading fonts across the = page like peanut butter across hot toast is not necessarily the route = to typographic excellence?'' -- Peter Flynn ------_=_NextPart_001_01C34CF7.CC9D4D80 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable XML vs. (La)TeX markup (was: XML, UTF-8 and TeX = engines)

>>>>> "BV" =3D=3D Boris = Veytsman <borisv@LK.NET> writes:
JS> From: Joachim Schrod = <jschrod@ACM.ORG>

JS> I still like its markup syntax much more = than
JS> XML for several reasons that are really off = topic here.

BV> Are you sure about the off topic part? I = certainly would like to
BV> know these reasons.

OK, you've got me. The real answer would be a paper on = the inability
to do pure semantic markup in most situations, and = about tag economy,
i.e., the relationship of amount of markup in a = document compared to
the amount of text. It would also cover the necessity = to have either
full support by editing environments, or be able to = enter and maintain
markup manually in "standard" editors. = Perhaps a presentation at a TeX
conference... :-)

But then, I'll try a shorter answer:

 -- Good typesetting needs ``micromarkup'', = things that TeX does with
    "~", "\ ", = "\,", etc. One can imagine semantic markup for many of
    these items, but the amount of = markup definitions and the
    cognitive load to use the correct = markup would be too high for
    almost all authors. There are also = issues where semantic markup
    gets difficult, DEK's examples of = ~ usage in the TeXbook provide
    good examples for that. Overall, I = like the economy of input here:
    "~", "\,", or = "--" are better to read and doesn't disturb the
    input as much as &nbsp;, = &spatium;, or &ndash;. Just imagine this
    email written with XML entities... = ;-) IMHO, the length of a tag
    should be related to its = importance: long tags for important
    things, short tags for unimportant = but necessary stuff.

 -- Space handling in TeX is more = "natural" than in XML. Not in
    macros, mind you, but in document = text. As an example, I like to
    be able to use blank lines to = separate paragraphs, as you can see
    in this email. This is a markup = tradition since decades, and it
    has proven to be useful. As = another example, I also like that
    multiple blanks collapse to one; = that drives me mad in Word.

 -- I like the possibility to be able to = introduce non-standard TeX
    markup for special situations. = E.g., in the TeX Directory
    Standard, we used markup = like

      = \begin{tdsSummary}
        = bibtex/           = \BibTeX{} input files
        . = bib/            = \BibTeX{} databases
        . . = base/         base distribution = (e.g., \path|xampl.bib|)
        . . = misc/         single-file = databases
        . = <package>/      name of a package
      = \end{tdsSummary}

    In the document source, the = directory structure is much easier to
    read and to maintain than

      = <tdsSummary>
        = <entry>
          = <directory>bibtex</directory>
          = <description>\BibTeX{} input files</description>
        = </entry>
       [...]
        = <entry>
          = <directory><subdir/><variable>package</variable>&= lt;/directory>
          = <description>name of a package</description>
        = </entry>
      = </tdsSummary>

    In the current source, one spots = errors immediately (e.g., how
    many <subdir/>s). That would = be lost in XML markup. Of course, I'm
    biased since I designed the markup = and wrote the macros. :-) SGML
    provided DATATAG for that, but = this was thrown out to make the
    parser's writer life easier. Umpf, = how many parser writers do we
    have, compared with the number of = authors?

 -- TeX math markup is easier to write and to = read then MathML.
    Mathematicians can also use its = flexibility to introduce arbitrary
    new expressions in their = "natural language math".

 -- Editor support for (La)TeX source input is = better than for XML.
    Actually, this is a very difficile = and difficult topic that would
    need a paper in itself. Please = note that this reflects my current
    view on the state of available = tools; there's nothing to prevent
    anybody creating better XML = editors -- they're promised since
    years, but they don't arrive. = Actually, there are good XML
    document editors like Framemaker; = but they're not as
    platform-independent as I would = need them. (For the record, I
    tried many editors, and currently = use psgml-mode in XEmacs. But
    it's not as good as = AUC-TeX.)

 -- An often cited reason to use XML markup = instead of TeX is the
    better support for validation and = transformation of XML documents.
    But IMHO this is overemphasized, = it is not needed as often as we
    discuss it. Most XML documents = that I've seen are not even
    conformant to some schema, = therefore one needs special transform
    scripts for more document classes = that one thinks at the start of
    an XML project.

    This is from my practical = experience in introducing XML in
    multinational large companies for = mission-critical documents.
    There it was even very hard to = achieve agreement on structures for
    formal documents like service = level agreements -- the ad-hoc
    markup that may be used for = informal documents is good for
    nothing. Hell, corporate users = even don't use Word document styles
    when they're available and prefer = to klick on their bold and
    italics button or change the type = size directly. That's the
    reality I'm doing business = in.

    Of course, there are XML validators = out there -- one only has to
    fight with the inability to = express completely sensible document
    structures in DTDs or schemas. The = resulting document structure
    definitions are either very = complex or very generic. Style sheets
    for complex schemas are very hard = to write, e.g., that's one of
    the reasons why we don't have good = support for high-quality
    Docbook output. Validation of very = generic structures doesn't
    bring enough advantages, then = valid documents are still nonsense.

    Last, but not least: If markup = validation is really so important,
    one can and should spend effort to = make a TeX validator available.
    There are several TeX parser = implementations out there -- I wrote
    one myself in two weeks. (Btw, = presented at the TUG conference in
    Santa Barbara, years ago.) They = can be utilized with sensible
    effort.

 -- Actually, IMO the main disadvantage of TeX = markup is the shortage
    of skillfull people in the job = market to implement that markup.
    That makes any manager worth his = salary shy away from TeX. For me,
    that's the main reason to use XML, = I find more people with the
    needed skills.

But it's late and I should stop here. I hope you got = an impression of
my viewpoint. As I've written above, a full = elaboration is beyond the
scope of this email discussion.

Cheers,
        = Joachim

--
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D
Joachim = Schrod           &= nbsp;           &n= bsp;          Email: = jschrod@acm.org
Roedermark, Germany

        ``How do we = persuade new users that spreading fonts across the page
        like = peanut butter across hot toast is not necessarily the route to
        = typographic = excellence?''          =              = -- Peter Flynn

------_=_NextPart_001_01C34CF7.CC9D4D80--