[pooma-dev] docbook overview
Scott Haney
scotth at proximation.com
Fri May 25 14:57:32 UTC 2001
Hi Allan,
I have a few questions and concerns.
Is it clear that HTML with CSS will not work for us?
I ask because there are some very nice WYSIWYG HTML editors like
Dreamweaver that make things a lot easier than even something like
LaTeX, given that we're not doing a lot of equations. Also, the world
seems to be moving to XML and good tools are rapidly being developed to
support this. I hope that we can focus on content and leave it to tools
to figure out how to program a table or a list. I'd also hate for us to
invest a lot in SGML just as people are moving away from it.
This said, I think your points are quite valid in general and maybe, in
practice, DocBook is a little less scary than it sounds from your
message. Therefore, I look forward to your report.
Scott
On Thursday, May 24, 2001, at 05:57 AM, Allan Stokes wrote:
>
> Hello everyone,
>
> Back at the Proximation meeting it was suggested that the Pooma
> documentation be prepared in DocBook format. Not everyone there was
> familiar with DocBook so I'd like to take a few minutes to describe
> DocBook
> and the authoring process so everyone understands the document format.
>
> About ten years ago SGML (Generalized Markup Language) became an
> official
> standard for document markup. This is an extremely rich structure which
> does not lend itself to simple applications. For example, in SGML you
> can
> redeclare the characters which function as markup delimiters. There
> isn't
> much you take for granted without a full parse.
>
> HTML is a simplified language which includes a subset of all possible
> SGML
> documents. The HTML subset was designed to exclude SGML mechanisms
> which
> complicate parsing so that HTML documents would be simple to read and
> process.
>
> Unfortunately, the markup elements included in HTML conflate structure
> with
> layout. Tags such as <p> and <ul> express paragraph and list document
> elements. But you also find stuff like this (as on the Pooma.com home
> page):
>
> <table width="100%" cellpadding="0" cellspacing="0">
> <tr bgcolor="#E5D5C4">
> </tr>
> </table>
>
> The table structure and the visual presentation are hopelessly mangled
> together. This makes HTML a poor choice for creating portable
> documents.
>
> TeX also suffers from unnatural commingling, which is one of several
> reasons
> people find TeX unpleasant to use. In the case of TeX, the problem was
> partly addressed by creating LaTeX as a document language. LaTeX
> allows the
> author to describe the structure of the document directly and (mostly)
> keeps
> the visual mechanics behind the curtains.
>
> DocBook was invented to solve the same problem with HTML: separating
> structure from presentation. The structure DocBook describes is
> everything
> you might want to put in a book.
>
> Here is a small scrap which I created while testing my DocBook tools to
> give
> the flavour of the notation.
>
> <!DOCTYPE Book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
>
> <book>
> <bookinfo>
> <title>Allan's First DocBook Book</title>
> <author>
> <firstname>Allan</firstname>
> <surname>Stokes</surname>
> </author>
> <copyright>
> <year>2001</year>
> <holder>Allan Stokes</holder>
> </copyright>
> </bookinfo>
> <preface>
> <title>Foreward</title>
> <para>I survived PSGML.</para>
> </preface>
> <chapter>
> <title>chapters MUST have a title</title>
> <para>The location of the PSGML tutorial is www.lysator.liu.se
> </para>
> <para>And so it goes.
> </para>
> </chapter>
> </book>
>
> This is a valid SGML document. The first line declares the document
> data
> type (DTD) which governs the structure of the rest of the document.
> The DTD
> formally defines what markup elements are legal, required, optional,
> etc.
>
> If you run this scrap through a formal SGML validator, the validator
> will be
> able to tell you if your document obeys the rules imposed by the DTD.
> This
> fragment references the DocBook DTD for DocBook version 4.1.
>
> With this background I can now define what DocBook is. DocBook is an
> SGML
> DTD. Presently the DocBook DTD is about 300,000 characters of SGML in
> SGML
> DTD syntax which defines the vast majority of document elements which
> any
> technical book might wish to include. O'Reilly has published high
> quality
> technical books directly from DocBook sources.
>
> DocBook is also used for electronic documentation. The Linux
> Documentation
> Project (LDP) has standardized on DocBook, and FreeBSD is in the
> process of
> converting much of their own online documentation into DocBook format.
> The
> tools required to work with DocBook are now standard components on many
> of
> these platforms, so DocBook has become exceptionally portable.
>
> A common stumbling block in understanding DocBook is that DocBook itself
> provides no tools. DocBook is an SGML DTD. The DocBook DTD is
> essentially
> an SGML document in its own right. (In the SGML world, everything is a
> document. When you have a hammer ...)
>
> The core tool for working with DocBook is the SGML validator. A formal
> SGML
> validator parses the DTD to determine the rules which govern the
> document
> instance which follows. The validator will tell you if your DocBook
> document is a legal document. In the purest sense, DocBook defines a
> subset
> of all possible SGML documents as being legal DocBook documents. The
> semantics of the markup are defined by the tools used in the
> postproduction.
>
> All DocBook gives us is a structure for composing well-formed
> documents. We
> need to address the issue of publishing the document separately.
>
> To do this you need an SGML processor which is capable of converting the
> DocBook source document into a backend format. There are many tools out
> there which can manipulate SGML so there are many ways to publish a
> DocBook
> document. In the non-commercial world, DocBook publishing is usually
> done
> with Jade (a free tool written by James Clark).
>
> Jade comes with a set of standard stylesheets for a variety of backend
> formats. The quality of output is quite acceptable using the default
> stylesheets. It is unlikely that we will wish to make any changes to
> the
> standard stylesheets for publishing the Pooma documentation. You can
> typically use Jade without having to deal with much of Jade's
> complexity.
> But I'll describe Jade anyway.
>
> There are a variety of standards for specifying stylesheets. Jade
> supports
> the DSSSL standard. DSSSL stylesheets are written in a Scheme-like
> language. DSSSL processing reminds me of Pooma. The DocBook document
> is an
> SGML tree structure (much like a Pooma template expression), and DSSSL
> behaves like an expression template which walks the tree structure and
> transforms it (decorates, trims, etc.) into a new tree structure, then
> finally you flatten the whole thing for output (e.g. evaluate).
>
> DSSSL is a functional language which iterates via tail recursion. A
> DSSSL
> stylesheet, surprise, is itself a valid SGML document. (It is even
> possible
> to use Jade to transform DSSSL stylesheets into other DSSSL stylesheets.
> It's this kind of thing which gives DSSSL a cult reputation. Let go of
> your
> mouse and back away quietly.)
>
> The primary backend formats supported by Jade are HTML, RTF, TeX. The
> TeX
> produced by Jade is suitable for conversion into high quality
> Postscript or
> PDF.
>
> Jade is easy enough to use, but configuring Jade involves a couple of
> frustrating steps. SGML contains several methods of indirection which
> are
> used extensively to modularize the implementation of the DocBook DTD
> and the
> DSSSL stylesheets. Most SGML entities are reference via public
> identifiers
> (as opposed to host filenames). Most SGML tools resolve the public
> identifiers by searching site-local files called catalogs. The catalog
> files supply local mappings to names which can be resolved on the local
> file
> system. Most SGML toolsets expect the catalog configuration to be done
> manually before all the SGML magic will work.
>
> Invoking Jade with the right combinations of stylesheets and backend
> formats
> is moderately tricky as well.
>
> The "DocBook Definitive Guide" presents a solution to this problem which
> demonstrates typical DSSSL trickery. A DSSSL stylesheet is an SGML
> document
> whose structure is defined by a DSSSL DTD. What they do is modify the
> standard DSSSL DTD to allow annotations to be placed inside the DSSSL
> stylesheet which define what combinations of stylesheets and backends
> are
> appropriate. Then they supply a Perl script which uses Jade to look
> inside
> the DSSSL stylesheet to find the annotations which govern how the
> backend
> format specified on the command line should be produced.
>
> The whole thing is slick when it works, but initially you get a severe
> dose
> of abstraction shock trying to figure out which bit of syntax functions
> at
> what level.
>
> I have most of this stuff working in my own environment. Once I figure
> out
> how I accomplished this feat I think it would be a good idea to
> document my
> Jade configuration and publishing process. Perhaps these notes would be
> suitable for a web page somewhere on pooma.com. (Is that stuff under
> source
> control?)
>
> The final piece of the puzzle is the authoring process.
>
> Generally, everything I've found on the web about authoring DocBook
> applies
> to emacs. emacs has a psgml package which defines a major mode for
> editing
> DocBook documents. psgml defines several menus and many keystrokes for
> applying syntax directed markup. When you load your DocBook document,
> psgml
> parses the DocBook DTD to determine the required markup structure for
> your
> document. It will prompt you on legal completions when you get lost in
> tricky annotations.
>
> It also has hooks to run external SGML validators to find errors in your
> markup. You can also automate the process of publishing your documents
> using Jade by commands invoked from within emacs. There are also a
> number
> of commercial applications which provide the same capabilities (syntax
> directed editing of SGML documents).
>
> psgml is a generic setup. I discovered that I needed to apply a number
> of
> tweaks to emacs to make the configuration usable. For example, psgml
> doesn't necessarily know how you have your catalog files set up when
> invoking external tools.
>
> Maintaining an existing DocBook document manually (e.g. without setting
> up
> psgml) is not much worse than maintaining HTML manually. However, I
> doubt
> anyone would want to crank new material without the support of something
> like psgml.
>
> There's another aspect of DocBook which I should mention briefly because
> many of the web materials talk about it. So far I've described the SGML
> version of DocBook and the tools appropriate for publishing SGML DocBook
> (Jade/DSSSL).
>
> There is also a parallel version of DocBook in XML format. XML is also
> an
> SGML language, simplified even more than HTML. The SGML version of
> DocBook
> is slightly more human friendly so most people write in the SGML
> dialect.
> The interconversion is mechanical.
>
> Jade and DSSSL are mature tools (with many limitations) which are no
> longer
> being aggressively developed, in part because DSSSL/Scheme is regarded
> as
> overkill.
>
> The ongoing work is mostly in the XML camp now. XML defines several
> different stylesheets languages (e.g. CSS) and different transformation
> tools (e.g. XSL/XSLT). When browsers fully support HTML/CSS it will be
> possible to publish XML DocBook documents directly to the web. The web
> browser will do the presentation directly from the style sheet.
>
> The XML version of DocBook uses an XML syntax for declaring the DocBook
> DTD.
> If I understand the situation correctly, the XML syntax is slightly less
> expressive than the SGML syntax, so the two DocBook DTDs are not
> presently
> formally equivalent. The next major version of DocBook is supposed to
> address this issue.
>
> The XML tools at this point are mostly experimental. Within the next
> few
> years it is anticipated that the new XML model will largely supplant the
> Jade/DSSSL process. It will be a different way of publishing the same
> source documents.
>
> I hope that gives people a good overview of what DocBook is, where it is
> going, and how the process fits together. Once I finish collecting my
> installation notes I'll set up my document framework for the Pooma
> concepts.
>
> Allan
More information about the pooma-dev
mailing list