[pooma-dev] docbook overview

Scott Haney scotth at proximation.com
Fri May 25 14:57:32 UTC 2001


Hi Allan,

I have a few questions and concerns.

Is it clear that HTML with CSS will not work for us?

I ask because there are some very nice WYSIWYG HTML editors like 
Dreamweaver that make things a lot easier than even something like 
LaTeX, given that we're not doing a lot of equations. Also, the world 
seems to be moving to XML and good tools are rapidly being developed to 
support this. I hope that we can focus on content and leave it to tools 
to figure out how to program a table or a list. I'd also hate for us to 
invest a lot in SGML just as people are moving away from it.

This said, I think your points are quite valid in general and maybe, in 
practice, DocBook is a little less scary than it sounds from your 
message. Therefore, I look forward to your report.

Scott

On Thursday, May 24, 2001, at 05:57 AM, Allan Stokes wrote:

>
> Hello everyone,
>
> Back at the Proximation meeting it was suggested that the Pooma
> documentation be prepared in DocBook format.  Not everyone there was
> familiar with DocBook so I'd like to take a few minutes to describe 
> DocBook
> and the authoring process so everyone understands the document format.
>
> About ten years ago SGML (Generalized Markup Language) became an 
> official
> standard for document markup.  This is an extremely rich structure which
> does not lend itself to simple applications.  For example, in SGML you 
> can
> redeclare the characters which function as markup delimiters.  There 
> isn't
> much you take for granted without a full parse.
>
> HTML is a simplified language which includes a subset of all possible 
> SGML
> documents.  The HTML subset was designed to exclude SGML mechanisms 
> which
> complicate parsing so that HTML documents would be simple to read and
> process.
>
> Unfortunately, the markup elements included in HTML conflate structure 
> with
> layout.  Tags such as <p> and <ul> express paragraph and list document
> elements.  But you also find stuff like this (as on the Pooma.com home
> page):
>
> <table width="100%" cellpadding="0" cellspacing="0">
> <tr bgcolor="#E5D5C4">
> </tr>
> </table>
>
> The table structure and the visual presentation are hopelessly mangled
> together.  This makes HTML a poor choice for creating portable 
> documents.
>
> TeX also suffers from unnatural commingling, which is one of several 
> reasons
> people find TeX unpleasant to use.  In the case of TeX, the problem was
> partly addressed by creating LaTeX as a document language.  LaTeX 
> allows the
> author to describe the structure of the document directly and (mostly) 
> keeps
> the visual mechanics behind the curtains.
>
> DocBook was invented to solve the same problem with HTML: separating
> structure from presentation.  The structure DocBook describes is 
> everything
> you might want to put in a book.
>
> Here is a small scrap which I created while testing my DocBook tools to 
> give
> the flavour of the notation.
>
> <!DOCTYPE Book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
>
> <book>
>   <bookinfo>
>     <title>Allan's First DocBook Book</title>
>     <author>
>       <firstname>Allan</firstname>
>       <surname>Stokes</surname>
>     </author>
>     <copyright>
>       <year>2001</year>
>       <holder>Allan Stokes</holder>
>     </copyright>
>   </bookinfo>
>   <preface>
>     <title>Foreward</title>
>     <para>I survived PSGML.</para>
>   </preface>
>   <chapter>
>     <title>chapters MUST have a title</title>
>     <para>The location of the PSGML tutorial is www.lysator.liu.se
>     </para>
>     <para>And so it goes.
>     </para>
>   </chapter>
> </book>
>
> This is a valid SGML document.  The first line declares the document 
> data
> type (DTD) which governs the structure of the rest of the document.  
> The DTD
> formally defines what markup elements are legal, required, optional, 
> etc.
>
> If you run this scrap through a formal SGML validator, the validator 
> will be
> able to tell you if your document obeys the rules imposed by the DTD.  
> This
> fragment references the DocBook DTD for DocBook version 4.1.
>
> With this background I can now define what DocBook is.  DocBook is an 
> SGML
> DTD.  Presently the DocBook DTD is about 300,000 characters of SGML in 
> SGML
> DTD syntax which defines the vast majority of document elements which 
> any
> technical book might wish to include.  O'Reilly has published high 
> quality
> technical books directly from DocBook sources.
>
> DocBook is also used for electronic documentation.  The Linux 
> Documentation
> Project (LDP) has standardized on DocBook, and FreeBSD is in the 
> process of
> converting much of their own online documentation into DocBook format.  
> The
> tools required to work with DocBook are now standard components on many 
> of
> these platforms, so DocBook has become exceptionally portable.
>
> A common stumbling block in understanding DocBook is that DocBook itself
> provides no tools.  DocBook is an SGML DTD.  The DocBook DTD is 
> essentially
> an SGML document in its own right.  (In the SGML world, everything is a
> document.  When you have a hammer ...)
>
> The core tool for working with DocBook is the SGML validator.  A formal 
> SGML
> validator parses the DTD to determine the rules which govern the 
> document
> instance which follows.  The validator will tell you if your DocBook
> document is a legal document.  In the purest sense, DocBook defines a 
> subset
> of all possible SGML documents as being legal DocBook documents.  The
> semantics of the markup are defined by the tools used in the 
> postproduction.
>
> All DocBook gives us is a structure for composing well-formed 
> documents.  We
> need to address the issue of publishing the document separately.
>
> To do this you need an SGML processor which is capable of converting the
> DocBook source document into a backend format.  There are many tools out
> there which can manipulate SGML so there are many ways to publish a 
> DocBook
> document.  In the non-commercial world, DocBook publishing is usually 
> done
> with Jade (a free tool written by James Clark).
>
> Jade comes with a set of standard stylesheets for a variety of backend
> formats.  The quality of output is quite acceptable using the default
> stylesheets.  It is unlikely that we will wish to make any changes to 
> the
> standard stylesheets for publishing the Pooma documentation.  You can
> typically use Jade without having to deal with much of Jade's 
> complexity.
> But I'll describe Jade anyway.
>
> There are a variety of standards for specifying stylesheets.  Jade 
> supports
> the DSSSL standard.  DSSSL stylesheets are written in a Scheme-like
> language.  DSSSL processing reminds me of Pooma.  The DocBook document 
> is an
> SGML tree structure (much like a Pooma template expression), and DSSSL
> behaves like an expression template which walks the tree structure and
> transforms it (decorates, trims, etc.) into a new tree structure, then
> finally you flatten the whole thing for output (e.g. evaluate).
>
> DSSSL is a functional language which iterates via tail recursion.  A 
> DSSSL
> stylesheet, surprise, is itself a valid SGML document.  (It is even 
> possible
> to use Jade to transform DSSSL stylesheets into other DSSSL stylesheets.
> It's this kind of thing which gives DSSSL a cult reputation.  Let go of 
> your
> mouse and back away quietly.)
>
> The primary backend formats supported by Jade are HTML, RTF, TeX.  The 
> TeX
> produced by Jade is suitable for conversion into high quality 
> Postscript or
> PDF.
>
> Jade is easy enough to use, but configuring Jade involves a couple of
> frustrating steps.  SGML contains several methods of indirection which 
> are
> used extensively to modularize the implementation of the DocBook DTD 
> and the
> DSSSL stylesheets.  Most SGML entities are reference via public 
> identifiers
> (as opposed to host filenames).  Most SGML tools resolve the public
> identifiers by searching site-local files called catalogs.  The catalog
> files supply local mappings to names which can be resolved on the local 
> file
> system.  Most SGML toolsets expect the catalog configuration to be done
> manually before all the SGML magic will work.
>
> Invoking Jade with the right combinations of stylesheets and backend 
> formats
> is moderately tricky as well.
>
> The "DocBook Definitive Guide" presents a solution to this problem which
> demonstrates typical DSSSL trickery.  A DSSSL stylesheet is an SGML 
> document
> whose structure is defined by a DSSSL DTD.  What they do is modify the
> standard DSSSL DTD to allow annotations to be placed inside the DSSSL
> stylesheet which define what combinations of stylesheets and backends 
> are
> appropriate.  Then they supply a Perl script which uses Jade to look 
> inside
> the DSSSL stylesheet to find the annotations which govern how the 
> backend
> format specified on the command line should be produced.
>
> The whole thing is slick when it works, but initially you get a severe 
> dose
> of abstraction shock trying to figure out which bit of syntax functions 
> at
> what level.
>
> I have most of this stuff working in my own environment.  Once I figure 
> out
> how I accomplished this feat I think it would be a good idea to 
> document my
> Jade configuration and publishing process.  Perhaps these notes would be
> suitable for a web page somewhere on pooma.com.  (Is that stuff under 
> source
> control?)
>
> The final piece of the puzzle is the authoring process.
>
> Generally, everything I've found on the web about authoring DocBook 
> applies
> to emacs.  emacs has a psgml package which defines a major mode for 
> editing
> DocBook documents.  psgml defines several menus and many keystrokes for
> applying syntax directed markup.  When you load your DocBook document, 
> psgml
> parses the DocBook DTD to determine the required markup structure for 
> your
> document.  It will prompt you on legal completions when you get lost in
> tricky annotations.
>
> It also has hooks to run external SGML validators to find errors in your
> markup.  You can also automate the process of publishing your documents
> using Jade by commands invoked from within emacs.  There are also a 
> number
> of commercial applications which provide the same capabilities (syntax
> directed editing of SGML documents).
>
> psgml is a generic setup.  I discovered that I needed to apply a number 
> of
> tweaks to emacs to make the configuration usable.  For example, psgml
> doesn't necessarily know how you have your catalog files set up when
> invoking external tools.
>
> Maintaining an existing DocBook document manually (e.g. without setting 
> up
> psgml) is not much worse than maintaining HTML manually.  However, I 
> doubt
> anyone would want to crank new material without the support of something
> like psgml.
>
> There's another aspect of DocBook which I should mention briefly because
> many of the web materials talk about it.  So far I've described the SGML
> version of DocBook and the tools appropriate for publishing SGML DocBook
> (Jade/DSSSL).
>
> There is also a parallel version of DocBook in XML format.  XML is also 
> an
> SGML language, simplified even more than HTML.  The SGML version of 
> DocBook
> is slightly more human friendly so most people write in the SGML 
> dialect.
> The interconversion is mechanical.
>
> Jade and DSSSL are mature tools (with many limitations) which are no 
> longer
> being aggressively developed, in part because DSSSL/Scheme is regarded 
> as
> overkill.
>
> The ongoing work is mostly in the XML camp now.  XML defines several
> different stylesheets languages (e.g. CSS) and different transformation
> tools (e.g. XSL/XSLT).  When browsers fully support HTML/CSS it will be
> possible to publish XML DocBook documents directly to the web.  The web
> browser will do the presentation directly from the style sheet.
>
> The XML version of DocBook uses an XML syntax for declaring the DocBook 
> DTD.
> If I understand the situation correctly, the XML syntax is slightly less
> expressive than the SGML syntax, so the two DocBook DTDs are not 
> presently
> formally equivalent.  The next major version of DocBook is supposed to
> address this issue.
>
> The XML tools at this point are mostly experimental.  Within the next 
> few
> years it is anticipated that the new XML model will largely supplant the
> Jade/DSSSL process.  It will be a different way of publishing the same
> source documents.
>
> I hope that gives people a good overview of what DocBook is, where it is
> going, and how the process fits together.  Once I finish collecting my
> installation notes I'll set up my document framework for the Pooma 
> concepts.
>
> Allan



More information about the pooma-dev mailing list