Manual Patch: New Introductory Chapter

Mon Dec 17 18:28:52 UTC 2001

This patch mainly adds an introductory chapter and a very small part of
the sequential program chapter.

2001-Dec-17  Jeffrey D. Oldham  <oldham at codesourcery.com>

	* concepts.xml: Minor wordsmithing fixes, e.g., removal of old
	temporary paragraphs, spelling changes, and better use of entities.
	* glossary.xml: s/multi-processor/multiprocessor/
	(architecture): New entry.
	(first class): Refill.
	* introduction.xml: New introductory chapter.
	* makefile (manual.dvi): Add dependence on introduction.xml and
	glossary.xml.
	* manual.xml: Add a few new entity declarations and use them.
	Move introductory chapter material to introduction.xml.  Begin
	writing sequential program chapter.  Add a few bibliographic
	entries.
	* tutorial.xml: Add more uses of entities changed in manual.xml.
	* figures/introduction.mp: New figure illustrating role of Pooma
	in science/math process.

Applied to	mainline
Approved by	me!

Thanks,
Jeffrey D. Oldham
oldham at codesourcery.com
-------------- next part --------------
Index: concepts.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v
retrieving revision 1.2
diff -c -p -r1.2 concepts.xml
*** concepts.xml	2001/12/14 04:18:13	1.2
--- concepts.xml	2001/12/17 16:56:50
***************
*** 1,9 ****
  <chapter id="concepts">
   <title>Overview of &pooma; Concepts</title>

-  <para>FIXME: How does multi-threaded computation fit into the
-  model?</para>
- 
   <para>In the previous chapter, we presented several different
   implementations of the &doof2d; simulation program.  The
   implementations illustrate the various containers, computation modes,
--- 1,6 ----
***************
*** 33,39 ****
      <term>computation environment</term>
      <listitem>
       <para>description of resources for computing, e.g., single
!      processor or multi-processor.</para>
      </listitem>
     </varlistentry>
    </variablelist>
--- 30,36 ----
      <term>computation environment</term>
      <listitem>
       <para>description of resources for computing, e.g., single
!      processor or multiprocessor.</para>
      </listitem>
     </varlistentry>
    </variablelist>
***************
*** 115,121 ****
    containers.</para>

    <para>This section describes many concepts, but one need not
!   understand them all to begin programming with the &pooma; Toolkit.
    First, we introduce the different &pooma;'s containers and describe
    how to choose an appropriate one for a particular task.  <xref
    linkend="concepts-sequential_containers-declarations-dependences"></xref>
--- 112,118 ----
    containers.</para>

    <para>This section describes many concepts, but one need not
!   understand them all to begin programming with the &poomatoolkit;.
    First, we introduce the different &pooma;'s containers and describe
    how to choose an appropriate one for a particular task.  <xref
    linkend="concepts-sequential_containers-declarations-dependences"></xref>
***************
*** 361,367 ****
     memory.  The layout specifies the processors and memory to use for
     each particular index.  A container's layout for a uniprocessor
     implementation consists of its domain, the processor, and its
!    memory.  For a multi-processor implementation, the layout maps
     portions of the domain to (possibly different) processors and
     memory.</para>

--- 358,364 ----
     memory.  The layout specifies the processors and memory to use for
     each particular index.  A container's layout for a uniprocessor
     implementation consists of its domain, the processor, and its
!    memory.  For a multiprocessor implementation, the layout maps
     portions of the domain to (possibly different) processors and
     memory.</para>

***************
*** 422,428 ****

     <para>In the previous section, we introduced the concepts important
     when declaring containers for use on uniprocessor computers.  When
!    using multi-processor computers, we augment these concepts with
     those for distributed computation.  Reading this section is
     important only for running a program on multiple processors.  Many
     of these concepts were introduced in <xref
--- 419,425 ----

     <para>In the previous section, we introduced the concepts important
     when declaring containers for use on uniprocessor computers.  When
!    using multiprocessor computers, we augment these concepts with
     those for distributed computation.  Reading this section is
     important only for running a program on multiple processors.  Many
     of these concepts were introduced in <xref
***************
*** 434,447 ****
     distributed container.</para>

     <para>As we noted in <xref
!    linkend="tutorial-array_distributed"></xref>, a &pooma;
!    programmer must specify how each container's domain should be
!    distributed among the available processors and memory spaces.
!    Using this information, the Toolkit automatically distributes the
!    data among the available processors and handles any required
!    communication among them.  The three concepts necessary for
!    declaring distributed containers are a partition, a guard layer,
!    and a context mapper tag.</para>

     <para>A <glossterm
     linkend="glossary-partition"><firstterm>partition</firstterm></glossterm>
--- 431,444 ----
     distributed container.</para>

     <para>As we noted in <xref
!    linkend="tutorial-array_distributed"></xref>, a &pooma; programmer
!    must specify how each container's domain should be distributed
!    among the available processors and memory spaces.  Using this
!    information, the &toolkit; automatically distributes the data among
!    the available processors and handles any required communication
!    among them.  The three concepts necessary for declaring distributed
!    containers are a partition, a guard layer, and a context mapper
!    tag.</para>

     <para>A <glossterm
     linkend="glossary-partition"><firstterm>partition</firstterm></glossterm>
***************
*** 615,622 ****
     &pooma; uses the communication library to copy information among
     contexts, all of which is hidden from both the programmer and the
     user.  &pooma; works with the Message Passing Interface (&mpi;)
!    Communications Library (FIXME: xref linkend="mpi99", <ulink
!    url="http://www-unix.mcs.anl.gov/mpi/"></ulink>) and the &mm;
     Shared Memory Library.  See <xref
     linkend="installation-distributed_computing"></xref> for details.</para>
    </section>
--- 612,620 ----
     &pooma; uses the communication library to copy information among
     contexts, all of which is hidden from both the programmer and the
     user.  &pooma; works with the Message Passing Interface (&mpi;)
!    Communications Library 
! <!-- FIXME: xref linkend="mpi99" -->
!    (<ulink url="http://www-unix.mcs.anl.gov/mpi/"></ulink>) and the &mm;
     Shared Memory Library.  See <xref
     linkend="installation-distributed_computing"></xref> for details.</para>
    </section>
Index: glossary.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v
retrieving revision 1.3
diff -c -p -r1.3 glossary.xml
*** glossary.xml	2001/12/14 04:18:13	1.3
--- glossary.xml	2001/12/17 16:56:50
***************
*** 17,22 ****
--- 17,31 ----

   <glossdiv id="glossary-a">
    <title>A</title>
+   <glossentry id="glossary-architecture">
+    <glossterm>architecture</glossterm>
+    <glossdef>
+     <para>particular hardware (processor) interface.  Examples
+     architectures include <quote>linux</quote>, <quote>sgin32</quote>,
+     <quote>sgi64</quote>, and <quote>sun</quote>.</para>
+    </glossdef>
+   </glossentry>
+ 
    <glossentry id="glossary-array">
     <glossterm>&array;</glossterm>
     <glossdef>
***************
*** 25,32 ****
      ignoring the time to compute the values if applicable.  &array;s
      are <link linkend="glossary-first_class">first-class
      object</link>s.  <link
! 			    linkend="glossary-dynamicarray">&dynamicarray;</link>s and <link
! 											     linkend="glossary-field">&field;</link>s generalize &array;.</para>
      <glossseealso otherterm="glossary-dynamicarray">&dynamicarray;</glossseealso>
      <glossseealso otherterm="glossary-field">&field;</glossseealso>
     </glossdef>
--- 34,42 ----
      ignoring the time to compute the values if applicable.  &array;s
      are <link linkend="glossary-first_class">first-class
      object</link>s.  <link
!     linkend="glossary-dynamicarray">&dynamicarray;</link>s and <link
!     linkend="glossary-field">&field;</link>s generalize
!     &array;.</para>
      <glossseealso otherterm="glossary-dynamicarray">&dynamicarray;</glossseealso>
      <glossseealso otherterm="glossary-field">&field;</glossseealso>
     </glossdef>
***************
*** 165,171 ****
     <glossdef>
      <para>computing environment with one or more processors each
      having associated memory, possibly shared.  In some contexts, it
!     refers to strictly multi-processor computation.</para>
      <glossseealso otherterm="glossary-computing_environment">computing environment</glossseealso>
      <glossseealso otherterm="glossary-sequential">sequential computing environment</glossseealso>
     </glossdef>
--- 175,181 ----
     <glossdef>
      <para>computing environment with one or more processors each
      having associated memory, possibly shared.  In some contexts, it
!     refers to strictly multiprocessor computation.</para>
      <glossseealso otherterm="glossary-computing_environment">computing environment</glossseealso>
      <glossseealso otherterm="glossary-sequential">sequential computing environment</glossseealso>
     </glossdef>
***************
*** 364,370 ****
      <para>a map from an index to processor(s) and memory used to
      compute the container's associated value.  For a uniprocessor
      implementation, a container's layout always consists of its
!     domain, the processor, and its memory.  For a multi-processor
      implementation, the layout maps portions of the domain to
      (possibly different) processors and memory.</para>
      <glossseealso otherterm="glossary-container">container</glossseealso>
--- 374,380 ----
      <para>a map from an index to processor(s) and memory used to
      compute the container's associated value.  For a uniprocessor
      implementation, a container's layout always consists of its
!     domain, the processor, and its memory.  For a multiprocessor
      implementation, the layout maps portions of the domain to
      (possibly different) processors and memory.</para>
      <glossseealso otherterm="glossary-container">container</glossseealso>
***************
*** 572,580 ****
      <para>a container derived from another.  The former's domain is a
      subset of the latter's, but, where the domains intersect,
      accessing a value through the view is the same as accessing it
!     through the original container.  Only &array;s, &dynamicarray;s,
!     and &field;s support views.</para>
!     <glossseealso otherterm="glossary-container">container</glossseealso>
     </glossdef>
    </glossentry>
   </glossdiv>
--- 582,591 ----
      <para>a container derived from another.  The former's domain is a
      subset of the latter's, but, where the domains intersect,
      accessing a value through the view is the same as accessing it
!     through the original container.  In Fortran 90, these are
!     called array sections.  Only &array;s, &dynamicarray;s, and
!     &field;s support views.</para> <glossseealso
!     otherterm="glossary-container">container</glossseealso>
     </glossdef>
    </glossentry>
   </glossdiv>
Index: introduction.xml
===================================================================
RCS file: introduction.xml
diff -N introduction.xml
*** /dev/null	Fri Mar 23 21:37:44 2001
--- introduction.xml	Mon Dec 17 09:56:51 2001
***************
*** 0 ****
--- 1,348 ----
+ <chapter id="introduction">
+  <title>Introduction</title>
+ 
+  <para>The Parallel Object-Oriented Methods and Applications
+  <acronym>POOMA</acronym> &toolkitcap; is a &cc; &toolkit; for
+  writing high-performance scientific programs for sequential and
+  distributed computation.  The &toolkit; provides a variety of
+  tools:
+  <itemizedlist spacing="compact">
+    <listitem>
+     <para>containers and other abstractions suitable for scientific
+     computation,</para>
+    </listitem>
+    <listitem>
+     <para>several container storage classes to reduce a program's
+     storage requirements,</para>
+    </listitem>
+    <listitem>
+     <para>support for a variety of computation modes including
+     data-parallel expressions, stencil-based computations, and lazy
+     evaluation,</para>
+    </listitem>
+    <listitem>
+     <para>support for writing parallel and distributed programs,</para>
+    </listitem>
+    <listitem>
+     <para>automatic creation of all interprocessor communication for
+     parallel and distributed programs, and</para>
+    </listitem>
+    <listitem>
+     <para>automatic out-of-order execution and loop rearrangement
+     for fast program execution.</para>
+    </listitem>
+   </itemizedlist>
+  Since the &toolkit; provides high-level abstractions, &pooma;
+  programs are much shorter than corresponding &fortran; or &c;
+  programs, requiring less time to write and less time to debug.
+  Using these high-level abstractions, the same code runs on a wide
+  variety of computers almost as fast as carefully crafted
+  machine-specific hand-written programs.  The &toolkit; is freely
+  available, open-source software compatible with any modern &cc;
+  compiler.</para>
+ 
+  <formalpara><title>&pooma; Goals.</title>
+   <para>The goals for the &poomatoolkit; have remained unchanged
+   since its inception in 1994:
+   <orderedlist>
+    <listitem>
+     <para>Code portability across serial, distributed, and parallel
+     architectures with no change to source code.</para>
+    </listitem>
+    <listitem>
+     <para>Development of reusable, cross-problem-domain components
+     to enable rapid application development.</para>
+    </listitem>
+    <listitem>
+     <para>Code efficiency for kernels and components relevant to
+     scientific simulation.</para>
+    </listitem>
+    <listitem>
+     <para>[&toolkitcap;] design and development driven by
+     applications from a diverse set of scientific problem
+     domains.</para>
+    </listitem>
+    <listitem>
+     <para>Shorter time from problem inception to working parallel
+     simulations.</para>
+ <!-- FIXME: Add citation to pooma95, p. 3 -->
+    </listitem>
+   </orderedlist>
+  </para>
+  </formalpara>
+ 
+  <formalpara><title>Code Portability for Sequential and Distributed Programs.</title>
+  <para>&pooma; programs run on sequential, distributed, and parallel
+  computers with no change in source code.  The programmer writes two
+  or three lines specifying how each container's domain should be
+  distributed among available processors.  Using these directives and
+  run-time information about the computer's configuration, the
+  &toolkit; automatically distributes pieces of the container
+  domains, called <firstterm>patch</firstterm>es, among the available
+  processors.  If a computation needs values from another patch,
+  &pooma; automatically passes the value to the place it is needed.
+  The same program, and even the same executable, works regardless of
+  the number of the available processors and the size of the
+  containers' domains.  A programmer interested in only sequential
+  execution can omit the two or three lines specifying how the
+  domains are to be distributed.</para>
+  </formalpara>
+ 
+  <figure float="1" id="introduction-science_algorithms">
+   <title>Science, Algorithms, Engineering, and &pooma;</title>
+   <mediaobject>
+    <imageobject>
+     <imagedata fileref="figures/introduction.101" format="EPS" align="center"></imagedata>
+    </imageobject>
+    <textobject>
+     <phrase>how &pooma; helps translate algorithms into programs</phrase>
+    </textobject>
+    <caption>
+     <para>In the translation from theoretical science and math to
+     computational science and math to computer programs, &pooma;
+     containers eases the translation of algorithms to computer
+     programs.</para>
+    </caption>
+   </mediaobject>
+  </figure>
+ 
+  <formalpara><title>Rapid Application Development.</title>
+  <para>The &poomatoolkit; is designed to enable rapid development of
+  scientific and distributed applications.  For example, its vector,
+  matrix, and tensor classes model the corresponding mathematical
+  concepts.  Its &array; and &field; classes model the discrete
+  spaces and mathematical arrays frequently found in computational
+  science and math.  See <xref
+  linkend="introduction-science_algorithms"></xref>.  The left column
+  illustrates theoretical science and math, the middle column
+  computational science and math, and the right column computer
+  science implementations.  For example, theoretical physics
+  frequently uses continuous fields in three-dimension space, while
+  algorithms for the corresponding computational physics problem
+  usually uses discrete fields.  &pooma; containers, classes, and
+  functions ease the engineering to map these algorithms to computer
+  programs.  For example, the &pooma; &field; container models
+  discrete fields; both map locations in discrete space to values and
+  permit computations of spatial distances and values.  The &pooma;
+  &array; container models the mathematical concept of an array, used
+  in numerical analysis.</para>
+  </formalpara>
+ 
+  <para>&pooma; containers support a variety of computation modes,
+  easing transition of algorithms into code.  For example, many
+  algorithms for solving partial differential equations use
+  stencil-based computations.  &pooma; supports stencil-based
+  computations on &array;s and &field;s.  It also supports
+  data-parallel computation.  For computations where one &field;'s
+  values is a function of several other &field;'s values, the
+  programmer can specify a relation.  Relations are lazily evaluated;
+  whenever the dependent &field;'s values are needed and it is
+  related to a &field; whose values have changed, the former
+  &field;'s values are computed.  Lazy evaluation also assists
+  correctness by eliminating the (frequently forgotten) need for a
+  programmer to ensure a &field;'s values are up-to-date before being
+  used.</para>
+ 
+  <formalpara><title>Efficient Code.</title>
+  <para>&pooma; incorporates a variety of techniques to ensure it
+  produces code that executes as quickly as special-case,
+  hand-written code.
+ <!-- FIXME: Do I present execution numbers here? -->
+  These techniques include extensive use of templates, out-of-order
+  evaluation to permit communication and computation to overlap,
+  availability of guard layers to reduce processors' synchronicity,
+  and use of &pete; to produce fast inner loops.</para>
+  </formalpara>
+ 
+  <para>Using templates permits the expressiveness of using pointers
+  and function arguments but ensures as much as work as possible
+  occurs at compile time, not run time.  Also, more code is exposed
+  to the compiler's optimizer, further speeding execution.  For
+  example, use of template parameters to define the &pooma; &array;
+  container permits the use of specialized data storage classes
+  called engines, fast creation of views of a portion of an &array;,
+  and polymorphic indexing.  An &array;'s engine template parameter
+  specifies how data is stored and indexed.  Some &array;s expect
+  almost all values to be used, while others might be mostly empty.
+  In the latter case, using a specialized engine storing the few
+  nonzero values would greatly reduce space requirements.  Using
+  engines also permits fast creation of container views, known as
+  <firstterm>array sections</firstterm> in Fortran 90.  A view's
+  engine is the same as the original container's engine, while the
+  view object maps its restricted domain to the original domain.
+  Space requirements and execution time are minimal.  Using templates
+  also permits containers to support polymorphic indexing, e.g.,
+  indexing both by integers and by three-dimensional coordinates.
+  For example, a container defers returning values to its engine
+  using a templatized index operator.  The engine can define indexing
+  functions with different function arguments, without the need to
+  add corresponding container functions.  Some of these features can
+  be expressed without using templates, but doing so increases
+  execution time.  For example, a container could have a pointer to
+  an engine object, but this requires a pointer dereference for each
+  operation.  Implementing polymorphic indexing without templates
+  would require adding virtual function corresponding to each of the
+  indexing functions.</para>
+ 
+ <!-- FIXME: Are the claims concerning out-of-order evaluation I make true? -->
+ 
+  <para>To ensure multiprocessor &pooma; programs execute quickly, it
+  is important that interprocessor communication overlaps with
+  intraprocessor computation as much as possible and communication is
+  minimized.  Asynchronous communication, out-of-order evaluation, and
+  use of guard layers all help achieve this.  &pooma; uses the
+  asynchronous communication facilities of the &cheetah; communication
+  library.  When a processor needs data stored or computed by another
+  processor, a message is sent between the two.  For synchronous
+  communication, the sender must issue an explicit send, and the
+  recipient must issue an explicit receive.  This synchronizes them.
+  &cheetah; permits the sender to put and get data without the
+  intervention of the remote site and also invoke functions at the
+  remote site to ensure the data is up-to-date.  Thus, out-of-order
+  evaluation must be supported.  Out-of-order evaluation has another
+  benefit: only computations directly or indirectly related to values
+  that are printed need occur.</para>
+ 
+  <para>Using guard layers also helps overlap communication and
+  computation.  For distributed computation, each container's domain is
+  split into pieces distributed among the available processors.
+  Frequently, computing a container value is local, involving just the
+  value itself and a few neighbors.  Computing a value near the edge of
+  a processor's domain may require knowing a few values from a
+  neighboring domain.  Guard layers permit these values to be copied
+  locally so they need not be repeatedly communicated.</para>
+ 
+  <para>&pooma; uses &pete; technology to ensure inner loops using
+  &pooma;'s object-oriented containers run as quickly as hand-coded
+  <!-- FIXME: Add a citation to Dr. Dobb's Journal article
+  pete-99. --> loops.  &pete; (the Portable Expression Template
+  Engine) uses expression-template technology to convert
+  data-parallel statements frequently found in the inner loops of
+  programs into efficient loops without any intermediate
+  computations.  For example, consider evaluating the <statement>A +=
+  -B + 2 * C;</statement> statement where <varname>A</varname> and
+  <varname>C</varname> are <type>vector<double></type>s and
+  <varname>B</varname> is a <type>vector<int></type>s.
+  Ordinary evaluation might introduce intermediaries for
+  <statement>-B</statement>, <statement>2*C</statement>, and their
+  sum.  The presence of these intermediaries in inner loops can
+  measurably slow evaluation.  To produce a loop without
+  intermediaries, &pete; stores each expression as a parse tree.  The
+  resulting parse trees can be combined into a larger parse tree.
+  Using its templates, the parse tree is converted, at compile time,
+  to an outer loop with contents corresponding to evaluating each
+  component of the result.  Thus, no intermediate values are computed
+  or stored.  For example, the code corresponding to <statement>A +=
+  -B + 2 * C;</statement> is 
+  <programlisting>
+  vector<double>::iterator iterA = A.begin();
+  vector<int>::const_iterator iterB = B.begin();
+  vector<double>::const_iterator iterC = C.begin();
+  while (iterA != A.end()) {
+    *iterA += -*iterB + 2 * *iterC;
+    ++iterA; ++iterB; ++iterC;
+  }
+  </programlisting>
+  Furthermore, since the code is available at compile-, not run-, time,
+  it can be further optimized, e.g., moving any loop-invariant code out
+  of the loop.</para>
+ 
+  <formalpara><title>Used for Diverse Set of Scientific Problems.</title>
+  <para>&pooma; has been used to solve a wide variety of scientific
+  problems.  Most recently, physicists at Los Alamos National
+  Laboratory implemented an entire library of hydrodynamics codes as
+  part of the U.S. government's Science-based Stockpile Stewardship
+  (<acronym>SBSS</acronym>) program to simulate nuclear weapons.
+  Other applications include a matrix solver, an accelerator code
+  simulating the dynamics of high-intensity charged particle beams in
+  linear accelerators, and a Monte Carlo neutron transport
+  code.</para>
+  </formalpara>
+ 
+  <formalpara><title>Easy Implementation.</title>
+  <para>&pooma;'s tools greatly reduce the time to implement
+  applications.  As we noted above, &pooma;'s containers and
+  expression syntax model the computational models and algorithms
+  most frequently found in scientific programs.  Using these
+  high-level tools which are known to be correct reduce the time
+  needed to debug programs.  Programmers can write and test programs
+  using their one or two-processor personal computers.  With no
+  additional work, the same program runs on computers with hundreds
+  of processors; the code is exactly the same, and the &toolkit;
+  automatically handles distribution of the data, all data
+  communication, and all synchronization.  Using all these tools
+  greatly reduces programming time.  For example, a team of two
+  physicists and two support people at Los Alamos National Laboratory
+  implemented a suite of hydrodynamics kernels in six months.  Their
+  work replaced the previous suite of less-powerful kernels which had
+  taken sixteen people several years to implement and debug.  Despite
+  not previously implementing any of the kernels, they averaged one
+  new kernel every three days, including the time to read the
+  corresponding scientific papers!</para>
+  </formalpara>
+ 
+  <section id="introduction-pooma_history">
+   <title>History of &pooma;</title>
+ 
+   <para>The &poomatoolkit; developed at Los Alamos National
+   Laboratory to assist nuclear fusion and fission research.
+   In 1994, the &toolkit; grew out of the Object-Oriented
+   Particle Simulation (OOPS) class library developed for
+   particle-in-cell simulations.  The goals of the Framework, as it
+   was called at the time, were driven by the Numerical Tokamak's
+   <quote>Parallel Platform Paradox</quote>:
+   <blockquote>
+    <para>The average time required to implement a moderate-sized
+    application on a parallel computer architecture is equivalent to
+    the half-life of the latest parallel supercomputer.</para>
+   </blockquote>
+   The framework's goal of being able to quickly write efficient
+   scientific code that could be run on a wide variety of platforms
+   remains unchanged today.  Development, driven mainly by the
+   Advanced Computing Laboratory at Los Alamos, proceeded rapidly.
+   A matrix solver application was written using the framework.
+ <!-- FIXME: Add citation to pooma-sc95. -->
+   Support for hydrodynamics, Monte Carlo simulations, and molecular
+   dynamics modeling soon followed.</para>
+ 
+   <para>By 1998, &pooma; was part of the U.S. Department of
+   Energy's Accelerated Strategic Computing Initiative
+   (<acronym>ASCI</acronym>).  The Comprehensive Test Ban Treaty
+   forbid nuclear weapons testing so they were instead simulated.
+   <acronym>ASCI</acronym>'s goal was to radically advance the state
+   of the art in high-performance computing and numerical simulations
+   so the nuclear weapon simulations could use 100-teraflop
+   computers.  A linear accelerator code <application
+   class='software'>linac</application> and a Monte Carlo neutron
+   transport code <application class='software'>MC++</application>
+   were written.
+ <!-- FIXME: Add citation to pooma-siam98. -->
+   </para>
+ 
+   <para>&pooma; 2 involved a new conceptual framework and a
+   complete rewriting of the source code to improve performance.  The
+ <!-- FIXME: Add a citation to iscope98.pdf. -->
+   &array; class was introduced with its use of engines, separating
+   container use from container storage.  An asynchronous scheduler
+   permitted out-of-order execution to improve cache coherency.
+   Incorporating the <application class="software">Portable
+   Expression Template Engine</application> (<acronym>PETE</acronym>)
+   permitted faster loop execution.  Soon, container views and
+   <type>ConstantFunction</type> and <type>IndexFunction</type>
+   engines were added.  Release 2.1.0 included &field;s with
+   their spatial extent and &dynamicarray;s with the ability to
+   dynamically change its domain size.  Support for particles and
+   their interaction with &field;s was added.  The &pooma; messaging
+   implementation was revised in release 2.3.0.  Use of the
+   &cheetah; Library separated &pooma; from the actual messaging
+   library used.  Support for applications running on clusters of
+   computers was added.  During the past two years, the &field;
+   abstraction and implementation was improved to increase its
+   flexibility, add support for multiple values and materials in the
+   same cell, and permit lazy evaluation.  Simultaneously, the
+   execution speed of the inner loops was greatly increased.  The
+   particle code has not yet been ported to the new &field;
+   abstraction.</para>
+  </section>
+ 
+ </chapter>
Index: makefile
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/makefile,v
retrieving revision 1.3
diff -c -p -r1.3 makefile
*** makefile	2001/12/14 04:18:13	1.3
--- makefile	2001/12/17 16:56:51
*************** CXXFLAGS= -g -Wall -pedantic -W -Wstrict
*** 25,31 ****

  all: manual.ps

! manual.dvi: manual.xml concepts.xml tutorial.xml

  %.all:	%.ps %.pdf %.html
  	chmod 644 $*.ps $*.pdf
--- 25,31 ----

  all: manual.ps

! manual.dvi: manual.xml introduction.xml tutorial.xml concepts.xml glossary.xml

  %.all:	%.ps %.pdf %.html
  	chmod 644 $*.ps $*.pdf
Index: manual.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v
retrieving revision 1.3
diff -c -p -r1.3 manual.xml
*** manual.xml	2001/12/14 04:18:13	1.3
--- manual.xml	2001/12/17 16:56:54
***************
*** 30,35 ****
--- 30,37 ----
    <!-- Produce a notation for a double dash.  Without this, TeX produces an en-hyphen. -->
  <!ENTITY doof2d "<command>Doof2d</command>" >
    <!-- Produce a notation for the Doof2d program.  -->
+ <!ENTITY fortran "<application class='software'>Fortran</application>">
+   <!-- Produce a notation for the Fortran programming language.  -->
  <!ENTITY make "<application class='software'>Make</application>">
    <!-- Produce a notation for the GNU Make program.  -->
  <!ENTITY mm "<application class='software'>MM</application>">
***************
*** 42,48 ****
    <!-- Produce a notation for the PETE library.  -->
  <!ENTITY pooma "<application class='software'>POOMA</application>">
    <!-- Produce a notation for Pooma software.  -->
! <!ENTITY poomaToolkit "<application class='software'>POOMA Toolkit</application>">
    <!-- Produce a notation for the Pooma toolkit.  -->
  <!ENTITY purify "<application class='software'>Purify</application>">
    <!-- Produce a notation for the Purify library.  -->
--- 44,50 ----
    <!-- Produce a notation for the PETE library.  -->
  <!ENTITY pooma "<application class='software'>POOMA</application>">
    <!-- Produce a notation for Pooma software.  -->
! <!ENTITY poomatoolkit "<application class='software'>POOMA &toolkitcap;</application>">
    <!-- Produce a notation for the Pooma toolkit.  -->
  <!ENTITY purify "<application class='software'>Purify</application>">
    <!-- Produce a notation for the Purify library.  -->
***************
*** 53,58 ****
--- 55,64 ----
    <!-- Produce a notation for the C++ Standard Template Library software package.  -->
  <!ENTITY tau "<application class='software'>Tau</application>">
    <!-- Produce a notation for the Tau software package.  -->
+ <!ENTITY toolkit "toolkit">
+   <!-- Produce a notation for the name of the Pooma software.  -->
+ <!ENTITY toolkitcap "Toolkit">
+   <!-- Produce a capitalized version of &toolkit;.  -->

  <!-- Type Entity Declarations -->

***************
*** 74,79 ****
--- 80,88 ----
    <!-- The "DynamicArray" type. -->
  <!ENTITY engine "<type>Engine</type>">
    <!-- The "Engine" type. -->
+ <!ENTITY false "<statement>false</statement>">
+   <!-- The false Boolean value. -->
+   <!-- Modify its tag to the appropriate one. -->
  <!ENTITY field "<type>Field</type>">
    <!-- The "Field" type. -->
  <!ENTITY inform "<type>Inform</type>">
***************
*** 88,99 ****
--- 97,113 ----
    <!-- The Pooma matrix type. -->
  <!ENTITY multipatch "<type>MultiPatch</type>">
    <!-- The "MultiPatch" engine without template parameters. -->
+ <!ENTITY options "<type>Options</type>">
+   <!-- The &pooma; options type. -->
  <!ENTITY replicatedtag "<type>ReplicatedTag</type>">
    <!-- The ReplicatedTag Layout type. -->
  <!ENTITY stencil "<type>Stencil</type>">
    <!-- The "Stencil" type. -->
  <!ENTITY tensor "<type>Tensor</type>">
    <!-- The Pooma tensor type. -->
+ <!ENTITY true "<statement>true</statement>">
+   <!-- The true Boolean value. -->
+   <!-- Modify its tag to the appropriate one. -->
  <!ENTITY vector "<type>Vector</type>">
    <!-- The "Vector" type. -->

***************
*** 135,171 ****
    <!-- spelling: dependence, not dependency -->
    <!-- spelling: element-wise, not elementwise -->
    <!-- phrase: function object, not functor -->
    <!-- spelling: multidimensional, not multi-dimensional -->
    <!-- spelling: multiprocessor, not multi-processor -->
    <!-- spelling: nonzero, not non-zero -->

  <!-- External File Entities -->
! <!ENTITY doof2d-c-element SYSTEM "./programs/Doof2d-C-element-annotated.cpp">
    <!-- hand-coded Doof2d implementation -->
! <!ENTITY doof2d-array-element SYSTEM "./programs/Doof2d-Array-element-annotated.cpp">
    <!-- Array element-wise Doof2d implementation -->
! <!ENTITY doof2d-array-parallel SYSTEM "./programs/Doof2d-Array-parallel-annotated.cpp">
    <!-- Array data-parallel Doof2d implementation -->
! <!ENTITY doof2d-array-stencil SYSTEM "./programs/Doof2d-Array-stencil-annotated.cpp">
    <!-- Array stencil Doof2d implementation -->
! <!ENTITY doof2d-array-distributed SYSTEM "./programs/Doof2d-Array-distributed-annotated.cpp">
    <!-- distributed Array stencil Doof2d implementation -->
! <!ENTITY doof2d-field-parallel SYSTEM "./programs/Doof2d-Field-parallel-annotated.cpp">
    <!-- Field data-parallel Doof2d implementation -->
! <!ENTITY doof2d-field-distributed SYSTEM "./programs/Doof2d-Field-distributed-annotated.cpp">
    <!-- Field data-parallel distributed Doof2d implementation -->
! <!ENTITY concepts-chapter SYSTEM "concepts.xml">
!   <!-- Pooma concepts chapter -->
! <!ENTITY glossary-chapter SYSTEM "glossary.xml">
!   <!-- glossary -->
! <!ENTITY tutorial-chapter SYSTEM "tutorial.xml">
!   <!-- Doof2d tutorial programs chapter -->
  ]>

  <book>
   <bookinfo>
    <title>&pooma;</title>
!   <subtitle>A &cc; Toolkit for High-Performance Parallel Scientific Computing</subtitle>
    <author><firstname>Jeffrey</firstname><othername
    role='mi'>D.</othername><surname>Oldham</surname>
     <affiliation>
--- 149,195 ----
    <!-- spelling: dependence, not dependency -->
    <!-- spelling: element-wise, not elementwise -->
    <!-- phrase: function object, not functor -->
+   <!-- spelling: interprocessor, not inter-processor -->
    <!-- spelling: multidimensional, not multi-dimensional -->
    <!-- spelling: multiprocessor, not multi-processor -->
    <!-- spelling: nonzero, not non-zero -->

+ <!-- External Chapters -->
+ <!ENTITY concepts-chapter SYSTEM "concepts.xml">
+   <!-- Pooma concepts chapter -->
+ <!ENTITY glossary-chapter SYSTEM "glossary.xml">
+   <!-- glossary -->
+ <!ENTITY introductory-chapter SYSTEM "introduction.xml">
+   <!-- Doof2d introductory chapter -->
+ <!ENTITY tutorial-chapter SYSTEM "tutorial.xml">
+   <!-- Doof2d tutorial programs chapter -->
+ 
  <!-- External File Entities -->
! <!-- Doof2d Programs -->
! <!ENTITY doof2d-c-element SYSTEM "./programs/examples/Doof2d/Doof2d-C-element-annotated.cpp">
    <!-- hand-coded Doof2d implementation -->
! <!ENTITY doof2d-array-element SYSTEM "./programs/examples/Doof2d/Doof2d-Array-element-annotated.cpp">
    <!-- Array element-wise Doof2d implementation -->
! <!ENTITY doof2d-array-parallel SYSTEM "./programs/examples/Doof2d/Doof2d-Array-parallel-annotated.cpp">
    <!-- Array data-parallel Doof2d implementation -->
! <!ENTITY doof2d-array-stencil SYSTEM "./programs/examples/Doof2d/Doof2d-Array-stencil-annotated.cpp">
    <!-- Array stencil Doof2d implementation -->
! <!ENTITY doof2d-array-distributed SYSTEM "./programs/examples/Doof2d/Doof2d-Array-distributed-annotated.cpp">
    <!-- distributed Array stencil Doof2d implementation -->
! <!ENTITY doof2d-field-parallel SYSTEM "./programs/examples/Doof2d/Doof2d-Field-parallel-annotated.cpp">
    <!-- Field data-parallel Doof2d implementation -->
! <!ENTITY doof2d-field-distributed SYSTEM "./programs/examples/Doof2d/Doof2d-Field-distributed-annotated.cpp">
    <!-- Field data-parallel distributed Doof2d implementation -->
! 
! <!-- Sequential Programs -->
! <!ENTITY initialize-finalize SYSTEM "./programs/examples/Sequential/initialize-finalize-annotated.cpp">
!   <!-- illustrate initialize() and finalize() -->
  ]>

  <book>
   <bookinfo>
    <title>&pooma;</title>
!   <subtitle>A &cc; &toolkitcap; for High-Performance Parallel Scientific Computing</subtitle>
    <author><firstname>Jeffrey</firstname><othername
    role='mi'>D.</othername><surname>Oldham</surname>
     <affiliation>
***************
*** 254,353 ****

   <part id="programming">
    <title>Programming with &pooma;</title>
- 
-   <chapter id="introduction">
-    <title>Introduction</title>
- 
-    <para>QUESTION: Add a partintro to the part above?</para>
- 
-    <para>&pooma; abbreviates <quote>Parallel Object-Oriented Methods
-    and Application</quote>.</para>

!    <para>This document is an introduction to &pooma; v2.1, a &cc;
!    toolkit for high-performance scientific computation.  &pooma;
!    runs efficiently on single-processor desktop machines,
!    shared-memory multiprocessors, and parallel supercomputers
!    containing dozens or hundreds of processors. What's more, by making
!    extensive use of the advanced features of the ANSI/ISO &cc;
!    standard—particularly templates—&pooma; presents a
!    compact, easy-to-read interface to its users.</para>
! 
!    <para>From Section  of
!    <filename>papers/iscope98.pdf</filename>:</para>
! 
!    <para>Scientific software developers have struggled with the need
!    to express mathematical abstractions in an elegant and maintainable
!    way without sacrificing performance.  The &pooma; (Parallel
!    Object-Oriented Methods and Applications) framework, written in
!    <acronym>ANSI</acronym>/<acronym>ISO</acronym> &cc;, has
!    demonstrated both high expressiveness and high performance for
!    large-scale scientific applications on platforms ranging from
!    workstations to massively parallel supercomputers.  &pooma; provides
!    high-level abstractions for multidimensional arrays, physical
!    meshes, mathematical fields, and sets of particles.  &pooma; also
!    exploits techniques such as expression templates to optimize serial
!    performance while encapsulating the details of parallel
!    communication and supporting block-based data compression.
!    Consequently, scientists can quickly assemble parallel simulation
!    codes by focusing directly on the physical abstractions relevant to
!    the system under study and not the technical difficulties of
!    parallel communication and machine-specific optimization.</para>
! 
!    <para>ADD: diagram of science and &pooma;.  See the diagram that
!    Mark and I wrote.</para>
! 
!    <para>Mention efficient evaluation of &pooma; expressions.  See
!    <filename>pooma-publications/iscope98.pdf</filename>,
!    Section 4.</para>
! 
!    <section id="introduction-pooma_evolution">
!     <title>Evolution of &pooma;</title>
! 
!     <para>QUESTION: Is this interesting?  Even if it is, it should be
!     short.</para>
! 
!     <para>The file <filename>papers/SCPaper-95.html</filename>
!     describes ?&pooma;1? and its abstraction layers.</para>
! 
!     <para>The "Introduction" of
!     <filename>papers/Siam0098.ps</filename> describes the DoE's
!     funding motivation for &pooma;: Accelerated Strategic Computing
!     Initiative (ASCI) and Science-based Stockpile Stewardship (SBSS),
!     pp. 1–2.</para>
! 
!     <para>See list of developers on p. 1 of
!     <filename>papers/pooma.ps</filename>.</para>
! 
!     <para>See list of developers on p. 1 of
!     <filename>papers/pooma.ps</filename>.  See history and motivation
!     on p. 3 of <filename>papers/pooma.ps</filename>.</para>

!     <para>Use <filename class="libraryfile">README</filename> for
!     information.</para>

-     <blockquote>
-      <attribution><filename
- 			    class="libraryfile">introduction.html</filename></attribution>
- 
-      <para>&pooma; was designed and implemented by scientists working
-      at the Los Alamos National Laboratory's Advanced Computing
-      Laboratory. Between them, these scientists have written and tuned
-      large applications on almost every commercial and experimental
-      supercomputer built in the last two decades. As the technology
-      used in those machines migrates down into departmental computing
-      servers and desktop multiprocessors, &pooma; is a vehicle for its
-      designers' experience to migrate as well. In particular,
-      &pooma;'s authors understand how to get good performance out of
-      modern architectures, with their many processors and multi-level
-      memory hierarchies, and how to handle the subtly complex problems
-      that arise in real-world applications.</para>
-     </blockquote>
- 
-    </section>
- 
-   </chapter>
- 
- 
    &tutorial-chapter;

    &concepts-chapter;
--- 278,288 ----

   <part id="programming">
    <title>Programming with &pooma;</title>

!

!   &introductory-chapter;

    &tutorial-chapter;

    &concepts-chapter;
***************
*** 356,361 ****
--- 291,305 ----
    <chapter id="sequential">
     <title>Writing Sequential Programs</title>

+    <para>FIXME: Explain the chapter's purpose.
+ HERE</para>
+ 
+    <para>FIXME: Explain the format of each section.
+ HERE</para>
+ 
+    <para>FIXME: Explain the order  of the sections.
+ HERE</para>
+ 
     <para>Proposed order.  Basically follow the order in the proposed
     reference section.
      <orderedlist>
***************
*** 373,380 ****
      </orderedlist>
      Include views of containers in the appropriate sections.</para>

- <!-- HERE -->
- 
     <para><emphasis>&c;: A Reference Manual</emphasis> uses this
     structure for &c; libraries:
      <orderedlist>
--- 317,322 ----
***************
*** 524,555 ****
       </listitem>
      </orderedlist>
     </para>

!    <para>&pooma; can reorder computations to permit more efficient
!    computation.  When running a sequential program, reordering may
!    permit omission of unneeded computations.  For example, if only
!    values from a particular field are printed, only computations
!    involving the field and containers dependent on it need to occur.
!    When running a distributed program, reordering may permit
!    computation and communication among processors to overlap.  &pooma;
!    automatically tracks dependences between data-parallel expressions,
!    ensuring correct ordering.  It does not track statements accessing
!    particular &array; and &field; values so the programmer must
!    precede these statements with calls to
!    <function>Pooma::blockAndEvaluate()</function>.  Each call forces
!    the executable to wait until all computation has completed.  Thus,
!    the desired values are known to be available.  In practice, some
!    calls to <function>Pooma::blockAndEvaluate</function> may not be
!    necessary, but omitting them requires knowledge of &pooma;'s
!    dependence computations, so the &author; recommends calling
!    <function>Pooma::blockAndEvaluate</function> before each access to
!    a particular value in an &array; or &field;.  Omitting a necessary
!    call may lead to a race condition.  See <xref
!    linkend="debugging_profiling-missing_blockandevaluate"></xref> for
!    instructions how to diagnose and eliminate these race conditions.</para>

     <para>UNFINISHED</para>

     <section id="sequential-benchmarks">
      <title>&benchmark; Programs</title>

--- 466,663 ----
       </listitem>
      </orderedlist>
     </para>
+ 
+    <section id="sequential-begin_end">
+     <title>Beginning and Ending &pooma; Programs</title>
+ 
+     <para>Every &pooma; program must begin with a call to
+     <function>initialize</function> and end with a call to
+     <function>finalize</function>.  These functions respectively
+     prepare and shut down &pooma;'s run-time structures.</para>
+ 
+     <section id="sequential-begin_end-files">
+      <title>Files</title>
+ 
+      <programlisting>
+      #include "Pooma/Pooma.h"  // or "Pooma/Arrays.h" or "Pooma/Fields.h" or ...
+      </programlisting>
+     </section>
+ 
+     <section id="sequential-begin_end-declarations">
+       <title>Declarations</title>
+ 
+      <funcsynopsis>
+       <funcprototype>
+        <funcdef>bool <function>Pooma::initialize</function></funcdef>
+        <paramdef>
+         <parameter class="function">int &argc,</parameter>
+         <parameter class="function">char ** &argv,</parameter>
+         <parameter class="function">bool initRTS = true,</parameter>
+         <parameter class="function">bool getCLArgsArch = true,</parameter>
+         <parameter class="function">bool initArch = true</parameter>
+        </paramdef>
+       </funcprototype>
+ 
+       <funcprototype>
+        <funcdef>bool <function>Pooma::initialize</function></funcdef>
+        <paramdef>
+         <parameter class="function">Pooma::Options &opts,</parameter>
+         <parameter class="function">bool initRTS = true,</parameter>
+         <parameter class="function">bool initArch = true</parameter>
+        </paramdef>
+       </funcprototype>
+ 
+       <funcprototype>
+        <funcdef>bool <function>Pooma::finalize</function></funcdef>
+        <void></void>
+       </funcprototype>
+ 
+       <funcprototype>
+        <funcdef>bool <function>Pooma::finalize</function></funcdef>
+        <paramdef>
+         <parameter class="function">bool quitRTS,</parameter>
+         <parameter class="function">bool quitArch</parameter>
+        </paramdef>
+       </funcprototype>
+      </funcsynopsis>
+     </section>
+ 
+     <section id="sequential-begin_end-description">
+      <title>Description</title>
+ 
+      <para>Before its use, the &poomatoolkit; must be initialized by a
+      call to <function>initialize</function>.  This usually occurs in
+      the <function>main</function> function.  The first form removes
+      and processes any &pooma;-specific arguments from the
+      command-line arguments <varname>argv</varname> and
+      <varname>argc</varname>.  <xref
+      linkend="sequential-options"></xref> describes these options.
+      The third, fourth, and fifth arguments all have a default value
+      of &true;.  If <parameter class="function">initRTS</parameter> is
+      &true;, the run-time system is initialized.  E.g., the contexts
+      are prepared for use.  If <parameter
+      class="function">getCLArgsArch</parameter> is &true,
+      architecture-specific command-line arguments are removed from
+      <varname>argv</varname> and <varname>argc</varname>.
+      Architecture-specific initialization occurs if <parameter
+      class="function">getCLArgsArch</parameter> is &true;.  An <link
+      linkend="glossary-architecture">architecture</link> is specified
+      by a hardware interface, e.g., processor type, but frequently is
+      also associated with an operating system or compiler.  For
+      example, Metrowerks for the Macintosh has an
+      architecture-specific initialization.  The function always
+      returns &true;.</para>
+ 
+      <para><function>initialize</function>'s alternative form
+      assumes the &pooma;-specific and architecture-specific
+      command-line arguments have already been removed from
+      <varname>argv</varname> and <varname>argc</varname> and stored in
+      <parameter class="function">opts</parameter>.  Its other two
+      parameters have the same meaning, and the two functions'
+      semantics are otherwise the same.</para>
+ 
+      <para>After its use, the &poomatoolkit; should be shut down using
+      a call to <function>finalize</function>.  This usually occurs in
+      the <function>main</function> function.  The former, and more
+      frequently used, form first prints any statistics and turns off
+      all default &pooma; streams.  Then it shuts down the run-time
+      system if it was previously initialized and then shuts down
+      architecture-specific objects if they were previously
+      initialized.  The latter form gives provides explicit control
+      whether the run-time system (<parameter
+      class="function">quitRTS</parameter>) and architecture-specific
+      objects (<parameter class="function">quitArch</parameter>) are
+      shut down.  Both functions always returns &true;.</para>
+ 
+      <para>Including almost any &pooma; header file, rather than just
+      <filename class="headerfile">Pooma/Pooma.h</filename> suffices
+      since most other &pooma; header files include it.</para>
+     </section>
+ 
+     <section id="sequential-begin_end-example">
+      <title>Example Program</title>
+ 
+      <para>Since every &pooma; program must call
+      <function>initialize</function> and
+      <function>finalize</function>, the simplest &pooma; program also
+      must call them.  This program also illustrates their usual
+      use.</para>
+ 
+      &initialize-finalize;
+     </section>
+ 
+    </section><!-- end sequential-begin_end -->
+ 
+    <section id="sequential-options">
+     <title>&pooma; Command-line Options</title>
+ 
+     <para>Every &pooma; program accepts a set of &pooma;-specific
+     command-line options to set values at run-time.</para>
+ 
+     <section id="sequential-options-list">
+      <title>Options Summary</title>
+ 
+      <variablelist>
+       <varlistentry>
+        <term><parameter class="option">&dashdash;pooma-info</parameter></term>
+        <listitem>
+ 	<para>
+ HERE  Who uses this?</para>
+        </listitem>
+       </varlistentry>
+ <!-- HERE -->
+      </variablelist>

!      <para>FIXME: Be sure to list default values.</para>
! 
! <!-- HERE -->
! 
!     </section>
! 
! <!-- HERE -->
! 
!     <para>QUESTION: Should I defer documenting &options; to the
!     reference manual, instead just listing commonly used options in
!     the previous section?
! 
! UNFINISHED</para>
! 
!    </section><!-- end sequential-options -->
! 
!    <section>
!     <title>TMP: Place these somewhere.</title>

+     <para>&pooma; can reorder computations to permit more efficient
+     computation.  When running a sequential program, reordering may
+     permit omission of unneeded computations.  For example, if only
+     values from a particular field are printed, only computations
+     involving the field and containers dependent on it need to occur.
+     When running a distributed program, reordering may permit
+     computation and communication among processors to overlap.
+     &pooma; automatically tracks dependences between data-parallel
+     expressions, ensuring correct ordering.  It does not track
+     statements accessing particular &array; and &field; values so the
+     programmer must precede these statements with calls to
+     <function>Pooma::blockAndEvaluate()</function>.  Each call forces
+     the executable to wait until all computation has completed.  Thus,
+     the desired values are known to be available.  In practice, some
+     calls to <function>Pooma::blockAndEvaluate</function> may not be
+     necessary, but omitting them requires knowledge of &pooma;'s
+     dependence computations, so the &author; recommends calling
+     <function>Pooma::blockAndEvaluate</function> before each access to
+     a particular value in an &array; or &field;.  Omitting a necessary
+     call may lead to a race condition.  See <xref
+     linkend="debugging_profiling-missing_blockandevaluate"></xref> for
+     instructions how to diagnose and eliminate these race
+     conditions.</para>
+ 
+     <para>Where talk about various &pooma; streams?</para>
+ 
     <para>UNFINISHED</para>

+    </section>
+ 
+ 
     <section id="sequential-benchmarks">
      <title>&benchmark; Programs</title>

***************
*** 561,573 ****
     </section>

-    <section id="sequential-inform">
-     <title>Using <type>Inform</type>s for Output</title>
- 
-     <para>UNFINISHED</para>
-    </section>
- 
- 
     <section>
      <title>Miscellaneous</title>

--- 669,674 ----
***************
*** 604,610 ****
      &pooma; II's expression trees and expression engines.</para>

      <para>COMMENT: <filename
! 			     class="libraryfile">background.html</filename> has some related
      &pete; material.</para>
     </section>

--- 705,711 ----
      &pooma; II's expression trees and expression engines.</para>

      <para>COMMENT: <filename
!     class="libraryfile">background.html</filename> has some related
      &pete; material.</para>
     </section>

***************
*** 652,659 ****
        in the input domain: A(i1, i2, ..., iN).</para>

       <para>The &pooma; multi-dimensional Array concept is similar to
!       the Fortran 90 array facility, but extends it in several
!       ways. Both &pooma; and Fortran arrays can have up to seven
        dimensions, and can serve as containers for arbitrary
        types. Both support the notion of views of a portion of the
        array, known as array sections in F90. The &pooma; Array concept
--- 753,760 ----
        in the input domain: A(i1, i2, ..., iN).</para>

       <para>The &pooma; multi-dimensional Array concept is similar to
!       the &fortran; 90 array facility, but extends it in several
!       ways. Both &pooma; and &fortran; arrays can have up to seven
        dimensions, and can serve as containers for arbitrary
        types. Both support the notion of views of a portion of the
        array, known as array sections in F90. The &pooma; Array concept
***************
*** 664,673 ****
        depending on the particular type of the Array being
        indexed.</para>

!      <para>Fortran arrays are dense and the elements are arranged
  		   according to column-major conventions. Therefore, X(i1,i2)
        refers to element number i1-1+(i2-1)*numberRowsInA. However, as
!       Fig. 1 shows, Fortran-style "Brick" storage is not the only
        storage format of interest to scientific programmers. For
        compatibility with C conventions, one might want to use an array
        featuring dense, row-major storage (a C-style Brick). To save
--- 765,774 ----
        depending on the particular type of the Array being
        indexed.</para>

!      <para>&fortran; arrays are dense and the elements are arranged
  		   according to column-major conventions. Therefore, X(i1,i2)
        refers to element number i1-1+(i2-1)*numberRowsInA. However, as
!       Fig. 1 shows, &fortran;-style "Brick" storage is not the only
        storage format of interest to scientific programmers. For
        compatibility with C conventions, one might want to use an array
        featuring dense, row-major storage (a C-style Brick). To save
***************
*** 691,697 ****
       themselves in the template parameters for the &pooma; Array
       class. The template
       <programlisting>
! 		      template <int Dim, class T = double, class EngineTag = Brick>
       class Array;
       </programlisting>
       is a specification for creating a set of classes all named
--- 792,798 ----
       themselves in the template parameters for the &pooma; Array
       class. The template
       <programlisting>
!      template <int Dim, class T = double, class EngineTag = Brick>
       class Array;
       </programlisting>
       is a specification for creating a set of classes all named
***************
*** 771,777 ****
      general Engine template whose template parameters are identical to
      those of Array. Next, the Array template determines the type of
      scalar arguments (indices) to be used in operator(), the function
!     that implements &pooma;'s Fortran-style indexing syntax X(i1,i2):
      <programlisting>
      typedef typename Engine_t::Index_t Index_t;
      </programlisting>
--- 872,878 ----
      general Engine template whose template parameters are identical to
      those of Array. Next, the Array template determines the type of
      scalar arguments (indices) to be used in operator(), the function
!     that implements &pooma;'s &fortran;-style indexing syntax X(i1,i2):
      <programlisting>
      typedef typename Engine_t::Index_t Index_t;
      </programlisting>
***************
*** 816,822 ****
      framework.</para>

      <para>Figure 3 illustrates the "Brick" specialization of the
!     Engine template, which implements Fortran-style lookup into a
      block of memory. First, there is the general Engine template,
      which is empty as there is no default behavior for an unknown
      EngineTag. The general template is therefore not a model for the
--- 917,923 ----
      framework.</para>

      <para>Figure 3 illustrates the "Brick" specialization of the
!     Engine template, which implements &fortran;-style lookup into a
      block of memory. First, there is the general Engine template,
      which is empty as there is no default behavior for an unknown
      EngineTag. The general template is therefore not a model for the
***************
*** 826,832 ****
      specialization of the Engine template. Finally, there is the
      partial specialization of the Engine template. Examining its body,
      we see the required Index_t typedef and the required operator(),
!     which follows the Fortran prescription for generating an offset
      into the data block based on the row, column, and the number of
      rows. All of the requirements are met, so the Brick-Engine class
      is a model of the Engine concept.</para>
--- 927,933 ----
      specialization of the Engine template. Finally, there is the
      partial specialization of the Engine template. Examining its body,
      we see the required Index_t typedef and the required operator(),
!     which follows the &fortran; prescription for generating an offset
      into the data block based on the row, column, and the number of
      rows. All of the requirements are met, so the Brick-Engine class
      is a model of the Engine concept.</para>
***************
*** 1899,1904 ****
--- 2000,2023 ----
      <title>TMP: What do we do with these …? Remove this
      section.</title>

+     <blockquote>
+      <attribution><filename
+       class="libraryfile">introduction.html</filename></attribution>
+ 
+      <para>&pooma; was designed and implemented by scientists working
+      at the Los Alamos National Laboratory's Advanced Computing
+      Laboratory. Between them, these scientists have written and tuned
+      large applications on almost every commercial and experimental
+      supercomputer built in the last two decades. As the technology
+      used in those machines migrates down into departmental computing
+      servers and desktop multiprocessors, &pooma; is a vehicle for its
+      designers' experience to migrate as well. In particular,
+      &pooma;'s authors understand how to get good performance out of
+      modern architectures, with their many processors and multi-level
+      memory hierarchies, and how to handle the subtly complex problems
+      that arise in real-world applications.</para>
+     </blockquote>
+ 
      <para>QUESTION: Do we describe the &leaffunctor;s specialized for
      &array;s in <filename
      class="headerfile">src/Array/Array.h</filename> or in the &pete;
***************
*** 2879,2898 ****
    </chapter>

!   <chapter id="where-place-these_ref">
     <title>TMP: Where do we describe these files?</title>

     <itemizedlist>
      <listitem>
       <para><filename
! 		     class="headerfile">src/Utilities/Conform.h</filename>: tag for
       checking whether terms in expression have conforming
       domains</para>
      </listitem>

      <listitem>
       <para><filename
! 		     class="headerfile">src/Utilities/DerefIterator.h</filename>:
       <type>DerefIterator<T></type> and
       <type>ConstDerefIterator<T></type> automatically
       dereference themselves to maintain <literal>const</literal>
--- 2998,3017 ----
    </chapter>

!   <chapter id="where_place_these_ref">
     <title>TMP: Where do we describe these files?</title>

     <itemizedlist>
      <listitem>
       <para><filename
!      class="headerfile">src/Utilities/Conform.h</filename>: tag for
       checking whether terms in expression have conforming
       domains</para>
      </listitem>

      <listitem>
       <para><filename
!      class="headerfile">src/Utilities/DerefIterator.h</filename>:
       <type>DerefIterator<T></type> and
       <type>ConstDerefIterator<T></type> automatically
       dereference themselves to maintain <literal>const</literal>
***************
*** 2901,2910 ****

      <listitem>
       <para><filename
! 		     class="headerfile">src/Utilities/Observable.h</filename>,
       <filename class="headerfile">src/Utilities/Observer.h</filename>,
       and <filename
! 		   class="headerfile">src/Utilities/ObserverEvent.h</filename>:
       <type>Observable<T></type>,
       <type>SingleObserveable<T></type>,
       <type>Observer<T></type>, and <type>ObserverEvent</type>
--- 3020,3029 ----

      <listitem>
       <para><filename
!      class="headerfile">src/Utilities/Observable.h</filename>,
       <filename class="headerfile">src/Utilities/Observer.h</filename>,
       and <filename
!      class="headerfile">src/Utilities/ObserverEvent.h</filename>:
       <type>Observable<T></type>,
       <type>SingleObserveable<T></type>,
       <type>Observer<T></type>, and <type>ObserverEvent</type>
***************
*** 2915,2920 ****
--- 3034,3053 ----

    </chapter>

+ 
+   <chapter id="needed_reference_items_ref">
+    <title>TMP: Items to Discuss in Reference Manual</title>
+ 
+    <itemizedlist>
+     <listitem>
+      <para>Discuss &options; and related material.  Add developer
+      command-line options listed in <filename
+      class="library">Utilities/Options.cmpl.cpp</filename> and also
+      possibly <parameter class="option">&dashdash;pooma-threads
+      <replaceable>n</replaceable></parameter>.</para>
+     </listitem>
+    </itemizedlist>
+   </chapter>
   </part>

***************
*** 2946,2952 ****

      <para>Section 3, "Sample Applications" of
      <filename>papers/SiamOO98_paper.ps</filename> describes porting a
!     particle program written using High-Performance Fortran to
      &pooma; and presumably why particles were added to &pooma;.  It
      also describes <application>MC++</application>, a Monte Carlo
      neutron transport code.</para>
--- 3079,3085 ----

      <para>Section 3, "Sample Applications" of
      <filename>papers/SiamOO98_paper.ps</filename> describes porting a
!     particle program written using High-Performance &fortran; to
      &pooma; and presumably why particles were added to &pooma;.  It
      also describes <application>MC++</application>, a Monte Carlo
      neutron transport code.</para>
***************
*** 3332,3338 ****

      <listitem>
       <para>QUESTION: How do &pooma; parallel concepts compare with
!      Fortran D or high-performance Fortran FINISH CITE:
       {koelbel94:_high_perfor_fortr_handb}?</para>
      </listitem>

--- 3465,3471 ----

      <listitem>
       <para>QUESTION: How do &pooma; parallel concepts compare with
!      &fortran; D or high-performance &fortran; FINISH CITE:
       {koelbel94:_high_perfor_fortr_handb}?</para>
      </listitem>

***************
*** 3500,3505 ****
--- 3633,3856 ----
     <title>Using MPI</title>
     <subtitle>Portable Parallel Programming with the Message-Passing Interface</subtitle>
     <edition>second edition</edition>
+   </biblioentry>
+ 
+   <biblioentry>
+    <abbrev>pooma95</abbrev>
+    <authorgroup>
+     <author>
+      <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
+      <affiliation>
+       <orgname>Dakota Software Systems, Inc.</orgname>
+       <address><city>Rapid City</city><state>SD</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Susan</firstname><othername role="mi">R.</othername><surname>Atlas</surname>
+      <affiliation>
+       <orgname>Parallel Solutions, Inc.</orgname>
+       <address><city>Santa Fe</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Subhankar</firstname><surname>Banerjee</surname>
+      <affiliation>
+       <orgname>New Mexico State University</orgname>
+       <address><city>Las Cruces</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
+      <affiliation>
+       <orgname>University of Illinois at Urbana-Champaign</orgname>
+       <address><city>Urbana-Champaign</city><state>IL</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
+      <affiliation>
+       <orgname>California Institute of Technology</orgname>
+       <address><city>Pasadena</city><state>CA</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Katarzyna</firstname><surname>Keahey</surname>
+      <affiliation>
+       <orgname>Indiana University</orgname>
+       <address><city>Bloomington</city><state>IN</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Marydell</firstname><surname>Tholburn</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+    </authorgroup>
+    <title>&pooma;</title>
+    <subtitle>A Framework for Scientific Simulation on Parallel Architectures</subtitle>
+    <releaseinfo>unpublished</releaseinfo>
+   </biblioentry>
+ 
+   <biblioentry>
+    <abbrev>pooma-sc95</abbrev>
+    <authorgroup>
+     <author>
+      <firstname>Susan</firstname><surname>Atlas</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Subhankar</firstname><surname>Banerjee</surname>
+      <affiliation>
+       <orgname>New Mexico State University</orgname>
+       <address><city>Las Cruces</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
+      <affiliation>
+       <orgname>Advanced Computing Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>M.</firstname><surname>Srikant</surname>
+      <affiliation>
+       <orgname>New Mexico State University</orgname>
+       <address><city>Las Cruces</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Marydell</firstname><surname>Tholburn</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+    </authorgroup>
+    <title>&pooma;</title>
+    <subtitle>A High Performance Distributed Simulation Environment for
+    Scientific Applications</subtitle>
+ <!-- FIXME: Where list Supercomputing 1995? -->
+   </biblioentry>
+ 
+   <biblioentry>
+    <abbrev>pooma-siam98</abbrev>
+    <authorgroup>
+     <author>
+      <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>James</firstname><othername role="mi">A.</othername><surname>Crotinger</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Scott</firstname><othername role="mi">W.</othername><surname>Haney</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Stephen</firstname><othername role="mi">A.</othername><surname>Smith</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+     <author>
+      <firstname>Timothy</firstname><othername role="mi">J.</othername><surname>Williams</surname>
+      <affiliation>
+       <orgname>Los Alamos National Laboratory</orgname>
+       <address><city>Los Alamos</city><state>NM</state></address>
+      </affiliation>
+     </author>
+    </authorgroup>
+    <title>Raid Application Development and Enhanced Code
+    Interoperability using the &pooma; Framework</title>
+ <!-- FIXME: Where list SIAM Workshop ... 1998? -->
+   </biblioentry>
+ 
+   <biblioentry>
+ <!-- FIXME: Change the year when we learn it. -->
+    <abbrev>pete-99</abbrev>
+    <authorgroup>
+     <author>
+      <firstname>Scott</firstname><surname>Haney</surname>
+     </author>
+     <author>
+      <firstname>James</firstname><surname>Crotinger</surname>
+     </author>
+     <author>
+      <firstname>Steve</firstname><surname>Karmesin</surname>
+     </author>
+     <author>
+      <firstname>Stephen</firstname><surname>Smith</surname>
+     </author>
+    </authorgroup>
+    <title>Easy Expression Templates Using &pete;: The Portable
+    Expression Template Engine</title>
+ <!-- FIXME: When and where was this published? -->
    </biblioentry>
   </bibliography>

Index: tutorial.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v
retrieving revision 1.2
diff -c -p -r1.2 tutorial.xml
*** tutorial.xml	2001/12/14 04:18:13	1.2
--- tutorial.xml	2001/12/17 16:56:55
***************
*** 36,42 ****
     </listitem>
     <listitem>
      <para>a data-parallel &pooma; &field; implementation for
!     multi-processor execution.</para>
     </listitem>
    </itemizedlist>
   </para>
--- 36,42 ----
     </listitem>
     <listitem>
      <para>a data-parallel &pooma; &field; implementation for
!     multiprocessor execution.</para>
     </listitem>
    </itemizedlist>
   </para>
***************
*** 94,100 ****
   zero.</para>

   <para>Before presenting various implementations of %doof2d;, we
!  explain how to install the &poomaToolkit;.</para>

   <para>REMOVE: &doof2d; algorithm and code is illustrated in
   Section 4.1 of
--- 94,100 ----
   zero.</para>

   <para>Before presenting various implementations of %doof2d;, we
!  explain how to install the &poomatoolkit;.</para>

   <para>REMOVE: &doof2d; algorithm and code is illustrated in
   Section 4.1 of
***************
*** 111,117 ****
    <quote>LINUXgcc.conf</quote> is not available.</para>

    <para>In this section, we describe how to obtain, build, and
!   install the &poomaToolkit;.  We focus on installing under the
    Unix operating system.  Instructions for installing on computers
    running Microsoft Windows or MacOS, as well as more extensive
    instructions for Unix, appear in <xref
--- 111,117 ----
    <quote>LINUXgcc.conf</quote> is not available.</para>

    <para>In this section, we describe how to obtain, build, and
!   install the &poomatoolkit;.  We focus on installing under the
    Unix operating system.  Instructions for installing on computers
    running Microsoft Windows or MacOS, as well as more extensive
    instructions for Unix, appear in <xref
***************
*** 142,148 ****
    <command>&dashdash;arch</command> option is the name of the corresponding
    configuration file, omitting its <filename
    class="libraryfile">.conf</filename> suffix.  The
!   <command>&dashdash;opt</command> indicates the &poomaToolkit; will
    contain optimized source code, which makes the code run more
    quickly but may impede debugging.  Alternatively, the
    <command>&dashdash;debug</command> option supports debugging.  The
--- 142,148 ----
    <command>&dashdash;arch</command> option is the name of the corresponding
    configuration file, omitting its <filename
    class="libraryfile">.conf</filename> suffix.  The
!   <command>&dashdash;opt</command> indicates the &poomatoolkit; will
    contain optimized source code, which makes the code run more
    quickly but may impede debugging.  Alternatively, the
    <command>&dashdash;debug</command> option supports debugging.  The
***************
*** 178,184 ****
   <section id="tutorial-hand_coded">
    <title>Hand-Coded Implementation</title>

!   <para>Before implementing &doof2d; using the &poomaToolkit;, we
    present a hand-coded implementation of &doof2d;.  See <xref
    linkend="tutorial-hand_coded-doof2d"></xref>.  After querying the
    user for the number of averagings, the arrays' memory is
--- 178,184 ----
   <section id="tutorial-hand_coded">
    <title>Hand-Coded Implementation</title>

!   <para>Before implementing &doof2d; using the &poomatoolkit;, we
    present a hand-coded implementation of &doof2d;.  See <xref
    linkend="tutorial-hand_coded-doof2d"></xref>.  After querying the
    user for the number of averagings, the arrays' memory is
***************
*** 290,296 ****
   <section id="tutorial-array_elementwise">
    <title>Element-wise &array; Implementation</title>

!   <para>The simplest way to use the &poomaToolkit; is to
    use the &pooma; &array; class instead of &c; arrays.  &array;s
    automatically handle memory allocation and deallocation, support a
    wider variety of assignments, and can be used in expressions.
--- 290,296 ----
   <section id="tutorial-array_elementwise">
    <title>Element-wise &array; Implementation</title>

!   <para>The simplest way to use the &poomatoolkit; is to
    use the &pooma; &array; class instead of &c; arrays.  &array;s
    automatically handle memory allocation and deallocation, support a
    wider variety of assignments, and can be used in expressions.
***************
*** 309,315 ****
       class="headerfile">Pooma/Arrays.h</filename> must be included.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-pooma_initialize">
!      <para>The &poomaToolkit; structures must be constructed before
       their use.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-domain">
--- 309,315 ----
       class="headerfile">Pooma/Arrays.h</filename> must be included.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-pooma_initialize">
!      <para>The &poomatoolkit; structures must be constructed before
       their use.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-domain">
***************
*** 347,359 ****
       memory leaks.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-pooma_finish">
!      <para>The &poomaToolkit; structures must be destructed after
       their use.</para>
      </callout>
     </calloutlist>
    </example>

!   <para>We describe the use of &array; and the &poomaToolkit; in
    <xref linkend="tutorial-array_elementwise-doof2d"></xref>.
    &array;s, declared in the <filename
    class="headerfile">Pooma/Arrays.h</filename>, are first-class
--- 347,359 ----
       memory leaks.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-pooma_finish">
!      <para>The &poomatoolkit; structures must be destructed after
       their use.</para>
      </callout>
     </calloutlist>
    </example>

!   <para>We describe the use of &array; and the &poomatoolkit; in
    <xref linkend="tutorial-array_elementwise-doof2d"></xref>.
    &array;s, declared in the <filename
    class="headerfile">Pooma/Arrays.h</filename>, are first-class
***************
*** 391,398 ****
    elements.  This is possible because the array knows the extent of
    its domain.</para>

!   <para>Any program using the &poomaToolkit; must initialize the
!   toolkit's data structures using
    <statement>Pooma::initialize(argc,argv)</statement>.  This
    extracts &pooma;-specific command-line options from the
    command-line arguments in <varname>argv</varname> and initializes
--- 391,398 ----
    elements.  This is possible because the array knows the extent of
    its domain.</para>

!   <para>Any program using the &poomatoolkit; must initialize the
!   &toolkit;'s data structures using
    <statement>Pooma::initialize(argc,argv)</statement>.  This
    extracts &pooma;-specific command-line options from the
    command-line arguments in <varname>argv</varname> and initializes
***************
*** 408,414 ****

    <para>&pooma; supports data-parallel &array; accesses.  Many
    algorithms are more easily expressed using data-parallel
!   expressions.  Also, the &poomaToolkit; might be able to reorder
    the data-parallel computations to be more efficient or distribute
    them among various processors.  In this section, we concentrate
    the differences between the data-parallel implementation of
--- 408,414 ----

    <para>&pooma; supports data-parallel &array; accesses.  Many
    algorithms are more easily expressed using data-parallel
!   expressions.  Also, the &poomatoolkit; might be able to reorder
    the data-parallel computations to be more efficient or distribute
    them among various processors.  In this section, we concentrate
    the differences between the data-parallel implementation of
***************
*** 618,624 ****
    indicates a particular dimension.  Index parameters
    <varname>i</varname> and <varname>j</varname> are in dimension 0
    and 1.  <methodname>upperExtent</methodname> serves an
!   analogous purpose.  The &poomaToolkit; uses these functions when
    distribution computation among various processors, but it does not
    use these functions to ensure nonexistent &array; values are not
    accessed.  Caveat stencil user!</para>
--- 618,624 ----
    indicates a particular dimension.  Index parameters
    <varname>i</varname> and <varname>j</varname> are in dimension 0
    and 1.  <methodname>upperExtent</methodname> serves an
!   analogous purpose.  The &poomatoolkit; uses these functions when
    distribution computation among various processors, but it does not
    use these functions to ensure nonexistent &array; values are not
    accessed.  Caveat stencil user!</para>
***************
*** 632,638 ****
    To convert a program designed for uniprocessor execution to a
    program designed for multiprocessor execution, the programmer need
    only specify how each container's domain should be split into
!   <quote>patches</quote>.  The &poomaToolkit; automatically
    distributes the data among the available processors and handles
    any required communication among processors.</para>

--- 632,638 ----
    To convert a program designed for uniprocessor execution to a
    program designed for multiprocessor execution, the programmer need
    only specify how each container's domain should be split into
!   <quote>patches</quote>.  The &poomatoolkit; automatically
    distributes the data among the available processors and handles
    any required communication among processors.</para>

***************
*** 746,761 ****
    configured for the particular run-time system.  See <xref
    linkend="installation-distributed_computing"></xref>.</para>

!   <para>A <firstterm>layout</firstterm> combines patches with
!   contexts so the program can be executed.  If &distributedtag; is
!   specified, the patches are distributed among the available
!   contexts.  If &replicatedtag; is specified, each set of patches is
!   replicated among each context.  Regardless, the containers'
!   domains are now distributed among the contexts so the program can
!   run.  When a patch needs data from another patch, the &pooma;
!   toolkit sends messages to the desired patch uses a message-passing
!   library.  All such communication is automatically performed by the
!   toolkit with no need for programmer or user input.</para>

    <para>FIXME: The two previous paragraphs demonstrate confusion
    between <quote>run-time system</quote> and <quote>message-passing
--- 746,761 ----
    configured for the particular run-time system.  See <xref
    linkend="installation-distributed_computing"></xref>.</para>

!   <para>A <firstterm>layout</firstterm> combines patches with contexts
!   so the program can be executed.  If &distributedtag; is specified,
!   the patches are distributed among the available contexts.  If
!   &replicatedtag; is specified, each set of patches is replicated
!   among each context.  Regardless, the containers' domains are now
!   distributed among the contexts so the program can run.  When a patch
!   needs data from another patch, the &poomatoolkit; sends messages to
!   the desired patch uses a message-passing library.  All such
!   communication is automatically performed by the &toolkit; with no need
!   for programmer or user input.</para>

    <para>FIXME: The two previous paragraphs demonstrate confusion
    between <quote>run-time system</quote> and <quote>message-passing
***************
*** 803,811 ****
    ></type> or <type>MultiPatch<UniformTag,
    Remote<CompressibleBrick> ></type> engines.</para>

!   <para>The computations for a distributed implementation are
!   exactly the same as for a sequential implementation.  The &pooma;
!   Toolkit and a message-passing library automatically perform all
    computation.</para>

    <para>Input and output for distributed programs is different than
--- 803,811 ----
    ></type> or <type>MultiPatch<UniformTag,
    Remote<CompressibleBrick> ></type> engines.</para>

!   <para>The computations for a distributed implementation are exactly
!   the same as for a sequential implementation.  The &poomatoolkit; and
!   a message-passing library automatically perform all
    computation.</para>

    <para>Input and output for distributed programs is different than
***************
*** 988,994 ****
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-mesh">
       <para>The mesh and centering declarations are the same for
!      uniprocessor and multi-processor implementations.</para>
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-remote">
       <para>The <type>MultiPatch</type> &engine; distributes requests
--- 988,994 ----
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-mesh">
       <para>The mesh and centering declarations are the same for
!      uniprocessor and multiprocessor implementations.</para>
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-remote">
       <para>The <type>MultiPatch</type> &engine; distributes requests
***************
*** 1032,1038 ****
      </listitem>
      <listitem>
       <para>The computation for uniprocessor or distributed
!      implementations remains the same.  The &pooma; toolkit
       automatically handles all communication necessary to ensure
       up-to-date values are available when needed.</para>
      </listitem>
--- 1032,1038 ----
      </listitem>
      <listitem>
       <para>The computation for uniprocessor or distributed
!      implementations remains the same.  The &poomatoolkit;
       automatically handles all communication necessary to ensure
       up-to-date values are available when needed.</para>
      </listitem>
Index: figures/introduction.mp
===================================================================
RCS file: introduction.mp
diff -N introduction.mp
*** /dev/null	Fri Mar 23 21:37:44 2001
--- introduction.mp	Mon Dec 17 09:56:55 2001
***************
*** 0 ****
--- 1,194 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec14
+ %% Pooma
+ 
+ %% Illustrations for Introduction
+ 
+ %% Assumes TEX=latex.
+ 
+ input boxes;
+ 
+ verbatimtex
+ \documentclass[10pt]{article}
+ \usepackage{amsmath}
+ \input{macros.ltx}
+ \begin{document}
+ etex
+ 
+ %% Relationship between science, computational science, and Pooma.
+ beginfig(101)
+   numeric unit; unit = 0.8cm;
+   numeric horizSpace; horizSpace = 8unit;
+   numeric vertSpace; vertSpace = unit;
+   numeric nuBoxes;		% number of boxes
+ 
+   % Ensure a list of boxes all have the same width.
+   % input <- suffixes for the boxes;
+   % output-> all boxes have the same width (maximum picture width + defaultdx)
+   vardef samewidth(suffix $)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight($)(t);
+     numericSetWidth(xpart(p_)+2defaultdx)($)(t);
+   enddef;
+   
+   % Ensure a list of boxes all have the same height.
+   % input <- suffixes for the boxes;
+   % output-> all boxes have the same height (maximum picture height + defaultdy)
+   vardef sameheight(suffix $)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight($)(t);
+     numericSetWidth(ypart(p_)+2defaultdy)($)(t);
+   enddef;
+   
+   % Given a list of boxes, determine the maximum picture width and
+   % maximum picture height.
+   % input <- suffixes for the boxes
+   % output-> pair of maximum picture width and height
+   vardef maxWidthAndHeight(suffix f)(text t) =
+     save w_, h_; numeric w_, h_;
+     w_ = xpart((urcorner pic_.f - llcorner pic_.f));
+     h_ = ypart((urcorner pic_.f - llcorner pic_.f));
+     forsuffixes uu = t:
+       if xpart((urcorner pic_.uu - llcorner pic_.uu)) > w_ :
+ 	w_ := xpart((urcorner pic_.uu - llcorner pic_.uu));
+       fi
+       if ypart((urcorner pic_.uu - llcorner pic_.uu)) > h_ :
+ 	h_ := ypart((urcorner pic_.uu - llcorner pic_.uu));
+       fi
+     endfor
+     (w_, h_)
+   enddef;
+ 
+   % Given a width, ensure a box has the given width.
+   % input <- box width
+   %          suffix for the one box
+   % output-> the box has the given width by setting its .dx
+   vardef numericSetWidthOne(expr width)(suffix f) =
+     f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
+   enddef;
+   
+   % Given a width, ensure all boxes have the given width.
+   % input <- box width
+   %          suffixes for the boxes
+   % output-> all boxes have the given width by setting their .dx
+   vardef numericSetWidth(expr width)(suffix f)(text t) =
+     f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
+     forsuffixes $ = t:
+       $.dx = 0.5(width - xpart(urcorner pic_.$ - llcorner pic_.$));
+     endfor
+   enddef;
+ 
+   % Given a height, ensure all boxes have the given height.
+   % input <- box height
+   %          suffixes for the boxes
+   % output-> all boxes have the given height by setting their .dx
+   vardef numericSetHeight(expr height)(suffix f)(text t) =
+     f.dy = 0.5(height - ypart(urcorner pic_.f - llcorner pic_.f));
+     forsuffixes $ = t:
+       $.dy = 0.5(height - ypart(urcorner pic_.$ - llcorner pic_.$));
+     endfor
+   enddef;
+   
+   % Ensure a list of boxes and circles all to have the same width, height,
+   % and diameter.
+   % input <- suffixes for the boxes and circles
+   % output-> all boxes have .dx and .dy set so they have the same width,
+   %           height, and radius
+   % The boxes are squares and the circles are circular, not oval.
+   vardef sameWidthAndHeight(suffix f)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight(f)(t);
+     if (xpart(p_)+2defaultdx >= ypart(p_)+2defaultdy):
+       numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
+       numericSetHeight(xpart(p_)+2defaultdx)(f)(t);
+     else:
+       numericSetWidth(ypart(p_)+2defaultdy)(f)(t);
+       numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
+     fi
+   enddef;
+ 
+   % Ensure a list of boxes and circles all to have the same width and
+   % the same height.  Unlike sameWidthAndHeight, the width and height
+   % can differ.
+   % input <- suffixes for the boxes and circles
+   % output-> all boxes have .dx and .dy set so they have the same width,
+   %           height, and radius
+   % The boxes are squares and the circles are circular, not oval.
+   vardef sameWidthSameHeight(suffix f)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight(f)(t);
+     numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
+     numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
+   enddef;
+ 
+   % Create the boxes.
+   boxit.b0(btex \textsl{science / math} etex);
+   boxit.b1(btex \textsl{algorithms} etex);
+   boxit.b2(btex \textsl{engineering} etex);
+   boxit.b3(btex \strut $\real^{\dimension} \maps \text{values}$ etex);
+   boxit.b4(btex \strut $\text{discrete space} \maps \text{values}$ etex);
+   boxit.b5(btex \strut $(\text{layout}, \text{engine}) \maps \text{values}$ etex);
+   boxit.b6(btex \strut linear algebra etex);
+   boxit.b7(btex \strut $\naturalNus^{\dimension} \maps \text{values}$ etex);
+   boxit.b8(btex etex);
+   nuBoxes = 8;
+   boxit.b9(btex \textsl{implementation} etex);
+   sameWidthSameHeight(b3,b4,b5,b6,b7,b8);
+   for t = 0 upto nuBoxes+1:
+     fixsize(b[t]);
+   endfor
+   
+   % Position the boxes.
+   b0.c = origin;
+   for t = 0 step 3 until nuBoxes:
+     b[t+2].c - b[t+1].c = b[t+1].c - b[t].c = (horizSpace, 0);
+   endfor
+   for t = 0 step 3 until nuBoxes-3:
+     b[t].s - b[t+3].n = (0, vertSpace);
+   endfor
+   b9.c = 0.5[b1.c,b2.c];
+   
+   % Draw the boxes.
+   for t = 0 upto nuBoxes+1:
+     if unknown(b[t].c):
+       show t;
+       show b[t].c;
+     fi
+   endfor
+ 
+   for t = 0 upto 2:
+     drawunboxed(b[t]);
+   endfor
+   for t = 3 upto nuBoxes-1:
+     drawboxed(b[t]);
+   endfor
+   drawunboxed(b9);
+   
+   % Label the boxes.
+   label.top(btex continuous field etex, b3.n);
+   label.top(btex discrete field etex, b4.n);
+   label.top(btex \pooma\ container etex, b5.n);
+   label.top(btex mathematical array etex, b7.n);
+ %  label.top(btex custom implementation etex, b8.n);
+ 
+   % Draw the arrows.
+   vardef drawAndLabelArrow(expr start, stop, txt, parr) =
+     path p; p = start -- stop;
+     drawarrow p;
+     label.top(txt rotated angle (direction parr of p), point parr of p);
+   enddef;
+   vardef drawAndLabelArrowDashed(expr start, stop, txt, parr) =
+     path p; p = start -- stop;
+     drawarrow p dashed evenly;
+     label.top(txt rotated angle (direction parr of p), point parr of p);
+   enddef;
+ %  drawAndLabelArrowDashed(b4.e, b8.w, btex etex, 0.5);
+ %  drawAndLabelArrowDashed(b7.e, b8.w, btex etex, 0.5);
+   drawAndLabelArrow(b3.e, b4.w, btex discretization etex, 0.5);
+   drawAndLabelArrow(b4.e, b5.w, btex \type{Field} etex, 0.5);
+   drawAndLabelArrow(b6.e, b7.w, btex \begin{tabular}{c} numerical\\ analysis \end{tabular} etex, 0.5);
+   drawAndLabelArrow(b7.e, b5.w, btex \type{Array} etex, 0.3);
+   
+ endfig;
+ 
+ bye