Manual: Wordsmithing Changes to First Four Chapters

Thu Jan 24 05:18:51 UTC 2002

2002-Jan-23  Jeffrey D. Oldham  <oldham at codesourcery.com>

        These changes move the manual toward delivery.  Unfinished
        sections remain in the DocBook source code but are not printed.
        The first two chapters were indexed.  Wordsmithing throughout the
        document (hopefully) improved the exposition.

        * arrays.xml: Wordsmithing.  Finish describing the Domain use
        section.
        * concepts.xml: Wordsmithing.
        * data-parallel.xml: Rewrite "naive".
        * glossary.xml: Add some indexing.
        (stride): Fix definition.
        (template): New definition.
        * introduction.xml: Index.  Wordsmith.
        * manual.xml: Add 'unfinished', 'temporary', and 'naive', and
        'naivecap' entities.  Change names of entities that used capital
        letters.  Write short description of container views.  Comment out
        unfinished sections.
        * template.xml: Index.  s/<</&openopen;/g.  Wordsmith.
        * tutorial.xml: Wordsmith.

Applied to	mainline.
Approved by	Bill Clinton.

Thanks,
Jeffrey D. Oldham
oldham at codesourcery.com
-------------- next part --------------
Index: arrays.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/arrays.xml,v
retrieving revision 1.2
diff -c -p -r1.2 arrays.xml
*** arrays.xml	2002/01/22 15:48:49	1.2
--- arrays.xml	2002/01/24 04:56:31
***************
*** 1,3 ****
--- 1,4 ----
+ <!-- FIXME: Index this file. -->
    <chapter id="arrays">
     <title>&array; Containers</title>

***************
*** 961,967 ****
       <para>Since an &array; can be queried for its domain, we briefly
       describe some &domain; operations.  A fuller description,
       including arithmetic operations, occur in <xref
!      linkend="views"></xref>.</para>

       <table frame="none" colsep="0" rowsep="0" tocentry="1"
  	   orient="port" pgwide="0" id="arrays-domains-use-table">
--- 962,973 ----
       <para>Since an &array; can be queried for its domain, we briefly
       describe some &domain; operations.  A fuller description,
       including arithmetic operations, occur in <xref
!      linkend="views"></xref>.  As we mentioned in <xref
!      linkend="arrays-domains-declarations"></xref>, the <filename
!      class="headerfile">Pooma/Domains.h</filename> header file
!      declares &domain;s, but most container header files automatically
!      include <filename class="headerfile">Pooma/Domains.h</filename>
!      so no explicit inclusion of is usually necessary.</para>

       <table frame="none" colsep="0" rowsep="0" tocentry="1"
  	   orient="port" pgwide="0" id="arrays-domains-use-table">
***************
*** 976,983 ****
         </thead>
         <tfoot>
  	<row>
! 	 <entry>Other &domain; accessors are described in <xref
! 	 linkend="views"></xref>.</entry>
  	</row>
         </tfoot>
         <tbody>
--- 982,990 ----
         </thead>
         <tfoot>
  	<row>
! 	 <entry><type>D</type> abbreviates the particular &domain;
! 	 type, e.g., &interval; or &grid;.  Other &domain; accessors
! 	 are described in <xref linkend="views"></xref>.</entry>
  	</row>
         </tfoot>
         <tbody>
***************
*** 1037,1053 ****

       <para>&domain; member functions are listed in <xref
       linkend="arrays-domains-use-table"></xref>.  Functions applicable
!      to one-dimensional and multidimensional &domain;s are listed
       before functions that only applicable to one-dimensional
       &domain;s.  The <methodname>size</methodname> member function
       yields the total number of indices in a given &domain;.  If and
       only if this number is zero, <methodname>empty</methodname> will
       yield &true;.  A multidimensional
       <type>domain<&dim;></type> is the direct product of &dim;
!      one-dimensional &domain;s.
! 
! HERE</para>
!    </section>
     </section>

--- 1044,1087 ----

       <para>&domain; member functions are listed in <xref
       linkend="arrays-domains-use-table"></xref>.  Functions applicable
!      to both one-dimensional and multidimensional &domain;s are listed
       before functions that only applicable to one-dimensional
       &domain;s.  The <methodname>size</methodname> member function
       yields the total number of indices in a given &domain;.  If and
       only if this number is zero, <methodname>empty</methodname> will
       yield &true;.  A multidimensional
       <type>domain<&dim;></type> is the direct product of &dim;
!      one-dimensional &domain;s.  The <methodname>operator[](int
!      dimension)</methodname> operator extracts the one-dimensional
!      &domain; corresponding to its parameter.  For example, the three
!      <type>Range<1></type> (one-dimensional) &domain;s can be
!      extracted from a <type>Range<3></type>
!      object <varname>r</varname> using
!      <statement>r[0]</statement>, <statement>r[1]</statement>, and
!      <statement>r[2]</statement>.</para>
! 
!      <para>&domain; accessors applicable only to one-dimensional
!      &domain;s are listed in the second half of <xref
!      linkend="arrays-domains-use-table"></xref>.  The
!      <methodname>length</methodname> member function, analogous to the
!      multidimensional <methodname>size</methodname> function, returns
!      the number of indices in the &domain;.  The
!      <methodname>first</methodname> and <methodname>last</methodname>
!      member functions return the domain's beginning and ending
!      indices.  The <methodname>begin</methodname> and
!      <methodname>end</methodname> member functions return input
!      iterators pointing to these respective locations.  They have type
!      <type>D<1>::iterator</type>, where <type>D</type>
!      abbreviates the &domain;'s type, e.g., &interval; or &grid;.
!      <!-- FIXME: Do I need to explain input iterators and their use?
!      --> The <methodname>min</methodname> and
!      <methodname>max</methodname> member functions return the minimum
!      and maximum indices in the &domain; object, respectively.  For
!      &locone; and &intervalone;, these are the same as
!      <methodname>first</methodname> and <methodname>last</methodname>,
!      but &rangeone; and &gridone; can have their largest index at the
!      beginning of their &domain;s.</para>
!     </section>
     </section>

*************** std::cout &openopen; a.read(2,-2) &openo
*** 1814,1820 ****

     <section id="arrays-dynamic_arrays">
!     <title>&dynamicarray;s: Dynamically Changing Domain Sizes</title>

      <para>&array;s have fixed domains so the set of valid indices
      remains fixed after declaration.  The &dynamicarray; class
--- 1848,1854 ----

     <section id="arrays-dynamic_arrays">
!     <title>&dynamicarray;s</title>

      <para>&array;s have fixed domains so the set of valid indices
      remains fixed after declaration.  The &dynamicarray; class
Index: concepts.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v
retrieving revision 1.7
diff -c -p -r1.7 concepts.xml
*** concepts.xml	2002/01/22 15:48:49	1.7
--- concepts.xml	2002/01/24 04:56:32
***************
*** 1,3 ****
--- 1,4 ----
+ <!-- FIXME: Index this file. -->
  <chapter id="concepts">
   <title>Overview of &pooma; Concepts</title>

***************
*** 13,22 ****
   separate categories:
   <variablelist>
    <varlistentry>
!     <term>container</term>
      <listitem>
!      <para>data structure holding one or more values and usually addressed
!      by indices</para>
      </listitem>
     </varlistentry>
     <varlistentry>
--- 14,23 ----
   separate categories:
   <variablelist>
    <varlistentry>
!     <term>containers</term>
      <listitem>
!      <para>data structures holding one or more values and usually accessed
!      using indices</para>
      </listitem>
     </varlistentry>
     <varlistentry>
***************
*** 34,41 ****
      </listitem>
     </varlistentry>
    </variablelist>
!   See <xref linkend="concepts-table"></xref>.  Many &pooma; programs
!   select one possibility from each column.  For example, <xref
    linkend="tutorial-array_stencil-doof2d"></xref> used &array;
    containers and stencils for sequential computation, while <xref
    linkend="tutorial-field_distributed-doof2d"></xref> used &field;
--- 35,43 ----
      </listitem>
     </varlistentry>
    </variablelist>
!   <xref linkend="concepts-table"></xref> categorizes the &pooma;
!   concepts.  Many &pooma; programs select one possibility from each
!   category.  For example, <xref
    linkend="tutorial-array_stencil-doof2d"></xref> used &array;
    containers and stencils for sequential computation, while <xref
    linkend="tutorial-field_distributed-doof2d"></xref> used &field;
***************
*** 103,115 ****
    <para>Most &pooma; programs use <firstterm>containers</firstterm> to
    store groups of values.  &pooma; containers are objects that store
    other objects such as numbers or vectors.  They control allocation
!   and deallocation of and access to these stored objects.  They are a
!   generalization of &c; arrays, but &pooma; containers are first-class
!   objects so they can be used directly in expressions.  They are
!   similar to &cc; containers such as <type>vector</type>,
!   <type>list</type>, and <type>stack</type>.  See <xref
!   linkend="concepts-containers-table"></xref> for a summary of the
!   containers.</para>

    <para>This section describes many concepts, but one need not
    understand them all to begin programming with the &poomatoolkit;.
--- 105,117 ----
    <para>Most &pooma; programs use <firstterm>containers</firstterm> to
    store groups of values.  &pooma; containers are objects that store
    other objects such as numbers or vectors.  They control allocation
!   and deallocation of these stored objects and access to them.  They
!   are a generalization of &c; arrays, but &pooma; containers are
!   first-class objects so they can be used directly in expressions.
!   They are also similar to &cc; containers such as
!   <type>vector</type>, <type>list</type>, and <type>stack</type>.  See
!   <xref linkend="concepts-containers-table"></xref> for a summary of
!   the containers.</para>

    <para>This section describes many concepts, but one need not
    understand them all to begin programming with the &poomatoolkit;.
***************
*** 126,131 ****
--- 128,137 ----
    multiple processors.  The programs in the previous chapter
    illustrate many of these concepts.</para>

+   <para><xref linkend="concepts-containers-table"></xref> briefly
+   describes the six &pooma; containers.  They are more fully described
+   in the paragraphs below.</para>
+ 
    <table frame="none" colsep="0" rowsep="0" tocentry="1"
  	   orient="port" pgwide="0" id="concepts-containers-table">
     <title>&pooma; Container Summary</title>
***************
*** 201,210 ****
    to &array;s, each cell may contain multiple values and multiple
    materials.  A &field;'s <glossterm
    linkend="glossary-mesh">mesh</glossterm> stores its spatial
!   characteristics and can map yield, e.g., the cell at a particular
!   point, the distance between two cells, or a cell's normals.  A
!   &field; should be used whenever geometric or spatial computations
!   are needed, multiple values per index are desired, or a computation
    involves more than one material.</para>

  <!-- FIXME: Want firstterm around tensor. -->
--- 207,216 ----
    to &array;s, each cell may contain multiple values and multiple
    materials.  A &field;'s <glossterm
    linkend="glossary-mesh">mesh</glossterm> stores its spatial
!   characteristics and can yield, e.g., the cell at a particular point,
!   the distance between two cells, or a cell's normals.  A &field;
!   should be used whenever geometric or spatial computations are
!   needed, multiple values per index are desired, or a computation
    involves more than one material.</para>

  <!-- FIXME: Want firstterm around tensor. -->
***************
*** 230,254 ****
    multiplying a &matrix; and a &vector;.</para>

    <para>The data of an &array;, &dynamicarray;, or &field; can be
!   viewed using more than one container by taking a view.  A <glossterm
    linkend="glossary-view"><firstterm>view</firstterm></glossterm> of
    an existing container &container; is a container whose domain
!   is a subset of &container;.  The subset can equal the original
!   domain.  A view acts like a reference in that changing any of the
!   view's values also changes the original container's and vice versa.
!   While users sometimes explicitly create views, they are perhaps more
!   frequently created as temporaries in expressions.  For example, if
!   <varname>A</varname> is an &array; and <varname>I</varname> is a
!   domain, <statement>A(I) - A(I-1)</statement> uses two views to form
!   the difference between adjacent values.</para>

    <section id="concepts-containers-choosing">
     <title>Choosing a Container</title>

     <para>The two most commonly used &pooma; containers are &array;s
!    and &field;s, while &vector;, &matrix;, or &tensor; frequently
!    represent mathematical objects.  <xref
     linkend="concepts-containers-choice_table"></xref> contains a
     decision tree describing how to choose an appropriate
     container.</para>
--- 236,262 ----
    multiplying a &matrix; and a &vector;.</para>

    <para>The data of an &array;, &dynamicarray;, or &field; can be
!   accessed using more than one container by taking a view.  A
!   <glossterm
    linkend="glossary-view"><firstterm>view</firstterm></glossterm> of
    an existing container &container; is a container whose domain
!   is a subset of &container;'s domain.  The subset can equal the
!   original domain.  A view acts like a reference in that changing any
!   of the view's values also changes the original container's and vice
!   versa.  While users sometimes explicitly create views, they are
!   perhaps more frequently created as temporaries in expressions.  For
!   example, if <varname>A</varname> is an &array; and
!   <varname>I</varname> is a domain, <statement>A(I) -
!   A(I-1)</statement> uses two views to form the difference between
!   adjacent values.</para>

    <section id="concepts-containers-choosing">
     <title>Choosing a Container</title>

     <para>The two most commonly used &pooma; containers are &array;s
!    and &field;s, while &vector;, &matrix;, and &tensor; represent
!    mathematical objects.  <xref
     linkend="concepts-containers-choice_table"></xref> contains a
     decision tree describing how to choose an appropriate
     container.</para>
***************
*** 299,304 ****
--- 307,324 ----
     in declaring them.  Concepts specific to distributed computation
     are described in the next section.</para>

+    <para><xref
+    linkend="concepts-sequential_containers-declarations-dependences"></xref>
+    illustrates the containers and the concepts involved in their
+    declarations.  The containers are listed in the top row.  Lines
+    connect these containers to the components necessary for their
+    declarations.  For example, an &array; declaration requires an
+    engine and a layout.  These, in turn, can depend on other &pooma;
+    concepts.  Declarations necessary only for distributed, or
+    multiprocessor, computation are also indicated.  Given a desired
+    container, one can use this figure to determine the concepts needed
+    to declare a particular container.</para>
+ 
     <figure float="1" id="concepts-sequential_containers-declarations-dependences">
      <title>Concepts For Declaring Containers</title>
      <mediaobject>
***************
*** 311,328 ****
      </mediaobject>
     </figure>

-    <para><xref
-    linkend="concepts-sequential_containers-declarations-dependences"></xref>
-    illustrates the containers and the concepts involved in their
-    declarations.  The containers are listed in the top row.  Lines
-    connect these containers to the components necessary for their
-    declarations.  For example, an &array; declaration requires an
-    &engine; and a layout.  These, in turn, can depend on other &pooma;
-    concepts.  Declarations necessary only for distributed, or
-    multiprocessor, computation are surrounded by dashed lines.  These
-    dependences to indicate the concepts needed for a particular
-    container.</para>
- 
     <para>An <glossterm
     linkend="glossary-engine"><firstterm>engine</firstterm></glossterm>
     stores and, if necessary, computes a container's values.  A
--- 331,336 ----
***************
*** 332,338 ****
     for all indices can use a constant engine, which need only store
     one value for the entire domain.  A &compressiblebrick; &engine;
     reduces its space requirements to a constant whenever all its
!    values are the same.  The separation also permits taking <link
     linkend="glossary-view">view</link>s of containers without copying
     storage.</para>

--- 340,347 ----
     for all indices can use a constant engine, which need only store
     one value for the entire domain.  A &compressiblebrick; &engine;
     reduces its space requirements to a constant whenever all its
!    values are the same.  The separation between a container and its
!    engine also permits taking <link
     linkend="glossary-view">view</link>s of containers without copying
     storage.</para>

***************
*** 350,360 ****

     <para>A <glossterm
     linkend="glossary-layout"><firstterm>layout</firstterm></glossterm>
!    maps <link linkend="glossary-domain">domain</link> <glossterm linkend="glossary-index">indices</glossterm> to the
!    processors and computer memory used by a container's engines.  See
!    <xref
     linkend="concepts-containers-declarations-computational_implementation"></xref>.
!    A program computes a container's values using a processor and
     memory.  The layout specifies the processors and memory to use for
     each particular index.  A container's layout for a uniprocessor
     implementation consists of its domain, the processor, and its
--- 359,369 ----

     <para>A <glossterm
     linkend="glossary-layout"><firstterm>layout</firstterm></glossterm>
!    maps <link linkend="glossary-domain">domain</link> <glossterm
!    linkend="glossary-index">indices</glossterm> to the processors and
!    computer memory used by a container's engines.  See <xref
     linkend="concepts-containers-declarations-computational_implementation"></xref>.
!    A program computes a container's values using these processors and
     memory.  The layout specifies the processors and memory to use for
     each particular index.  A container's layout for a uniprocessor
     implementation consists of its domain, the processor, and its
***************
*** 378,388 ****
     interval [0,n).  A domain need not contain all integral points
     between its endpoints.  A <glossterm
     linkend="glossary-stride"><firstterm>stride</firstterm></glossterm>
!    is a subset of an interval consisting of regularly-spaced points.
!    A <glossterm
     linkend="glossary-range"><firstterm>range</firstterm></glossterm>
     is a subset of an interval of regularly-spaced points specified by
!    strides.</para>

     <para>A &field;'s <glossterm
     linkend="glossary-mesh"><firstterm>mesh</firstterm></glossterm>
--- 387,396 ----
     interval [0,n).  A domain need not contain all integral points
     between its endpoints.  A <glossterm
     linkend="glossary-stride"><firstterm>stride</firstterm></glossterm>
!    indicates a regular spacing between points.  A <glossterm
     linkend="glossary-range"><firstterm>range</firstterm></glossterm>
     is a subset of an interval of regularly-spaced points specified by
!    a stride.</para>

     <para>A &field;'s <glossterm
     linkend="glossary-mesh"><firstterm>mesh</firstterm></glossterm>
***************
*** 399,422 ****
     linkend="glossary-point">point</link> in &space; corresponding to
     the cell in the lower, left corner of its <link
     linkend="glossary-domain">domain</link>.  Combining this, the
!    domain, and the cell size fully specifies the mesh's map from
!    indices to &space;.</para>

     <para>A mesh's <glossterm
     linkend="glossary-cell_size"><firstterm>cell
!    size</firstterm></glossterm> specifies the spatial dimensions of
!    a &field; <link linkend="glossary-cell">cell</link>, e.g., its
!    width, height, and depth, in &space;.  Combining this, the
!    domain, and the corner position fully specifies the mesh's map
!    from indices to &space;.</para>
    </section>

    <section id="concepts-containers-distributed_declarations">
     <title>Declaring Distributed Containers</title>

!    <para>In the previous section, we introduced the concepts important
!    when declaring containers for use on uniprocessor computers.  When
     using multiprocessor computers, we augment these concepts with
     those for distributed computation.  Reading this section is
     important only for running a program on multiple processors.  Many
--- 407,430 ----
     linkend="glossary-point">point</link> in &space; corresponding to
     the cell in the lower, left corner of its <link
     linkend="glossary-domain">domain</link>.  Combining this, the
!    domain, and the cell size can specify the mesh's map from indices
!    to &space;.</para>

     <para>A mesh's <glossterm
     linkend="glossary-cell_size"><firstterm>cell
!    size</firstterm></glossterm> specifies the spatial dimensions of a
!    &field; <link linkend="glossary-cell">cell</link>, e.g., its width,
!    height, and depth, in &space;.  Combining this, the domain,
!    and the corner position can specify the mesh's map from indices to
!    &space;.</para>
    </section>

    <section id="concepts-containers-distributed_declarations">
     <title>Declaring Distributed Containers</title>

!    <para>In the previous section, we introduced the important concepts
!    for declaring containers for use on uniprocessor computers.  When
     using multiprocessor computers, we augment these concepts with
     those for distributed computation.  Reading this section is
     important only for running a program on multiple processors.  Many
***************
*** 457,463 ****
     linkend="glossary-external_guard_layer"><firstterm>external guard
     layer</firstterm></glossterm> specifies values surrounding the
     entire domain.  Its presence eases computation along the domain's
!    edges by permitting the same computations as for more internal
     computations.  An <glossterm
     linkend="glossary-internal_guard_layer"><firstterm>internal guard
     layer</firstterm></glossterm> duplicates values from adjacent
--- 465,471 ----
     linkend="glossary-external_guard_layer"><firstterm>external guard
     layer</firstterm></glossterm> specifies values surrounding the
     entire domain.  Its presence eases computation along the domain's
!    edges by permitting the same computations as for more-internal
     computations.  An <glossterm
     linkend="glossary-internal_guard_layer"><firstterm>internal guard
     layer</firstterm></glossterm> duplicates values from adjacent
***************
*** 488,503 ****

    <para>&pooma; computations can be expressed using a variety of
    modes.  Many &pooma; computations involve &array; or &field;
!   containers, but how their values are accessed and the associated
!   algorithms using them varies.  For example, element-wise computation
    involves explicitly accessing a container's values.  A data-parallel
!   computation uses expressions to represent larger subsets of a
!   container's values.  Stencil-based computations express a
!   computation as repeatedly applying a local computation to each
!   element of an array.  A relation among containers establishes a
!   dependency among them so the values of one container are updated
!   whenever any other's values change.  A program may use any or all of
!   these styles, which are described below.</para>

    <para><glossterm
    linkend="glossary-element_wise"><firstterm>Element-wise</firstterm></glossterm>
--- 496,511 ----

    <para>&pooma; computations can be expressed using a variety of
    modes.  Many &pooma; computations involve &array; or &field;
!   containers, but how their values are accessed and how the associated
!   algorithms use them varies.  For example, element-wise computation
    involves explicitly accessing a container's values.  A data-parallel
!   computation operates on larger subsets of a container's values.
!   Stencil-based computations express a computation as repeatedly
!   applying a local computation to each element of an array.  A
!   relation among containers establishes a dependency among them so the
!   values of one container are updated whenever any other's values
!   change.  A program may use any or all of these styles, which are
!   described below.</para>

    <para><glossterm
    linkend="glossary-element_wise"><firstterm>Element-wise</firstterm></glossterm>
***************
*** 515,524 ****
    linkend="tutorial-array_parallel-doof2d"></xref>,
    <statement>a(I,J)</statement> represents the subset of &array;
    <varname>a</varname>'s values having coordinates in the domain
!   specified by the one-dimensional &interval;s <varname>I</varname>
!   and <varname>J</varname>.  Using data-parallel expressions
!   frequently eliminates the need for writing explicit loops in
!   code.</para>

    <para>A <glossterm
    linkend="glossary-stencil"><firstterm>stencil</firstterm></glossterm>
--- 523,532 ----
    linkend="tutorial-array_parallel-doof2d"></xref>,
    <statement>a(I,J)</statement> represents the subset of &array;
    <varname>a</varname>'s values having coordinates in the domain
!   specified by the direct product of one-dimensional &interval;s
!   <varname>I</varname> and <varname>J</varname>.  Using
!   data-parallel expressions frequently eliminates the need for writing
!   explicit loops.</para>

    <para>A <glossterm
    linkend="glossary-stencil"><firstterm>stencil</firstterm></glossterm>
***************
*** 550,557 ****
   <section id="concepts-computation_environment">
    <title>Computation Environment</title>

!   <para>A &pooma; program can execute on a wide variety of computers.
!    The default <glossterm
     linkend="glossary-sequential"><firstterm>sequential computing
     environment</firstterm></glossterm> consists of one processor and
     its associated memory, as found on a personal computer.  In
--- 558,565 ----
   <section id="concepts-computation_environment">
    <title>Computation Environment</title>

!   <para>The same &pooma; program can execute on a wide variety of
!    computers.  The default <glossterm
     linkend="glossary-sequential"><firstterm>sequential computing
     environment</firstterm></glossterm> consists of one processor and
     its associated memory, as found on a personal computer.  In
***************
*** 574,580 ****
        library.</para>
       </listitem>
       <listitem>
!       <para>The &pooma; executable must be run using the library.</para>
      </listitem>
     </orderedlist>
     All of these were illustrated in <xref
--- 582,589 ----
        library.</para>
       </listitem>
       <listitem>
!       <para>The &pooma; executable must be run using the
!       communications library.</para>
      </listitem>
     </orderedlist>
     All of these were illustrated in <xref
***************
*** 611,619 ****
     contexts, all of which is hidden from both the programmer and the
     user.  &pooma; works with the Message Passing Interface (&mpi;)
     Communications Library 
! <!-- FIXME: xref linkend="mpi99" -->
!    (<ulink url="http://www-unix.mcs.anl.gov/mpi/"></ulink>) and the &mm;
!    Shared Memory Library.  See <xref
!    linkend="installation-distributed_computing"></xref> for details.</para>
    </section>
  </chapter>
--- 620,628 ----
     contexts, all of which is hidden from both the programmer and the
     user.  &pooma; works with the Message Passing Interface (&mpi;)
     Communications Library 
! <!-- FIXME: xref linkend="mpi99" (<ulink url="http://www-unix.mcs.anl.gov/mpi/"></ulink>) -->
!    and the &mm; Shared Memory Library.  See <xref
!    linkend="installation-distributed_computing"></xref> for
!    details.</para>
    </section>
  </chapter>
Index: data-parallel.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/data-parallel.xml,v
retrieving revision 1.1
diff -c -p -r1.1 data-parallel.xml
*** data-parallel.xml	2002/01/14 17:33:33	1.1
--- data-parallel.xml	2002/01/24 04:56:34
***************
*** 14,20 ****

     <para>After introducing data-parallel expressions and statements,
     we present the corresponding &pooma; syntax.  Then we present its
!    implementation, which uses expression-template technology.  A naive
     data-parallel implementation might generate temporary variables,
     cluttering a program's inner loops and slowing its execution.
     Instead, &pooma; uses &pete, the Portable Expression Template
--- 14,20 ----

     <para>After introducing data-parallel expressions and statements,
     we present the corresponding &pooma; syntax.  Then we present its
!    implementation, which uses expression-template technology.  A &naive;
     data-parallel implementation might generate temporary variables,
     cluttering a program's inner loops and slowing its execution.
     Instead, &pooma; uses &pete, the Portable Expression Template
***************
*** 51,57 ****
      height h and to an entire field of particles with
      masses m and heights h.  Our algorithm works with
      data-parallel syntax, and we would like to write the corresponding
!     computer program using data-parallel syntax as well..</para>
     </section>

--- 51,57 ----
      height h and to an entire field of particles with
      masses m and heights h.  Our algorithm works with
      data-parallel syntax, and we would like to write the corresponding
!     computer program using data-parallel syntax as well.</para>
     </section>

*************** std::cout << A-B << std::endl;
*** 881,887 ****

      <para>Data-parallel statements involving containers occur
      frequently in the inner loops of scientific programs so their
!     efficient execution is important.  A naive implementation for
      these statements may create and destroy containers holding
      intermediate values, slowing execution considerably.
      In 1995, Todd <!-- FIXME: Add citations to vandevoorde-95 and
--- 881,887 ----

      <para>Data-parallel statements involving containers occur
      frequently in the inner loops of scientific programs so their
!     efficient execution is important.  A &naive; implementation for
      these statements may create and destroy containers holding
      intermediate values, slowing execution considerably.
      In 1995, Todd <!-- FIXME: Add citations to vandevoorde-95 and
*************** std::cout << A-B << std::endl;
*** 894,900 ****
      framework, is also available separately from &pooma; at
      <ulink url="http://www.acl.lanl.gov/pete/"></ulink>.</para>

!     <para>In this section, we first describe how a naive
      implementation may slow execution.  Then, we describe &pete;'s
      faster implementation.  A data-parallel statement is converted
      into a parse tree, rather than immediately evaluating it.  The
--- 894,900 ----
      framework, is also available separately from &pooma; at
      <ulink url="http://www.acl.lanl.gov/pete/"></ulink>.</para>

!     <para>In this section, we first describe how a &naive;
      implementation may slow execution.  Then, we describe &pete;'s
      faster implementation.  A data-parallel statement is converted
      into a parse tree, rather than immediately evaluating it.  The
*************** std::cout << A-B << std::endl;
*** 909,918 ****
      types are traversed and code is produced without the need for any
      intermediate values.  We present the implementation in <xref
  								 linkend="data_parallel-implementation-pete"></xref>, but first we
!     explain the difficulties caused by the naive implementation.</para>

      <section id="data_parallel-implementation-naive">
!      <title>Naive Implementation</title>

       <para>A conventional implementation to evaluate data-parallel
       expressions might overload arithmetic operator functions.
--- 909,918 ----
      types are traversed and code is produced without the need for any
      intermediate values.  We present the implementation in <xref
  								 linkend="data_parallel-implementation-pete"></xref>, but first we
!     explain the difficulties caused by the &naive; implementation.</para>

      <section id="data_parallel-implementation-naive">
!      <title>&naivecap; Implementation</title>

       <para>A conventional implementation to evaluate data-parallel
       expressions might overload arithmetic operator functions.
Index: glossary.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v
retrieving revision 1.7
diff -c -p -r1.7 glossary.xml
*** glossary.xml	2002/01/22 15:48:49	1.7
--- glossary.xml	2002/01/24 04:56:35
***************
*** 99,105 ****
    <glossentry id="glossary-compile_time">
     <glossterm>compile time</glossterm>
     <glossdef>
!     <para>time in the process from writing a program to executing it
      when the program is compiled by a compiler.  This is also called
      <firstterm>compilation time</firstterm>.</para>
      <glossseealso otherterm="glossary-programming_time">programming time</glossseealso>
--- 99,113 ----
    <glossentry id="glossary-compile_time">
     <glossterm>compile time</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-compile_time">
!      <primary>compile time</primary>
!     </indexterm>
!     <indexterm>
!      <primary>compilation time</primary>
!      <see>compile time.</see>
!     </indexterm>
!     in the process from writing a program to executing it, the time
      when the program is compiled by a compiler.  This is also called
      <firstterm>compilation time</firstterm>.</para>
      <glossseealso otherterm="glossary-programming_time">programming time</glossseealso>
***************
*** 306,316 ****
     </glossdef>
    </glossentry>

- <!-- HERE -->
    <glossentry id="glossary-enumeration">
     <glossterm>enumeration</glossterm>
     <glossdef>
!     <para>distinct &cc; integral type with named constants.  These are
      frequently used in template programming because they can be used
      as template arguments.</para>
     </glossdef>
--- 314,327 ----
     </glossdef>
    </glossentry>

    <glossentry id="glossary-enumeration">
     <glossterm>enumeration</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-enumeration">
!      <primary>enumeration</primary>
!     </indexterm>
!     &cc; integral type with named constants.  These are
      frequently used in template programming because they can be used
      as template arguments.</para>
     </glossdef>
***************
*** 324,330 ****
    <glossentry id="glossary-external_guard_layer">
     <glossterm>external guard layer</glossterm>
     <glossdef>
!     <para><link linkend="glossary-guard_layer">guard layer</link>
      surrounding a container's domain used to ease computation along
      the domain's edges by permitting the same computations as for
      more internal computations.  It is an optimization, not required
--- 335,350 ----
    <glossentry id="glossary-external_guard_layer">
     <glossterm>external guard layer</glossterm>
     <glossdef>
!     <para>
!      <indexterm zone="glossary-external_guard_layer">
!       <primary>guard layer</primary>
!       <secondary>external</secondary>
!      </indexterm>
!      <indexterm>
!       <primary>external guard layer</primary>
!       <see>guard layer, external.</see> 
!      </indexterm>
!     <link linkend="glossary-guard_layer">guard layer</link>
      surrounding a container's domain used to ease computation along
      the domain's edges by permitting the same computations as for
      more internal computations.  It is an optimization, not required
***************
*** 382,394 ****
    <glossentry id="glossary-function_template">
     <glossterm>function template</glossterm>
     <glossdef>
!     <para>a definition of an unbounded set of related functions, all
!     having the same name but whose parameter types can depend on
!     template parameters.  They are particularly useful when
!     overloading <glossterm
!     linkend="glossary-operator_function">operator
!     functions</glossterm> to accept parameters that themselves depend
!     on templates.</para>
     </glossdef>
    </glossentry>
   </glossdiv>
--- 402,434 ----
    <glossentry id="glossary-function_template">
     <glossterm>function template</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-function_template">
!      <primary>function</primary>
!      <secondary>template</secondary>
!     </indexterm>
!     a definition of an unbounded set of related functions, all having
!     the same name but whose types can depend on template parameters.
!     They are particularly useful when overloading
!     <indexterm>
!      <primary>overloaded function</primary>
!      <see>function, overloaded.</see>
!     </indexterm>
!     <indexterm>
!      <primary>function</primary>
!      <secondary>overloaded</secondary>
!     </indexterm>
!     <glossterm linkend="glossary-operator_function">operator
!     functions</glossterm>
!     <indexterm>
!      <primary>operator function</primary>
!      <see>function, operator.</see>
!     </indexterm>
!     <indexterm>
!      <primary>function</primary>
!      <secondary>operator</secondary>
!     </indexterm>
!     to accept parameters that themselves depend on templates.</para>
     </glossdef>
    </glossentry>
   </glossdiv>
***************
*** 399,417 ****
    <glossentry id="glossary-guard_layer">
     <glossterm>guard layer</glossterm>
     <glossdef>
!     <para>domain surrounding each patch of a container's domain.  It
      contains read-only values.  <link
      linkend="glossary-external_guard_layer">External guard
      layer</link>s ease programming, while <link
      linkend="glossary-internal_guard_layer">internal guard
      layer</link>s permit each patch's computation to be occur without
      copying values from adjacent patches.  They are optimizations, not
!     required for program correctness.</para> <glossseealso
!     otherterm="glossary-external_guard_layer">external guard
!     layer</glossseealso> <glossseealso
!     otherterm="glossary-internal_guard_layer">internal guard
!     layer</glossseealso> <glossseealso
!     otherterm="glossary-partition">partition</glossseealso>
      <glossseealso otherterm="glossary-patch">patch</glossseealso>
      <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
--- 439,460 ----
    <glossentry id="glossary-guard_layer">
     <glossterm>guard layer</glossterm>
     <glossdef>
!     <para>
!      <indexterm zone="glossary-guard_layer">
!       <primary>guard layer</primary>
!      </indexterm>
!     domain surrounding each patch of a container's domain.  It
      contains read-only values.  <link
      linkend="glossary-external_guard_layer">External guard
      layer</link>s ease programming, while <link
      linkend="glossary-internal_guard_layer">internal guard
      layer</link>s permit each patch's computation to be occur without
      copying values from adjacent patches.  They are optimizations, not
!     required for program correctness.</para>
!     <glossseealso otherterm="glossary-external_guard_layer">external guard layer</glossseealso>
!     <glossseealso otherterm="glossary-internal_guard_layer">internal
! guard layer</glossseealso>
!     <glossseealso otherterm="glossary-partition">partition</glossseealso>
      <glossseealso otherterm="glossary-patch">patch</glossseealso>
      <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
***************
*** 448,454 ****
    <glossentry id="glossary-internal_guard_layer">
     <glossterm>internal guard layer</glossterm>
     <glossdef>
!     <para><link linkend="glossary-guard_layer">guard layer</link>
      containing copies of adjacent patches' values.  These copies can
      permit an individual patch's computation to occur without asking
      adjacent patches for values.  This can speed computation but are
--- 491,506 ----
    <glossentry id="glossary-internal_guard_layer">
     <glossterm>internal guard layer</glossterm>
     <glossdef>
!     <para>
!      <indexterm zone="glossary-internal_guard_layer">
!       <primary>guard layer</primary>
!       <secondary>internal</secondary>
!      </indexterm>
!      <indexterm>
!       <primary>internal guard layer</primary>
!       <see>guard layer, internal.</see> 
!      </indexterm>
!     <link linkend="glossary-guard_layer">guard layer</link>
      containing copies of adjacent patches' values.  These copies can
      permit an individual patch's computation to occur without asking
      adjacent patches for values.  This can speed computation but are
***************
*** 498,503 ****
--- 550,560 ----
   <glossdiv id="glossary-m">
    <title>M</title>

+   <glossentry id="glossary-real_matrix">
+    <glossterm>matrix</glossterm>
+    <glosssee otherterm="glossary-matrix"></glosssee>
+   </glossentry>
+ 
    <glossentry id="glossary-mesh">
     <glossterm>mesh</glossterm>
     <glossdef>
***************
*** 516,524 ****
    <glossentry id="glossary-operator_function">
     <glossterm>operator function</glossterm>
     <glossdef>
!     <para>function defining an operator's code.  For example,
!     <function>operator+</function> defines the result of using the
!     <operator>+</operator>.</para>
     </glossdef>
    </glossentry>
   </glossdiv>
--- 573,586 ----
    <glossentry id="glossary-operator_function">
     <glossterm>operator function</glossterm>
     <glossdef>
!     <para>
!      <indexterm zone="glossary-operator_function">
!       <primary>function</primary>
!       <secondary>operator</secondary>
!      </indexterm>
!     function defining a function invoked using a &cc; operator.  For
!     example, the <function>operator+</function> function defines the
!     result of using the <operator>+</operator>.</para>
     </glossdef>
    </glossentry>
   </glossdiv>
***************
*** 545,551 ****
    <glossentry id="glossary-patch">
     <glossterm>patch</glossterm>
     <glossdef>
!     <para>subset of a container's domain with values computed by a
      particular context.  A partition splits a domain into patches.  It
      may be surrounded by external and internal guard layers.</para>
      <glossseealso otherterm="glossary-partition">partition</glossseealso>
--- 607,617 ----
    <glossentry id="glossary-patch">
     <glossterm>patch</glossterm>
     <glossdef>
!     <para>
!      <indexterm zone="glossary-patch">
!       <primary>patch</primary>
!      </indexterm>
!     subset of a container's domain with values computed by a
      particular context.  A partition splits a domain into patches.  It
      may be surrounded by external and internal guard layers.</para>
      <glossseealso otherterm="glossary-partition">partition</glossseealso>
***************
*** 568,574 ****
    <glossentry id="glossary-programming_time">
     <glossterm>programming time</glossterm>
     <glossdef>
!     <para>time in the process from writing a program to executing it
      when the program is being written by a programmer.</para>
      <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
      <glossseealso otherterm="glossary-run_time">run time</glossseealso>
--- 634,644 ----
    <glossentry id="glossary-programming_time">
     <glossterm>programming time</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-programming_time">
!      <primary>programming time</primary>
!     </indexterm>
!     in the process from writing a program to executing it, the time
      when the program is being written by a programmer.</para>
      <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
      <glossseealso otherterm="glossary-run_time">run time</glossseealso>
***************
*** 613,619 ****
    <glossentry id="glossary-run_time">
     <glossterm>run time</glossterm>
     <glossdef>
!     <para>time in the process from writing a program to executing it
      when the program is executed.  This is also called
      <firstterm>execution time</firstterm>.</para>
      <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
--- 683,697 ----
    <glossentry id="glossary-run_time">
     <glossterm>run time</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-run_time">
!      <primary>run time</primary>
!     </indexterm>
!     <indexterm>
!      <primary>execution time</primary>
!      <see>run time.</see>
!     </indexterm>
!     in the process from writing a program to executing it, the time
      when the program is executed.  This is also called
      <firstterm>execution time</firstterm>.</para>
      <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
***************
*** 654,665 ****
    <glossentry id="glossary-stride">
     <glossterm>stride</glossterm>
     <glossdef>
!     <para>a subset of regularly-spaced points in an integral
!     interval.  For example, the set of points a, a+2, a+4, …,
!     b-2, b is specified by [a,b] with stride 2.  It is a
!     domain.</para>
      <glossseealso otherterm="glossary-range">range</glossseealso>
!     <glossseealso otherterm="glossary-interval">interval</glossseealso>
      <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
    </glossentry>
--- 732,743 ----
    <glossentry id="glossary-stride">
     <glossterm>stride</glossterm>
     <glossdef>
!     <para>spacing between regularly-spaced points in a domain.  For
!     example, the set of points a, a+2, a+4, …, b-2, b is
!     specified by [a,b] with stride 2.  It is a domain.</para>
      <glossseealso otherterm="glossary-range">range</glossseealso>
!     <glossseealso
!     otherterm="glossary-interval">interval</glossseealso>
      <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
    </glossentry>
***************
*** 681,690 ****
   <glossdiv id="glossary-t">
    <title>T</title>

    <glossentry id="glossary-template_instantiation">
     <glossterm>template instantiation</glossterm>
     <glossdef>
!     <para>applying a template class to template parameters to create a
      type.  For example, <statement>foo<double,3></statement>
      instantiates <statement>template <typename T, int n> class
      foo</statement> with the type &double; and the constant
--- 759,788 ----
   <glossdiv id="glossary-t">
    <title>T</title>

+   <glossentry id="glossary-template">
+    <glossterm>template</glossterm>
+    <glossdef>
+     <para>
+     <indexterm zone="glossary-template">
+      <primary>template</primary>
+     </indexterm>
+     class or function definition having template parameters.
+     These parameters' values are used at compile time, not run time,
+     so they may include types and other compile-time values.
+     <!-- FIXME: Strengthen this definition. --></para>
+     <glossseealso otherterm="glossary-template_instantiation">template instantiation</glossseealso>
+     <glossseealso otherterm="glossary-template_specialization">template specialization</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
    <glossentry id="glossary-template_instantiation">
     <glossterm>template instantiation</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-template_instantiation">
!      <primary>template instantiation</primary>
!     </indexterm>
!     applying a template class to template parameter arguments to create a
      type.  For example, <statement>foo<double,3></statement>
      instantiates <statement>template <typename T, int n> class
      foo</statement> with the type &double; and the constant
***************
*** 696,702 ****
    <glossentry id="glossary-template_specialization">
     <glossterm>template specialization</glossterm>
     <glossdef>
!     <para>class or function definition for a particular (special)
      subset of template arguments.</para>
     </glossdef>
    </glossentry>
--- 794,804 ----
    <glossentry id="glossary-template_specialization">
     <glossterm>template specialization</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-template_specialization">
!      <primary>template specialization</primary>
!     </indexterm>
!     class or function definition for a particular (special)
      subset of template arguments.</para>
     </glossdef>
    </glossentry>
***************
*** 724,730 ****
    <glossentry id="glossary-trait">
     <glossterm>trait</glossterm>
     <glossdef>
!     <para>a characteristic of a type.</para>
      <glossseealso otherterm="glossary-traits_class">traits class</glossseealso>
     </glossdef>
    </glossentry>
--- 826,836 ----
    <glossentry id="glossary-trait">
     <glossterm>trait</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-trait">
!      <primary>trait</primary>
!     </indexterm>
!     a characteristic of a type.</para>
      <glossseealso otherterm="glossary-traits_class">traits class</glossseealso>
     </glossdef>
    </glossentry>
***************
*** 732,739 ****
    <glossentry id="glossary-traits_class">
     <glossterm>traits class</glossterm>
     <glossdef>
!     <para>a class containing one or more traits all describing a
!     particular type's chacteristics.</para>
      <glossseealso otherterm="glossary-trait">trait</glossseealso>
     </glossdef>
    </glossentry>
--- 838,849 ----
    <glossentry id="glossary-traits_class">
     <glossterm>traits class</glossterm>
     <glossdef>
!     <para>
!     <indexterm zone="glossary-traits_class">
!      <primary>traits class</primary>
!     </indexterm>
!     a class containing one or more traits all describing a particular
!     type's chacteristics.</para>
      <glossseealso otherterm="glossary-trait">trait</glossseealso>
     </glossdef>
    </glossentry>
***************
*** 771,783 ****
    <glossentry id="glossary-view">
     <glossterm>view of a container</glossterm>
     <glossdef>
!     <para>a container derived from another.  The former's domain is a
      subset of the latter's, but, where the domains intersect,
      accessing a value through the view is the same as accessing it
      through the original container.  In Fortran 90, these are
      called array sections.  Only &array;s, &dynamicarray;s, and
!     &field;s support views.</para> <glossseealso
!     otherterm="glossary-container">container</glossseealso>
     </glossdef>
    </glossentry>
   </glossdiv>
--- 881,893 ----
    <glossentry id="glossary-view">
     <glossterm>view of a container</glossterm>
     <glossdef>
!     <para>a container derived from another.  The view's domain is a
      subset of the latter's, but, where the domains intersect,
      accessing a value through the view is the same as accessing it
      through the original container.  In Fortran 90, these are
      called array sections.  Only &array;s, &dynamicarray;s, and
!     &field;s support views.</para>
!     <glossseealso otherterm="glossary-container">container</glossseealso>
     </glossdef>
    </glossentry>
   </glossdiv>
Index: introduction.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/introduction.xml,v
retrieving revision 1.3
diff -c -p -r1.3 introduction.xml
*** introduction.xml	2002/01/14 17:33:34	1.3
--- introduction.xml	2002/01/24 04:56:35
***************
*** 20,32 ****
     </listitem>
     <listitem>
      <para>automatic creation of all interprocessor communication for
!     parallel and distributed programs</para>
     </listitem>
     <listitem>
-     <para>several container storage classes to reduce a program's
-     storage requirements, and</para>
-    </listitem>
-    <listitem>
      <para>automatic out-of-order execution and loop rearrangement
      for fast program execution.</para>
     </listitem>
--- 20,28 ----
     </listitem>
     <listitem>
      <para>automatic creation of all interprocessor communication for
!     parallel and distributed programs, and</para>
     </listitem>
     <listitem>
      <para>automatic out-of-order execution and loop rearrangement
      for fast program execution.</para>
     </listitem>
***************
*** 44,50 ****
   <section id="introduction-goals">
    <title>&pooma; Goals</title>

!   <para>The goals for the &poomatoolkit; have remained unchanged
    since its conception in 1994:
    <orderedlist>
     <listitem>
--- 40,50 ----
   <section id="introduction-goals">
    <title>&pooma; Goals</title>

!   <para><indexterm zone="introduction-goals">
!          <primary>&pooma;</primary>
!          <secondary>goals</secondary>
!         </indexterm>
!   The goals for the &poomatoolkit; have remained unchanged
    since its conception in 1994:
    <orderedlist>
     <listitem>
***************
*** 74,89 ****

    <bridgehead id="introduction-goals-portability" renderas="sect2">Code Portability for Sequential and Distributed Programs</bridgehead>

!   <para>The same &pooma; programs run on sequential, distributed, and
    parallel computers.  No change in source code is required.  Two or
!   three lines specifying how each container's domain should be
    distributed among available processors.  Using these directives and
    run-time information about the computer's configuration, the
    &toolkit; automatically distributes pieces of the container domains,
    called <link
    linkend="glossary-patch"><firstterm>patches</firstterm></link>,
    among the available processors.  If a computation needs values from
!   another patch, &pooma; automatically passes the value to the patch
    where it is needed.  The same program, and even the same executable,
    works regardless of the number of the available processors and the
    size of the containers' domains.  A programmer interested in only
--- 74,92 ----

    <bridgehead id="introduction-goals-portability" renderas="sect2">Code Portability for Sequential and Distributed Programs</bridgehead>

!   <para><indexterm zone="introduction-goals-portability">
!          <primary>code portability</primary>
!         </indexterm>
!   The same &pooma; programs run on sequential, distributed, and
    parallel computers.  No change in source code is required.  Two or
!   three lines specify how each container's domain should be
    distributed among available processors.  Using these directives and
    run-time information about the computer's configuration, the
    &toolkit; automatically distributes pieces of the container domains,
    called <link
    linkend="glossary-patch"><firstterm>patches</firstterm></link>,
    among the available processors.  If a computation needs values from
!   another patch, &pooma; automatically passes the values to the patch
    where it is needed.  The same program, and even the same executable,
    works regardless of the number of the available processors and the
    size of the containers' domains.  A programmer interested in only
***************
*** 92,116 ****

    <bridgehead id="introduction-goals-rapid_development" renderas="sect2">Rapid Application Development</bridgehead>

!   <para>The &poomatoolkit; is designed to enable rapid development of
    scientific and distributed applications.  For example, its vector,
    matrix, and tensor classes model the corresponding mathematical
    concepts.  Its &array; and &field; classes model the discrete spaces
!   and mathematical arrays frequently found in computational science and
!   math.  See <xref linkend="introduction-science_algorithms"></xref>.
!   The left column indicates theoretical science and math concepts, the
!   middle column computational science and math concepts, and the right
!   column computer science implementations.  For example, theoretical
!   physics frequently uses continuous fields in three-dimension space,
!   while algorithms for a corresponding computational physics problem
!   usually uses discrete fields.  &pooma; containers, classes, and
!   functions ease engineering computer programs for these algorithms.
!   For example, the &pooma; &field; container models discrete fields;
!   both map locations in discrete space to values and permit
!   computations of spatial distances and values.  The &pooma; &array;
!   container models the mathematical concept of an array, used in
!   numerical analysis.</para>

    <figure float="1" id="introduction-science_algorithms">
     <title>How &pooma; Fits Into the Scientific Process</title>
     <mediaobject>
--- 95,125 ----

    <bridgehead id="introduction-goals-rapid_development" renderas="sect2">Rapid Application Development</bridgehead>

!   <para><indexterm zone="introduction-goals-rapid_development">
!          <primary>rapid development</primary>
!         </indexterm>
!   The &poomatoolkit; is designed to enable rapid development of
    scientific and distributed applications.  For example, its vector,
    matrix, and tensor classes model the corresponding mathematical
    concepts.  Its &array; and &field; classes model the discrete spaces
!   and mathematical arrays frequently found in computational science
!   and math.  See <xref
!   linkend="introduction-science_algorithms"></xref>.  The left column
!   indicates theoretical science and math concepts, the middle column
!   computational science and math concepts, and the right column
!   computer science implementations.  For example, theoretical physics
!   frequently uses continuous fields in three-dimension space, while
!   algorithms for a corresponding computational physics problem usually
!   uses discrete fields.  &pooma; containers, classes, and functions
!   ease engineering computer programs for these algorithms.  For
!   example, the &pooma; &field; container models discrete fields: both
!   map locations in discrete space to values and permit computations of
!   spatial distances and values.  The &pooma; &array; container models
!   the mathematical concept of an array, frequently used in numerical
!   analysis.</para>

+   <!-- FIXME: How can we include this figure in the HTML version? -->
+ 
    <figure float="1" id="introduction-science_algorithms">
     <title>How &pooma; Fits Into the Scientific Process</title>
     <mediaobject>
***************
*** 121,129 ****
       <phrase>&pooma; helps translate algorithms into programs.</phrase>
      </textobject>
      <caption>
!      <para>In the translation from theoretical science and math to
!      computational science and math to computer programs, &pooma; eases
!      the implementation of algorithms as computer programs.</para>
      </caption>
     </mediaobject>
    </figure>
--- 130,138 ----
       <phrase>&pooma; helps translate algorithms into programs.</phrase>
      </textobject>
      <caption>
!      <para>In the translation from theoretical science to
!      computational science to computer programs, &pooma; eases the
!      implementation of algorithms as computer programs.</para>
      </caption>
     </mediaobject>
    </figure>
***************
*** 131,191 ****
    <para>&pooma; containers support a variety of computation modes,
    easing translation of algorithms into code.  For example, many
    algorithms for solving partial differential equations use
!   stencil-based computations.  &pooma; supports stencil-based
!   computations on &array;s and &field;s.  It also supports
!   data-parallel computation similar to &fortran 90 syntax.  For
!   computations where one &field;'s values is a function of several
!   other &field;'s values, the programmer can specify a relation.
!   Relations are lazily evaluated: whenever the dependent &field;'s
!   values are needed and it is dependent on a &field; whose values have
!   changed, its values are computed.  Lazy evaluation also assists
!   correctness by eliminating the frequently forgotten need for a
!   programmer to ensure a &field;'s values are up-to-date before being
!   used.</para>

    <bridgehead id="introduction-goals-efficient" renderas="sect2">Efficient Code</bridgehead>

    <para>&pooma; incorporates a variety of techniques to ensure it
!   produces code that executes as quickly as special-case,
!   hand-written code.
!  <!-- FIXME: Do I present execution numbers here? -->
!   These techniques include extensive use of templates, out-of-order
!   evaluation, use of guard layers, and production of fast inner loops.</para>
! 
!   <para>&pooma;'s uses of &cc; templates permits the expressiveness
!   from using pointers and function arguments but ensures as much as
!   work as possible occurs at compile time, not run time.  This speeds
!   programs' execution.  Since more code is produced at compile time,
!   more code is available to the compiler's optimizer, further speeding
!   execution.  The &pooma; &array; container benefits from the use of
!   template parameters.  Their use permits the use of specialized data
!   storage classes called <link
    linkend="glossary-engine"><firstterm>engines</firstterm></link>.  An
!   &array;'s &engine; template parameter specifies how data is stored and
!   indexed.  Some &array;s expect almost all values to be used, while
!   others might be mostly vacant.  In the latter case, using a
    specialized engine storing the few nonzero values greatly reduces
!   space requirements.  Using engines also permits fast creation of
!   container views, known as <firstterm>array sections</firstterm> in
!   Fortran 90.  A view's engine is the same as the original
!   container's engine, but the view object maps its restricted domain to
!   the original domain.  Space requirements and execution time to use
!   views are minimal.  Using templates also permits containers to
!   support polymorphic indexing, e.g., indexing both by integers and by
!   three-dimensional coordinates.  A container defers indexing
!   operations to its engine's templatized index operator.  Since it uses
!   templates, the &engine; can define indexing functions with different
!   function arguments, without the need to add corresponding container
!   functions.  Some of these benefits of using templates can be
!   expressed without them, but doing so increases execution time.  For
!   example, a container could have a pointer to an engine object, but
!   this requires a pointer dereference for each operation.  Implementing
!   polymorphic indexing without templates would require adding virtual
!   functions corresponding to each of the indexing functions.</para>

!   <para>To ensure multiprocessor &pooma; programs execute quickly, it
    is important that interprocessor communication overlaps with
    intraprocessor computations as much as possible and that
    communication is minimized.  Asynchronous communication, out-of-order
--- 140,235 ----
    <para>&pooma; containers support a variety of computation modes,
    easing translation of algorithms into code.  For example, many
    algorithms for solving partial differential equations use
!   stencil-based computations so &pooma; supports stencil-based
!   computations on &array;s and &field;s.  &pooma; also supports
!   data-parallel computation similar to &fortran 90 syntax.
!   <indexterm class="startofrange"
!   id="introduction-goals-rapid_development-index-relations">
!    <primary>relation</primary>
!   </indexterm>
!   To ease implementing computations where one &field;'s values are a
!   function of several other &field;'s values, the programmer can
!   specify a <glossterm
!   linkend="glossary-relation">relation</glossterm>.  Relations are
!   lazily evaluated: whenever the dependent &field;'s values are needed
!   and they are dependent on a &field; whose values have changed, the
!   values are computed.  Relations also assists correctness by
!   eliminating the frequently forgotten need for a programmer to ensure
!   a &field;'s values are up-to-date before being used.<indexterm
!   class="endofrange"
!   startref="introduction-goals-rapid_development-index-relations"></indexterm></para>

    <bridgehead id="introduction-goals-efficient" renderas="sect2">Efficient Code</bridgehead>

    <para>&pooma; incorporates a variety of techniques to ensure it
!   produces code that executes as quickly as special-case, hand-written
!   code.  These techniques include extensive use of templates,
!   out-of-order evaluation, use of guard layers, and production of fast
!   inner loops.</para>
! 
!   <para><indexterm class="startofrange"
!   id="introduction-goals-efficient-index-templates_use">
!    <primary>templates</primary>
!    <secondary>use</secondary>
!   </indexterm>
!   &pooma;'s uses of &cc; templates ensures as much as work as possible
!   occurs at compile time, not run time.  This speeds programs'
!   execution.  Since more code is produced at compile time, more code
!   is available to the compiler's optimizer, further speeding
!   execution.
!   <indexterm class="startofrange"
!   id="introduction-goals-efficient-index-templates-engines">
!    <primary>engines</primary>
!   </indexterm>
!   The &pooma; &array; container benefits from the use of template
!   parameters.  Their use permits the use of specialized data storage
!   classes called <link
    linkend="glossary-engine"><firstterm>engines</firstterm></link>.  An
!   &array;'s &engine; template parameter specifies how data is stored
!   and indexed.  Some &array;s expect almost all values to be used,
!   while others might be mostly empty.  In the latter case, using a
    specialized engine storing the few nonzero values greatly reduces
!   storage requirements.  Using engines also permits fast creation of
!   container views, known as <indexterm><primary>array
!   sections</primary></indexterm><firstterm>array sections</firstterm>
!   in &fortran; 90.  A view's engine is the same as the original
!   container's engine, but the view object's restricted domain is a
!   subset of the original domain.  Space requirements and execution
!   time to use views are minimal.  <indexterm class="endofrange"
!   startref="introduction-goals-efficient-index-templates-engines"></indexterm>
!   </para>

+   <para id="introduction-goals-efficient-polymorphic_indexing">
+   <indexterm zone="introduction-goals-efficient-polymorphic_indexing">
+    <primary>polymorphic indexing</primary>
+   </indexterm>
+   Using templates also permits containers to support polymorphic
+   indexing, e.g., indexing both by integers and by three-dimensional
+   coordinates.  A container uses templatized indexing functions that
+   defer indexing operations to its engine's index operators.  Since
+   the container uses templates, the &engine; can define indexing
+   functions with different function arguments, without the need to add
+   corresponding container functions.  Some of these benefits of using
+   templates can be expressed without them, but doing so increases
+   execution time.  For example, a container could have a pointer to an
+   engine object, but this requires a pointer dereference for each
+   operation.  Implementing polymorphic indexing without templates
+   would require adding virtual functions corresponding to each of the
+   indexing functions.
+   <indexterm class="endofrange"
+   startref="introduction-goals-efficient-index-templates_use"></indexterm>
+   </para>
+ 
   <!-- FIXME: Are the claims concerning out-of-order evaluation I make true? -->

!   <para id="introduction-goals-efficient-asynchronous_communication">
!   <indexterm zone="introduction-goals-efficient-asynchronous_communication">
!    <primary>asynchronous communication</primary>
!   </indexterm>
!   <indexterm zone="introduction-goals-efficient-asynchronous_communication">
!    <primary>&cheetah;</primary>
!   </indexterm>
!   To ensure multiprocessor &pooma; programs execute quickly, it
    is important that interprocessor communication overlaps with
    intraprocessor computations as much as possible and that
    communication is minimized.  Asynchronous communication, out-of-order
***************
*** 199,241 ****
    sender to put and get data without synchronizing with the recipient
    processor, and it also permits invoking functions at remote sites to
    ensure desired data is up-to-date.  Thus, out-of-order evaluation
    must be supported.  Out-of-order evaluation also has another benefit:
    Only computations directly or indirectly related to values that are
!   printed need occur.</para>

!   <para>Surrounding a patch with <link
    linkend="glossary-guard_layer"><firstterm>guard
    layers</firstterm></link> can help reduce interprocessor
    communication.  For distributed computation, each container's domain
    is split into pieces distributed among the available processors.
    Frequently, computing a container value is local, involving just the
!   value itself and a few neighbors, but computing a value near the edge
!   of a processor's domain may require knowing a few values from a
    neighboring domain.  Guard layers permit these values to be copied
    locally so they need not be repeatedly communicated.</para>

!   <para>&pooma; uses &pete; technology to ensure inner loops involving
    &pooma;'s object-oriented containers run as quickly as hand-coded
!   <!-- FIXME: Add a citation to Dr. Dobb's Journal article pete-99. -->
!   loops.  &pete; (the Portable Expression Template Engine) uses
!   expression-template technology to convert data-parallel statements
!   in the inner loops of programs into efficient loops
!   without any intermediate computations.  For example, consider
!   evaluating the statement
!   <programlisting>
!   A += -B + 2 * C;</programlisting>
!   where <varname>A</varname> and <varname>C</varname> are
    <type>vector<double></type>s and <varname>B</varname> is a
!   <type>vector<int></type>.  Naive evaluation might introduce
    intermediaries for <statement>-B</statement>,
    <statement>2*C</statement>, and their sum.  The presence of these
!   intermediaries in inner loops can measurably slow evaluation.  To
    produce a loop without intermediaries, &pete; stores each expression
!   as a parse tree.  The resulting parse trees can be combined into a
!   larger parse tree.  Using its templates, the parse tree is converted,
!   at compile time, to a loop evaluating each component of the result.
!   Thus, no intermediate values are computed or stored.  For example,
!   the code corresponding to the statement above is
    <programlisting>
    vector<double>::iterator iterA = A.begin();
    vector<int>::const_iterator iterB = B.begin();
--- 243,308 ----
    sender to put and get data without synchronizing with the recipient
    processor, and it also permits invoking functions at remote sites to
    ensure desired data is up-to-date.  Thus, out-of-order evaluation
+   <indexterm>
+    <primary>out-of-order evaluation</primary>
+   </indexterm>
+   <indexterm>
+    <primary>evaluation</primary>
+    <secondary>out-of-order</secondary>
+    <see>out-of-order evaluation.</see>
+   </indexterm>
+ <!-- FIXME: Add glossary entry for out-of-order evaluation. -->
    must be supported.  Out-of-order evaluation also has another benefit:
    Only computations directly or indirectly related to values that are
!   printed need occur.
!   </para>

!   <para id="introduction-goals-efficient-guard_layers">
!   <indexterm zone="introduction-goals-efficient-guard_layers">
!    <primary>guard layer</primary>
!   </indexterm>
!   Surrounding a patch with <link
    linkend="glossary-guard_layer"><firstterm>guard
    layers</firstterm></link> can help reduce interprocessor
    communication.  For distributed computation, each container's domain
    is split into pieces distributed among the available processors.
    Frequently, computing a container value is local, involving just the
!   value itself and a few neighbors, but computing a value near the
!   edge of a processor's domain may require knowing a few values from a
    neighboring domain.  Guard layers permit these values to be copied
    locally so they need not be repeatedly communicated.</para>

!   <para id="introduction-goals-efficient-pete">
!   <indexterm zone="introduction-goals-efficient-pete">
!    <primary>&pete;</primary>
!   </indexterm>
!   <indexterm>
!    <primary><application class="software">Portable Expression Template Engine</application></primary>
!    <see>&pete;.</see>
!   </indexterm>
!   <indexterm zone="introduction-goals-efficient-pete">
!    <primary>inner-loop evaluation</primary>
!   </indexterm>
!   &pooma; uses &pete; technology to ensure inner loops involving
    &pooma;'s object-oriented containers run as quickly as hand-coded
!   <!-- FIXME: Add a citation to Dr. Dobb's Journal article
!   pete-99. --> loops.  &pete; (the <application>Portable Expression Template
!   Engine</application>) uses expression-template technology to convert data-parallel
!   statements into efficient loops without any intermediate
!   computations.  For example, consider evaluating the statement
! <programlisting>
! A += -B + 2 * C;
! </programlisting> where <varname>A</varname> and <varname>C</varname> are
    <type>vector<double></type>s and <varname>B</varname> is a
!   <type>vector<int></type>.  &naivecap; evaluation might introduce
    intermediaries for <statement>-B</statement>,
    <statement>2*C</statement>, and their sum.  The presence of these
!   intermediaries in inner loops can measurably slow performance.  To
    produce a loop without intermediaries, &pete; stores each expression
!   as a parse tree.  Using its templates, the parse tree is
!   converted, at compile time, to a loop directly evaluating each component of
!   the result without computing intermediate values.
!   For example, the code corresponding to the statement above is
    <programlisting>
    vector<double>::iterator iterA = A.begin();
    vector<int>::const_iterator iterB = B.begin();
***************
*** 244,267 ****
      *iterA += -*iterB + 2 * *iterC;
      ++iterA; ++iterB; ++iterC;
    }</programlisting>
!   Furthermore, since the code is available at compile, not run, time,
    it can be further optimized, e.g., moving any loop-invariant code out
    of the loop.</para>

    <bridgehead id="introduction-goals-scientific" renderas="sect2">Used for Diverse Set of Scientific Problems</bridgehead>

    <para>&pooma; has been used to solve a wide variety of scientific
    problems.  Most recently, physicists at Los Alamos National
!   Laboratory implemented an entire library of hydrodynamics codes as
    part of the U.S. government's science-based Stockpile Stewardship
!   Program to simulate nuclear weapons.  Other applications include a
    matrix solver, an accelerator code simulating the dynamics of
    high-intensity charged particle beams in linear accelerators, and a
!   Monte Carlo neutron transport code.</para>

    <bridgehead id="introduction-goals-easy_implementation" renderas="sect2">Easy Implementation</bridgehead>

!   <para>&pooma;'s tools greatly reduce the time to implement
    applications.  As we noted above, &pooma;'s containers and expression
    syntax model the computational models and algorithms most frequently
    found in scientific programs.  These high-level tools are known to be
--- 311,357 ----
      *iterA += -*iterB + 2 * *iterC;
      ++iterA; ++iterB; ++iterC;
    }</programlisting>
!   Furthermore, since the code is available at compile time, not run time,
    it can be further optimized, e.g., moving any loop-invariant code out
    of the loop.</para>

+ 
    <bridgehead id="introduction-goals-scientific" renderas="sect2">Used for Diverse Set of Scientific Problems</bridgehead>

    <para>&pooma; has been used to solve a wide variety of scientific
    problems.  Most recently, physicists at Los Alamos National
!   Laboratory
!   <indexterm>
!    <primary>Los Alamos National Laboratory</primary>
!   </indexterm>
!   implemented an entire library of hydrodynamics codes
!   <indexterm>
!    <primary>hydrodynamics</primary>
!   </indexterm>
!   as
    part of the U.S. government's science-based Stockpile Stewardship
!   Program
!   <indexterm>
!    <primary>Stockpile Stewardship Program</primary>
!   </indexterm>
!   to simulate nuclear weapons.  Other applications include a
    matrix solver, an accelerator code simulating the dynamics of
    high-intensity charged particle beams in linear accelerators, and a
!   Monte Carlo
!   <indexterm>
!    <primary>Monte Carlo simulation</primary>
!   </indexterm>
!   neutron transport code.</para>

+ 
    <bridgehead id="introduction-goals-easy_implementation" renderas="sect2">Easy Implementation</bridgehead>

!   <para id="introduction-goals-easy_implementation-ease">
!   <indexterm zone="introduction-goals-easy_implementation-ease">
!    <primary>&pooma;</primary>
!    <secondary>ease of writing programs</secondary>
!   </indexterm>
!   &pooma;'s tools greatly reduce the time to implement
    applications.  As we noted above, &pooma;'s containers and expression
    syntax model the computational models and algorithms most frequently
    found in scientific programs.  These high-level tools are known to be
***************
*** 271,280 ****
    computers.  With no additional work, the same program runs on
    computers with hundreds of processors; the code is exactly the same,
    and the &toolkit; automatically handles distribution of the data, all
!   data communication, and all synchronization.  The net results is a
    significant reduction in programming time.  For example, a team of
    two physicists and two support people at Los Alamos National
!   Laboratory implemented a suite of hydrodynamics kernels in six
    months.  Their work replaced a previous suite of less-powerful
    kernels which had taken sixteen people several years to implement and
    debug.  Despite not have previously implemented any of the kernels,
--- 361,378 ----
    computers.  With no additional work, the same program runs on
    computers with hundreds of processors; the code is exactly the same,
    and the &toolkit; automatically handles distribution of the data, all
!   data communication, and all synchronization.  The net result is a
    significant reduction in programming time.  For example, a team of
    two physicists and two support people at Los Alamos National
!   Laboratory
!   <indexterm>
!    <primary>Los Alamos National Laboratory</primary>
!   </indexterm>
!   implemented a suite of hydrodynamics kernels
!   <indexterm>
!    <primary>hydrodynamics</primary>
!   </indexterm>
!   in six
    months.  Their work replaced a previous suite of less-powerful
    kernels which had taken sixteen people several years to implement and
    debug.  Despite not have previously implemented any of the kernels,
***************
*** 283,352 ****
   </section><!-- introduction-goals -->

   <section id="introduction-performance">
    <title>&pooma; Produces Fast Programs</title>

    <para>almost as fast as &c;.  wide variety of configurations: one
    processor, many processors, give performance data for at least two
!   different programs
! HERE</para>

!   <para>describe &doof2d; here

    &doof2d; is a two-dimensional diffusion simulation program.
    Initially, all values in the square two-dimensional grid are zero
!   except for the central value.  
! 
! HERE</para>

   </section>

- <!-- HERE -->

   <section id="introduction-open_source">
    <title>&pooma; is Free, Open-Source Software</title>

    <para>The &poomatoolkit; is open-source software.  Anyone may
    download, read, redistribute, and modify the &pooma; source code.
!   If an application requires a specialized container, any programmer
!   may add it.  Any programmer can extend it to solve problems in
!   previously unsupported domains.  Companies using the &toolkit; can
!   read the source code to ensure it has no hidden back doors or
!   security holes.  It may be downloaded for free and used for
!   perpetuity.  There are no annual licenses and no on-going costs.  By
!   keeping their own copies, companies are guaranteed the software will
!   never disappear.  In summary, the &poomatoolkit; is free, low-risk
!   software.</para>
   </section>

   <section id="introduction-pooma_history">
    <title>History of &pooma;</title>

    <para>The &poomatoolkit; was developed at Los Alamos National
!   Laboratory to assist nuclear fusion and fission research.
    In 1994, the &toolkit; grew out of the <application
    class='software'>Object-Oriented Particle Simulation</application>
!   class library developed for particle-in-cell simulations.  The goals
    of the Framework, as it was called at the time, were driven by the
!   Numerical Tokamak's <quote>Parallel Platform Paradox</quote>:
    <blockquote>
     <para>The average time required to implement a moderate-sized
     application on a parallel computer architecture is equivalent to
     the half-life of the latest parallel supercomputer.</para>
    </blockquote>
    The framework's goal of being able to quickly write efficient
    scientific code that could be run on a wide variety of platforms
    remains unchanged today.  Development, mainly at the
!   Advanced Computing Laboratory at Los Alamos, proceeded rapidly.
!   A matrix solver application was written using the framework.
  <!-- FIXME: Add citation to pooma-sc95. -->
!   Support for hydrodynamics, Monte Carlo simulations, and molecular
!   dynamics modeling soon followed.</para>
! 
!   <para>By 1998, &pooma; was part of the U.S. Department of
!   Energy's Accelerated Strategic Computing Initiative
!   (<acronym>ASCI</acronym>).  The Comprehensive Test Ban Treaty forbid
    nuclear weapons testing so they were instead simulated using
    computers.  <acronym>ASCI</acronym>'s goal was to radically advance
    the state of the art in high-performance computing and numerical
--- 381,516 ----
   </section><!-- introduction-goals -->

+ <![%unfinished;[
   <section id="introduction-performance">
    <title>&pooma; Produces Fast Programs</title>

    <para>almost as fast as &c;.  wide variety of configurations: one
    processor, many processors, give performance data for at least two
!   different programs UNFINISHED</para>

!   <para>describe &doof2d; at this location

    &doof2d; is a two-dimensional diffusion simulation program.
    Initially, all values in the square two-dimensional grid are zero
!   except for the central value.  UNFINISHED</para>

   </section>
+ ]]>  <!-- end unfinished -->

   <section id="introduction-open_source">
    <title>&pooma; is Free, Open-Source Software</title>

+   <indexterm zone="introduction-open_source">
+    <primary>open-source software</primary>
+   </indexterm>
+   <indexterm zone="introduction-open_source">
+    <primary>&pooma;</primary>
+    <secondary>open-source</secondary>
+   </indexterm>
+ 
    <para>The &poomatoolkit; is open-source software.  Anyone may
    download, read, redistribute, and modify the &pooma; source code.
!   If an application requires a specialized container not already
!   available, any programmer may add it.  Any programmer can extend it
!   to solve problems in previously unsupported domains.  Companies
!   using the &toolkit; can read the source code to ensure it has no
!   security holes.  It may be downloaded for free
!   and used for perpetuity.  There are no annual licenses and no
!   on-going costs.  By keeping their own copies, companies are
!   guaranteed the software will never disappear.  In summary, the
!   &poomatoolkit; is free, low-risk software.</para>
   </section>

   <section id="introduction-pooma_history">
    <title>History of &pooma;</title>

+   <indexterm zone="introduction-pooma_history">
+    <primary>&pooma;</primary>
+    <secondary>history</secondary>
+   </indexterm>
+   <indexterm zone="introduction-pooma_history">
+    <primary>Los Alamos National Laboratory</primary>
+   </indexterm>
+ 
    <para>The &poomatoolkit; was developed at Los Alamos National
!   Laboratory to assist nuclear fusion
!   <indexterm>
!    <primary>fusion</primary>
!   </indexterm>
!   and fission
!   <indexterm>
!    <primary>fission</primary>
!   </indexterm>
!   research.
    In 1994, the &toolkit; grew out of the <application
    class='software'>Object-Oriented Particle Simulation</application>
!   <indexterm>
!    <primary>Object-Oriented Particle Simulation Library</primary>
!   </indexterm>
!   Class Library developed for particle-in-cell simulations.  The goals
    of the Framework, as it was called at the time, were driven by the
!   Numerical Tokamak's 
!   <indexterm>
!    <primary>Tokamak</primary>
!   </indexterm>
!   <quote>Parallel Platform Paradox</quote>:
    <blockquote>
     <para>The average time required to implement a moderate-sized
     application on a parallel computer architecture is equivalent to
     the half-life of the latest parallel supercomputer.</para>
    </blockquote>
+   <indexterm>
+    <primary>Parallel Platform Paradox</primary>
+   </indexterm>
    The framework's goal of being able to quickly write efficient
    scientific code that could be run on a wide variety of platforms
    remains unchanged today.  Development, mainly at the
!   Advanced Computing Laboratory
!   <indexterm>
!    <primary>Los Alamos National Laboratory</primary>
!    <secondary>Advanced Computing Laboratory</secondary>
!   </indexterm>
!   at Los Alamos, proceeded rapidly.  A matrix solver application was
!   written using the framework.
  <!-- FIXME: Add citation to pooma-sc95. -->
!   Support for hydrodynamics,
!   <indexterm>
!    <primary>hydrodynamics</primary>
!   </indexterm>
!   Monte Carlo simulations,
!   <indexterm>
!    <primary>Monte Carlo simulation</primary>
!   </indexterm>
!   and molecular dynamics
!   <indexterm>
!    <primary>molecular dynamics modeling</primary>
!   </indexterm>
!   modeling soon followed.</para>
! 
!   <para id="introduction-pooma_history-asci">
!   By 1998, &pooma; was part of the U.S. Department of
!   Energy's
!   <indexterm>
!    <primary>Department of Energy</primary>
!   </indexterm>
!   Accelerated Strategic Computing Initiative
!   (<acronym>ASCI</acronym>).
!   <indexterm zone="introduction-pooma_history-asci">
!    <primary>Department of Energy</primary>
!    <secondary>Accelerated Strategic Computing Initiative</secondary>
!   </indexterm>
!   <indexterm>
!    <primary>Accelerated Strategic Computing Initiative</primary>
!    <see>Department of Energy, Accelerated Strategic Computing Initiative.</see>
!   </indexterm>
!   The Comprehensive Test Ban Treaty
!   <indexterm>
!    <primary>Comprehensive Test Ban Treaty</primary>
!   </indexterm>
!   forbid
    nuclear weapons testing so they were instead simulated using
    computers.  <acronym>ASCI</acronym>'s goal was to radically advance
    the state of the art in high-performance computing and numerical
*************** HERE</para>
*** 361,388 ****
    <para>&pooma; 2 involved a new conceptual framework and a
    complete rewriting of the source code to improve performance.  The
  <!-- FIXME: Add a citation to iscope98.pdf. -->
!   &array; class was introduced with its use of &engine;s, separating
!   container use from container storage.  An asynchronous scheduler
!   permitted out-of-order execution to improve cache coherency.
    Incorporating the <application class="software">Portable
    Expression Template Engine</application> (<acronym>PETE</acronym>)
!   permitted faster loop execution.  Soon, container views and
!   <type>ConstantFunction</type> and <type>IndexFunction</type>
!   &engine;s were added.  Release 2.1.0 included &field;s with
!   their spatial extent and &dynamicarray;s with the ability to
!   dynamically change its domain size.  Support for particles and
    their interaction with &field;s were added.  The &pooma; messaging
    implementation was revised in release 2.3.0.  Use of the
!   &cheetah; Library separated &pooma; from the actual messaging
    library used, and support for applications running on clusters of
    computers was added.  <ulink
!   url="http://www.codesourcery.com">CodeSourcery, LLC</ulink>, and
!   <ulink url="www.proximation.com">Proximation, LLC</ulink>, took
    over &pooma; development from Los Alamos National Laboratory.
    During the past two years, the &field;
!   abstraction and implementation was improved to increase its
    flexibility, add support for multiple values and materials in the
!   same cell, and permit lazy evaluation.  Simultaneously, the
    execution speed of the inner loops was greatly increased.</para>
   </section>

--- 525,609 ----
    <para>&pooma; 2 involved a new conceptual framework and a
    complete rewriting of the source code to improve performance.  The
  <!-- FIXME: Add a citation to iscope98.pdf. -->
!   &array; class
!   <indexterm>
!    <primary>&array;</primary>
!   </indexterm>
!   was introduced with its use of &engine;s,
!   <indexterm>
!    <primary>&engine;</primary>
!   </indexterm>
!   separating
!   container use from container storage.  A new asynchronous scheduler
!   permitted out-of-order execution
!   <indexterm>
!    <primary>out-of-order evaluation</primary>
!   </indexterm>
!   to improve cache coherency.
    Incorporating the <application class="software">Portable
    Expression Template Engine</application> (<acronym>PETE</acronym>)
!   <indexterm>
!    <primary>&pete;</primary>
!   </indexterm>
!   permitted faster loop execution.  Soon, container views
!   <indexterm>
!    <primary>container</primary>
!    <secondary>view</secondary>
!   </indexterm>
!   and
!   <type>ConstantFunction</type>
!   <indexterm>
!    <primary>&engine;</primary>
!    <secondary><type>ConstantFunction</type></secondary>
!   </indexterm>
!   and <type>IndexFunction</type>
!   <indexterm>
!    <primary>&engine;</primary>
!    <secondary><type>IndexFunction</type></secondary>
!   </indexterm>
!   &engine;s were added.  Release 2.1.0 included &field;s
!   <indexterm>
!    <primary>&field;</primary>
!   </indexterm>
!   with
!   their spatial extent and &dynamicarray;s
!   <indexterm>
!    <primary>&dynamicarray;</primary>
!   </indexterm>
!   with the ability to
!   dynamically change domain size.  Support for particles and
    their interaction with &field;s were added.  The &pooma; messaging
    implementation was revised in release 2.3.0.  Use of the
!   &cheetah; Library
!   <indexterm>
!    <primary>&cheetah;</primary>
!   </indexterm>
!   separated &pooma; from the actual messaging
    library used, and support for applications running on clusters of
    computers was added.  <ulink
!   url="http://www.codesourcery.com/">CodeSourcery, LLC</ulink>,
!   <indexterm>
!    <primary>CodeSourcery, LLC</primary>
!   </indexterm>
!   and
!   <ulink url="http://www.proximation.com/">Proximation, LLC</ulink>,
!   <indexterm>
!    <primary>Proximation, LLC</primary>
!   </indexterm>
!   took
    over &pooma; development from Los Alamos National Laboratory.
    During the past two years, the &field;
!   abstraction
!   <indexterm>
!    <primary>&field;</primary>
!   </indexterm>
!   and implementation was improved to increase its
    flexibility, add support for multiple values and materials in the
!   same cell, and permit lazy evaluation.
!   <indexterm>
!    <primary>lazy evaluation</primary>
!   </indexterm>
!   Simultaneously, the
    execution speed of the inner loops was greatly increased.</para>
   </section>

Index: manual.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v
retrieving revision 1.8
diff -c -p -r1.8 manual.xml
*** manual.xml	2002/01/22 15:48:49	1.8
--- manual.xml	2002/01/24 04:56:38
***************
*** 1,4 ****
--- 1,6 ----
  <?xml version="1.0"?>
+ <!-- FIXME: Index this file. -->
+ <!-- FIXME: What font does DocBook/JadeTeX use?  Can we use it for the figures? -->

  <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "file://usr/lib/sgml/docbookx.dtd" [

***************
*** 7,12 ****
--- 9,20 ----
    <!-- UPDATE: Check before publishing to see if any needs changing. -->
    <!-- ADD: Write more material. -->

+ <!-- Conditional Inclusion Entity Declarations -->
+ <!ENTITY % unfinished "IGNORE">
+   <!-- Unfinished sections should not be included in published versions. -->
+ <!ENTITY % temporary "INCLUDE">
+   <!-- Temporary sections to be included in published versions until the final version is written. -->
+ 
  <!-- Index Entity Declarations -->
  <!ENTITY genindex.sgm SYSTEM "genindex.sgm">

***************
*** 158,163 ****
--- 166,173 ----

  <!ENTITY container "C">
    <!-- an abbreviation for a canonical container -->
+ <!ENTITY containerdomain "D">
+   <!-- an abbreviation for a canonical container domain -->
  <!ENTITY n "n">
    <!-- the size of one dimension of an array -->
  <!ENTITY space "ℜ<superscript>d</superscript>">
***************
*** 173,194 ****

  <!-- &pooma; URLs and Files -->

! <!ENTITY poomaDownloadPage '<ulink url="http://pooma.codesourcery.com/pooma/download">http://pooma.codesourcery.com/pooma/download</ulink>'>
    <!-- The WWW page supporting downloading the &pooma; source code. -->
    <!-- UPDATE this URL. -->
! <!ENTITY poomaHomePage '<ulink url="http://www.pooma.com/">http://www.pooma.com/</ulink>'>
    <!-- The canonical Pooma home page. -->
    <!-- UPDATE this filename. -->
! <!ENTITY poomaSource "pooma-2.3.0">
    <!-- The Pooma source code directory. -->
! <!ENTITY poomaSourceFile "&poomaSource;.tgz">
    <!-- The Pooma source code archive. -->
! <!ENTITY poomaExampleDirectory "examples/Manual">
    <!-- The directory holding this manual's example codes. -->

  <!-- Spelling and Formatting Decisions -->
  <!ENTITY author "author">
    <!-- A word describing an author xor authors. -->
    <!-- spelling: dependence, not dependency -->
    <!-- spelling: element-wise, not elementwise -->
    <!-- phrase: function object, not functor -->
--- 183,210 ----

  <!-- &pooma; URLs and Files -->

! <!ENTITY poomadownloadpage '<ulink url="http://pooma.codesourcery.com/pooma/download">http://pooma.codesourcery.com/pooma/download</ulink>'>
    <!-- The WWW page supporting downloading the &pooma; source code. -->
    <!-- UPDATE this URL. -->
! <!ENTITY poomahomepage '<ulink url="http://www.pooma.com/">http://www.pooma.com/</ulink>'>
    <!-- The canonical Pooma home page. -->
    <!-- UPDATE this filename. -->
! <!ENTITY poomasource "pooma-2.3.0">
    <!-- The Pooma source code directory. -->
!   <!-- UPDATE this filename. -->
! <!ENTITY poomasourcefile "&poomasource;.tgz">
    <!-- The Pooma source code archive. -->
! <!ENTITY poomaexampledirectory "examples/Manual">
    <!-- The directory holding this manual's example codes. -->

  <!-- Spelling and Formatting Decisions -->
  <!ENTITY author "author">
    <!-- A word describing an author xor authors. -->
+ <!ENTITY naive "naïve">
+   <!-- The word "na\"{\i}ve." -->
+ <!ENTITY naivecap "Naïve">
+   <!-- The word "Na\"{\i}ve," i.e., the capitalized &naive;. -->
+   <!-- The Pooma source code directory. -->
    <!-- spelling: dependence, not dependency -->
    <!-- spelling: element-wise, not elementwise -->
    <!-- phrase: function object, not functor -->
***************
*** 260,277 ****
      <orgname>CodeSourcery, LLC</orgname>
     </affiliation>
    </author>
!   <copyright><year>2001</year><holder>CodeSourcery, LLC (<ulink url="http://www.codesourcery.com"></ulink>)</holder></copyright>
!   <contractsponsor>Los Alamos National Laboratory<ulink url="http://www.lanl.gov"></ulink></contractsponsor>
    <legalnotice>
     <!-- FIXME: What is the correct legal notice? -->
     <para>All rights reserved.  This document may not be redistributed in any form without the express permission of the author.</para>
    </legalnotice>
    <revhistory>
     <revision>
!     <revnumber>0.01</revnumber>
!     <date>2002 Jan 14</date>
      <authorinitials>jdo</authorinitials>
!     <revremark>first draft</revremark>
     </revision>
    </revhistory>
   </bookinfo>
--- 276,293 ----
      <orgname>CodeSourcery, LLC</orgname>
     </affiliation>
    </author>
!   <copyright><year>2002</year><holder>CodeSourcery, LLC (<ulink url="http://www.codesourcery.com/"></ulink>)</holder></copyright>
!   <contractsponsor>Los Alamos National Laboratory<ulink url="http://www.lanl.gov/"></ulink></contractsponsor>
    <legalnotice>
     <!-- FIXME: What is the correct legal notice? -->
     <para>All rights reserved.  This document may not be redistributed in any form without the express permission of the author.</para>
    </legalnotice>
    <revhistory>
     <revision>
!     <revnumber>1.00</revnumber>
!     <date>2002 Jan 23</date>
      <authorinitials>jdo</authorinitials>
!     <revremark>First publication.</revremark>
     </revision>
    </revhistory>
   </bookinfo>
***************
*** 279,284 ****
--- 295,301 ----
   <!-- FINISH: May we have a short table of contents followed by a -->
   <!-- complete table of contents? -->

+ <![%unfinished;[
   <preface id="preface">
    <title>Preface</title>

***************
*** 338,349 ****
--- 355,369 ----
    </section>

   </preface>
+ ]]>  <!-- end unfinished -->

+ <![%unfinished;[
   <part id="programming">
    <title>Programming with &pooma;</title>

  <!-- FIXME: Add a partintro to the part above? -->
+ ]]>  <!-- end unfinished -->

    &introductory-chapter; 

***************
*** 420,429 ****
      components of each vector in an &array; to form its own &array;.
      Since each container has one or more &engine;s, we can also
      describe the latter category as containers that compute their
!     values using other containers' values.  A <type>MultiPatch</type>
!     &engine; distributes its domain among various processors and
!     memory spaces, each responsible for computing values associated
!     with a portion, or patch, of the domain.</para>

      <para>Just as multiple containers can use the same engine,
      multiple &engine;s can use the same underlying data.  As we
--- 440,449 ----
      components of each vector in an &array; to form its own &array;.
      Since each container has one or more &engine;s, we can also
      describe the latter category as containers that compute their
!     values using other containers' values.  A &multipatch; &engine;
!     distributes its domain among various processors and memory spaces,
!     each responsible for computing values associated with a portion,
!     or patch, of the domain.</para>

      <para>Just as multiple containers can use the same engine,
      multiple &engine;s can use the same underlying data.  As we
***************
*** 491,497 ****
  	<row>
  	 <entry>&dynamic;</entry>
  	 <entry>is a one-dimensional &brick; with dynamically
!          resizable domain.  HERE ever explicitly declare these?</entry>
  	</row>
  	<row rowsep="1">
  	 <entry>&engine;s That Compute</entry>
--- 511,518 ----
  	<row>
  	 <entry>&dynamic;</entry>
  	 <entry>is a one-dimensional &brick; with dynamically
!          resizable domain.  This should be used with &dynamicarray;,
! 	 not &array;.</entry>
  	</row>
  	<row rowsep="1">
  	 <entry>&engine;s That Compute</entry>
***************
*** 620,626 ****
      <methodname>operator()</methodname> take <type>Loc<1></type>
      or one ∫ parameter.  In addition, the one-dimensional domain
      can be dynamically resized using <methodname>create</methodname>
!     and <methodname>destroy</methodname>; see .  

  HERE Dynamic. How does one change the domain size?  What is the model?</para>

--- 641,647 ----
      <methodname>operator()</methodname> take <type>Loc<1></type>
      or one ∫ parameter.  In addition, the one-dimensional domain
      can be dynamically resized using <methodname>create</methodname>
!     and <methodname>destroy</methodname>; see .

  HERE Dynamic. How does one change the domain size?  What is the model?</para>

*************** HERE Dynamic. How does one change the do
*** 696,708 ****
--- 717,780 ----
    <chapter id="views">
     <title>Container Views</title>

+    <indexterm zone="views">
+     <primary>container</primary>
+     <secondary>view</secondary>
+    </indexterm>
+    <indexterm>
+     <primary>view of a container</primary>
+     <see>container, view.</see>
+    </indexterm>
+ 
+ <![%temporary;[
+ 
+    <para>A <glossterm linkend="glossary-view"><firstterm>view of a
+    container &container;</firstterm></glossterm> is a container
+    accessing a subset of &container;'s domain &containerdomain;
+    and values.  The subset can include all of &containerdomain;.
+    A <quote>view</quote> is so named because it is a different way to
+    access, or view, another container's values.  Both the container
+    and its view share the same underlying engine so changing values in
+    one also changes them in the other.</para>
+ 
+    <para>A view is created by following a container's name by
+    parentheses containing a domain &containerdomain;.  For
+    example, consider this code extracted from <xref
+    linkend="tutorial-array_parallel-doof2d"></xref> in <xref
+    linkend="tutorial-array_data_parallel"></xref>.
+ <programlisting>
+ Interval<1> N(0, n-1);
+ Interval<2> vertDomain(N, N);
+ Interval<1> I(1,n-2);
+ Interval<1> J(1,n-2);
+ Array<2, double, Brick> a(vertDomain);
+ Array<2, double, Brick> b(vertDomain);
+ a(I,J) = (1.0/9.0) *
+   (b(I+1,J+1) + b(I+1,J  ) + b(I+1,J-1) +
+    b(I  ,J+1) + b(I  ,J  ) + b(I  ,J-1) +
+    b(I-1,J+1) + b(I-1,J  ) + b(I-1,J-1));
+ </programlisting>  The last statement creates ten views.  For example,
+ 
+    <statement>a(I,J)</statement> creates a view of
+    <varname>a</varname> using the smaller domain specified by
+    <varname>I</varname> and <varname>J</varname>.  This omits the
+    outermost rows of columns of <varname>a</varname>.  The views
+    of <varname>b</varname> illustrate the use of views in
+    data-parallel statements.  <statement>b(I-1,J-1)</statement> has a
+    subset shifted up one row and left one column compared with
+    <statement>b(I,J)</statement>.</para>
+ ]]>  <!-- end temporary -->
+ <![%unfinished;[
     <para>Be sure to list the various arithmetic operations on domains
     that can be used.  This was deferred from the &array; and domain
     chapter.  Explain &array;'s <function>comp</function> function.</para>

  <!-- FIXME: Finish this chapter. -->
+ ]]>  <!-- end unfinished -->
    </chapter>

+ 
+ <![%unfinished;[
    <chapter id="sequential">
     <title>Writing Sequential Programs</title>

*************** UNFINISHED</para>
*** 1086,1095 ****
      dependence computations, so the &author; recommends calling
      <function>Pooma::blockAndEvaluate</function> before each access to
      a particular value in an &array; or &field;.  Omitting a necessary
!     call may lead to a race condition.  See <xref
      linkend="debugging_profiling-missing_blockandevaluate"></xref> for
      instructions how to diagnose and eliminate these race
!     conditions.</para>

      <para>Where talk about various &pooma; streams?</para>

--- 1158,1171 ----
      dependence computations, so the &author; recommends calling
      <function>Pooma::blockAndEvaluate</function> before each access to
      a particular value in an &array; or &field;.  Omitting a necessary
!     call may lead to a race condition.
! <![%unfinished;[
!     See <xref
      linkend="debugging_profiling-missing_blockandevaluate"></xref> for
      instructions how to diagnose and eliminate these race
!     conditions.
! ]]>  <!-- end unfinished -->
! </para>

      <para>Where talk about various &pooma; streams?</para>

*************** UNFINISHED</para>
*** 1193,1199 ****
        in the input domain: A(i1, i2, ..., iN).</para>

       <para>The &pooma; multidimensional Array concept is similar to
!       the &fortran; 90 array facility, but extends it in several
        ways. Both &pooma; and &fortran; arrays can have up to seven
        dimensions, and can serve as containers for arbitrary
        types. Both support the notion of views of a portion of the
--- 1269,1275 ----
        in the input domain: A(i1, i2, ..., iN).</para>

       <para>The &pooma; multidimensional Array concept is similar to
!       the &fortran; 90 array facility, but extends it in several
        ways. Both &pooma; and &fortran; arrays can have up to seven
        dimensions, and can serve as containers for arbitrary
        types. Both support the notion of views of a portion of the
*************** UNFINISHED</para>
*** 1492,1498 ****
      &pooma; II's expression trees and expression engines.</para>

      <variablelist>
!      <varlistentry><term><type>MultiPatch</type> Engine</term>
        <listitem><para>From <filename
        class="libraryfile">README</filename>: To actually use multiple
        contexts effectively, you need to use the MultiPatch engine with
--- 1568,1574 ----
      &pooma; II's expression trees and expression engines.</para>

      <variablelist>
!      <varlistentry><term>&multipatch; Engine</term>
        <listitem><para>From <filename
        class="libraryfile">README</filename>: To actually use multiple
        contexts effectively, you need to use the MultiPatch engine with
*************** UNFINISHED</para>
*** 1508,1515 ****
--- 1584,1593 ----

     </section>
    </chapter>
+ ]]>  <!-- end unfinished -->

+ <![%unfinished;[
    <chapter id="parallel">
     <title>Writing Distributed Programs</title>

*************** UNFINISHED</para>
*** 1562,1569 ****
--- 1640,1649 ----
     </section>

    </chapter>
+ ]]>  <!-- end unfinished -->

+ <![%unfinished;[
    <chapter id="debugging_profiling">
     <title>Debugging and Profiling &pooma; Programs</title>

*************** UNFINISHED</para>
*** 1607,1615 ****
--- 1687,1700 ----
        region's size should reveal where calls are missing.</para>
      </section>
    </chapter>
+ ]]>  <!-- end unfinished -->
+ 

+ <![%unfinished;[
   </part>
+ ]]>  <!-- end unfinished -->

+ <![%unfinished;[
   <part id="reference">
    <title>&pooma; Reference Manual</title>

*************** UNFINISHED</para>
*** 3489,3496 ****
--- 3574,3583 ----
     </itemizedlist>
    </chapter>
   </part>
+ ]]>  <!-- end unfinished -->

+ <![%unfinished;[
   <appendix id="future_development">
    <title>Future Development</title>

*************** UNFINISHED</para>
*** 3610,3615 ****
--- 3697,3703 ----
    </section>

   </appendix>
+ ]]>  <!-- end unfinished -->

   <appendix id="installation">
*************** UNFINISHED</para>
*** 3644,3650 ****
       <orderedlist spacing="compact">
  	<listitem>
  	 <para>Download the library from the &pooma; Download page
!          available off the &pooma; home page (&poomaHomePage;).</para>
  	</listitem>
  	<listitem>
  	 <para>Extract the source code using <command>tar xzvf
--- 3732,3738 ----
       <orderedlist spacing="compact">
  	<listitem>
  	 <para>Download the library from the &pooma; Download page
!          available off the &pooma; home page (&poomahomepage;).</para>
  	</listitem>
  	<listitem>
  	 <para>Extract the source code using <command>tar xzvf
*************** UNFINISHED</para>
*** 3715,3721 ****
       <orderedlist spacing="compact">
        <listitem>
         <para>Download the library from the &pooma; Download page
!        available off the &pooma; home page (&poomaHomePage;).</para>
        </listitem>
        <listitem>
         <para>Extract the source code using <command>tar xzvf
--- 3803,3809 ----
       <orderedlist spacing="compact">
        <listitem>
         <para>Download the library from the &pooma; Download page
!        available off the &pooma; home page (&poomahomepage;).</para>
        </listitem>
        <listitem>
         <para>Extract the source code using <command>tar xzvf
*************** UNFINISHED</para>
*** 3863,3868 ****
--- 3951,3957 ----
   </appendix>

+ <![%unfinished;[
   <appendix id="compilation_errors">
    <title>Dealing with Compilation Errors</title>

*************** UNFINISHED</para>
*** 4039,4044 ****
--- 4128,4134 ----
    </section>

   </appendix>
+ ]]>  <!-- end unfinished -->

   &bibliography-chapter;
Index: template.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/template.xml,v
retrieving revision 1.1
diff -c -p -r1.1 template.xml
*** template.xml	2002/01/14 17:33:34	1.1
--- template.xml	2002/01/24 04:56:39
***************
*** 1,7 ****
    <chapter id="template_programming">
     <title>Programming with Templates</title>

!    <para>&pooma; extensively uses &cc; templates to support type
     polymorphism without incurring any run-time cost.  In this chapter,
     we briefly introduce using templates in &cc; programs by relating
     them to <quote>ordinary</quote> &cc; constructs such as values,
--- 1,16 ----
    <chapter id="template_programming">
     <title>Programming with Templates</title>

!    <indexterm zone="template_programming">
!     <primary>templates</primary>
!    </indexterm>
!    <indexterm>
!     <primary>template programming</primary>
!     <see>templates</see>
!    </indexterm>
! 
!    <para>&pooma; extensively uses &cc; <glossterm
!    linkend="glossary-template">template</glossterm>s to support type
     polymorphism without incurring any run-time cost.  In this chapter,
     we briefly introduce using templates in &cc; programs by relating
     them to <quote>ordinary</quote> &cc; constructs such as values,
***************
*** 9,69 ****
     templates will occur repeatedly:
     <itemizedlist>
       <listitem>
!       <para>Template programming occurs at compile time, not run time.
!       That is, template operations occur within the compiler, not when
!       a program runs.</para>
       </listitem>
       <listitem>
!       <para>Templates permit declaring families of classes using a
!       single declaration.  For example, the &array; template
!       declaration permits using arrays with many different value
        types, e.g., arrays of integers, arrays of floating point
        numbers, and arrays of arrays.</para>
      </listitem>
     </itemizedlist>
!    For those interested in the implementation of &pooma;, we close
!    with a discussion of some template programming concepts used in the
!    implementation but not likely to be used by &pooma; users.</para>

     <section id="template_programming-compile_time">
!     <title>Templates Occur at Compile-Time</title>

      <para>&pooma; uses &cc; templates to support type polymorphism
      without incurring any run-time cost as a program executes.  All
      template operations are performed at compile time by the
      compiler.</para>

!     <para>Prior to the introduction of templates, almost all a
      program's interesting computation occurred when it was executed.
      When writing the program, the programmer, at <glossterm
      linkend="glossary-programming_time"><firstterm>programming
!     time</firstterm></glossterm>, would specify which statements and
!     expressions would occur and which types to use.  At <glossterm
      linkend="glossary-compile_time"><firstterm>compile
      time</firstterm></glossterm>, the compiler would convert the
      program's source code into an executable program.  Even though the
      compiler uses the types to produce the executable, no interesting
      computation would occur.  At <glossterm
      linkend="glossary-run_time"><firstterm>run
!     time</firstterm></glossterm>, the resulting executable program
      would actually perform the operations.</para>

      <para>The introduction of templates permits interesting
      computation to occur while the compiler produces the executable.
!     Most interesting is template instantiation, which produces a type
      at compile time.  For example, the &array; <quote>type</quote>
      definition requires template parameters <varname>Dim</varname>,
      <varname>T</varname>, and <varname>EngineTag</varname>, specifying
!     its dimension, the type of its elements, and its &engine; type.  To
      use this, a programmer specifies values for the template
      parameters:
      <statement><type>Array<2,double,Brick></type></statement>
!     specifies a dimension of 2, an element type of &double;, and the
!     &brick; &engine; type.  At compile time, the compiler creates a type
!     definition by substituting the values for the template parameters
!     in the template definition.  The substitution is analogous to the
!     run-time application of a function to specific values.</para>

      <para>All computation not involving run-time input or output can
      occur at program time, compile time, or run time, whichever is
--- 18,107 ----
     templates will occur repeatedly:
     <itemizedlist>
       <listitem>
!       <para>Template programming constructs execute at compile time,
!       not run time.  That is, template operations occur within the
!       compiler, not when a program runs.</para>
       </listitem>
       <listitem>
!       <para id="template_programming-introduction-main_uses-type_polymorphism">Templates permit declaring families of classes using a
!       single declaration.  For example, the &array;
!       <indexterm>
!        <primary>&array;</primary>
!       </indexterm>
!       <indexterm zone="template_programming-introduction-main_uses-type_polymorphism">
!        <primary>type polymorphism</primary>
!       </indexterm>
!       template
!       declaration permits using &array;s with many different value
        types, e.g., arrays of integers, arrays of floating point
        numbers, and arrays of arrays.</para>
      </listitem>
     </itemizedlist>
!    For those interested in the implementation of &pooma;, we close the
!    section with a discussion of some template programming concepts
!    used in the implementation but not likely to be used by &pooma;
!    users.</para>

     <section id="template_programming-compile_time">
!     <title>Templates Execute at Compile-Time</title>
! 
!     <indexterm zone="template_programming-compile_time">
!      <primary>compile time</primary>
!     </indexterm>
!     <indexterm zone="template_programming-compile_time">
!      <primary>compiler</primary>
!     </indexterm>

      <para>&pooma; uses &cc; templates to support type polymorphism
      without incurring any run-time cost as a program executes.  All
      template operations are performed at compile time by the
      compiler.</para>

!     <para>Prior to the introduction of templates, almost all of a
      program's interesting computation occurred when it was executed.
      When writing the program, the programmer, at <glossterm
      linkend="glossary-programming_time"><firstterm>programming
!     time</firstterm></glossterm>,
!     <indexterm>
!      <primary>programming time</primary>
!     </indexterm>
!     would specify which statements and expressions will occur and
!     which types to use.  At <glossterm
      linkend="glossary-compile_time"><firstterm>compile
      time</firstterm></glossterm>, the compiler would convert the
      program's source code into an executable program.  Even though the
      compiler uses the types to produce the executable, no interesting
      computation would occur.  At <glossterm
      linkend="glossary-run_time"><firstterm>run
!     time</firstterm></glossterm>,
!     <indexterm>
!      <primary>run time</primary>
!     </indexterm>
!     the resulting executable program
      would actually perform the operations.</para>

      <para>The introduction of templates permits interesting
      computation to occur while the compiler produces the executable.
!     Most interesting is template instantiation,
!     <indexterm>
!      <primary>template</primary>
!      <secondary>instantiation</secondary>
!     </indexterm>
!     which produces a type
      at compile time.  For example, the &array; <quote>type</quote>
      definition requires template parameters <varname>Dim</varname>,
      <varname>T</varname>, and <varname>EngineTag</varname>, specifying
!     its dimension, the type of its values, and its &engine; type.  To
      use this, a programmer specifies values for the template
      parameters:
      <statement><type>Array<2,double,Brick></type></statement>
!     specifies a dimension of 2, a value type of &double;, and the
!     &brick; &engine; type.  At compile time, the compiler creates a
!     type definition by substituting the values for the template
!     parameters in the templatized type definition.  The substitution
!     is analogous to the run-time application of a function to specific
!     values.</para>

      <para>All computation not involving run-time input or output can
      occur at program time, compile time, or run time, whichever is
***************
*** 71,83 ****
      computations by hand rather than writing code to compute it.  &cc;
      templates are Turing-complete so they can compute anything
      computable.  Unfortunately, syntax for compile-time computation is
!     more difficult than for run-time computation, and also current
      compilers are not as efficient as code executed by hardware.
!     Run-time &cc; constructs are Turing-complete so using templates is
      unnecessary.  Thus, we can shift computation to the time which
      best trades off the ease of expressing syntax with the speed of
      computation by programmer, compiler, or computer chip.  For
!     example, &pooma; uses expression template technology to speed
      run-time execution of data-parallel statements.  The &pooma;
      developers decided to shift some of the computation from run-time
      to compile-time using template computations.  The resulting
--- 109,129 ----
      computations by hand rather than writing code to compute it.  &cc;
      templates are Turing-complete so they can compute anything
      computable.  Unfortunately, syntax for compile-time computation is
!     more difficult than for run-time computation.  Also current
      compilers are not as efficient as code executed by hardware.
!     Run-time &cc; constructs are Turing-complete
!     <indexterm>
!      <primary>Turing complete</primary>
!     </indexterm>
!     so using templates is
      unnecessary.  Thus, we can shift computation to the time which
      best trades off the ease of expressing syntax with the speed of
      computation by programmer, compiler, or computer chip.  For
!     example, &pooma; uses expression template technology
!     <indexterm>
!      <primary>expression templates</primary>
!     </indexterm>
!     to speed
      run-time execution of data-parallel statements.  The &pooma;
      developers decided to shift some of the computation from run-time
      to compile-time using template computations.  The resulting
***************
*** 100,111 ****
         parameters, both of which are used in this book.</para>
        </listitem>
        <listitem>
!        <para>template instantiation, i.e., specifying a particular
!        type by specifying values for template parameters.</para>
        </listitem>
        <listitem>
!        <para>nested type names, which are types specified within a
!        class definition.</para>
        </listitem>
       </itemizedlist>
      We discuss each of these below.</para>
--- 146,170 ----
         parameters, both of which are used in this book.</para>
        </listitem>
        <listitem>
!        <para>template instantiation,
!        <indexterm>
! 	<primary>template</primary>
! 	<secondary>instantiation</secondary>
!        </indexterm>
!        i.e., specifying a particular type by specifying values for
!        template parameters.</para>
        </listitem>
        <listitem>
!        <para>nested type names,
!        <indexterm>
! 	<primary>nested type</primary>
! 	<see>type, nested.</see>
!        </indexterm>
!        <indexterm>
! 	<primary>type</primary>
! 	<secondary>nested</secondary>
!        </indexterm>
!        which are types specified within a class definition.</para>
        </listitem>
       </itemizedlist>
      We discuss each of these below.</para>
***************
*** 174,179 ****
--- 233,242 ----
      brackets (<statement><></statement>).  For example,
      <type>pair<int></type> <glossterm
      linkend="glossary-template_instantiation"><firstterm>instantiates</firstterm></glossterm>
+     <indexterm>
+      <primary>template</primary>
+      <secondary>instantiation</secondary>
+     </indexterm>
      the <classname>pair</classname> template class definition with
      <varname>T</varname> equal to ∫.  That is, the compiler
      creates a definition for <type>pair<int></type> by copying
***************
*** 184,193 ****
      The result is a definition exactly the same as
      <classname>pairOfInts</classname>.</para>

!      <table frame="none" colsep="0" rowsep="0" tocentry="1"
! 	    orient="port" pgwide="0"
! 	    id="template_programming-template_use-correspondence_table">
!       <title>Correspondences Between Run-Time and Compile-Time
       Programming Constructs</title>

        <tgroup cols="3" align="left">
--- 247,286 ----
      The result is a definition exactly the same as
      <classname>pairOfInts</classname>.</para>

!     <para>As we mentioned above, template instantiation
!     <indexterm>
!      <primary>template</primary>
!      <secondary>instantiation</secondary>
!     </indexterm>
!     is analogous to function application.
!     <indexterm>
!      <primary>function</primary>
!      <secondary>application</secondary>
!     </indexterm>
!     A template class is analogous to a
!     function.  The analogy between compile-time and run-time
!     programming constructs can be extended.  <xref
!     linkend="template_programming-template_use-correspondence_table"></xref>
!     lists these correspondences.  For example, at run time, values
!     consist of things such as integers, floating point numbers,
!     pointers, functions, and objects.  Programs compute by operating
!     on these values.  The compile-time values
!     <indexterm>
!      <primary>compile time</primary>
!      <secondary>value</secondary>
!     </indexterm>
!     include types, and
!     compile-time operations use these types.  For both run-time and
!     compile-time programming, &cc; defines default sets of values that
!     all conforming compilers must support.  For example,
!     <statement>3</statement> and <statement>6.022e+23</statement> are
!     run-time values that any &cc; compiler must accept.  It must also
!     accept the ∫, &bool;, and <type>int*</type> types.</para>
! 
!     <table frame="none" colsep="0" rowsep="0" tocentry="1"
! 	   orient="port" pgwide="0"
! 	   id="template_programming-template_use-correspondence_table">
!      <title>Correspondences Between Run-Time and Compile-Time
       Programming Constructs</title>

        <tgroup cols="3" align="left">
***************
*** 198,204 ****
  	 <entry>compile time</entry>
  	</row>
         </thead>
!        <tbody>
  	<row>
  	 <entry>values</entry>
  	 <entry>integers, strings, objects, functions, …</entry>
--- 291,297 ----
  	 <entry>compile time</entry>
  	</row>
         </thead>
!        <tbody valign="top">
  	<row>
  	 <entry>values</entry>
  	 <entry>integers, strings, objects, functions, …</entry>
***************
*** 222,236 ****
  	</row>
  	<row>
  	 <entry>packaging repeated operations</entry>
! 	 <entry>A function generalizes a particular operation applied to
! 	different values.  The function parameters are placeholders
! 	for particular values.</entry>
! 	 <entry>A template class generalizes a particular class
! 	definition using different types.  The template parameters are
! 	placeholders for particular values.</entry>
  	</row>
  	<row>
! 	 <entry>application</entry>
  	 <entry>Use a function by appending function arguments
  	surrounded by parentheses.</entry>
  	 <entry>Use a template class by appending template arguments
--- 315,342 ----
  	</row>
  	<row>
  	 <entry>packaging repeated operations</entry>
! 	 <entry>A function
!          <indexterm>
!           <primary>function</primary>
!          </indexterm>
!          generalizes a particular operation applied to different
!          values.  The function parameters are placeholders for
!          particular values.</entry>
!          <entry>A template class generalizes a particular class
!          definition using different types.  The template parameters
!          are placeholders for particular values.</entry>
  	</row>
  	<row>
! 	 <entry>application
!          <indexterm>
!           <primary>function</primary>
!           <secondary>application</secondary>
!          </indexterm>
!          <indexterm>
!           <primary>application</primary>
!           <see>function, application.</see>
!          </indexterm>
!          </entry>
  	 <entry>Use a function by appending function arguments
  	surrounded by parentheses.</entry>
  	 <entry>Use a template class by appending template arguments
***************
*** 239,262 ****
         </tbody>
        </tgroup>
       </table>
- 
-     <para>As we mentioned above, template instantiation is analogous
-     to function application.  A template class is analogous to a
-     function.  The analogy between compile-time and run-time
-     programming constructs can be extended.  <xref
-     linkend="template_programming-template_use-correspondence_table"></xref>
-     lists these correspondences.  For example, at run time, values
-     consist of things such as integers, floating point numbers,
-     pointers, functions, and objects.  Programs compute by operating
-     on these values.  The compile-time values include types, and
-     compile-time operations use these types.  For both run-time and
-     compile-time programming, &cc; defines default sets of values that
-     all conforming compilers must support.  For example,
-     <statement>3</statement> and <statement>6.022e+23</statement> are
-     run-time values that any &cc; compiler must accept.  It must also
-     accept the ∫, &bool;, and <type>int*</type> types.</para>

!     <para>The set of supported run-time and compile-time values can be
      extended.  Run-time values can be extended by creating new
      objects.  Although not part of the default set of values, these
      objects are treated and operated on as values.  To extend the set
--- 345,359 ----
         </tbody>
        </tgroup>
       </table>

!     <para id="template_programming-template_use-extensions">
!     <indexterm zone="template_programming-template_use-extensions">
!      <primary>object</primary>
!     </indexterm>
!     <indexterm zone="template_programming-template_use-extensions">
!      <primary>class definition</primary>
!     </indexterm>
!     The set of supported run-time and compile-time values can be
      extended.  Run-time values can be extended by creating new
      objects.  Although not part of the default set of values, these
      objects are treated and operated on as values.  To extend the set
***************
*** 268,282 ****
      built-in types, these types can be used in the same way that any
      other types can be used, e.g., declaring variables.</para>

!     <para>Functions generalize similar run-time operations, while
      template class generalize similar class definitions.  A function
      definition generalizes a repeated run-time operation.  For
      example, consider repeatedly printing the largest of two numbers:
  <programlisting>
! std::cout << (3 > 4 ? 3 : 4) << std::endl;
! std::cout << (4 > -13 ? 4 : -13) << std::endl;
! std::cout << (23 > 4 ? 23 : 4) << std::endl;
! std::cout << (0 > 3 ? 0 : 3) << std::endl;
  </programlisting>  Each statement is exactly the same except for the
  repeated two values.  Thus, we can generalize these statements writing
  a function:
--- 365,383 ----
      built-in types, these types can be used in the same way that any
      other types can be used, e.g., declaring variables.</para>

!     <para id="template_programming-template_use-functions">
!     <indexterm zone="template_programming-template_use-functions">
!      <primary>function</primary>
!     </indexterm>
!     Functions generalize similar run-time operations, while
      template class generalize similar class definitions.  A function
      definition generalizes a repeated run-time operation.  For
      example, consider repeatedly printing the largest of two numbers:
  <programlisting>
! std::cout &openopen; (3 > 4 ? 3 : 4) &openopen; std::endl;
! std::cout &openopen; (4 > -13 ? 4 : -13) &openopen; std::endl;
! std::cout &openopen; (23 > 4 ? 23 : 4) &openopen; std::endl;
! std::cout &openopen; (0 > 3 ? 0 : 3) &openopen; std::endl;
  </programlisting>  Each statement is exactly the same except for the
  repeated two values.  Thus, we can generalize these statements writing
  a function:
*************** void maxOut(int a, int b)
*** 285,294 ****
  { std::cout &openopen; (a > b ? a : b) &openopen; std::endl; }
  </programlisting>  The function's body consists of the statement with
  variables substituted for the two particular values.  Each parameter
! is a placeholder that, when used, holds one particular value among the
! set of possible integral values.  The function must be named to permit
! its use, and declarations for its two parameters follow.  Using the
! function simplifies the code:
  <programlisting>
  maxOut(3, 4);
  maxOut(4, -13);
--- 386,395 ----
  { std::cout &openopen; (a > b ? a : b) &openopen; std::endl; }
  </programlisting>  The function's body consists of the statement with
  variables substituted for the two particular values.  Each parameter
! variable is a placeholder that, when used, holds one particular value
! among the set of possible integral values.  The function must be named
! to permit its use, and declarations for its two parameters follow.
! Using the function simplifies the code:
  <programlisting>
  maxOut(3, 4);
  maxOut(4, -13);
*************** maxOut(0, 3);
*** 298,306 ****
      parentheses surrounding specific values for its parameters, but
      the function's return type is omitted.</para>

!     <para>A template class definition generalizes repeated class
      definitions.  If two class definitions differ only in a few types,
!     template parameters can be substituted.  Each parameter is a
      placeholder that, when used, holds one particular value, i.e.,
      type, among the set of possible values.  The class definition is
      named to permit its use, and declarations for its parameters
--- 399,417 ----
      parentheses surrounding specific values for its parameters, but
      the function's return type is omitted.</para>

!     <para id="template_programming-template_use-template_class">
!     <indexterm zone="template_programming-template_use-template_class">
!      <primary>template</primary>
!      <secondary>definition</secondary>
!     </indexterm>
!     A template class definition generalizes repeated class
      definitions.  If two class definitions differ only in a few types,
!     template parameters
!     <indexterm>
!      <primary>template</primary>
!      <secondary>parameter</secondary>
!     </indexterm>
!     can be substituted.  Each parameter is a
      placeholder that, when used, holds one particular value, i.e.,
      type, among the set of possible values.  The class definition is
      named to permit its use, and declarations for its parameters
*************** maxOut(0, 3);
*** 313,323 ****
      Note the notation for the template class parameters.
      <statement>template <typename T></statement>
      <emphasis>precedes</emphasis> the class definition.  The keyword
!     <keywordname>typename</keywordname> indicates the template
      parameter is a type.  <varname>T</varname> is the template
      parameter's name.  (We could have used any other identifier such
      as <varname>pairElementType</varname> or <varname>foo</varname>.)
!     Note that using <keywordname>class</keywordname> is equivalent to
      using <keywordname>typename</keywordname> so <statement>template
      <class T></statement> is equivalent to <statement>template
      <typename T></statement>.  While declaring a template class
--- 424,442 ----
      Note the notation for the template class parameters.
      <statement>template <typename T></statement>
      <emphasis>precedes</emphasis> the class definition.  The keyword
!     <keywordname>typename</keywordname>
!     <indexterm>
!      <primary><keywordname>typename</keywordname></primary>
!     </indexterm>
!     indicates the template
      parameter is a type.  <varname>T</varname> is the template
      parameter's name.  (We could have used any other identifier such
      as <varname>pairElementType</varname> or <varname>foo</varname>.)
!     Note that using <keywordname>class</keywordname>
!     <indexterm>
!      <primary><keywordname>class</keywordname></primary>
!     </indexterm>
!     is equivalent to
      using <keywordname>typename</keywordname> so <statement>template
      <class T></statement> is equivalent to <statement>template
      <typename T></statement>.  While declaring a template class
*************** maxOut(0, 3);
*** 327,336 ****
      for its parameters.  As we showed above,
      <statement>pair<int></statement> <glossterm
      linkend="glossary-template_instantiation">instantiates</glossterm>
      the template class <classname>pair</classname> with ∫ for its
      type parameter <varname>T</varname>.</para>

!     <para>In template programming, nested type names store
      compile-time data that can be used within template classes.  Since
      compile-time class definitions are analogous to run-time objects
      and the latter stores named values, nested type names are values,
--- 446,468 ----
      for its parameters.  As we showed above,
      <statement>pair<int></statement> <glossterm
      linkend="glossary-template_instantiation">instantiates</glossterm>
+     <indexterm>
+      <primary>template</primary>
+      <secondary>instantiation</secondary>
+     </indexterm>
      the template class <classname>pair</classname> with ∫ for its
      type parameter <varname>T</varname>.</para>

!     <para id="template_programming-template_use-nested_types">
!     <indexterm zone="template_programming-template_use-nested_types">
!      <primary>type</primary>
!      <secondary>nested</secondary>
!     </indexterm>
!     <indexterm>
!      <primary>nested type</primary>
!      <see>type, nested.</see>
!     </indexterm>
!     In template programming, nested type names store
      compile-time data that can be used within template classes.  Since
      compile-time class definitions are analogous to run-time objects
      and the latter stores named values, nested type names are values,
*************** maxOut(0, 3);
*** 338,349 ****
      template class &array; has an nested type name for the type of its
      domain:
  <programlisting>
! 		 typedef typename Engine_t::Domain_t Domain_t;
! </programlisting> This <keywordname>typedef</keywordname>, i.e., type
      definition, defines the type <type>Domain_t</type> as equivalent
      to <type>Engine_t::Domain_t</type>.  The
!     <operator>::</operator> operator selects the
!     <type>Domain_t</type> nested type from inside the
      <type>Engine_t</type> type.  This illustrates how to access
      &array;'s <type>Domain_t</type> when not within &array;'s scope:
      <type>Array<Dim, T, EngineTag>::Domain_t</type>.  The
--- 470,493 ----
      template class &array; has an nested type name for the type of its
      domain:
  <programlisting>
! typedef typename Engine_t::Domain_t Domain_t;
! </programlisting> This <keywordname>typedef</keywordname>,
!     <indexterm>
!      <primary><keywordname>typedef</keywordname></primary>
!      <see>type, definition.</see>
!     </indexterm>
!     <indexterm>
!      <primary>type</primary>
!      <secondary>definition</secondary>
!     </indexterm>
!     i.e., type
      definition, defines the type <type>Domain_t</type> as equivalent
      to <type>Engine_t::Domain_t</type>.  The
!     <operator>::</operator> operator
!     <indexterm>
!      <primary><operator>::</operator> operator</primary>
!     </indexterm>
!     selects the <type>Domain_t</type> nested type from inside the
      <type>Engine_t</type> type.  This illustrates how to access
      &array;'s <type>Domain_t</type> when not within &array;'s scope:
      <type>Array<Dim, T, EngineTag>::Domain_t</type>.  The
*************** maxOut(0, 3);
*** 363,371 ****
      &poomatoolkit;.  In this section, we present template programming
      techniques used to implement &pooma;.  We extend the
      correspondence between compile-time template programming
!     constructs and run-time constructs.  Reading this section is not
!     necessary unless you wish to understand how &pooma; is
!     implemented.</para>

      <para>In the previous section, we used a correspondence between
      run-time and compile-time programming constructs to introduce
--- 507,515 ----
      &poomatoolkit;.  In this section, we present template programming
      techniques used to implement &pooma;.  We extend the
      correspondence between compile-time template programming
!     constructs and run-time constructs started in the previous
!     section.  Reading this section is not necessary unless you wish to
!     understand how &pooma; is implemented.</para>

      <para>In the previous section, we used a correspondence between
      run-time and compile-time programming constructs to introduce
*************** maxOut(0, 3);
*** 390,396 ****
  	<entry>compile time</entry>
         </row>
        </thead>
!       <tbody>
         <row>
  	<entry>values</entry>
  	<entry>integers, strings, objects, functions, …</entry>
--- 534,540 ----
  	<entry>compile time</entry>
         </row>
        </thead>
!       <tbody valign="top">
         <row>
  	<entry>values</entry>
  	<entry>integers, strings, objects, functions, …</entry>
*************** maxOut(0, 3);
*** 414,430 ****
  	<entry>values stored in a collection</entry>
  	<entry>An object stores values.</entry>
  	<entry>A <glossterm linkend="glossary-traits_class">traits
! 	class</glossterm> contains values describing a type.</entry>
         </row>
         <row>
  	<entry>extracting values from collections</entry>
  	<entry>An object's named values are extracted using the
! 	<operator>.</operator> operator</entry>
  	<entry>A class's nested types and classes are extracted using
! 	the <operator>::</operator> operator.</entry>
         </row>
         <row>
! 	<entry>control flow to choose among operations</entry>
  	<entry><keywordname>if</keywordname>, <keywordname>while</keywordname>, <keywordname>goto</keywordname>, …</entry>
  	<entry>template class specializations with pattern matching</entry>
         </row>
--- 558,595 ----
  	<entry>values stored in a collection</entry>
  	<entry>An object stores values.</entry>
  	<entry>A <glossterm linkend="glossary-traits_class">traits
! 	class</glossterm>
!         <indexterm>
!          <primary>traits class</primary>
!         </indexterm>
!         <indexterm>
!          <primary>class</primary>
!          <secondary>traits</secondary>
!          <see>traits class</see>
!         </indexterm>
!         contains values describing a type.</entry>
         </row>
         <row>
  	<entry>extracting values from collections</entry>
  	<entry>An object's named values are extracted using the
! 	<operator>.</operator> operator.
!         <indexterm>
!          <primary><operator>.</operator> operator</primary>
!         </indexterm>
!         </entry>
  	<entry>A class's nested types and classes are extracted using
! 	the <operator>::</operator> operator.
!         <indexterm>
!          <primary><operator>::</operator> operator</primary>
!         </indexterm>
!         </entry>
         </row>
         <row>
! 	<entry>control flow
!         <indexterm>
!          <primary>control flow</primary>
!         </indexterm>
!         to choose among operations</entry>
  	<entry><keywordname>if</keywordname>, <keywordname>while</keywordname>, <keywordname>goto</keywordname>, …</entry>
  	<entry>template class specializations with pattern matching</entry>
         </row>
*************** maxOut(0, 3);
*** 432,444 ****
       </tgroup>
      </table>

!     <para>The only compile-time value described in the previous
!     section was types, but any compile-time constant can also be used.
      Integral literals, <keywordname>const</keywordname> variables, and
      other constructs can be used, but the main use is enumerations.
      An <glossterm
      linkend="glossary-enumeration"><firstterm>enumeration</firstterm></glossterm>
!     enumeration is a distinct integral type with named constants.  For
      example, the &array; declaration declares two separate
      enumerations:
  <programlisting>
--- 597,619 ----
       </tgroup>
      </table>

!     <para>
!     <indexterm class="startofrange"
! 	       id="template_programming-pooma_implementation-index-enumeration">
!      <primary>enumeration</primary>
!     </indexterm>
!     <indexterm class="startofrange"
! 	       id="template_programming-pooma_implementation-index-compile_time_values">
!      <primary>compile time</primary>
!      <secondary>value</secondary>
!     </indexterm>
!     The only compile-time values described in the previous
!     section were types, but any compile-time constant can also be used.
      Integral literals, <keywordname>const</keywordname> variables, and
      other constructs can be used, but the main use is enumerations.
      An <glossterm
      linkend="glossary-enumeration"><firstterm>enumeration</firstterm></glossterm>
!     is a distinct integral type with named constants.  For
      example, the &array; declaration declares two separate
      enumerations:
  <programlisting>
*************** enum { dimensionPlusRank = dimensions + 
*** 480,502 ****
       </listitem>
      </itemizedlist>
      The use of non-integral constant values such as floating-point
!     numbers at compile time is restricted.</para>
! 
!     <para>Other compile-time values include pointers and references to
!     objects and functions and executable code.  For example, a pointer
!     to a function sometimes is passed to a template function to
!     perform a specific task.  Even though executable code cannot be
!     directly represented in a program, it is a compile-time value
!     which the compiler uses.  A simple example is a class that is
!     created by template instantiation, e.g.,
!     <type>pair<int></type>.  Conceptually, the ∫ template
      argument is substituted throughout the <type>pair</type> template
      class to produce a class definition.  Although neither the
      programmer nor the user sees this class definition, it is
      represented inside the compiler, which can use and manipulate the
      code.</para>
! 
!     <para>Through template programming, the compiler's optimizer can
      transform complicated code into much simpler code.  In <xref
      linkend="data_parallel-implementation"></xref>, we describe the
      complicated template code used to implement efficiently
--- 655,723 ----
       </listitem>
      </itemizedlist>
      The use of non-integral constant values such as floating-point
!     numbers at compile time is restricted.
!     <indexterm class="endofrange"
! 	       startref="template_programming-pooma_implementation-index-enumeration">
!     </indexterm>
!     </para>
! 
!     <para>Other compile-time values include pointers
!     <indexterm>
!      <primary>pointer</primary>
!     </indexterm>
!     to objects and
!     functions, references
!     <indexterm>
!      <primary>reference</primary>
!     </indexterm>
!     to objects and functions, and executable
!     code.  For example, a pointer to a function
!     <indexterm>
!      <primary>pointer</primary>
!      <secondary>function</secondary>
!     </indexterm>
!     <indexterm>
!      <primary>function pointer</primary>
!      <see>pointer, function.</see>
!     </indexterm>
!     sometimes is passed to
!     a template function to perform a specific task.  Even though
!     executable code
!     <indexterm>
!      <primary>executable code</primary>
!     </indexterm>
!     cannot be directly represented in a program, it is
!     a compile-time value which the compiler uses.  A simple example is
!     a class that is created by template instantiation,
!     <indexterm>
!      <primary>template</primary>
!      <secondary>instantiation</secondary>
!     </indexterm>
!     e.g., <type>pair<int></type>.  Conceptually, the ∫ template
      argument is substituted throughout the <type>pair</type> template
      class to produce a class definition.  Although neither the
      programmer nor the user sees this class definition, it is
      represented inside the compiler, which can use and manipulate the
      code.</para>
!     <indexterm class="endofrange"
! 	       startref="template_programming-pooma_implementation-index-compile_time_values">
!     </indexterm>
! 
!     <para id="template_programming-pooma_implementation-optimization">
!     Through template programming, the compiler's optimizer
!     <indexterm>
!      <primary>optimizer</primary>
!      <see>compiler, optimizer.</see>
!     </indexterm>
!     <indexterm>
!      <primary>optimization</primary>
!      <see>compiler, optimizer.</see>
!     </indexterm>
!     <indexterm zone="template_programming-pooma_implementation-optimization">
!      <primary>compiler</primary>
!      <secondary>optimizer</secondary>
!     </indexterm>
!     can
      transform complicated code into much simpler code.  In <xref
      linkend="data_parallel-implementation"></xref>, we describe the
      complicated template code used to implement efficiently
*************** struct usuallySimpleClass<false> {
*** 537,544 ****
      compilers that translate &cc; code into &c; code may permit
      inspecting the resulting code.  For example, using the
      <option>&dashdash;keep_gen_c</option> command-line option with the
!     KAI &cc; compiler<!-- FIXME: Reference or link? --> creates a file
!     containing the result of intermediate code.  Unfortunately,
      reading and understanding the code is frequently difficult.
      Perhaps future &cc; compilers will support easy inspection of
      optimized code.</para>
--- 758,775 ----
      compilers that translate &cc; code into &c; code may permit
      inspecting the resulting code.  For example, using the
      <option>&dashdash;keep_gen_c</option> command-line option with the
!     <application class="software">KAI &cc; compiler</application>
!     <!-- FIXME: Reference or link? -->
!     <indexterm>
!      <primary>compiler</primary>
!      <secondary>KAI</secondary>
!     </indexterm>
!     <indexterm>
!      <primary><application class="software">KAI &cc; compiler</application></primary>
!      <see>compiler, KAI.</see>
!     </indexterm>
!     creates a file
!     containing the intermediate code.  Unfortunately,
      reading and understanding the code is frequently difficult.
      Perhaps future &cc; compilers will support easy inspection of
      optimized code.</para>
*************** struct usuallySimpleClass<false> {
*** 550,556 ****
      <operator>></operator> and <operator>==</operator>.  At run
      time, the category of strings can be compared using
      <operator>==</operator> and characters can be extracted using
!     subscripts and the <operator>[]</operator> operator.  Compile-time
      operations are more limited.  Types may be declared and used.  The
      <keywordname>sizeof</keywordname> operator yields the number of
      bytes to represent an object of the specified type.  Enumerations,
--- 781,787 ----
      <operator>></operator> and <operator>==</operator>.  At run
      time, the category of strings can be compared using
      <operator>==</operator> and characters can be extracted using
!     subscripts with the <operator>[]</operator> operator.  Compile-time
      operations are more limited.  Types may be declared and used.  The
      <keywordname>sizeof</keywordname> operator yields the number of
      bytes to represent an object of the specified type.  Enumerations,
*************** struct usuallySimpleClass<false> {
*** 562,582 ****
      used as template arguments.  At compile time, pointers and
      references to objects and functions can be used as template
      arguments, while the category of executable code supports no
!     operations.  (The compiler's optimizer may simplify it,
!     though.)</para>
! 
!     <para>At run time, an object can store multiple values, each
      having its own name.  For example, a <type>pair<int></type>
      object <varname>p</varname> stores two ∫s named
      <methodname>left_</methodname> and
      <methodname>right_</methodname>.  The <operator>.</operator>
!     operator extracts a named member from an object:
      <statement>p.left_</statement>.  At compile time, a class can
      store multiple values, each having its own name.  These are
      sometimes called <glossterm
      linkend="glossary-traits_class"><firstterm>traits
      classes</firstterm></glossterm>.  For example, implementing
!     data-parallel operations requiring storing the a tree of types.
      The <type>ExpressionTraits<BinaryNode<Op, Left,
      Right&closeclose;</type> traits class stores the types of a binary
      node representing the operation of <varname>Op</varname> on left
--- 793,829 ----
      used as template arguments.  At compile time, pointers and
      references to objects and functions can be used as template
      arguments, while the category of executable code supports no
!     operations.  (The compiler's optimizer
!     <indexterm>
!      <primary>compiler</primary>
!      <secondary>optimizer</secondary>
!     </indexterm>
!     may simplify it, though.)</para>
! 
!     <para id="template_programming-pooma_implementation-traits_class">
!     <indexterm zone="template_programming-pooma_implementation-traits_class">
!      <primary>traits class</primary>
!     </indexterm>
!     At run time, an object
!     <indexterm>
!      <primary>object</primary>
!     </indexterm>
!     can store multiple values, each
      having its own name.  For example, a <type>pair<int></type>
      object <varname>p</varname> stores two ∫s named
      <methodname>left_</methodname> and
      <methodname>right_</methodname>.  The <operator>.</operator>
!     operator
!     <indexterm>
!      <primary><operator>.</operator> operator</primary>
!     </indexterm>
!     extracts a named member from an object:
      <statement>p.left_</statement>.  At compile time, a class can
      store multiple values, each having its own name.  These are
      sometimes called <glossterm
      linkend="glossary-traits_class"><firstterm>traits
      classes</firstterm></glossterm>.  For example, implementing
!     data-parallel operations requiring storing a tree of types.
      The <type>ExpressionTraits<BinaryNode<Op, Left,
      Right&closeclose;</type> traits class stores the types of a binary
      node representing the operation of <varname>Op</varname> on left
*************** struct ExpressionTraits<BinaryNode&lt
*** 590,629 ****
    typedef typename CombineExpressionTraits<Left_t, Right_t>::Type_t Type_t;
  };
  </programlisting> consists of a class definition and internal type
! 
!     definitions.  This traits class contains three values, all types,
!     named <type>Left_t</type>, <type>Right_t</type>, and
      <type>Type_t</type>, representing the type of the left child, the
!     right child, and the entire node, respectively.  No enumerations
!     or constant values occur.  See <xref
      linkend="data_parallel-implementation"></xref> for more details
!     regarding the implementation of data-parallel operators.  Many
!     traits classes, such as this one, use internal type definitions to
!     store values.</para>
! 
!     <para>The example also illustrates using the
!     <operator>::</operator> operator to extract a member of a traits
!     class.  The type <type>ExpressionTraits<Left></type>
!     contains an internal type definition of <type>Type_t</type>.
!     Using the <operator>::</operator> operator extracts it:
      <statement>ExpressionTraits<Left>::Type_t</statement>.
      Enumerations and other values can also be extracted.  For example,
      <statement>Array<2, int, Brick>::dimensions</statement>
      yields the dimension of the array's domain.</para>

!     <para>Control flow determines which code is used.  At run time,
      control-flow statements such as <keywordname>if</keywordname>,
      <keywordname>while</keywordname>, and
      <keywordname>goto</keywordname> determine which statements to
      execute.  Template programming uses two mechanisms: template class
      specializations and pattern matching.  These are similar to
!     control flow for functional programming languages.  A <glossterm
      linkend="glossary-traits_class"><firstterm>template class
      specialization</firstterm></glossterm> is a class definition
      specific to one or more template arguments.  For example, the
!     implementation for data-parallel operations uses the templated
!     <type>CreateLeaf</type>.  The default definition works for any
!     template argument <varname>T</varname>:
  <programlisting>
  template<class T>
  struct CreateLeaf
--- 837,891 ----
    typedef typename CombineExpressionTraits<Left_t, Right_t>::Type_t Type_t;
  };
  </programlisting> consists of a class definition and internal type
!     definitions.  This traits class contains three values, all types
!     and named <type>Left_t</type>, <type>Right_t</type>, and
      <type>Type_t</type>, representing the type of the left child, the
!     right child, and the entire node, respectively.  Many traits
!     classes, such as this one, use internal type definitions to store
!     values.  No enumerations or constant values occur in this traits
!     class, but other such classes include them.  See <xref
      linkend="data_parallel-implementation"></xref> for more details
!     regarding the implementation of data-parallel operators.</para>
! 
!     <para id="template_programming-pooma_implementation-double_colon_operator">
!     <indexterm zone="template_programming-pooma_implementation-double_colon_operator">
!      <primary><operator>::</operator> operator</primary>
!     </indexterm>
!     The example also illustrates using the <operator>::</operator>
!     operator to extract a member of a traits class.  The type
!     <type>ExpressionTraits<Left></type> contains an internal
!     type definition of <type>Type_t</type>.  Using the
!     <operator>::</operator> operator extracts it:
      <statement>ExpressionTraits<Left>::Type_t</statement>.
      Enumerations and other values can also be extracted.  For example,
      <statement>Array<2, int, Brick>::dimensions</statement>
      yields the dimension of the array's domain.</para>

!     <para id="template_programming-pooma_implementation-template_specialization">
!     <indexterm zone="template_programming-pooma_implementation-template_specialization">
!      <primary>template</primary>
!      <secondary>specialization</secondary>
!     </indexterm>
!     <indexterm class="startofrange" id="template_programming-pooma_implementation-index-control_flow">
!      <primary>control flow</primary>
!     </indexterm>
!     Control flow determines which code is used.  At run time,
      control-flow statements such as <keywordname>if</keywordname>,
      <keywordname>while</keywordname>, and
      <keywordname>goto</keywordname> determine which statements to
      execute.  Template programming uses two mechanisms: template class
      specializations and pattern matching.  These are similar to
!     control flow in functional programming languages.  A <glossterm
      linkend="glossary-traits_class"><firstterm>template class
      specialization</firstterm></glossterm> is a class definition
      specific to one or more template arguments.  For example, the
!     implementation for data-parallel operations
!     <indexterm>
!      <primary>data-parallel operation</primary>
!     </indexterm>
!     uses the templated <type>CreateLeaf</type>.  The default
!     definition works for any template
!     argument <varname>T</varname>:
  <programlisting>
  template<class T>
  struct CreateLeaf
*************** struct CreateLeaf<Expression<T&clo
*** 644,650 ****
      <type>CreateLeaf</type>'s template argument is an
      <type>Expression</type> type.</para>

!     <para>Pattern matching of template arguments to template
      parameters determines which template code is used.  The code
      associated with the match that is most specific is the one that is
      used.  For example, <type>CreateLeaf<int></type> uses the
--- 906,921 ----
      <type>CreateLeaf</type>'s template argument is an
      <type>Expression</type> type.</para>

!     <para id="template_programming-pooma_implementation-pattern_matching">
!     <indexterm zone="template_programming-pooma_implementation-pattern_matching">
!      <primary>template</primary>
!      <secondary>pattern matching</secondary>
!     </indexterm>
!     <indexterm>
!      <primary>pattern matching</primary>
!      <see>template, pattern matching.</see>
!     </indexterm>
!     Pattern matching of template arguments to template
      parameters determines which template code is used.  The code
      associated with the match that is most specific is the one that is
      used.  For example, <type>CreateLeaf<int></type> uses the
*************** struct CreateLeaf<Expression<T&clo
*** 663,672 ****

      <para>Control flow using template specializations and pattern
      matching is similar to <keywordname>switch</keywordname>
!     statements.  A <keywordname>switch</keywordname> statement has a
      condition and one or more pairs of case labels and associated
      code.  The code associated with the the case label whose value
!     matched the condition is executed.  If no case label matches the
      condition, the default code, if present, is used.  In template
      programming, instantiating a template, e.g.,
      <type>CreateLeaf<Expression<int&closeclose;</type> serves as
--- 934,947 ----

      <para>Control flow using template specializations and pattern
      matching is similar to <keywordname>switch</keywordname>
!     statements.
!     <indexterm>
!      <primary><keywordname>switch</keywordname></primary>
!     </indexterm>
!     A <keywordname>switch</keywordname> statement has a
      condition and one or more pairs of case labels and associated
      code.  The code associated with the the case label whose value
!     matches the condition is executed.  If no case label matches the
      condition, the default code, if present, is used.  In template
      programming, instantiating a template, e.g.,
      <type>CreateLeaf<Expression<int&closeclose;</type> serves as
*************** struct CreateLeaf<Expression<T&clo
*** 681,689 ****
      default label since it matches any arguments.  If no set of
      template parameters match (which is impossible for our example) or
      if more than one set are best matches, the code is
!     incorrect.</para>
! 
!     <para>Functions as well as classes may be templated.  All the
      concepts needed to understand function templates have already been
      introduced so we illustrate using an example.  The templated
      function <function>f</function> takes one parameter of any type:
--- 956,975 ----
      default label since it matches any arguments.  If no set of
      template parameters match (which is impossible for our example) or
      if more than one set are best matches, the code is
!     incorrect.
!     <indexterm class="endofrange" startref="template_programming-pooma_implementation-index-control_flow">
!     </indexterm></para>
! 
!     <para>
!     <indexterm class="startofrange" id="template_programming-pooma_implementation-function_template">
!      <primary>template</primary>
!      <secondary>function</secondary>
!     </indexterm>
!     <indexterm>
!      <primary>function template</primary>
!      <see>function, template.</see>
!     </indexterm>
!     Functions as well as classes may be templated.  All the
      concepts needed to understand function templates have already been
      introduced so we illustrate using an example.  The templated
      function <function>f</function> takes one parameter of any type:
*************** void f(const T& t) { … }
*** 697,704 ****
      functions equivalent to <function>f(const int&amp)</function>,
      <function>f(const bool&amp)</function>, <function>f(const
      int*&amp)</function>, ….  Using a templated class
!     definition with a static member function, we can define an
!     equivalent function:
  <programlisting>
  template <typename T>
  class F {
--- 983,1008 ----
      functions equivalent to <function>f(const int&amp)</function>,
      <function>f(const bool&amp)</function>, <function>f(const
      int*&amp)</function>, ….  Using a templated class
!     definition with a static member function,
!     <indexterm>
!      <primary>function</primary>
!      <secondary>static member</secondary>
!     </indexterm>
!     <indexterm>
!      <primary>static member function</primary>
!      <see>function, static member</see>
!     </indexterm>
!     we can define an equivalent function:
!     <indexterm>
!      <primary>function</primary>
!      <secondary>static member</secondary>
!      <tertiary>equivalence with function template</tertiary>
!     </indexterm>
!     <indexterm>
!      <primary>template</primary>
!      <secondary>function</secondary>
!      <tertiary>equivalence with static member function</tertiary>
!     </indexterm>
  <programlisting>
  template <typename T>
  class F {
*************** class F {
*** 706,735 ****
  };
  </programlisting>  Both the templated class and the templated function
      take the same template arguments, but the class uses a static
!     member function so the notation to invoke it is slightly more
!     verbose: <statement>F<T>::f(t)</statement>.  The advantage
!     of a function template is that it can be overloaded, particularly
!     operator functions.  For example, the <operator>+</operator>
!     operator is overloaded to add two &array;s, which require template
!     parameters to specify:
  <programlisting>
  template <int D1,class T1,class E1,int D2,class T2,class E2>
  // complicated return type omitted
  operator+(const Array<D1,T1,E1> & l,const Array<D2,T2,E2> & r);
  </programlisting>  Without using function templates, it would not be
- 
      possible to write expressions such as <statement>a1 +
      a2</statement>.  Member functions can also be templated.  This
      permits, for example, overloading of assignment operators defined
!     within templated classes.</para>
! 
!     <para>Function objects are frequently useful in run-time code.
      They consist of a function plus some additional storage and are
      usually implemented as structures with data members and a function
!     call operator.  Analogous classes can be used at compile time.
!     Using the transformation introduced in the previous paragraph, we
      see that any function can be transformed into a class containing a
!     static member function.  Internal type definitions, enumerations,
      and static constant values can be added to the class.  The static
      member function can use these values during its computation.  The
      <type>CreateLeaf</type> structure, introduced above, illustrates this.
--- 1010,1077 ----
  };
  </programlisting>  Both the templated class and the templated function
      take the same template arguments, but the class uses a static
!     member function.  Thus, the notation to invoke it is slightly more
!     verbose: <statement>F<T>::f(t)</statement>.</para>
! 
!     <para id="template_programming-pooma_implementation-function_template-overloaded">
!     <indexterm zone="template_programming-pooma_implementation-function_template-overloaded">
!      <primary>function</primary>
!      <secondary>overloaded</secondary>
!     </indexterm>
!     <indexterm zone="template_programming-pooma_implementation-function_template-overloaded">
!      <primary>function</primary>
!      <secondary>operator</secondary>
!     </indexterm>
!     The advantage of a function template is that it can be overloaded,
!     particularly operator functions.  For example, the
!     <operator>+</operator> operator is overloaded to add two &array;s,
!     which require template parameters to specify:
  <programlisting>
  template <int D1,class T1,class E1,int D2,class T2,class E2>
  // complicated return type omitted
  operator+(const Array<D1,T1,E1> & l,const Array<D2,T2,E2> & r);
  </programlisting>  Without using function templates, it would not be
      possible to write expressions such as <statement>a1 +
      a2</statement>.  Member functions can also be templated.  This
      permits, for example, overloading of assignment operators defined
!     within templated classes.
!     <indexterm class="endofrange" startref="template_programming-pooma_implementation-function_template">
!     </indexterm>
!     </para>
! 
!     <para>Function objects
!     <indexterm>
!      <primary>function</primary>
!      <secondary>object</secondary>
!     </indexterm>
!     are frequently useful in run-time code.
      They consist of a function plus some additional storage and are
      usually implemented as structures with data members and a function
!     call operator.
!     <indexterm>
!      <primary>function</primary>
!      <secondary>call operator</secondary>
!     </indexterm>
!     Analogous classes can be used at compile time.
!     Using the transformation
!     <indexterm>
!      <primary>function</primary>
!      <secondary>static member</secondary>
!      <tertiary>equivalence with function template</tertiary>
!     </indexterm>
!     <indexterm>
!      <primary>template</primary>
!      <secondary>function</secondary>
!      <tertiary>equivalence with static member function</tertiary>
!     </indexterm>
!     introduced in the previous paragraph, we
      see that any function can be transformed into a class containing a
!     static member function.
!     <indexterm>
!      <primary>function</primary>
!      <secondary>static member</secondary>
!     </indexterm>
!     Internal type definitions, enumerations,
      and static constant values can be added to the class.  The static
      member function can use these values during its computation.  The
      <type>CreateLeaf</type> structure, introduced above, illustrates this.
Index: tutorial.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v
retrieving revision 1.6
diff -c -p -r1.6 tutorial.xml
*** tutorial.xml	2002/01/22 15:48:49	1.6
--- tutorial.xml	2002/01/24 04:56:40
***************
*** 1,17 ****
  <chapter id="tutorial">
   <title>A Tutorial Introduction</title>

-  <para>UPDATE: In the following paragraph, fix the cross-reference
-  to the actual section.</para>
- 
   <para>&pooma; provides different containers and processor
   configurations and supports different implementation styles, as
!  described in <xref linkend="introduction"></xref>.  In this
!  chapter, we present several different implementations of the
!  &doof2d; two-dimensional diffusion simulation program:
    <itemizedlist spacing="compact">
     <listitem>
!     <para>a C-style implementation omitting any use of &pooma;
      computing each array element individually,</para>
     </listitem>
     <listitem>
--- 1,15 ----
+ <!-- FIXME: Index this file. -->
  <chapter id="tutorial">
   <title>A Tutorial Introduction</title>

   <para>&pooma; provides different containers and processor
   configurations and supports different implementation styles, as
!  described in <xref linkend="introduction-goals"></xref>.  In this
!  chapter, we present several different implementations of the &doof2d;
!  two-dimensional diffusion simulation program:
    <itemizedlist spacing="compact">
     <listitem>
!     <para>a C-style implementation omitting any use of &pooma; and
      computing each array element individually,</para>
     </listitem>
     <listitem>
***************
*** 40,52 ****
     </listitem>
    </itemizedlist>
   </para>
!  <para>These illustrate the &array;, &field;, &engine;, layout,
!  mesh, and domain data types.  They also illustrate various
!  immediate computation styles (element-wise accesses, data-parallel
!  expressions, and stencil computation) and various processor
!  configurations (one sequential processor and multiple
!  processors).</para>

   <figure float="1" id="tutorial-doof2d_averagings">
    <title>&doof2d; Averagings</title>
    <mediaobject>
--- 38,68 ----
     </listitem>
    </itemizedlist>
   </para>
!  <para>These illustrate the &array;, &field;, &engine;, layout, mesh,
!  and &domain; data types.  They also illustrate various immediate
!  computation styles (element-wise accesses, data-parallel expressions,
!  and stencil computation) and various processor configurations (one
!  processor and multiple processors).</para>
! 
!  <para>The &doof2d; diffusion program starts with a two-dimensional
!  grid of values.  To model an initial density, all grid values are
!  zero except for one nonzero value in the center.  Each averaging,
!  each grid element, except the outermost ones, updates its value by
!  averaging its value and its eight neighbors.  To avoid overwriting
!  grid values before all their uses occur, we use two arrays, reading
!  the first and writing the second and then reversing their roles
!  within each iteration.</para>
! 
!  <para>We illustrate the averagings in <xref
!  linkend="tutorial-doof2d_averagings"></xref>.  Initially, only the
!  center element has nonzero value.  To form the first averaging, each
!  element's new value equals the average of its and its neighbors'
!  previous values.  Thus, the initial nonzero value spreads to a
!  three-by-three grid.  The averaging continues, spreading to a
!  five-by-five grid of nonzero values.  Values in the outermost grid cells
!  are always zero.</para>

+ <!-- FIXME: Fix the layout, somehow. -->
   <figure float="1" id="tutorial-doof2d_averagings">
    <title>&doof2d; Averagings</title>
    <mediaobject>
***************
*** 75,177 ****
    </mediaobject>
   </figure>

!  <para>The &doof2d; diffusion program starts with a two-dimensional
!  grid of values.  To model an initial density, all grid values are
!  zero except for one nonzero value in the center.  Each averaging,
!  each grid element, except the outermost ones, updates its value by
!  averaging its value and its eight neighbors.  To avoid overwriting
!  grid values before all their uses occur, we use two arrays, reading
!  the first and writing the second and then reversing their roles
!  within each iteration.</para>
! 
!  <para>Figure <xref linkend="tutorial-doof2d_averagings"></xref>
!  illustrates the averagings.  Initially, only the center element has
!  nonzero value.  To form the first averaging, each element's new
!  value equals the average of its and its neighbors' previous values.
!  Thus, the initial nonzero value spreads to a three-by-three grid.
!  The averaging continues, spreading to a five-by-five grid of
!  nonzero values.  Values in outermost grid cells are always
!  zero.</para>
! 
!  <para>Before presenting various implementations of %doof2d;, we
   explain how to install the &poomatoolkit;.</para>

   <para>REMOVE: &doof2d; algorithm and code is illustrated in
   Section 4.1 of
   <filename>pooma-publications/pooma.ps</filename>.  It includes a
   figure illustrating parallel communication of data.</para>

   <section id="tutorial-installation">
    <title>Installing &pooma;</title>

    <para>ADD: How does one install &pooma; using Windows or Mac?</para>

    <para>UPDATE: Make a more recent &pooma; source code file
!   available on &poomaDownloadPage;.  For example,
    <quote>LINUXgcc.conf</quote> is not available.</para>

    <para>In this section, we describe how to obtain, build, and
    install the &poomatoolkit;.  We focus on installing under the
!   Unix operating system.  Instructions for installing on computers
    running Microsoft Windows or MacOS, as well as more extensive
    instructions for Unix, appear in <xref
!   linkend="installation"></xref>.</para>

    <para>Obtain the &pooma; source code <filename
!   path="http://www.codesourcery.com/pooma/downloads_folder/">&poomaSourceFile;</filename>
!   from the &pooma; download page (&poomaDownloadPage;) available off
!   the &pooma; home page (&poomaHomePage;).  The <quote>tgz</quote>
    indicates this is a compressed tar archive file.  To extract the
!   source files, use <command>tar xzvf &poomaSourceFile;</command>.
    Move into the source code directory <filename
!   class="directory">&poomaSource;</filename> directory; e.g.,
!   <command>cd &poomaSource;</command>.</para>

!   <para>Configuring the source code prepares the necessary paths for
!   compilation.  First, determine a configuration file in
!   corresponding to your operating system and compiler in the
!   <filename class="directory">config/arch/</filename> directory.
!   For example, <filename
    class="libraryfile">LINUXgcc.conf</filename> supports compiling
!   under a &linux; operating system with &gcc; and <filename
    class="libraryfile">SGI64KCC.conf</filename> supports compiling
!   under a 64-bit <application>SGI</application> Unix operating
!   system with &kcc;.  Then, configure the source code:
!   <command>./configure &dashdash;arch LINUXgcc &dashdash;opt &dashdash;suite
    LINUXgcc-opt</command>.  The architecture argument to the
!   <command>&dashdash;arch</command> option is the name of the corresponding
!   configuration file, omitting its <filename
    class="libraryfile">.conf</filename> suffix.  The
    <command>&dashdash;opt</command> indicates the &poomatoolkit; will
!   contain optimized source code, which makes the code run more
!   quickly but may impede debugging.  Alternatively, the
!   <command>&dashdash;debug</command> option supports debugging.  The
!   <glossterm linkend="glossary-suite_name">suite name</glossterm>
    can be any arbitrary string.  We chose
!   <command>LINUXgcc-opt</command> to remind us of the architecture
!   and optimization choice.  <filename
    class="libraryfile">configure</filename> creates subdirectories
!   named by the suite name <quote>LINUXgcc-opt</quote> for use when
!   compiling the source files.  Comments at the beginning of
!   <filename
    class="libraryfile">lib/<replaceable>suiteName</replaceable>/PoomaConfiguration.h</filename>
    record the configuration arguments.</para>

!   <para>To compile the source code, set the
!   <envar>POOMASUITE</envar> environment variable to the suite name
!   and then type <command>make</command>.  To set the environment
!   variable for the <application>bash</application> shell use
!   <command>export
    POOMASUITE=<replaceable>suiteName</replaceable></command>,
!   substituting the suite name's
!   <replaceable>suiteName</replaceable>.  For the
!   <application>csh</application> shell, use <command>setenv
    POOMASUITE LINUXgcc-opt</command>.  Issuing the
    <command>make</command> command compiles the &pooma; source code
    files to create the &pooma; library.  The &pooma; makefiles assume
!   the <trademark>GNU</trademark> &make; so substitute the proper
!   command if necessary.  The &pooma; library can be found in, e.g.,
!   <filename
    class="libraryfile">lib/LINUXgcc-opt/libpooma-gcc.a</filename>.</para>
   </section>

--- 91,183 ----
    </mediaobject>
   </figure>

!  <para>Before presenting the various implementations of &doof2d;, we
   explain how to install the &poomatoolkit;.</para>

+ <![%unfinished;[
   <para>REMOVE: &doof2d; algorithm and code is illustrated in
   Section 4.1 of
   <filename>pooma-publications/pooma.ps</filename>.  It includes a
   figure illustrating parallel communication of data.</para>
+ ]]>  <!-- end unfinished -->

+ 
   <section id="tutorial-installation">
    <title>Installing &pooma;</title>

+ <![%unfinished;[
    <para>ADD: How does one install &pooma; using Windows or Mac?</para>

    <para>UPDATE: Make a more recent &pooma; source code file
!   available on &poomadownloadpage;.  For example,
    <quote>LINUXgcc.conf</quote> is not available.</para>
+ ]]>  <!-- end unfinished -->

    <para>In this section, we describe how to obtain, build, and
    install the &poomatoolkit;.  We focus on installing under the
!   Unix operating system.
! <![%unfinished;[
!   Instructions for installing on computers
    running Microsoft Windows or MacOS, as well as more extensive
    instructions for Unix, appear in <xref
!   linkend="installation"></xref>.
! ]]>  <!-- end unfinished -->
!   </para>

    <para>Obtain the &pooma; source code <filename
!   path="http://www.codesourcery.com/pooma/downloads_folder/">&poomasourcefile;</filename>
!   from the &pooma; download page (&poomadownloadpage;) available off
!   the &pooma; home page (&poomahomepage;).  The <quote>tgz</quote>
    indicates this is a compressed tar archive file.  To extract the
!   source files, use <command>tar xzvf &poomasourcefile;</command>.
    Move into the source code directory <filename
!   class="directory">&poomasource;</filename> directory; e.g.,
!   <command>cd &poomasource;</command>.</para>

!   <para>Configuring the source code determines file names needed for
!   compilation.  First, determine a configuration file in the <filename
!   class="directory">config/arch/</filename> directory corresponding to
!   your operating system and compiler.  For example, <filename
    class="libraryfile">LINUXgcc.conf</filename> supports compiling
!   under a &linux; operating system with &gcc;, while <filename
    class="libraryfile">SGI64KCC.conf</filename> supports compiling
!   under a 64-bit <application>SGI</application> Unix operating system
! <!-- FIXME: Center the following command. -->
!   with &kcc;.  Next, configure the source code: <command>./configure
!   &dashdash;arch LINUXgcc &dashdash;opt &dashdash;suite
    LINUXgcc-opt</command>.  The architecture argument to the
!   <command>&dashdash;arch</command> option is the name of the
!   corresponding configuration file, omitting its <filename
    class="libraryfile">.conf</filename> suffix.  The
    <command>&dashdash;opt</command> indicates the &poomatoolkit; will
!   contain optimized source code, which makes the code run more quickly
!   but may impede debugging.  Alternatively, use the
!   <command>&dashdash;debug</command> option which supports debugging.
!   The <glossterm linkend="glossary-suite_name">suite name</glossterm>
    can be any arbitrary string.  We chose
!   <command>LINUXgcc-opt</command> to remind us of the architecture and
!   optimization choice.  <filename
    class="libraryfile">configure</filename> creates subdirectories
!   named <quote>LINUXgcc-opt</quote> for use when compiling the source
!   files.  Comments at the beginning of <filename
    class="libraryfile">lib/<replaceable>suiteName</replaceable>/PoomaConfiguration.h</filename>
    record the configuration arguments.</para>

!   <para>To compile the source code, set the <envar>POOMASUITE</envar>
!   environment variable to the suite name and then type
!   <command>make</command>.  To set the environment variable for the
! <!-- FIXME: Center the following command. -->
!   <application>bash</application> shell use <command>export
    POOMASUITE=<replaceable>suiteName</replaceable></command>,
!   substituting the suite name's <replaceable>suiteName</replaceable>.
! <!-- FIXME: Center the following command. -->
!   For the <application>csh</application> shell, use <command>setenv
    POOMASUITE LINUXgcc-opt</command>.  Issuing the
    <command>make</command> command compiles the &pooma; source code
    files to create the &pooma; library.  The &pooma; makefiles assume
!   the <trademark>GNU</trademark> &make; is available so substitute the
!   proper command to run <trademark>GNU</trademark> &make; if
!   necessary.  The &pooma; library can be found in, e.g., <filename
    class="libraryfile">lib/LINUXgcc-opt/libpooma-gcc.a</filename>.</para>
   </section>

***************
*** 181,209 ****
    <para>Before implementing &doof2d; using the &poomatoolkit;, we
    present a hand-coded implementation of &doof2d;.  See <xref
    linkend="tutorial-hand_coded-doof2d"></xref>.  After querying the
!   user for the number of averagings, the arrays' memory is
!   allocated.  Since the arrays' size is not known at compile time,
!   the arrays are accesses via pointers to allocated dynamic memory.
!   This memory is deallocated at the program's end to avoid memory
!   leaks.  The arrays are initialized with initial conditions.  For
!   the <varname>b</varname> array, all values except the central ones
!   have nonzero values.  Only the outermost values of the
    <varname>a</varname> array need be initialized to zero, but we
!   instead initialize them all using the loop used by
!   <varname>b</varname>.</para>

!   <para>The simulation's kernel consists of triply nested loops.
!   The outermost loop controls the number of iterations.  The inner
    nested loops iterate through the arrays' elements, excepting the
!   outermost elements; note the loop indices range from 1 to n-2
!   while the array indices range from 0 to n-1.  Each
!   <varname>a</varname> value is assigned the average of its
!   corresponding value in <varname>b</varname> and the latter's
!   neighbors.  Values in the two-dimensional grids are accessed using
!   two sets of brackets, e.g., <statement>a[i][j]</statement>.  After
!   assigning values to <varname>a</varname>, a second averaging reads
!   values in <varname>a</varname>, writing values in
!   <varname>b</varname>.</para>

    <para>After the kernel finishes, the final central value is
    printed.  If the desired number of averagings is even, the value
--- 187,214 ----
    <para>Before implementing &doof2d; using the &poomatoolkit;, we
    present a hand-coded implementation of &doof2d;.  See <xref
    linkend="tutorial-hand_coded-doof2d"></xref>.  After querying the
!   user for the number of averagings, the arrays' memory is allocated.
!   Since the arrays' size is not known at compile time, the arrays are
!   accessed via pointers to allocated dynamic memory.  This memory is
!   deallocated at the program's end to avoid memory leaks.  The arrays
!   are initialized with initial conditions.  For the
!   <varname>b</varname> array, all values except the central ones have
!   nonzero values.  Only the outermost values of the
    <varname>a</varname> array need be initialized to zero, but we
!   instead initialize them all using the same loop
!   initializing <varname>b</varname>.</para>

!   <para>The simulation's kernel consists of triply nested loops.  The
!   outermost loop controls the number of iterations.  The two inner
    nested loops iterate through the arrays' elements, excepting the
!   outermost elements; note the loop indices range from 1 to n-2 while
!   the array indices range from 0 to n-1.  Each <varname>a</varname>
!   value is assigned the average of its corresponding value in
!   <varname>b</varname> and the latter's neighbors.  Values in the
!   two-dimensional grids are accessed using two sets of brackets, e.g.,
!   <statement>a[i][j]</statement>.  After assigning values to
!   <varname>a</varname>, a second averaging reads values in
!   <varname>a</varname>, writing values in <varname>b</varname>.</para>

    <para>After the kernel finishes, the final central value is
    printed.  If the desired number of averagings is even, the value
***************
*** 241,248 ****
       <varname>a</varname> array.</para>
      </callout>
      <callout arearefs="tutorial-hand_coded-doof2d-constants">
!      <para>These constants indicate the number of iterations, and
!      the average weighting.</para>
      </callout>
      <callout arearefs="tutorial-hand_coded-doof2d-first_write">
       <para>Each <varname>a</varname> value, except an outermost one,
--- 246,252 ----
       <varname>a</varname> array.</para>
      </callout>
      <callout arearefs="tutorial-hand_coded-doof2d-constants">
!      <para>This constants indicates the average's weighting.</para>
      </callout>
      <callout arearefs="tutorial-hand_coded-doof2d-first_write">
       <para>Each <varname>a</varname> value, except an outermost one,
***************
*** 268,289 ****

    <para>To compile the executable, change directories to the &pooma;
    <filename
!   class="directory">&poomaExampleDirectory;/Doof2d</filename>
    directory.  Ensure the <envar>POOMASUITE</envar> environment
    variable specifies the desired suite name
    <replaceable>suiteName</replaceable>, as we did when compiling
!   &pooma; in the previous section <xref
!   linkend="tutorial-installation"></xref>.  Issuing the
!   <command>make Doof2d-C-element</command> command creates the
    executable
    <command><replaceable>suiteName</replaceable>/Doof2d-C-element</command>.</para>

!   <para>When running the executable, specify the desired a
!   nonnegative number of averagings and the nonnegative number of
!   grid cells along any dimension.  The resulting grid has the same
!   number of cells along each dimension.  After the executable
!   finishes, the resulting value of the central element is
!   printed.</para>
   </section>

--- 272,291 ----

    <para>To compile the executable, change directories to the &pooma;
    <filename
!   class="directory">&poomaexampledirectory;/Doof2d</filename>
    directory.  Ensure the <envar>POOMASUITE</envar> environment
    variable specifies the desired suite name
    <replaceable>suiteName</replaceable>, as we did when compiling
!   &pooma; in <xref linkend="tutorial-installation"></xref>.  Issuing
!   the <command>make Doof2d-C-element</command> command creates the
    executable
    <command><replaceable>suiteName</replaceable>/Doof2d-C-element</command>.</para>

!   <para>When running the executable, specify the desired nonnegative
!   number of averagings and the nonnegative number of grid cells along
!   any dimension.  The resulting grid has the same number of cells
!   along each dimension.  After the executable finishes, the resulting
!   value of the central element is printed.</para>
   </section>

***************
*** 314,323 ****
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-domain">
       <para>Before creating an &array;, its domain must be specified.
!      The <varname>N</varname> interval represents the
!      one-dimensional integral set {0, 1, 2, …, n-1}.  An
!      <type>Interval<2></type> object represents the entire
!      two-dimensional index domain.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-array_creation">
       <para>An &array;'s template parameters indicate its dimension,
--- 316,325 ----
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-domain">
       <para>Before creating an &array;, its domain must be specified.
!      The <varname>N</varname> &interval; represents the
!      one-dimensional integral set {0, 1, 2, …, n-1}.  The
!      <type>Interval<2></type> <varname>vertDomain</varname>
!      object represents the entire two-dimensional index domain.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-array_creation">
       <para>An &array;'s template parameters indicate its dimension,
***************
*** 330,349 ****
       domain.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-initialization">
!      <para>The first statement initializes all &array; values to the
!      same scalar value.  This is possible because each &array;
!      <quote>knows</quote> its domain.  The second statement
!      illustrates &array; element access.  Indices, separated by
       commas, are surrounded by parentheses rather than surrounded by
       square brackets (<statement>[]</statement>).</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-first_write">
       <para>&array; element access uses parentheses, rather than
!      square brackets</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-deallocation">
!      <para>Since &array;s are first-class objects, they
!      automatically deallocate any memory they require, eliminating
       memory leaks.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-pooma_finish">
--- 332,349 ----
       domain.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-initialization">
!      <para>The first loop initializes all &array; values to the
!      same scalar value.  The second statement
!      illustrates assigning one &array; value.  Indices, separated by
       commas, are surrounded by parentheses rather than surrounded by
       square brackets (<statement>[]</statement>).</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-first_write">
       <para>&array; element access uses parentheses, rather than
!      square brackets.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-deallocation">
!      <para>The &array;s deallocate any memory they require, eliminating
       memory leaks.</para>
      </callout>
      <callout arearefs="tutorial-array_elementwise-doof2d-pooma_finish">
***************
*** 364,370 ****
    <para>The creation of the <varname>a</varname> and
    <varname>b</varname> &array;s requires an object specifying their
    index domains.  Since these are two-dimensional arrays, their
!   index domains are also two dimensional.  The two-dimensional
    <type>Interval<2></type> object is the Cartesian product of
    two one-dimensional <type>Interval<1></type> objects, each
    specifying the integral set {0, 1, 2, …, n-1}.</para>
--- 364,370 ----
    <para>The creation of the <varname>a</varname> and
    <varname>b</varname> &array;s requires an object specifying their
    index domains.  Since these are two-dimensional arrays, their
!   index domains are also two-dimensional.  The two-dimensional
    <type>Interval<2></type> object is the Cartesian product of
    two one-dimensional <type>Interval<1></type> objects, each
    specifying the integral set {0, 1, 2, …, n-1}.</para>
***************
*** 373,387 ****
    type of its values, and how the values are stored.  Both
    <varname>a</varname> and <varname>b</varname> are two-dimension
    arrays storing &double;s so their <varname>dimension</varname>
!   is 2 and its element type is &double;.  An &engine; stores an
!   &array;'s values.  For example, a &brick; &engine; explicitly
!   stores all values.  A &compressiblebrick; &engine; also explicitly
!   stores values if more than value is present, but, if all values
!   are the same, storage for just that value is required.  Since an
!   engine can store its values any way it desires, it might instead
!   compute its values using a function or compute the values stored
!   in separate engines.  In practice, most explicitly specified
!   &engine;s are either &brick; or &compressiblebrick;.</para>

    <para>&array;s support both element-wise access and scalar
    assignment.  Element-wise access uses parentheses, not square
--- 373,387 ----
    type of its values, and how the values are stored.  Both
    <varname>a</varname> and <varname>b</varname> are two-dimension
    arrays storing &double;s so their <varname>dimension</varname>
!   is 2 and their value type is &double;.  An &engine; stores an
!   &array;'s values.  For example, a &brick; &engine; explicitly stores
!   all values.  A &compressiblebrick; &engine; also explicitly stores
!   values if more than one value is present, but, if all values are the
!   same, storage for just that value is required.  Since an engine can
!   store its values any way it desires, it might instead compute its
!   values using a function or compute using values stored in separate
!   engines.  In practice, most explicitly specified &engine;s are
!   either &brick; or &compressiblebrick;.</para>

    <para>&array;s support both element-wise access and scalar
    assignment.  Element-wise access uses parentheses, not square
***************
*** 389,405 ****
    specifies the central element.  The scalar assignment <statement>b
    = 0.0</statement> assigns the same 0.0 value to all array
    elements.  This is possible because the array knows the extent of
!   its domain.</para>

    <para>Any program using the &poomatoolkit; must initialize the
    &toolkit;'s data structures using
!   <statement>Pooma::initialize(argc,argv)</statement>.  This
!   extracts &pooma;-specific command-line options from the
!   command-line arguments in <varname>argv</varname> and initializes
!   the inter-processor communication and other data structures.  When
!   finished, <statement>Pooma::finalize()</statement> ensures all
!   computation has finished and the communication and other data
!   structures are destructed.</para>
   </section>

--- 389,406 ----
    specifies the central element.  The scalar assignment <statement>b
    = 0.0</statement> assigns the same 0.0 value to all array
    elements.  This is possible because the array knows the extent of
!   its domain.  We illustrate these data-parallel statements in the
!   next section.</para>

    <para>Any program using the &poomatoolkit; must initialize the
    &toolkit;'s data structures using
!   <statement>Pooma::initialize(argc,argv)</statement>.  This extracts
!   &pooma;-specific command-line options from the program's
!   command-line arguments and initializes the interprocessor
!   communication and other data structures.  When finished,
!   <statement>Pooma::finalize()</statement> ensures all computation and
!   communication has finished and the data structures are
!   destructed.</para>
   </section>

***************
*** 408,437 ****

    <para>&pooma; supports data-parallel &array; accesses.  Many
    algorithms are more easily expressed using data-parallel
!   expressions.  Also, the &poomatoolkit; might be able to reorder
!   the data-parallel computations to be more efficient or distribute
!   them among various processors.  In this section, we concentrate
!   the differences between the data-parallel implementation of
!   &doof2d; listed in <xref
!   linkend="tutorial-array_parallel-doof2d"></xref> and the
!   element-wise implementation listed in the previous section <xref
!   linkend="tutorial-array_elementwise"></xref>.</para>

    <example id="tutorial-array_parallel-doof2d">
     <title>Data-Parallel &array; Implementation of &doof2d;</title>
     &doof2d-array-parallel;
     <calloutlist>
      <callout arearefs="tutorial-array_parallel-doof2d-blockAndEvaluate">
       <para>&pooma; may reorder computation of statements.  Calling
       <function>Pooma::blockAndEvaluate</function> ensures all
       computation finishes before accessing a particular array
       element.</para>
      </callout>
-     <callout arearefs="tutorial-array_parallel-doof2d-innerdomain">
-      <para>These variables specify one-dimensional domains {1, 2,
-      …, n-2}.  Their Cartesian product specifies the domain
-      of the array values that are modified.</para>
-     </callout>
      <callout arearefs="tutorial-array_parallel-doof2d-first_write">
       <para>Data-parallel expressions replace nested loops and array
       element accesses.  For example, <statement>a(I,J)</statement>
--- 409,437 ----

    <para>&pooma; supports data-parallel &array; accesses.  Many
    algorithms are more easily expressed using data-parallel
!   expressions.  Also, the &poomatoolkit; can sometimes reorder the
!   data-parallel computations to be more efficient or distribute them
!   among various processors.  In this section, we concentrate on the
!   differences between the data-parallel implementation of &doof2d;
!   listed in <xref linkend="tutorial-array_parallel-doof2d"></xref> and
!   the element-wise implementation listed in the previous
!   section.</para>

    <example id="tutorial-array_parallel-doof2d">
     <title>Data-Parallel &array; Implementation of &doof2d;</title>
     &doof2d-array-parallel;
     <calloutlist>
+     <callout arearefs="tutorial-array_parallel-doof2d-innerdomain">
+      <para>These variables specify one-dimensional domains {1, 2,
+      …, n-2}.  Their Cartesian product specifies the domain
+      of the array values that are modified.</para>
+     </callout>
      <callout arearefs="tutorial-array_parallel-doof2d-blockAndEvaluate">
       <para>&pooma; may reorder computation of statements.  Calling
       <function>Pooma::blockAndEvaluate</function> ensures all
       computation finishes before accessing a particular array
       element.</para>
      </callout>
      <callout arearefs="tutorial-array_parallel-doof2d-first_write">
       <para>Data-parallel expressions replace nested loops and array
       element accesses.  For example, <statement>a(I,J)</statement>
***************
*** 443,462 ****
     </calloutlist>
    </example>

!   <para>Data-parallel expressions apply domain objects to containers
!   to indicate a set of parallel expressions.  For example, in the
!   program listed above, <statement>a(I,J)</statement> specifies all
!   of <varname>a</varname> array excepting the outermost elements.
!   The array's <varname>vertDomain</varname> domain consists of the
!   Cartesian product of {0, 1, 2, …, n-1} and itself, while
    <varname>I</varname> and <varname>J</varname> each specify {1, 2,
    …, n-2}.  Thus, <statement>a(I,J)</statement> is the subset
!   with a domain of the Cartesian product of {1, 2, …, n-2}
!   and itself.  It is called a <firstterm>view</firstterm> of an
!   array.  It is itself an array, with a domain and supporting
!   element access, but its storage is the same as
!   <varname>a</varname>'s.  Changing a value in
!   <statement>a(I,J)</statement> also changes the same value in
    <varname>a</varname>.  Changing a value in the latter also changes
    the former if the value is not one of <varname>a</varname>'s
    outermost elements.  The expression
--- 443,461 ----
     </calloutlist>
    </example>

!   <para>Data-parallel expressions use containers and domain objects to
!   indicate a set of parallel expressions.  For example, in the program
!   listed above, <statement>a(I,J)</statement> specifies the subset of
!   <varname>a</varname> array omitting the outermost elements.  The
!   array's <varname>vertDomain</varname> domain consists of the
!   Cartesian product of {0, 1, 2, …, n-1} with itself, while
    <varname>I</varname> and <varname>J</varname> each specify {1, 2,
    …, n-2}.  Thus, <statement>a(I,J)</statement> is the subset
!   with a domain of the Cartesian product of {1, 2, …, n-2} with
!   itself.  It is called a <firstterm>view</firstterm> of an array.  It
!   is itself an &array;, with a domain and supporting element access, but
!   its storage is the same as <varname>a</varname>'s.  Changing a value
!   in <statement>a(I,J)</statement> also changes the same value in
    <varname>a</varname>.  Changing a value in the latter also changes
    the former if the value is not one of <varname>a</varname>'s
    outermost elements.  The expression
***************
*** 465,474 ****
    product of {2, 3, …, n-1}, i.e., the same domain as
    <statement>a(I,J)</statement> but shifted up one unit and to the
    right one unit.  Only an &interval;'s value, not its name, is
!   important.  Thus, all uses of <varname>J</varname> in this program
    could be replaced by <varname>I</varname> without changing the
    semantics.</para>

    <figure float="1" id="tutorial-array_parallel-doof2d-adding_arrays">
     <title>Adding &array;s</title>
     <mediaobject>
--- 464,483 ----
    product of {2, 3, …, n-1}, i.e., the same domain as
    <statement>a(I,J)</statement> but shifted up one unit and to the
    right one unit.  Only an &interval;'s value, not its name, is
!   important so all uses of <varname>J</varname> in this program
    could be replaced by <varname>I</varname> without changing the
    semantics.</para>

+   <para>The statement assigning to <statement>a(I,J)</statement>
+   illustrates that &array;s may participate in expressions.  Each
+   addend is a view of an array, which is itself an array.  The views'
+   indices are zero-based so their sum can be formed by adding
+   identically indexed elements of each array.  For example, the lower,
+   left element of the result equals the sum of the lower, left
+   elements of the addend arrays.  <xref
+   linkend="tutorial-array_parallel-doof2d-adding_arrays"></xref>
+   illustrates adding two arrays.</para>
+ 
    <figure float="1" id="tutorial-array_parallel-doof2d-adding_arrays">
     <title>Adding &array;s</title>
     <mediaobject>
***************
*** 476,528 ****
       <imagedata fileref="figures/doof2d.210" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>Adding two arrays with different domains is supported.</phrase>
      </textobject>
      <caption>
!      <para>When adding arrays, values in corresponding positions are
!      added even if they have different indices, indicated by the
!      small numbers adjacent to the arrays.</para>
      </caption>
     </mediaobject>
    </figure>

-   <para>The statement assigning to <statement>a(I,J)</statement>
-   illustrates that &array;s may participate in expressions.  Each
-   addend is a view of an array, which is itself an array.  Each view
-   has the same domain size so their sum can be formed by
-   corresponding elements of each array.  For example, the lower,
-   left element of the result equals the sum of the lower, left
-   elements of the addend arrays.  For the computation, indices are
-   ignored; only the relative positions within each domain are used.
-   <xref
-   linkend="tutorial-array_parallel-doof2d-adding_arrays"></xref>
-   illustrates adding two arrays with different domain indices.  The
-   indices are indicated by the small numbers to the left and the
-   bottom of the arrays.  Even though 9 and 3 have different indices
-   (1,1) and (2,0), they are added to each other because they have
-   the same relative positions within the addends.</para>
- 
    <para>Just before accessing individual &array; values, the code
    contains calls to <function>Pooma::blockAndEvaluate</function>.
    &pooma; may reorder computation or distribute them among various
    processors.  Before reading an individual &array; value, calling
!   the function ensures all computations affecting its value have
!   finished, i.e., it has the correct value.  Calling this function
!   is necessary only when accessing individual array elements because
!   &pooma; cannot determine when to call the function itself. For
!   example, before printing an array, &pooma; will call
!   <function>blockAndEvaluate</function> itself.</para>
   </section>

   <section id="tutorial-array_stencil">
    <title>Stencil &array; Implementation</title>

!   <para>Many computations are local, computing an &array;'s value by
!   using close-by &array; values.  Encapsulating this computation in
!   a stencil can yield faster code because the compiler can determine
!   all accesses come from the same array.  Each stencil consists of a
!   function object and an indication of the stencil's extent.</para>

    <example id="tutorial-array_stencil-doof2d">
     <title>Stencil &array; Implementation of &doof2d;</title>
--- 485,523 ----
       <imagedata fileref="figures/doof2d.210" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>Adding two arrays is supported.</phrase>
      </textobject>
      <caption>
!      <para>When adding arrays, values with the same indices, indicated
!      by the small numbers adjacent to the arrays, are added.</para>
      </caption>
     </mediaobject>
    </figure>

    <para>Just before accessing individual &array; values, the code
    contains calls to <function>Pooma::blockAndEvaluate</function>.
    &pooma; may reorder computation or distribute them among various
    processors.  Before reading an individual &array; value, calling
!   this function ensures all computations affecting its value have
!   finished, i.e., it has the correct value.  Calling this function is
!   necessary only when accessing individual array elements.  For
!   example, before the data-parallel operation of printing an array,
!   &pooma; will call <function>blockAndEvaluate</function>
!   itself.</para>
   </section>

   <section id="tutorial-array_stencil">
    <title>Stencil &array; Implementation</title>

!   <para>Many scientific computations are localized, computing an
!   array's value by using neighboring values.  Encapsulating this local
!   computation in a <glossterm
!   linkend="glossary-stencil"><firstterm>stencil</firstterm></glossterm>
!   can yield faster code because the compiler can determine that all
!   array accesses use the same array.  Each stencil consists of a
!   function object and an indication of which neighbors participate in
!   the function's computation.</para>

    <example id="tutorial-array_stencil-doof2d">
     <title>Stencil &array; Implementation of &doof2d;</title>
***************
*** 546,552 ****
      <callout arearefs="tutorial-array_stencil-doof2d-stencil_extent">
       <para>These two functions indicate the stencil's size.  For
       each dimension, the stencil extends one cell to the left of (or
!      below) its center and also one call to the right (or above) its
       center.</para>
      </callout>
      <callout
--- 541,547 ----
      <callout arearefs="tutorial-array_stencil-doof2d-stencil_extent">
       <para>These two functions indicate the stencil's size.  For
       each dimension, the stencil extends one cell to the left of (or
!      below) its center and also one cell to the right (or above) its
       center.</para>
      </callout>
      <callout
***************
*** 564,584 ****
     </calloutlist>
    </example>

!   <para>Before we describe how to create a stencil, we describe how
!   to apply a stencil to an array, yielding values.  To compute the
!   value associated with index position (1,3), the stencil's center
!   is placed at (1,3).  The stencil's
!   <function>upperExtent</function> and
!   <function>lowerExtent</function> functions indicate which &array;
!   values the stencil's function will use.  See <xref
    linkend="tutorial-array_stencil-doof2d-apply_stencil"></xref>.
!   Applying the stencil's function call
!   <function>operator()</function> yields the computed value.  To
!   compute multiple &array; values, apply a stencil to the array and
!   a domain object: <statement>stencil(b,
!   interiorDomain)</statement>.  This applies the stencil to each
!   position in the domain.  The user must ensure that applying the
!   stencil does not access nonexistent &array; values.</para>

    <figure float="1" id="tutorial-array_stencil-doof2d-apply_stencil">
     <title>Applying a Stencil to an &array;</title>
--- 559,578 ----
     </calloutlist>
    </example>

!   <para>Before we describe how to create a stencil, we describe how to
!   apply a stencil to an array, yielding computed values.  To compute
!   the value associated with index position (1,3), the stencil's center
!   is placed at (1,3).  The stencil's <function>upperExtent</function>
!   and <function>lowerExtent</function> functions indicate which
!   &array; values the stencil's function will use.  See <xref
    linkend="tutorial-array_stencil-doof2d-apply_stencil"></xref>.
!   Applying the stencil's function call <function>operator()</function>
!   yields the computed value.  To compute multiple &array; values,
!   apply a stencil to the array and a domain object:
!   <statement>stencil(b, interiorDomain)</statement>.  This applies the
!   stencil to each position in the domain.  The user must ensure that
!   applying the stencil does not access nonexistent &array;
!   values.</para>

    <figure float="1" id="tutorial-array_stencil-doof2d-apply_stencil">
     <title>Applying a Stencil to an &array;</title>
***************
*** 592,598 ****
      <caption>
       <para>To compute the value associated with index position (1,3)
       of an array, place the stencil's center, indicated with dashed
!      lines, at the position.  The computation involves the array
       values covered by the array and delineated by
       <function>upperExtent</function> and
       <function>lowerExtent</function>.</para>
--- 586,592 ----
      <caption>
       <para>To compute the value associated with index position (1,3)
       of an array, place the stencil's center, indicated with dashed
!      lines, at the position (1,3).  The computation involves the array
       values covered by the array and delineated by
       <function>upperExtent</function> and
       <function>lowerExtent</function>.</para>
***************
*** 607,625 ****
    must define a function call <function>operator()</function> with a
    container parameter and index parameters.  The number of index
    parameters, indicating the stencil's center, must equal the
!   container's dimension.  For example, <type>DoofNinePt</type>
!   defines <methodname>operator()(const C& c, int i, int
!   j)</methodname>.  We templated the container type
!   <varname>C</varname> although this is not strictly necessary.  The
!   two index parameters <varname>i</varname> and <varname>j</varname>
!   ensure the stencil works with two-dimensional containers.  The
!   <methodname>lowerExtent</methodname> indicates how far to the left
!   (or below) the stencil extends beyond its center.  Its parameter
!   indicates a particular dimension.  Index parameters
    <varname>i</varname> and <varname>j</varname> are in dimension 0
    and 1.  <methodname>upperExtent</methodname> serves an
    analogous purpose.  The &poomatoolkit; uses these functions when
!   distribution computation among various processors, but it does not
    use these functions to ensure nonexistent &array; values are not
    accessed.  Caveat stencil user!</para>
   </section>
--- 601,619 ----
    must define a function call <function>operator()</function> with a
    container parameter and index parameters.  The number of index
    parameters, indicating the stencil's center, must equal the
!   container's dimension.  For example, <type>DoofNinePt</type> defines
!   <methodname>operator()(const C& c, int i, int j)</methodname>.  We
!   templated the container type <varname>C</varname> although this is
!   not strictly necessary.  The two index parameters
!   <varname>i</varname> and <varname>j</varname> ensure the stencil
!   works with two-dimensional containers.  The
!   <methodname>lowerExtent</methodname> function indicates how far to
!   the left (or below) the stencil extends beyond its center.  Its
!   parameter indicates a particular dimension.  Index parameters
    <varname>i</varname> and <varname>j</varname> are in dimension 0
    and 1.  <methodname>upperExtent</methodname> serves an
    analogous purpose.  The &poomatoolkit; uses these functions when
!   distributing computation among various processors, but it does not
    use these functions to ensure nonexistent &array; values are not
    accessed.  Caveat stencil user!</para>
   </section>
***************
*** 634,640 ****
    only specify how each container's domain should be split into
    <quote>patches</quote>.  The &poomatoolkit; automatically
    distributes the data among the available processors and handles
!   any required communication among processors.</para>

    <example id="tutorial-array_distributed-doof2d">
     <title>Distributed Stencil &array; Implementation of &doof2d;</title>
--- 628,637 ----
    only specify how each container's domain should be split into
    <quote>patches</quote>.  The &poomatoolkit; automatically
    distributes the data among the available processors and handles
!   any required communication among processors.  <xref
!   linkend="tutorial-array_distributed-doof2d"></xref> illustrates how
!   to write a distributed version of the stencil program (<xref
!   linkend="tutorial-array_stencil-doof2d"></xref>).</para>

    <example id="tutorial-array_distributed-doof2d">
     <title>Distributed Stencil &array; Implementation of &doof2d;</title>
***************
*** 644,655 ****
       <para>Multiple copies of a distributed program may
       simultaneously run, perhaps each having its own input and
       output.  Thus, we use command-line arguments to pass input to
!      the program.  Using an &inform; object ensures only one program
       produces output.</para>
      </callout>
      <callout arearefs="tutorial-array_distributed-doof2d-layout">
       <para>The <type>UniformGridPartition</type> declaration
!      specifies how an array's domain will be partition, of split,
       into patches.  Guard layers are an optimization that can reduce
       data communication between patches.  The
       <type>UniformGridLayout</type> declaration applies the
--- 641,652 ----
       <para>Multiple copies of a distributed program may
       simultaneously run, perhaps each having its own input and
       output.  Thus, we use command-line arguments to pass input to
!      the program.  Using an &inform; object ensures only one copy
       produces output.</para>
      </callout>
      <callout arearefs="tutorial-array_distributed-doof2d-layout">
       <para>The <type>UniformGridPartition</type> declaration
!      specifies how an array's domain will be partitioned, or split,
       into patches.  Guard layers are an optimization that can reduce
       data communication between patches.  The
       <type>UniformGridLayout</type> declaration applies the
***************
*** 657,664 ****
       patches among various processors.</para>
      </callout>
      <callout arearefs="tutorial-array_distributed-doof2d-remote">
!      <para>The <type>MultiPatch</type> &engine; distributes requests
!      for &array; values to the associated patch.  Since a patch may
       associated with a different processor, its
       <quote>remote</quote> &engine; has type
       <type>Remote<Brick></type>.  &pooma; automatically
--- 654,661 ----
       patches among various processors.</para>
      </callout>
      <callout arearefs="tutorial-array_distributed-doof2d-remote">
!      <para>The &multipatch; &engine; distributes requests
!      for &array; values to the associated patches.  Since a patch may
       associated with a different processor, its
       <quote>remote</quote> &engine; has type
       <type>Remote<Brick></type>.  &pooma; automatically
***************
*** 675,690 ****

    <para>Supporting distributed computation requires only minor code
    changes.  These changes specify how each container's domain is
!   distributed among the available processors and how input and
!   output occurs.  The rest of the program, including all the
!   computations, remains the same.  When running, the &pooma;
!   executable interacts with the run-time library to determine which
!   processors are available, distributes the containers' domains, and
!   automatically handles all necessary interprocessor communication.
!   The same executable runs on one or many processors.  Thus, the
!   programmer can write one program, debugging it on a uniprocessor
!   computer and running it on a supercomputer.</para>

    <figure float="1" id="tutorial-array_distributed-doof2d-distributed_model">
     <title>The &pooma; Distributed Computation Model</title>
     <mediaobject>
--- 672,713 ----

    <para>Supporting distributed computation requires only minor code
    changes.  These changes specify how each container's domain is
!   distributed among the available processors and how input and output
!   occurs.  The rest of the program, including all the computations,
!   remains the same.  When running, the &pooma; executable interacts
!   with the run-time library to determine which processors are
!   available, distributes the containers' domains, and automatically
!   handles all necessary interprocessor communication.  The same
!   executable runs on one or many processors.  Thus, the programmer can
!   write one program, debugging it on a uniprocessor computer and run
!   it on a supercomputer.</para>
! 
!   <para>&pooma;'s distributed computing model separates container
!   domain concepts from computer configuration concepts.  See <xref
!   linkend="tutorial-array_distributed-doof2d-distributed_model"></xref>.
!   The statements in the program indicate how each container's domain
!   will be partitioned.  This process is represented in the upper left
!   corner of the figure.  A user-specified
!   <firstterm>partition</firstterm> specifies how to split the domain
!   into pieces.  For example, the illustrated partition splits the
!   domain into three equal-sized pieces along the x-dimension and two
!   equal-sized pieces along the y-dimension.  Applying the partition to
!   the domain creates <firstterm>patches</firstterm>.  The partition
!   also specifies external and internal guard layers.  A
!   <firstterm>guard layer</firstterm> is a domain surrounding a patch.
!   A patch's computation only reads but does not write these guarded
!   values.  An <firstterm>external guard layer</firstterm> conceptually
!   surrounds the entire container domain with boundary values whose
!   presence permits all domain computations to be performed the same
!   way even for computed values along the domain's edge.  An
!   <firstterm>internal guard layer</firstterm> duplicates values from
!   adjacent patches so communication need not occur during a patch's
!   computation.  The use of guard layers is an optimization; using
!   external guard layers eases programming and using internal guard
!   layers reduces communication among processors.  Their use is not
!   required.</para>

+ <!-- FIXME: Fix the "Computer Configuration" text layout so it does not overlap the box. -->
    <figure float="1" id="tutorial-array_distributed-doof2d-distributed_model">
     <title>The &pooma; Distributed Computation Model</title>
     <mediaobject>
***************
*** 695,732 ****
       <phrase>the &pooma; distributed computation model</phrase>
      </textobject>
      <caption>
!      <para>The &pooma; distributed computation model combines
!      partitioning containers' domains and the computer configuration
!      to create a layout.</para>
      </caption>
     </mediaobject>
    </figure>

-   <para>&pooma;'s distributed computing model separates container
-   domain concepts from computer configuration concepts.  See <xref
-   linkend="tutorial-array_distributed-doof2d-distributed_model"></xref>.
-   The program indicates how each container's domain will be
-   partitioned.  This process is represented in the upper left corner
-   of the figure.  A user-specified <firstterm>partition</firstterm>
-   specifies how to split the domain into pieces.  For example, the
-   illustrated partition splits the domain into three equal-sized
-   pieces along the x-dimension and two equal-sized pieces along the
-   y-dimension.  Thus, the domain is split into
-   <firstterm>patches</firstterm>.  The partition also specifies
-   external and internal guard layers.  A <firstterm>guard
-   layer</firstterm> is a domain surrounding a patch.  A patch's
-   computation only reads but does not write these guarded values.
-   An <firstterm>external guard layer</firstterm> conceptually
-   surrounds the entire container domain with boundary values whose
-   presence permits all domain computations to be performed the same
-   way even for values along the domain's edge.  An
-   <firstterm>internal guard layer</firstterm> duplicates values from
-   adjacent patches so communication need not occur during a patch's
-   computation.  The use of guard layers is an optimization; using
-   external guard layers eases programming and using internal guard
-   layers reduces communication among processors.  Their use is not
-   required.</para>
- 
    <para>The computer configuration of shared memory and processors
    is determined by the run-time system.  See the upper right portion
    of <xref
--- 718,730 ----
       <phrase>the &pooma; distributed computation model</phrase>
      </textobject>
      <caption>
!      <para>The &pooma; distributed computation model creates a layout
!      by combining a partitioning of the containers' domains and the
!      computer configuration.</para>
      </caption>
     </mediaobject>
    </figure>

    <para>The computer configuration of shared memory and processors
    is determined by the run-time system.  See the upper right portion
    of <xref
***************
*** 738,765 ****
    supercomputer consisting of desktop computers networked together
    might have as many contexts as computers.  The run-time system,
    e.g., the Message Passing Interface (&mpi;) Communications Library
!   (FIXME: xref linkend="mpi99", <ulink
!   url="http://www-unix.mcs.anl.gov/mpi/"></ulink>) or the &mm;
    Shared Memory Library (<ulink
    url="http://www.engelschall.com/sw/mm/"></ulink>), communicates
    the available contexts to the executable.  &pooma; must be
!   configured for the particular run-time system.  See <xref
    linkend="installation-distributed_computing"></xref>.</para>

    <para>A <firstterm>layout</firstterm> combines patches with contexts
    so the program can be executed.  If &distributedtag; is specified,
    the patches are distributed among the available contexts.  If
!   &replicatedtag; is specified, each set of patches is replicated
!   among each context.  Regardless, the containers' domains are now
    distributed among the contexts so the program can run.  When a patch
    needs data from another patch, the &poomatoolkit; sends messages to
!   the desired patch uses a message-passing library.  All such
!   communication is automatically performed by the &toolkit; with no need
!   for programmer or user input.</para>
! 
!   <para>FIXME: The two previous paragraphs demonstrate confusion
!   between <quote>run-time system</quote> and <quote>message-passing
!   library</quote>.</para>

    <para>Incorporating &pooma;'s distributed computation model into a
    program requires writing very few lines of code.  <xref
--- 736,759 ----
    supercomputer consisting of desktop computers networked together
    might have as many contexts as computers.  The run-time system,
    e.g., the Message Passing Interface (&mpi;) Communications Library
!   <!-- FIXME: xref linkend="mpi99", <ulink
!   url="http://www-unix.mcs.anl.gov/mpi/"></ulink> --> or the &mm;
    Shared Memory Library (<ulink
    url="http://www.engelschall.com/sw/mm/"></ulink>), communicates
    the available contexts to the executable.  &pooma; must be
!   configured for the particular run-time system in use.  See <xref
    linkend="installation-distributed_computing"></xref>.</para>

    <para>A <firstterm>layout</firstterm> combines patches with contexts
    so the program can be executed.  If &distributedtag; is specified,
    the patches are distributed among the available contexts.  If
!   &replicatedtag; is specified, each set of patches is replicated on
!   each context.  Regardless, the containers' domains are now
    distributed among the contexts so the program can run.  When a patch
    needs data from another patch, the &poomatoolkit; sends messages to
!   the desired patch uses the message-passing library.  All such
!   communication is automatically performed by the &toolkit; with no
!   need for programmer or user input.</para>

    <para>Incorporating &pooma;'s distributed computation model into a
    program requires writing very few lines of code.  <xref
***************
*** 772,783 ****
    copy of adjacent patches' outermost values.  This may speed
    computation because a patch need not synchronize its computation
    with other patches' processors.  Since each value's computation
!   requires knowing its surrounding neighbors, the internal guard
    layer is one layer deep.  The second <type>GuardLayers</type>
    argument specifies no external guard layer.  External guard layers
!   simplify computing values along the edges of domains.  Since the
!   program already uses only the interior domain for computation, we
!   do not use this feature.</para>

    <para>The <varname>layout</varname> declaration creates a
    <type>UniformGridLayout</type> layout.  As <xref
--- 766,777 ----
    copy of adjacent patches' outermost values.  This may speed
    computation because a patch need not synchronize its computation
    with other patches' processors.  Since each value's computation
!   requires knowing its surrounding neighbors, this internal guard
    layer is one layer deep.  The second <type>GuardLayers</type>
    argument specifies no external guard layer.  External guard layers
!   simplify computing values along the edges of domains.  Since our
!   program already uses only the interior domain for computation, we do
!   not use this feature.</para>

    <para>The <varname>layout</varname> declaration creates a
    <type>UniformGridLayout</type> layout.  As <xref
***************
*** 787,843 ****
    comprise <varname>layout</varname>'s three parameters; the
    contexts are implicitly supplied by the run-time system.</para>

!   <para>To create a distributed &array;, it should be created using
!   a &layout; object and have a &multipatch; &engine;.  Prior
!   implementations designed for uniprocessors constructed the
!   container using a &domain; object.  A distributed implementation
!   uses a &layout; object, which conceptually specifies a &domain;
!   object and its distribution throughout the computer.  A
!   &multipatch; &engine; supports computations using multiple patches.
!   The <type>UniformTag</type> indicates the patches all have the
!   same size.  Since patches may reside on different contexts, the
!   second template parameter is <type>Remote</type>.  Its
!   <type>Brick</type> template parameter specifies the &engine; for a
!   particular patch on a particular context.  Most distributed
!   programs use <type>MultiPatch<UniformTag, Remote<Brick>
!   ></type> or <type>MultiPatch<UniformTag,
!   Remote<CompressibleBrick> ></type> &engine;s.</para>

    <para>The computations for a distributed implementation are exactly
    the same as for a sequential implementation.  The &poomatoolkit; and
!   a message-passing library automatically perform all
    computation.</para>

    <para>Input and output for distributed programs is different than
!   for sequential programs.  Although the same instructions run on
!   each context, each context may have its own input and output
!   streams.  To avoid dealing with multiple input streams, we pass
!   the input via command-line arguments, which are replicated for
!   each context.  Using &inform; streams avoids having multiple
!   output streams print.  Any context can print to an &inform; stream
!   but only text sent to context 0 is sent.  At the beginning of
!   the program, we create an &inform; object.  Throughout the rest of
!   the program, we use it instead of <varname>std::cout</varname> and
    <varname>std::cerr</varname>.</para>

    <para>The command to run the program is dependent on the run-time
    system.  To use &mpi; with the Irix 6.5 operating system, one
    can use the <command>mpirun</command> command.  For example,
!   <statement>mpirun -np 4 Doof2d-Array-distributed -mpi 2 10
!   1000</statement> invokes the &mpi; run-time system with four
!   processors.  The <statement>-mpi</statement> option tells the
!   &pooma; executable <command>Doof2d-Array-distributed</command> to
!   use the &mpi; Library.  The remaining arguments specify the number
!   of processors, the number of averagings, and the array size.  The
!   first and last values are used for each dimension.  For example,
!   if three processors are specified, then the x-dimension will have
!   three processors and the y-dimension will have three processors,
!   totalling nine processors.  The command
!   <statement>Doof2d-Array-distributed -shmem -np 4 2 10
!   1000</statement> uses the &mm; Shared Memory Library
!   (<statement>-shmem</statement>) and four processors.  As for
!   &mpi;, the remaining command-line arguments are specified on a
!   per-dimension basis for the two-dimensional program.</para>
   </section>

--- 781,837 ----
    comprise <varname>layout</varname>'s three parameters; the
    contexts are implicitly supplied by the run-time system.</para>

!   <para>To create a distributed &array;, it should be created using a
!   &layout; object and have a &multipatch; &engine; rather than using a
!   &domain; object and a &brick; &engine; as we did for the
!   uniprocessor implementations.  A distributed implementation uses a
!   &layout; object, which conceptually specifies a &domain; object and
!   its distribution throughout the computer.  A &multipatch; &engine;
!   supports computations using multiple patches.  The
!   <type>UniformTag</type> indicates the patches all have the same
!   size.  Since patches may reside on different contexts, the second
!   template parameter is <type>Remote</type>.  Its <type>Brick</type>
!   template parameter specifies the &engine; for a particular patch on
!   a particular context.  Most distributed programs use
!   <type>MultiPatch<UniformTag, Remote<Brick> ></type> or
!   <type>MultiPatch<UniformTag, Remote<CompressibleBrick>
!   ></type> &engine;s.</para>

    <para>The computations for a distributed implementation are exactly
    the same as for a sequential implementation.  The &poomatoolkit; and
!   a message-passing library automatically perform all the
    computation.</para>

    <para>Input and output for distributed programs is different than
!   for sequential programs.  Although the same instructions run on each
!   context, each context may have its own input and output streams.  To
!   avoid dealing with multiple input streams, we pass the input via
!   command-line arguments, which are replicated for each context.
!   Using &inform; streams avoids having multiple output streams print.
!   Any context can print to an &inform; stream but only text sent to
!   context 0 is displayed.  At the beginning of the program, we
!   create an &inform; object named <varname>output</varname>.
!   Throughout the rest of the program, we use it instead of
!   <varname>std::cout</varname> and
    <varname>std::cerr</varname>.</para>

    <para>The command to run the program is dependent on the run-time
    system.  To use &mpi; with the Irix 6.5 operating system, one
    can use the <command>mpirun</command> command.  For example,
!   <command>mpirun -np 4 Doof2d-Array-distributed -mpi 2 10
!   1000</command> invokes the &mpi; run-time system with four
!   processors.  The <option>-mpi</option> option tells the &pooma;
!   executable <command>Doof2d-Array-distributed</command> to use the
!   &mpi; Library.  The remaining arguments specify the number of
!   processors, the number of averagings, and the array size.  The first
!   and last values are the same for each dimension.  For example, if three
!   processors are specified, then the x-dimension will have three
!   processors and the y-dimension will have three processors, totaling
!   nine processors.  The command <command>Doof2d-Array-distributed
!   -shmem -np 4 2 10 1000</command> uses the &mm; Shared Memory Library
!   (<option>-shmem</option>) and four processors.  As for &mpi;, the
!   remaining command-line arguments are specified on a per-dimension
!   basis for the two-dimensional program.</para>
   </section>

***************
*** 845,851 ****
    <title>Data-Parallel &field; Implementation</title>

    <para>&pooma; &array;s support many scientific computations, but
!   many scientific computations require values distributed throughout
    space, and &array;s have no spatial extent.  &pooma; &field;s,
    supporting a superset of &array; functionality, model values
    distributed throughout space.</para>
--- 839,845 ----
    <title>Data-Parallel &field; Implementation</title>

    <para>&pooma; &array;s support many scientific computations, but
!   other scientific computations require values distributed throughout
    space, and &array;s have no spatial extent.  &pooma; &field;s,
    supporting a superset of &array; functionality, model values
    distributed throughout space.</para>
***************
*** 861,869 ****

    <para>In this section, we implement the &doof2d; two-dimensional
    diffusion simulation program using &field;s.  This simulation does
!   not require any &field;-specific features, but we chose to present
    this program rather than one using &field;-specific features to
!   permit comparisons with the &array; versions, especially <xref
    linkend="tutorial-array_parallel-doof2d"></xref>.</para>

    <example id="tutorial-field_parallel-doof2d">
--- 855,863 ----

    <para>In this section, we implement the &doof2d; two-dimensional
    diffusion simulation program using &field;s.  This simulation does
!   not require any &field;-specific features, but we present
    this program rather than one using &field;-specific features to
!   facilitate comparison with the &array; versions, especially <xref
    linkend="tutorial-array_parallel-doof2d"></xref>.</para>

    <example id="tutorial-field_parallel-doof2d">
***************
*** 876,886 ****
       included.</para>
      </callout>
      <callout arearefs="tutorial-field_parallel-doof2d-mesh">
!      <para>These statements specify the spacing and number of
!      &field; values.  First, a layout is explicitly.  Then, a mesh,
!      which specifies the spacing between cells, is created.  The
!      &field;'s centering specifies one cell-centered value per
!      cell.</para>
      </callout>
      <callout arearefs="tutorial-field_parallel-doof2d-field_creation">
       <para>&field;'s first template parameter specifies the type of
--- 870,879 ----
       included.</para>
      </callout>
      <callout arearefs="tutorial-field_parallel-doof2d-mesh">
!      <para>These statements specify the spacing and number of &field;
!      values.  First, a layout is specified.  Then, a mesh, which
!      specifies the spacing between cells, is created.  The &field;'s
!      centering specifies one cell-centered value per cell.</para>
      </callout>
      <callout arearefs="tutorial-field_parallel-doof2d-field_creation">
       <para>&field;'s first template parameter specifies the type of
***************
*** 907,931 ****
    Since the above program is designed for uniprocessor computation,
    specifying the domain specifies the layout.  A &field;'s
    <firstterm>mesh</firstterm> specifies its spatial extent.  For
!   example, one can ask the mesh for the distance between two cells
!   or for the normals to a particular cell.  Cells in a
    <type>UniformRectilinearMesh</type> all have the same size and are
!   parallelepipeds.  To create the mesh, one specifies the layout,
!   the location of the spatial point corresponding to the lower, left
    domain location, and the size of a particular cell.  Since this
!   program does not use mesh computations, our choices do not much
!   matter.  We specify the domain's lower, left corner is at spatial
!   location (0.0, 0.0) and each cell's width and height is 1.
!   Thus, the middle of the cell at domain position (3,4) is (3.5,
!   4.5).</para>

    <para>A &field; cell can contain one or more values although each
!   cell must have the same arrangement.  For this simulation, we
!   desire one value per cell so we place that position at the cell's
    center, i.e., a cell centering.  The
    <function>canonicalCentering</function> function returns such a
!   centering.  We defer discussion of the latter two arguments to
!   <xref linkend="sequential"></xref>.</para>

    <para>A &field; declaration is analogous to an &array; declaration
    but must also specify a centering and a mesh.  In <xref
--- 900,924 ----
    Since the above program is designed for uniprocessor computation,
    specifying the domain specifies the layout.  A &field;'s
    <firstterm>mesh</firstterm> specifies its spatial extent.  For
!   example, one can ask the mesh for the distance between two cells or
!   for the normals to a particular cell.  Cells in a
    <type>UniformRectilinearMesh</type> all have the same size and are
!   parallelepipeds.  To create the mesh, one specifies the layout, the
!   location of the spatial point corresponding to the lower, left
    domain location, and the size of a particular cell.  Since this
!   program does not use mesh computations, our choices do not matter.
!   We specify the domain's lower, left corner as spatial location (0.0,
!   0.0) and each cell's width and height as 1.  Thus, the middle
!   of the cell at domain position (3,4) is (3.5, 4.5).</para>

    <para>A &field; cell can contain one or more values although each
!   cell must have the same arrangement of values.  For this simulation,
!   we desire one value per cell so we place that position at the cell's
    center, i.e., a cell centering.  The
    <function>canonicalCentering</function> function returns such a
!   centering.  <![%unfinished;[ We defer discussion of the latter two arguments
!   to <xref linkend="sequential"></xref> ]]> <!-- FIXME
!   unfinished --> .</para>

    <para>A &field; declaration is analogous to an &array; declaration
    but must also specify a centering and a mesh.  In <xref
***************
*** 938,954 ****
    the &engine; type.  Since a &field; has a centering and a mesh in
    addition to a layout, those arguments are also necessary.</para>

!   <para>&field; operations are a superset of &array; operations so
!   the &doof2d; computations are the same as for <xref
!   linkend="tutorial-array_parallel-doof2d"></xref>.  &field;
!   accesses require parentheses, not square brackets, and accesses to
!   particular values should be preceded by calls to
    <function>Pooma::blockAndEvaluate</function>.</para>

    <para>To summarize, &field;s support multiple values per cell and
    have spatial extent.  Thus, their declarations must specify a
    centering and a mesh.  Otherwise, a &field; program is similar to
!   one with &array;s.</para>
   </section>

--- 931,947 ----
    the &engine; type.  Since a &field; has a centering and a mesh in
    addition to a layout, those arguments are also necessary.</para>

!   <para>&field; operations are a superset of &array; operations so the
!   &doof2d; computations are the same as in <xref
!   linkend="tutorial-array_parallel-doof2d"></xref>.  &field; accesses
!   require parentheses, not square brackets, and accesses to individual
!   values should be preceded by calls to
    <function>Pooma::blockAndEvaluate</function>.</para>

    <para>To summarize, &field;s support multiple values per cell and
    have spatial extent.  Thus, their declarations must specify a
    centering and a mesh.  Otherwise, a &field; program is similar to
!   one using &array;s.</para>
   </section>

***************
*** 956,970 ****
    <title>Distributed &field; Implementation</title>

    <para>A &pooma; program using &field;s can execute on one or more
!   processors.  In <xref
!   linkend="tutorial-array_distributed"></xref>, we demonstrated how
!   to modify a uniprocessor stencil &array; implementation to run on
!   multiple processors.  In this section, we demonstrate that the
!   uniprocessor data-parallel &field; implementation of the previous
!   section can be converted.  Only the container declarations change;
!   the computations do not.  Since the changes are exactly analogous
!   to those in <xref linkend="tutorial-array_distributed"></xref>,
!   our exposition here will be shorter.</para>

    <example id="tutorial-field_distributed-doof2d">
     <title>Distributed Data-Parallel &field; Implementation of &doof2d;</title>
--- 949,963 ----
    <title>Distributed &field; Implementation</title>

    <para>A &pooma; program using &field;s can execute on one or more
!   processors.  In <xref linkend="tutorial-array_distributed"></xref>,
!   we demonstrated how to modify a uniprocessor stencil &array;
!   implementation to run on multiple processors.  In this section, we
!   demonstrate that the uniprocessor data-parallel &field;
!   implementation of the previous section can be similarly converted.
!   Only the container declarations change; the computations do not.
!   Since the changes are exactly analogous to those in <xref
!   linkend="tutorial-array_distributed"></xref>, our exposition here
!   will be shorter.</para>

    <example id="tutorial-field_distributed-doof2d">
     <title>Distributed Data-Parallel &field; Implementation of &doof2d;</title>
***************
*** 974,985 ****
       <para>Multiple copies of a distributed program may
       simultaneously run, perhaps each having its own input and
       output.  Thus, we use command-line arguments to pass input to
!      the program.  Using an &inform; stream ensures only one program
       produces output.</para>
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-layout">
       <para>The <type>UniformGridPartition</type> declaration
!      specifies how an array's domain will be partition, of split,
       into patches.  Guard layers are an optimization that can reduce
       data communication between patches.  The
       <type>UniformGridLayout</type> declaration applies the
--- 967,978 ----
       <para>Multiple copies of a distributed program may
       simultaneously run, perhaps each having its own input and
       output.  Thus, we use command-line arguments to pass input to
!      the program.  Using an &inform; stream ensures only one copy
       produces output.</para>
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-layout">
       <para>The <type>UniformGridPartition</type> declaration
!      specifies how an array's domain will be partitioned, or split,
       into patches.  Guard layers are an optimization that can reduce
       data communication between patches.  The
       <type>UniformGridLayout</type> declaration applies the
***************
*** 991,998 ****
       uniprocessor and multiprocessor implementations.</para>
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-remote">
!      <para>The <type>MultiPatch</type> &engine; distributes requests
!      for &array; values to the associated patch.  Since a patch may
       associated with a different processor, its
       <quote>remote</quote> engine has type
       <type>Remote<Brick></type>.  &pooma; automatically
--- 984,991 ----
       uniprocessor and multiprocessor implementations.</para>
      </callout>
      <callout arearefs="tutorial-field_distributed-doof2d-remote">
!      <para>The &multipatch; &engine; distributes requests
!      for &field; values to the associated patch.  Since a patch may
       associated with a different processor, its
       <quote>remote</quote> engine has type
       <type>Remote<Brick></type>.  &pooma; automatically
***************
*** 1038,1051 ****
      </listitem>
      <listitem>
       <para>The command to invoke a distributed program is
!      system-dependent.  For example, the <statement>mpirun -np 4
!      Doof2d-Field-distributed -mpi 2 10 1000</statement> command
       might use &mpi; communication.
!      <statement>Doof2d-Field-distributed -shmem -np 4 2 10
!      1000</statement> might use the &mm; Shared Memory Library.</para>
      </listitem>
    </itemizedlist>
    </para>
   </section>
! <!-- FIXME: Do I need a chapter conclusion? -->
  </chapter>
--- 1031,1044 ----
      </listitem>
      <listitem>
       <para>The command to invoke a distributed program is
!      system-dependent.  For example, the <command>mpirun -np 4
!      Doof2d-Field-distributed -mpi 2 10 1000</command> command
       might use &mpi; communication.
!      <command>Doof2d-Field-distributed -shmem -np 4 2 10
!      1000</command> might use the &mm; Shared Memory Library.</para>
      </listitem>
    </itemizedlist>
    </para>
   </section>
! <!-- FIXME: Add a chapter conclusion. -->
  </chapter>