Patch: Recent Manual Changes

Fri Jan 4 10:44:00 UTC 2002

This patch mainly adds (mostly finished) chapters on understanding and
using data-parallel operators and templates to the R2 manual that is
being written.

2002-Jan-04  Jeffrey D. Oldham  <oldham at codesourcery.com>

	* bibliography.xml: New file containing bibliographic information.
	* concepts.xml: Clarify containers that map indices to values.
	* glossary.xml: Add entries for compilation time, compile time,
	conformable containers, conformable domains, execution time,
	instantiation, programming time, run time, template instantiation,
	trait, traits class, Turing complete.
	* introduction.xml: Many minor changes mainly involving formatting
	and word choice.  Add sections discussing program execution speed
	and open-source software.
	* manual.xml: Add several new entity definitions.  Add unfinished
	chapter discussing writing programs using templates.  Add unfinished
	data-parallel operator chapter.  Many other minor changes.  Move
	bibliography to separate file.
	* tutorial.xml: Minor wordsmithing changes.
	* figures/box-macros.mp: New file containing macros to create
	boxes in illustrations.
	* figures/data-parallel.mp: New file illustrating data-parallel
	operations.
	* figures/doof2d.mp: Replace definitions with inclusion of
	grid-macros.mp.
	* figures/grid-macros.mp: New file containing macros to create
	grids.
	* figures/introduction.mp: Use box-macros.mp.
	* programs/Doof2d-Array-distributed-annotated.patch: Moved to
	different directory.
	* programs/Doof2d-Array-element-annotated.patch: Likewise.
	* programs/Doof2d-Array-parallel-annotated.patch: Likewise.
	* programs/Doof2d-Array-stencil-annotated.patch: Likewise.
	* programs/Doof2d-C-element-annotated.patch: Likewise.
	* programs/Doof2d-Field-distributed-annotated.patch: Likewise.
	* programs/Doof2d-Field-parallel-annotated.patch: Likewise.
	* programs/makefile: Likewise.

Applied to	mainline.

Thanks,
Jeffrey D. Oldham
oldham at codesourcery.com
-------------- next part --------------
Index: bibliography.xml
===================================================================
RCS file: bibliography.xml
diff -N bibliography.xml
*** /dev/null	Fri Mar 23 21:37:44 2001
--- bibliography.xml	Fri Jan  4 10:14:05 2002
***************
*** 0 ****
--- 1,277 ----
+ <!-- Bibliography -->
+ 
+ <bibliography id="bibliography">
+  <title>Bibliography</title>
+ 
+  <para>FIXME: How do I process these entries?</para>
+ 
+  <biblioentry>
+   <abbrev>mpi99</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>William</firstname><surname>Gropp</surname>
+    </author>
+    <author>
+     <firstname>Ewing</firstname><surname>Lusk</surname>
+    </author>
+    <author>
+     <firstname>Anthony</firstname><surname>Skjellum</surname>
+    </author>
+   </authorgroup>
+   <copyright>
+    <year>1999</year>
+    <holder>Massachusetts Institute of Technology</holder>
+   </copyright>
+   <isbn>0-262-57132-3</isbn>
+   <publisher>
+    <publishername>The MIT Press</publishername>
+    <address>Cambridge, MA</address>
+   </publisher>
+   <title>Using MPI</title>
+   <subtitle>Portable Parallel Programming with the Message-Passing Interface</subtitle>
+   <edition>second edition</edition>
+  </biblioentry>
+ 
+  <biblioentry>
+   <abbrev>pooma95</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
+     <affiliation>
+      <orgname>Dakota Software Systems, Inc.</orgname>
+      <address><city>Rapid City</city><state>SD</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Susan</firstname><othername role="mi">R.</othername><surname>Atlas</surname>
+     <affiliation>
+      <orgname>Parallel Solutions, Inc.</orgname>
+      <address><city>Santa Fe</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Subhankar</firstname><surname>Banerjee</surname>
+     <affiliation>
+      <orgname>New Mexico State University</orgname>
+      <address><city>Las Cruces</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
+     <affiliation>
+      <orgname>University of Illinois at Urbana-Champaign</orgname>
+      <address><city>Urbana-Champaign</city><state>IL</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
+     <affiliation>
+      <orgname>California Institute of Technology</orgname>
+      <address><city>Pasadena</city><state>CA</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Katarzyna</firstname><surname>Keahey</surname>
+     <affiliation>
+      <orgname>Indiana University</orgname>
+      <address><city>Bloomington</city><state>IN</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Marydell</firstname><surname>Tholburn</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+   </authorgroup>
+   <title>&pooma;</title>
+   <subtitle>A Framework for Scientific Simulation on Parallel Architectures</subtitle>
+   <releaseinfo>unpublished</releaseinfo>
+  </biblioentry>
+ 
+  <biblioentry>
+   <abbrev>pooma-sc95</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>Susan</firstname><surname>Atlas</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Subhankar</firstname><surname>Banerjee</surname>
+     <affiliation>
+      <orgname>New Mexico State University</orgname>
+      <address><city>Las Cruces</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
+     <affiliation>
+      <orgname>Advanced Computing Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>M.</firstname><surname>Srikant</surname>
+     <affiliation>
+      <orgname>New Mexico State University</orgname>
+      <address><city>Las Cruces</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Marydell</firstname><surname>Tholburn</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+   </authorgroup>
+   <title>&pooma;</title>
+   <subtitle>A High Performance Distributed Simulation Environment for
+   Scientific Applications</subtitle>
+ <!-- FIXME: Where list Supercomputing 1995? -->
+  </biblioentry>
+ 
+  <biblioentry>
+   <abbrev>pooma-siam98</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>James</firstname><othername role="mi">A.</othername><surname>Crotinger</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Scott</firstname><othername role="mi">W.</othername><surname>Haney</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Stephen</firstname><othername role="mi">A.</othername><surname>Smith</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+    <author>
+     <firstname>Timothy</firstname><othername role="mi">J.</othername><surname>Williams</surname>
+     <affiliation>
+      <orgname>Los Alamos National Laboratory</orgname>
+      <address><city>Los Alamos</city><state>NM</state></address>
+     </affiliation>
+    </author>
+   </authorgroup>
+   <title>Raid Application Development and Enhanced Code
+   Interoperability using the &pooma; Framework</title>
+ <!-- FIXME: Where list SIAM Workshop ... 1998? -->
+  </biblioentry>
+ 
+  <biblioentry>
+   <abbrev>pete-99</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>Scott</firstname><surname>Haney</surname>
+    </author>
+    <author>
+     <firstname>James</firstname><surname>Crotinger</surname>
+    </author>
+    <author>
+     <firstname>Steve</firstname><surname>Karmesin</surname>
+    </author>
+    <author>
+     <firstname>Stephen</firstname><surname>Smith</surname>
+    </author>
+   </authorgroup>
+   <title>&pete;: The Portable Expression Template Engine.  1999 October,
+ \emph{Dr. Dobb's Journal}, vol.24, nu.10, pp.88--95</title>
+ <!-- FIXME: Fix the tagging. -->
+  </biblioentry>
+ 
+  <biblioentry>
+   <abbrev>veldhuizen-95</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>Todd</firstname><surname>Veldhuizen</surname>
+    </author>
+   </authorgroup>
+   <title>Expression Templates.  1995 June, \emph{&cc; Report}, vol.7,
+ nu.5, pp.26--31.  Also available at http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html</title>
+ <!-- FIXME: Fix the tagging. -->
+  </biblioentry>
+ 
+  <biblioentry>
+   <abbrev>vandevoorde-95</abbrev>
+   <authorgroup>
+    <author>
+     <firstname>David</firstname><surname>Vandevoorde</surname>
+    </author>
+   </authorgroup>
+   <title>\texttt{valarray<Troy>}: An Implementation of a Numerical
+ Array.  1995.  unpublished.  Available at ftp://ftp.cs.rpi.edu/pub/vandevod/Valarray/Documents/valarray.ps.</title>
+ <!-- FIXME: Fix the tagging. -->
+  </biblioentry>
+ 
+ 
+ </bibliography>
Index: concepts.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v
retrieving revision 1.3
diff -c -p -r1.3 concepts.xml
*** concepts.xml	2001/12/17 17:27:41	1.3
--- concepts.xml	2002/01/04 17:14:05
***************
*** 343,349 ****
  	<imagedata fileref="figures/concepts.101" format="EPS" align="center"></imagedata>
       </imageobject>
       <textobject>
! 	<phrase>maps from indices to values</phrase>
       </textobject>
      </mediaobject>
     </figure>
--- 343,349 ----
  	<imagedata fileref="figures/concepts.101" format="EPS" align="center"></imagedata>
       </imageobject>
       <textobject>
! 	<phrase>&array;s and &field;s map from indices to values.</phrase>
       </textobject>
      </mediaobject>
     </figure>
Index: glossary.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v
retrieving revision 1.4
diff -c -p -r1.4 glossary.xml
*** glossary.xml	2001/12/17 17:27:41	1.4
--- glossary.xml	2002/01/04 17:14:06
***************
*** 91,96 ****
--- 91,112 ----
     </glossdef>
    </glossentry>

+   <glossentry id="glossary-compilation_time">
+    <glossterm>compilation time</glossterm>
+    <glosssee otherterm="glossary-compilation_time"></glosssee>
+   </glossentry>
+ 
+   <glossentry id="glossary-compile_time">
+    <glossterm>compile time</glossterm>
+    <glossdef>
+     <para>time in the process from writing a program to executing it
+     when the program is compiled by a compiler.  This is also called
+     <firstterm>compilation time</firstterm>.</para>
+     <glossseealso otherterm="glossary-programming_time">programming time</glossseealso>
+     <glossseealso otherterm="glossary-run_time">run time</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
    <glossentry id="glossary-computing_environment">
     <glossterm>computing environment</glossterm>
     <glossdef>
***************
*** 102,107 ****
--- 118,145 ----
     </glossdef>
    </glossentry>

+   <glossentry id="glossary-conformable_containers">
+    <glossterm>conformable containers</glossterm>
+    <glossdef>
+     <para>containers with conformable domains.</para>
+     <glossseealso otherterm="glossary-conformable_domains">conformable domains</glossseealso>
+     <glossseealso otherterm="glossary-data_parallel">data parallel</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
+   <glossentry id="glossary-conformable_domains">
+    <glossterm>conformable domains</glossterm>
+    <glossdef>
+     <para>domains with the <quote>same shape</quote> so that
+     corresponding dimensions have the same number of elements.
+     Scalars, deemed conformable with any domain, get
+     <quote>expanded</quote> to the domain's shape.  Binary operators
+     can operate on containers with conformable domains.</para>
+     <glossseealso otherterm="glossary-conformable_containers">conformable containers</glossseealso>
+     <glossseealso otherterm="glossary-data_parallel">data parallel</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
    <glossentry id="glossary-container">
     <glossterm>container</glossterm>
     <glossdef>
***************
*** 240,245 ****
--- 278,288 ----
     </glossdef>
    </glossentry>

+   <glossentry id="glossary-execution_time">
+    <glossterm>execution time</glossterm>
+    <glosssee otherterm="glossary-run_time"></glosssee>
+   </glossentry>
+ 
    <glossentry id="glossary-external_guard_layer">
     <glossterm>external guard layer</glossterm>
     <glossdef>
***************
*** 297,311 ****
     <glossdef>
      <para>domain surrounding each patch of a container's domain.  It
      contains read-only values.  <link
! 				       linkend="glossary-external_guard_layer">External guard
      layer</link>s ease programming, while <link
! 						 linkend="glossary-internal_guard_layer">internal guard
      layer</link>s permit each patch's computation to be occur without
!     copying values from adjacent patches.  They are optimizations,
!     not required for program correctness.</para>
!     <glossseealso otherterm="glossary-external_guard_layer">external guard layer</glossseealso>
!     <glossseealso otherterm="glossary-internal_guard_layer">internal guard layer</glossseealso>
!     <glossseealso otherterm="glossary-partition">partition</glossseealso>
      <glossseealso otherterm="glossary-patch">patch</glossseealso>
      <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
--- 340,356 ----
     <glossdef>
      <para>domain surrounding each patch of a container's domain.  It
      contains read-only values.  <link
!     linkend="glossary-external_guard_layer">External guard
      layer</link>s ease programming, while <link
!     linkend="glossary-internal_guard_layer">internal guard
      layer</link>s permit each patch's computation to be occur without
!     copying values from adjacent patches.  They are optimizations, not
!     required for program correctness.</para> <glossseealso
!     otherterm="glossary-external_guard_layer">external guard
!     layer</glossseealso> <glossseealso
!     otherterm="glossary-internal_guard_layer">internal guard
!     layer</glossseealso> <glossseealso
!     otherterm="glossary-partition">partition</glossseealso>
      <glossseealso otherterm="glossary-patch">patch</glossseealso>
      <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
***************
*** 319,331 ****
     <glossterm>index</glossterm>
     <glossdef>
      <para>a position in a <link
! 				 linkend="glossary-domain">domain</link> usually denoted by an
      ordered tuple.  More than one index are called <link
! 							  linkend="glossary-indices">indices</link>.</para>
!     <glossseealso otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
    </glossentry>

    <glossentry id="glossary-indices">
     <glossterm>indices</glossterm>
     <glossdef>
--- 364,381 ----
     <glossterm>index</glossterm>
     <glossdef>
      <para>a position in a <link
!     linkend="glossary-domain">domain</link> usually denoted by an
      ordered tuple.  More than one index are called <link
!     linkend="glossary-indices">indices</link>.</para> <glossseealso
!     otherterm="glossary-domain">domain</glossseealso>
     </glossdef>
    </glossentry>

+   <glossentry id="glossary-instantiation">
+    <glossterm>instantiation</glossterm>
+    <glosssee>template instantiation</glosssee>
+   </glossentry>
+ 
    <glossentry id="glossary-indices">
     <glossterm>indices</glossterm>
     <glossdef>
***************
*** 439,444 ****
--- 489,504 ----
      <glossseealso otherterm="glossary-index">index</glossseealso>
     </glossdef>
    </glossentry>
+ 
+   <glossentry id="glossary-programming_time">
+    <glossterm>programming time</glossterm>
+    <glossdef>
+     <para>time in the process from writing a program to executing it
+     when the program is being written by a programmer.</para>
+     <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
+     <glossseealso otherterm="glossary-run_time">run time</glossseealso>
+    </glossdef>
+   </glossentry>
   </glossdiv>

   <glossdiv id="glossary-r">
***************
*** 480,485 ****
--- 540,556 ----
      <glossseealso otherterm="glossary-stencil">stencil</glossseealso>
     </glossdef>
    </glossentry>
+ 
+   <glossentry id="glossary-run_time">
+    <glossterm>run time</glossterm>
+    <glossdef>
+     <para>time in the process from writing a program to executing it
+     when the program is executed.  This is also called
+     <firstterm>execution time</firstterm>.</para>
+     <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
+     <glossseealso otherterm="glossary-programming_time">programming time</glossseealso>
+    </glossdef>
+   </glossentry>
   </glossdiv>

   <glossdiv id="glossary-s">
***************
*** 541,546 ****
--- 612,629 ----
   <glossdiv id="glossary-t">
    <title>T</title>

+   <glossentry id="glossary-template_instantiation">
+    <glossterm>template instantiation</glossterm>
+    <glossdef>
+     <para>applying a template class to template parameters to create a
+     type.  For example, <statement>foo<double,3></statement>
+     instantiates <statement>template <typename T, int n> class
+     foo</statement> with the type &double; and the constant
+     integer 3.  Template instantiation is analogous to applying a
+     function to function arguments.</para>
+    </glossdef>
+   </glossentry>
+ 
    <glossentry id="glossary-tensor">
     <glossterm>&tensor;</glossterm>
     <glossdef>
***************
*** 558,563 ****
--- 641,673 ----
      mathematical matrices as first-class objects.</para>
      <glossseealso otherterm="glossary-tensor">&tensor;</glossseealso>
      <glossseealso otherterm="glossary-vector">&vector;</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
+   <glossentry id="glossary-trait">
+    <glossterm>trait</glossterm>
+    <glossdef>
+     <para>a characteristic of a type.</para>
+     <glossseealso otherterm="glossary-traits_class">traits class</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
+   <glossentry id="glossary-traits_class">
+    <glossterm>traits class</glossterm>
+    <glossdef>
+     <para>a class containing one or more traits all describing a
+     particular type's chacteristics.</para>
+     <glossseealso otherterm="glossary-trait">trait</glossseealso>
+    </glossdef>
+   </glossentry>
+ 
+   <glossentry id="glossary-Turing_complete">
+    <glossterm>Turing complete</glossterm>
+    <glossdef>
+     <para>describes a language that can compute anything that can be
+     computed.  That is, the language for computation is as powerful as
+     it can be.  Most wide-spread programming languages are
+     Turing-complete, including &cc;, &c;, and &fortran;.</para>
     </glossdef>
    </glossentry>
   </glossdiv>
Index: introduction.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/introduction.xml,v
retrieving revision 1.1
diff -c -p -r1.1 introduction.xml
*** introduction.xml	2001/12/17 17:27:41	1.1
--- introduction.xml	2002/01/04 17:14:06
***************
*** 2,21 ****
   <title>Introduction</title>

   <para>The Parallel Object-Oriented Methods and Applications
!  <acronym>POOMA</acronym> &toolkitcap; is a &cc; &toolkit; for
!  writing high-performance scientific programs for sequential and
!  distributed computation.  The &toolkit; provides a variety of
!  tools:
   <itemizedlist spacing="compact">
     <listitem>
      <para>containers and other abstractions suitable for scientific
      computation,</para>
     </listitem>
     <listitem>
-     <para>several container storage classes to reduce a program's
-     storage requirements,</para>
-    </listitem>
-    <listitem>
      <para>support for a variety of computation modes including
      data-parallel expressions, stencil-based computations, and lazy
      evaluation,</para>
--- 2,16 ----
   <title>Introduction</title>

   <para>The Parallel Object-Oriented Methods and Applications
!  (<acronym>POOMA</acronym>) &toolkitcap; is a &cc; &toolkit; for
!  writing high-performance scientific programs.  The &toolkit; provides
!  a variety of tools:
   <itemizedlist spacing="compact">
     <listitem>
      <para>containers and other abstractions suitable for scientific
      computation,</para>
     </listitem>
     <listitem>
      <para>support for a variety of computation modes including
      data-parallel expressions, stencil-based computations, and lazy
      evaluation,</para>
***************
*** 25,31 ****
     </listitem>
     <listitem>
      <para>automatic creation of all interprocessor communication for
!     parallel and distributed programs, and</para>
     </listitem>
     <listitem>
      <para>automatic out-of-order execution and loop rearrangement
--- 20,30 ----
     </listitem>
     <listitem>
      <para>automatic creation of all interprocessor communication for
!     parallel and distributed programs</para>
!    </listitem>
!    <listitem>
!     <para>several container storage classes to reduce a program's
!     storage requirements, and</para>
     </listitem>
     <listitem>
      <para>automatic out-of-order execution and loop rearrangement
***************
*** 34,53 ****
    </itemizedlist>
   Since the &toolkit; provides high-level abstractions, &pooma;
   programs are much shorter than corresponding &fortran; or &c;
!  programs, requiring less time to write and less time to debug.
!  Using these high-level abstractions, the same code runs on a wide
!  variety of computers almost as fast as carefully crafted
!  machine-specific hand-written programs.  The &toolkit; is freely
!  available, open-source software compatible with any modern &cc;
!  compiler.</para>

!  <formalpara><title>&pooma; Goals.</title>
    <para>The goals for the &poomatoolkit; have remained unchanged
!   since its inception in 1994:
    <orderedlist>
     <listitem>
      <para>Code portability across serial, distributed, and parallel
!     architectures with no change to source code.</para>
     </listitem>
     <listitem>
      <para>Development of reusable, cross-problem-domain components
--- 33,55 ----
    </itemizedlist>
   Since the &toolkit; provides high-level abstractions, &pooma;
   programs are much shorter than corresponding &fortran; or &c;
!  programs and require less time to write and less time to debug.
!  Using these high-level abstractions, the same code runs on a
!  sequential, parallel, and distributed computers.  It runs almost as
!  fast as carefully crafted machine-specific hand-written programs.
!  The &toolkit; is freely available, open-source software compatible
!  with any modern &cc; compiler.</para>

! 
!  <section id="introduction-goals">
!   <title>&pooma; Goals</title>
! 
    <para>The goals for the &poomatoolkit; have remained unchanged
!   since its conception in 1994:
    <orderedlist>
     <listitem>
      <para>Code portability across serial, distributed, and parallel
!     architectures without any change to the source code.</para>
     </listitem>
     <listitem>
      <para>Development of reusable, cross-problem-domain components
***************
*** 58,66 ****
      scientific simulation.</para>
     </listitem>
     <listitem>
!     <para>[&toolkitcap;] design and development driven by
!     applications from a diverse set of scientific problem
!     domains.</para>
     </listitem>
     <listitem>
      <para>Shorter time from problem inception to working parallel
--- 60,67 ----
      scientific simulation.</para>
     </listitem>
     <listitem>
!     <para>&toolkitcap; design and development driven by applications
!     from a diverse set of scientific problem domains.</para>
     </listitem>
     <listitem>
      <para>Shorter time from problem inception to working parallel
***************
*** 68,296 ****
  <!-- FIXME: Add citation to pooma95, p. 3 -->
     </listitem>
    </orderedlist>
!  </para>
!  </formalpara>

-  <formalpara><title>Code Portability for Sequential and Distributed Programs.</title>
-  <para>&pooma; programs run on sequential, distributed, and parallel
-  computers with no change in source code.  The programmer writes two
-  or three lines specifying how each container's domain should be
-  distributed among available processors.  Using these directives and
-  run-time information about the computer's configuration, the
-  &toolkit; automatically distributes pieces of the container
-  domains, called <firstterm>patch</firstterm>es, among the available
-  processors.  If a computation needs values from another patch,
-  &pooma; automatically passes the value to the place it is needed.
-  The same program, and even the same executable, works regardless of
-  the number of the available processors and the size of the
-  containers' domains.  A programmer interested in only sequential
-  execution can omit the two or three lines specifying how the
-  domains are to be distributed.</para>
-  </formalpara>
- 
-  <figure float="1" id="introduction-science_algorithms">
-   <title>Science, Algorithms, Engineering, and &pooma;</title>
-   <mediaobject>
-    <imageobject>
-     <imagedata fileref="figures/introduction.101" format="EPS" align="center"></imagedata>
-    </imageobject>
-    <textobject>
-     <phrase>how &pooma; helps translate algorithms into programs</phrase>
-    </textobject>
-    <caption>
-     <para>In the translation from theoretical science and math to
-     computational science and math to computer programs, &pooma;
-     containers eases the translation of algorithms to computer
-     programs.</para>
-    </caption>
-   </mediaobject>
-  </figure>
- 
-  <formalpara><title>Rapid Application Development.</title>
-  <para>The &poomatoolkit; is designed to enable rapid development of
-  scientific and distributed applications.  For example, its vector,
-  matrix, and tensor classes model the corresponding mathematical
-  concepts.  Its &array; and &field; classes model the discrete
-  spaces and mathematical arrays frequently found in computational
-  science and math.  See <xref
-  linkend="introduction-science_algorithms"></xref>.  The left column
-  illustrates theoretical science and math, the middle column
-  computational science and math, and the right column computer
-  science implementations.  For example, theoretical physics
-  frequently uses continuous fields in three-dimension space, while
-  algorithms for the corresponding computational physics problem
-  usually uses discrete fields.  &pooma; containers, classes, and
-  functions ease the engineering to map these algorithms to computer
-  programs.  For example, the &pooma; &field; container models
-  discrete fields; both map locations in discrete space to values and
-  permit computations of spatial distances and values.  The &pooma;
-  &array; container models the mathematical concept of an array, used
-  in numerical analysis.</para>
-  </formalpara>
- 
-  <para>&pooma; containers support a variety of computation modes,
-  easing transition of algorithms into code.  For example, many
-  algorithms for solving partial differential equations use
-  stencil-based computations.  &pooma; supports stencil-based
-  computations on &array;s and &field;s.  It also supports
-  data-parallel computation.  For computations where one &field;'s
-  values is a function of several other &field;'s values, the
-  programmer can specify a relation.  Relations are lazily evaluated;
-  whenever the dependent &field;'s values are needed and it is
-  related to a &field; whose values have changed, the former
-  &field;'s values are computed.  Lazy evaluation also assists
-  correctness by eliminating the (frequently forgotten) need for a
-  programmer to ensure a &field;'s values are up-to-date before being
-  used.</para>
- 
-  <formalpara><title>Efficient Code.</title>
-  <para>&pooma; incorporates a variety of techniques to ensure it
-  produces code that executes as quickly as special-case,
-  hand-written code.
- <!-- FIXME: Do I present execution numbers here? -->
-  These techniques include extensive use of templates, out-of-order
-  evaluation to permit communication and computation to overlap,
-  availability of guard layers to reduce processors' synchronicity,
-  and use of &pete; to produce fast inner loops.</para>
-  </formalpara>
- 
-  <para>Using templates permits the expressiveness of using pointers
-  and function arguments but ensures as much as work as possible
-  occurs at compile time, not run time.  Also, more code is exposed
-  to the compiler's optimizer, further speeding execution.  For
-  example, use of template parameters to define the &pooma; &array;
-  container permits the use of specialized data storage classes
-  called engines, fast creation of views of a portion of an &array;,
-  and polymorphic indexing.  An &array;'s engine template parameter
-  specifies how data is stored and indexed.  Some &array;s expect
-  almost all values to be used, while others might be mostly empty.
-  In the latter case, using a specialized engine storing the few
-  nonzero values would greatly reduce space requirements.  Using
-  engines also permits fast creation of container views, known as
-  <firstterm>array sections</firstterm> in Fortran 90.  A view's
-  engine is the same as the original container's engine, while the
-  view object maps its restricted domain to the original domain.
-  Space requirements and execution time are minimal.  Using templates
-  also permits containers to support polymorphic indexing, e.g.,
-  indexing both by integers and by three-dimensional coordinates.
-  For example, a container defers returning values to its engine
-  using a templatized index operator.  The engine can define indexing
-  functions with different function arguments, without the need to
-  add corresponding container functions.  Some of these features can
-  be expressed without using templates, but doing so increases
-  execution time.  For example, a container could have a pointer to
-  an engine object, but this requires a pointer dereference for each
-  operation.  Implementing polymorphic indexing without templates
-  would require adding virtual function corresponding to each of the
-  indexing functions.</para>
- 
- <!-- FIXME: Are the claims concerning out-of-order evaluation I make true? -->
- 
-  <para>To ensure multiprocessor &pooma; programs execute quickly, it
-  is important that interprocessor communication overlaps with
-  intraprocessor computation as much as possible and communication is
-  minimized.  Asynchronous communication, out-of-order evaluation, and
-  use of guard layers all help achieve this.  &pooma; uses the
-  asynchronous communication facilities of the &cheetah; communication
-  library.  When a processor needs data stored or computed by another
-  processor, a message is sent between the two.  For synchronous
-  communication, the sender must issue an explicit send, and the
-  recipient must issue an explicit receive.  This synchronizes them.
-  &cheetah; permits the sender to put and get data without the
-  intervention of the remote site and also invoke functions at the
-  remote site to ensure the data is up-to-date.  Thus, out-of-order
-  evaluation must be supported.  Out-of-order evaluation has another
-  benefit: only computations directly or indirectly related to values
-  that are printed need occur.</para>
- 
-  <para>Using guard layers also helps overlap communication and
-  computation.  For distributed computation, each container's domain is
-  split into pieces distributed among the available processors.
-  Frequently, computing a container value is local, involving just the
-  value itself and a few neighbors.  Computing a value near the edge of
-  a processor's domain may require knowing a few values from a
-  neighboring domain.  Guard layers permit these values to be copied
-  locally so they need not be repeatedly communicated.</para>
- 
-  <para>&pooma; uses &pete; technology to ensure inner loops using
-  &pooma;'s object-oriented containers run as quickly as hand-coded
-  <!-- FIXME: Add a citation to Dr. Dobb's Journal article
-  pete-99. --> loops.  &pete; (the Portable Expression Template
-  Engine) uses expression-template technology to convert
-  data-parallel statements frequently found in the inner loops of
-  programs into efficient loops without any intermediate
-  computations.  For example, consider evaluating the <statement>A +=
-  -B + 2 * C;</statement> statement where <varname>A</varname> and
-  <varname>C</varname> are <type>vector<double></type>s and
-  <varname>B</varname> is a <type>vector<int></type>s.
-  Ordinary evaluation might introduce intermediaries for
-  <statement>-B</statement>, <statement>2*C</statement>, and their
-  sum.  The presence of these intermediaries in inner loops can
-  measurably slow evaluation.  To produce a loop without
-  intermediaries, &pete; stores each expression as a parse tree.  The
-  resulting parse trees can be combined into a larger parse tree.
-  Using its templates, the parse tree is converted, at compile time,
-  to an outer loop with contents corresponding to evaluating each
-  component of the result.  Thus, no intermediate values are computed
-  or stored.  For example, the code corresponding to <statement>A +=
-  -B + 2 * C;</statement> is 
-  <programlisting>
-  vector<double>::iterator iterA = A.begin();
-  vector<int>::const_iterator iterB = B.begin();
-  vector<double>::const_iterator iterC = C.begin();
-  while (iterA != A.end()) {
-    *iterA += -*iterB + 2 * *iterC;
-    ++iterA; ++iterB; ++iterC;
-  }
-  </programlisting>
-  Furthermore, since the code is available at compile-, not run-, time,
-  it can be further optimized, e.g., moving any loop-invariant code out
-  of the loop.</para>
- 
-  <formalpara><title>Used for Diverse Set of Scientific Problems.</title>
-  <para>&pooma; has been used to solve a wide variety of scientific
-  problems.  Most recently, physicists at Los Alamos National
-  Laboratory implemented an entire library of hydrodynamics codes as
-  part of the U.S. government's Science-based Stockpile Stewardship
-  (<acronym>SBSS</acronym>) program to simulate nuclear weapons.
-  Other applications include a matrix solver, an accelerator code
-  simulating the dynamics of high-intensity charged particle beams in
-  linear accelerators, and a Monte Carlo neutron transport
-  code.</para>
-  </formalpara>
- 
-  <formalpara><title>Easy Implementation.</title>
-  <para>&pooma;'s tools greatly reduce the time to implement
-  applications.  As we noted above, &pooma;'s containers and
-  expression syntax model the computational models and algorithms
-  most frequently found in scientific programs.  Using these
-  high-level tools which are known to be correct reduce the time
-  needed to debug programs.  Programmers can write and test programs
-  using their one or two-processor personal computers.  With no
-  additional work, the same program runs on computers with hundreds
-  of processors; the code is exactly the same, and the &toolkit;
-  automatically handles distribution of the data, all data
-  communication, and all synchronization.  Using all these tools
-  greatly reduces programming time.  For example, a team of two
-  physicists and two support people at Los Alamos National Laboratory
-  implemented a suite of hydrodynamics kernels in six months.  Their
-  work replaced the previous suite of less-powerful kernels which had
-  taken sixteen people several years to implement and debug.  Despite
-  not previously implementing any of the kernels, they averaged one
-  new kernel every three days, including the time to read the
-  corresponding scientific papers!</para>
-  </formalpara>

   <section id="introduction-pooma_history">
    <title>History of &pooma;</title>

!   <para>The &poomatoolkit; developed at Los Alamos National
    Laboratory to assist nuclear fusion and fission research.
!   In 1994, the &toolkit; grew out of the Object-Oriented
!   Particle Simulation (OOPS) class library developed for
!   particle-in-cell simulations.  The goals of the Framework, as it
!   was called at the time, were driven by the Numerical Tokamak's
!   <quote>Parallel Platform Paradox</quote>:
    <blockquote>
     <para>The average time required to implement a moderate-sized
     application on a parallel computer architecture is equivalent to
--- 69,335 ----
  <!-- FIXME: Add citation to pooma95, p. 3 -->
     </listitem>
    </orderedlist>
!   Below, we discuss how &pooma; achieves these goals.
!   </para>
! 
!   <bridgehead id="introduction-goals-portability" renderas="sect2">Code Portability for Sequential and Distributed Programs</bridgehead>
! 
!   <para>The same &pooma; programs run on sequential, distributed, and
!   parallel computers.  No change in source code is required.  Two or
!   three lines specifying how each container's domain should be
!   distributed among available processors.  Using these directives and
!   run-time information about the computer's configuration, the
!   &toolkit; automatically distributes pieces of the container domains,
!   called <link
!   linkend="glossary-patch"><firstterm>patches</firstterm></link>,
!   among the available processors.  If a computation needs values from
!   another patch, &pooma; automatically passes the value to the patch
!   where it is needed.  The same program, and even the same executable,
!   works regardless of the number of the available processors and the
!   size of the containers' domains.  A programmer interested in only
!   sequential execution can omit the two or three lines specifying how
!   the domains are to be distributed.</para>
! 
!   <bridgehead id="introduction-goals-rapid_development" renderas="sect2">Rapid Application Development</bridgehead>
! 
!   <para>The &poomatoolkit; is designed to enable rapid development of
!   scientific and distributed applications.  For example, its vector,
!   matrix, and tensor classes model the corresponding mathematical
!   concepts.  Its &array; and &field; classes model the discrete spaces
!   and mathematical arrays frequently found in computational science and
!   math.  See <xref linkend="introduction-science_algorithms"></xref>.
!   The left column indicates theoretical science and math concepts, the
!   middle column computational science and math concepts, and the right
!   column computer science implementations.  For example, theoretical
!   physics frequently uses continuous fields in three-dimension space,
!   while algorithms for a corresponding computational physics problem
!   usually uses discrete fields.  &pooma; containers, classes, and
!   functions ease engineering computer programs for these algorithms.
!   For example, the &pooma; &field; container models discrete fields;
!   both map locations in discrete space to values and permit
!   computations of spatial distances and values.  The &pooma; &array;
!   container models the mathematical concept of an array, used in
!   numerical analysis.</para>
! 
!   <figure float="1" id="introduction-science_algorithms">
!    <title>How &pooma; Fits Into the Scientific Process</title>
!    <mediaobject>
!     <imageobject>
!      <imagedata fileref="figures/introduction.101" format="EPS" align="center"></imagedata>
!     </imageobject>
!     <textobject>
!      <phrase>&pooma; helps translate algorithms into programs.</phrase>
!     </textobject>
!     <caption>
!      <para>In the translation from theoretical science and math to
!      computational science and math to computer programs, &pooma; eases
!      the implementation of algorithms as computer programs.</para>
!     </caption>
!    </mediaobject>
!   </figure>
! 
!   <para>&pooma; containers support a variety of computation modes,
!   easing translation of algorithms into code.  For example, many
!   algorithms for solving partial differential equations use
!   stencil-based computations.  &pooma; supports stencil-based
!   computations on &array;s and &field;s.  It also supports
!   data-parallel computation similar to &fortran 90 syntax.  For
!   computations where one &field;'s values is a function of several
!   other &field;'s values, the programmer can specify a relation.
!   Relations are lazily evaluated: whenever the dependent &field;'s
!   values are needed and it is dependent on a &field; whose values have
!   changed, its values are computed.  Lazy evaluation also assists
!   correctness by eliminating the frequently forgotten need for a
!   programmer to ensure a &field;'s values are up-to-date before being
!   used.</para>
! 
!   <bridgehead id="introduction-goals-efficient" renderas="sect2">Efficient Code</bridgehead>
! 
!   <para>&pooma; incorporates a variety of techniques to ensure it
!   produces code that executes as quickly as special-case,
!   hand-written code.
!  <!-- FIXME: Do I present execution numbers here? -->
!   These techniques include extensive use of templates, out-of-order
!   evaluation, use of guard layers, and production of fast inner loops.</para>
! 
!   <para>&pooma;'s uses of &cc; templates permits the expressiveness
!   from using pointers and function arguments but ensures as much as
!   work as possible occurs at compile time, not run time.  This speeds
!   programs' execution.  Since more code is produced at compile time,
!   more code is available to the compiler's optimizer, further speeding
!   execution.  The &pooma; &array; container benefits from the use of
!   template parameters.  Their use permits the use of specialized data
!   storage classes called <link
!   linkend="glossary-engine"><firstterm>engines</firstterm></link>.  An
!   &array;'s engine template parameter specifies how data is stored and
!   indexed.  Some &array;s expect almost all values to be used, while
!   others might be mostly vacant.  In the latter case, using a
!   specialized engine storing the few nonzero values greatly reduces
!   space requirements.  Using engines also permits fast creation of
!   container views, known as <firstterm>array sections</firstterm> in
!   Fortran 90.  A view's engine is the same as the original
!   container's engine, but the view object maps its restricted domain to
!   the original domain.  Space requirements and execution time to use
!   views are minimal.  Using templates also permits containers to
!   support polymorphic indexing, e.g., indexing both by integers and by
!   three-dimensional coordinates.  A container defers indexing
!   operations to its engine's templatized index operator.  Since it uses
!   templates, the engine can define indexing functions with different
!   function arguments, without the need to add corresponding container
!   functions.  Some of these benefits of using templates can be
!   expressed without them, but doing so increases execution time.  For
!   example, a container could have a pointer to an engine object, but
!   this requires a pointer dereference for each operation.  Implementing
!   polymorphic indexing without templates would require adding virtual
!   functions corresponding to each of the indexing functions.</para>
! 
!  <!-- FIXME: Are the claims concerning out-of-order evaluation I make true? -->
! 
!   <para>To ensure multiprocessor &pooma; programs execute quickly, it
!   is important that interprocessor communication overlaps with
!   intraprocessor computations as much as possible and that
!   communication is minimized.  Asynchronous communication, out-of-order
!   evaluation, and use of guard layers all help achieve these goals.
!   &pooma; uses the asynchronous communication facilities of the
!   &cheetah; communication library.  When a processor needs data that is
!   stored or computed by another processor, a message is sent between
!   the two.  If synchronous communication was used, the sender must
!   issue an explicit send, and the recipient must issue an explicit
!   receive, synchronizing the two processors.  &cheetah; permits the
!   sender to put and get data without synchronizing with the recipient
!   processor, and it also permits invoking functions at remote sites to
!   ensure desired data is up-to-date.  Thus, out-of-order evaluation
!   must be supported.  Out-of-order evaluation also has another benefit:
!   Only computations directly or indirectly related to values that are
!   printed need occur.</para>
! 
!   <para>Surrounding a patch with <link
!   linkend="glossary-guard_layer"><firstterm>guard
!   layers</firstterm></link> can help reduce interprocessor
!   communication.  For distributed computation, each container's domain
!   is split into pieces distributed among the available processors.
!   Frequently, computing a container value is local, involving just the
!   value itself and a few neighbors, but computing a value near the edge
!   of a processor's domain may require knowing a few values from a
!   neighboring domain.  Guard layers permit these values to be copied
!   locally so they need not be repeatedly communicated.</para>
! 
!   <para>&pooma; uses &pete; technology to ensure inner loops involving
!   &pooma;'s object-oriented containers run as quickly as hand-coded
!   <!-- FIXME: Add a citation to Dr. Dobb's Journal article pete-99. -->
!   loops.  &pete; (the Portable Expression Template Engine) uses
!   expression-template technology to convert data-parallel statements
!   in the inner loops of programs into efficient loops
!   without any intermediate computations.  For example, consider
!   evaluating the statement
!   <programlisting>
!   A += -B + 2 * C;</programlisting>
!   where <varname>A</varname> and <varname>C</varname> are
!   <type>vector<double></type>s and <varname>B</varname> is a
!   <type>vector<int></type>.  Naive evaluation might introduce
!   intermediaries for <statement>-B</statement>,
!   <statement>2*C</statement>, and their sum.  The presence of these
!   intermediaries in inner loops can measurably slow evaluation.  To
!   produce a loop without intermediaries, &pete; stores each expression
!   as a parse tree.  The resulting parse trees can be combined into a
!   larger parse tree.  Using its templates, the parse tree is converted,
!   at compile time, to a loop evaluating each component of the result.
!   Thus, no intermediate values are computed or stored.  For example,
!   the code corresponding to the statement above is
!   <programlisting>
!   vector<double>::iterator iterA = A.begin();
!   vector<int>::const_iterator iterB = B.begin();
!   vector<double>::const_iterator iterC = C.begin();
!   while (iterA != A.end()) {
!     *iterA += -*iterB + 2 * *iterC;
!     ++iterA; ++iterB; ++iterC;
!   }</programlisting>
!   Furthermore, since the code is available at compile, not run, time,
!   it can be further optimized, e.g., moving any loop-invariant code out
!   of the loop.</para>
! 
!   <bridgehead id="introduction-goals-scientific" renderas="sect2">Used for Diverse Set of Scientific Problems</bridgehead>
! 
!   <para>&pooma; has been used to solve a wide variety of scientific
!   problems.  Most recently, physicists at Los Alamos National
!   Laboratory implemented an entire library of hydrodynamics codes as
!   part of the U.S. government's science-based Stockpile Stewardship
!   Program to simulate nuclear weapons.  Other applications include a
!   matrix solver, an accelerator code simulating the dynamics of
!   high-intensity charged particle beams in linear accelerators, and a
!   Monte Carlo neutron transport code.</para>
! 
!   <bridgehead id="introduction-goals-easy_implementation" renderas="sect2">Easy Implementation</bridgehead>
! 
!   <para>&pooma;'s tools greatly reduce the time to implement
!   applications.  As we noted above, &pooma;'s containers and expression
!   syntax model the computational models and algorithms most frequently
!   found in scientific programs.  These high-level tools are known to be
!   correct and reduce the time to debug programs.  Since the same
!   programs run on one processor and multiple processors, programmers
!   can write and test programs using their one or two-processor personal
!   computers.  With no additional work, the same program runs on
!   computers with hundreds of processors; the code is exactly the same,
!   and the &toolkit; automatically handles distribution of the data, all
!   data communication, and all synchronization.  The net results is a
!   significant reduction in programming time.  For example, a team of
!   two physicists and two support people at Los Alamos National
!   Laboratory implemented a suite of hydrodynamics kernels in six
!   months.  Their work replaced a previous suite of less-powerful
!   kernels which had taken sixteen people several years to implement and
!   debug.  Despite not have previously implemented any of the kernels,
!   they implemented one new kernel every three days, including the time
!   to read the corresponding scientific papers!</para>
!  </section><!-- introduction-goals -->
! 
! 
!  <section id="introduction-performance">
!   <title>&pooma; Produces Fast Programs</title>
! 
!   <para>almost as fast as &c;.  wide variety of configurations: one
!   processor, many processors, give performance data for at least two
!   different programs
! HERE</para>
! 
!   <para>describe &doof2d; here
! 
!   &doof2d; is a two-dimensional diffusion simulation program.
!   Initially, all values in the square two-dimensional grid are zero
!   except for the central value.  
! 
! HERE</para>
! 
!  </section>
! 
! <!-- HERE -->
! 
!  <section id="introduction-open_source">
!   <title>&pooma; is Free, Open-Source Software</title>
! 
!   <para>The &poomatoolkit; is open-source software.  Anyone may
!   download, read, redistribute, and modify the &pooma; source code.
!   If an application requires a specialized container, any programmer
!   may add it.  Any programmer can extend it to solve problems in
!   previously unsupported domains.  Companies using the &toolkit; can
!   read the source code to ensure it has no hidden back doors or
!   security holes.  It may be downloaded for free and used for
!   perpetuity.  There are no annual licenses and no on-going costs.  By
!   keeping their own copies, companies are guaranteed the software will
!   never disappear.  In summary, the &poomatoolkit; is free, low-risk
!   software.</para>
!  </section>

   <section id="introduction-pooma_history">
    <title>History of &pooma;</title>

!   <para>The &poomatoolkit; was developed at Los Alamos National
    Laboratory to assist nuclear fusion and fission research.
!   In 1994, the &toolkit; grew out of the <application
!   class='software'>Object-Oriented Particle Simulation</application>
!   class library developed for particle-in-cell simulations.  The goals
!   of the Framework, as it was called at the time, were driven by the
!   Numerical Tokamak's <quote>Parallel Platform Paradox</quote>:
    <blockquote>
     <para>The average time required to implement a moderate-sized
     application on a parallel computer architecture is equivalent to
***************
*** 298,304 ****
    </blockquote>
    The framework's goal of being able to quickly write efficient
    scientific code that could be run on a wide variety of platforms
!   remains unchanged today.  Development, driven mainly by the
    Advanced Computing Laboratory at Los Alamos, proceeded rapidly.
    A matrix solver application was written using the framework.
  <!-- FIXME: Add citation to pooma-sc95. -->
--- 337,343 ----
    </blockquote>
    The framework's goal of being able to quickly write efficient
    scientific code that could be run on a wide variety of platforms
!   remains unchanged today.  Development, mainly at the
    Advanced Computing Laboratory at Los Alamos, proceeded rapidly.
    A matrix solver application was written using the framework.
  <!-- FIXME: Add citation to pooma-sc95. -->
***************
*** 307,321 ****

    <para>By 1998, &pooma; was part of the U.S. Department of
    Energy's Accelerated Strategic Computing Initiative
!   (<acronym>ASCI</acronym>).  The Comprehensive Test Ban Treaty
!   forbid nuclear weapons testing so they were instead simulated.
!   <acronym>ASCI</acronym>'s goal was to radically advance the state
!   of the art in high-performance computing and numerical simulations
!   so the nuclear weapon simulations could use 100-teraflop
!   computers.  A linear accelerator code <application
    class='software'>linac</application> and a Monte Carlo neutron
!   transport code <application class='software'>MC++</application>
!   were written.
  <!-- FIXME: Add citation to pooma-siam98. -->
    </para>

--- 346,360 ----

    <para>By 1998, &pooma; was part of the U.S. Department of
    Energy's Accelerated Strategic Computing Initiative
!   (<acronym>ASCI</acronym>).  The Comprehensive Test Ban Treaty forbid
!   nuclear weapons testing so they were instead simulated using
!   computers.  <acronym>ASCI</acronym>'s goal was to radically advance
!   the state of the art in high-performance computing and numerical
!   simulations so the nuclear weapon simulations could use 100-teraflop
!   parallel computers.  A linear accelerator code <application
    class='software'>linac</application> and a Monte Carlo neutron
!   transport code <application class='software'>MC++</application> were
!   among the codes written.
  <!-- FIXME: Add citation to pooma-siam98. -->
    </para>

***************
*** 332,348 ****
    engines were added.  Release 2.1.0 included &field;s with
    their spatial extent and &dynamicarray;s with the ability to
    dynamically change its domain size.  Support for particles and
!   their interaction with &field;s was added.  The &pooma; messaging
    implementation was revised in release 2.3.0.  Use of the
    &cheetah; Library separated &pooma; from the actual messaging
!   library used.  Support for applications running on clusters of
!   computers was added.  During the past two years, the &field;
    abstraction and implementation was improved to increase its
    flexibility, add support for multiple values and materials in the
    same cell, and permit lazy evaluation.  Simultaneously, the
!   execution speed of the inner loops was greatly increased.  The
!   particle code has not yet been ported to the new &field;
!   abstraction.</para>
   </section>

  </chapter>
--- 371,389 ----
    engines were added.  Release 2.1.0 included &field;s with
    their spatial extent and &dynamicarray;s with the ability to
    dynamically change its domain size.  Support for particles and
!   their interaction with &field;s were added.  The &pooma; messaging
    implementation was revised in release 2.3.0.  Use of the
    &cheetah; Library separated &pooma; from the actual messaging
!   library used, and support for applications running on clusters of
!   computers was added.  <ulink
!   url="http://www.codesourcery.com">CodeSourcery, LLC</ulink>, and
!   <ulink url="www.proximation.com">Proximation, LLC</ulink>, took
!   over &pooma; development from Los Alamos National Laboratory.
!   During the past two years, the &field;
    abstraction and implementation was improved to increase its
    flexibility, add support for multiple values and materials in the
    same cell, and permit lazy evaluation.  Simultaneously, the
!   execution speed of the inner loops was greatly increased.</para>
   </section>

  </chapter>
Index: manual.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v
retrieving revision 1.4
diff -c -p -r1.4 manual.xml
*** manual.xml	2001/12/17 17:27:41	1.4
--- manual.xml	2002/01/04 17:14:10
***************
*** 26,31 ****
--- 26,33 ----
    <!-- Modify this to the desired formatting. -->
  <!ENTITY cheetah "<application class='software'>Cheetah</application>" >
    <!-- Produce a notation for the Cheetah Library.  -->
+ <!ENTITY closeclose "> >" >
+   <!-- Produce a notation for ">>", which frequently occurs with templates.  Without this, TeX produces a shift symbol. -->
  <!ENTITY dashdash "- -" >
    <!-- Produce a notation for a double dash.  Without this, TeX produces an en-hyphen. -->
  <!ENTITY doof2d "<command>Doof2d</command>" >
***************
*** 38,47 ****
    <!-- Produce a notation for the MM Library.  -->
  <!ENTITY mpi "<application class='software'>MPI</application>">
    <!-- Produce a notation for the MPI package.  -->
  <!ENTITY pdt "<application class='software'>PDToolkit</application>">
    <!-- Produce a notation for the PDT software package.  -->
  <!ENTITY pete "<application class='software'>PETE</application>">
!   <!-- Produce a notation for the PETE library.  -->
  <!ENTITY pooma "<application class='software'>POOMA</application>">
    <!-- Produce a notation for Pooma software.  -->
  <!ENTITY poomatoolkit "<application class='software'>POOMA &toolkitcap;</application>">
--- 40,51 ----
    <!-- Produce a notation for the MM Library.  -->
  <!ENTITY mpi "<application class='software'>MPI</application>">
    <!-- Produce a notation for the MPI package.  -->
+ <!ENTITY openopen "< <" >
+   <!-- Produce a notation for "<<", which frequently occurs with output.  Without this, TeX produces a shift symbol. -->
  <!ENTITY pdt "<application class='software'>PDToolkit</application>">
    <!-- Produce a notation for the PDT software package.  -->
  <!ENTITY pete "<application class='software'>PETE</application>">
!   <!-- Produce a notation for the PETE framework.  -->
  <!ENTITY pooma "<application class='software'>POOMA</application>">
    <!-- Produce a notation for Pooma software.  -->
  <!ENTITY poomatoolkit "<application class='software'>POOMA &toolkitcap;</application>">
***************
*** 87,92 ****
--- 91,98 ----
    <!-- The "Field" type. -->
  <!ENTITY inform "<type>Inform</type>">
    <!-- The "Inform" output type. -->
+ <!ENTITY int "<type>int</type>">
+   <!-- The C "int" type. -->
  <!ENTITY interval "<type>Interval</type>">
    <!-- The "Interval" type. -->
  <!ENTITY layout "<type>Layout</type>">
***************
*** 155,162 ****
--- 161,172 ----
    <!-- spelling: nonzero, not non-zero -->

  <!-- External Chapters -->
+ <!ENTITY bibliography-chapter SYSTEM "bibliography.xml">
+   <!-- bibliography -->
  <!ENTITY concepts-chapter SYSTEM "concepts.xml">
    <!-- Pooma concepts chapter -->
+ <!ENTITY data-parallel-chapter SYSTEM "data-parallel.xml">
+   <!-- data-parallel expressions chapter -->
  <!ENTITY glossary-chapter SYSTEM "glossary.xml">
    <!-- glossary -->
  <!ENTITY introductory-chapter SYSTEM "introduction.xml">
***************
*** 183,189 ****

  <!-- Sequential Programs -->
  <!ENTITY initialize-finalize SYSTEM "./programs/examples/Sequential/initialize-finalize-annotated.cpp">
!   <!-- illustrate initialize() and finalize() -->
  ]>

  <book>
--- 193,205 ----

<!ENTITY initialize-finalize SYSTEM "./programs/examples/Sequential/initialize-finalize-annotated.cpp">
!   
! 
! 
! <!ENTITY pairs-untemplated SYSTEM "./programs/examples/Templates/pairs-untemplated-annotated.cpp">
!   
! <!ENTITY pairs-templated SYSTEM "./programs/examples/Templates/pairs-templated-annotated.cpp">
!   
  ]>

  <book>
***************
*** 205,211 ****
    <revhistory>
     <revision>
      <revnumber>0.01</revnumber>
!     <date>2001 Nov 26</date>
      <authorinitials>jdo</authorinitials>
      <revremark>first draft</revremark>
     </revision>
--- 221,227 ----
    <revhistory>
     <revision>
      <revnumber>0.01</revnumber>
!     <date>2001 Dec 18</date>
      <authorinitials>jdo</authorinitials>
      <revremark>first draft</revremark>
     </revision>
***************
*** 280,292 ****
    <title>Programming with &pooma;</title>

!   &introductory-chapter;

    &tutorial-chapter;

    &concepts-chapter;

    <chapter id="sequential">
     <title>Writing Sequential Programs</title>
--- 296,1819 ----
    <title>Programming with &pooma;</title>

  <!-- FIXME: Add a partintro to the part above? -->
+ 
+   &introductory-chapter; 
+ 
+ 
+   <chapter id="template_programming">
+    <title>Programming with Templates</title>
+ 
+    <para>&pooma; extensively uses &cc; templates to support type
+    polymorphism without any run-time cost.  In this chapter, we
+    briefly introduce using templates in &cc; programs by relating them
+    to <quote>ordinary</quote> &cc; constructs such as values, objects,
+    and classes.  The two main concepts underlying &cc; templates will
+    occur repeatedly:
+    <itemizedlist>
+     <listitem>
+      <para>Template programming occurs at compile time, not run
+      time.  That is, template operations occur inside the compiler,
+      not when a program runs.</para>
+     </listitem>
+     <listitem>
+      <para>Templates permit declaring families of classes with a
+      single declaration.  For example, the &array; template
+      declaration permits using arrays with many different element
+      types, e.g., arrays of integers, arrays of floating point
+      numbers, and arrays of arrays.</para>
+     </listitem>
+    </itemizedlist>
+    For those interested in the implementation of &pooma;, we close
+    with a discussion of some template programming concepts used in the
+    implementation but not likely to be used by &pooma; users.</para>
+ 
+    <section id="template_programming-compile_time">
+     <title>Templates Occur at Compile-Time</title>
+ 
+     <para>&pooma; uses templates to support type polymorphism without
+     incurring any run-time cost as a program executes.  All template
+     operations are performed at compile time by the compiler.</para>
+ 
+     <para>Prior to the introduction of templates, almost all a
+     program's interesting computation occurred when it was executed.
+     When writing the program, the programmer, at <glossterm
+     linkend="glossary-programming_time"><firstterm>programming
+     time</firstterm></glossterm>, would specify which statements and
+     expressions would occur and which types to use.  At <glossterm
+     linkend="glossary-compile_time"><firstterm>compile
+     time</firstterm></glossterm>, the compiler converts the program's
+     source code into an executable program.  Even though the compiler
+     uses the types to produce the executable, no interesting
+     computation occurs.  At <glossterm
+     linkend="glossary-run_time"><firstterm>run
+     time</firstterm></glossterm>, the resulting executable program
+     actually performs the operations.</para>
+ 
+     <para>The introduction of templates permits interesting
+     computation to occur while the compiler produces the executable.
+     Most interesting is template instantiation, which produces a type
+     at compile time.  For example, the &array; <quote>type</quote>
+     definition requires template parameters <varname>Dim</varname>,
+     <varname>T</varname>, and <varname>EngineTag</varname>, specifying
+     its dimension, the type of its elements, and its engine type.  To
+     use this, a programmer specifies values for the template
+     parameters:
+     <statement><type>Array<2,double,Brick></type></statement>.
+     At compile time, the compiler creates a type definition by
+     substituting the values for the template parameters in the
+     template definition.  The substitution is analogous to the
+     run-time application of a function to specific values.</para>
+ 
+     <para>All computation not involving run-time input or output can
+     occur at program time, compile time, or run time, whichever is
+     more convenient.  At program time, a programmer can perform
+     computations by hand rather than writing code to compute it.  &cc;
+     templates are Turing-complete so they can compute anything.
+     Unfortunately, syntax for compile-time computation is more
+     difficult than for run-time computation, and also current compiler
+     are not as efficient as executables.  Run-time &cc; constructs are
+     Turing-complete so using templates is unnecessary.  Thus, we shift
+     computation to the time which best trades off the ease of
+     expressing syntax with the speed of computation by programmer,
+     compiler, or computer chip.  For example, &pooma; uses expression
+     template technology to speed run-time execution of data-parallel
+     statements.  The &pooma; developers decided to shift some of the
+     computation from run-time to compile-time using template
+     computations.  The resulting run-time code runs more quickly, but
+     compiling the code takes longer.  Also, programming time for the
+     &pooma; developers increased significantly, but, since most users
+     are most concerned about decreasing run times, they made this
+     choice.</para>
+ 
+    </section>
+ 
+ 
+    <section id="template_programming-template_use">
+     <title>Template Programming for &pooma; Users</title>
+ 
+     <para>Most &pooma; users need only understand a subset of
+     available tools for template programming.  These tools include
+     <itemizedlist>
+       <listitem>
+        <para>reading template declarations and understanding template
+        parameters, which are used in this book.</para>
+       </listitem>
+      <listitem>
+       <para>template instantiation, specifying a particular type by
+       specifying values for template parameters.</para>
+      </listitem>
+       <listitem>
+        <para>nested type names, which are types specified within a
+        class definition.</para>
+       </listitem>
+     </itemizedlist>
+     We discuss these below.</para>

!     <example id="template_programming-template_use-untemplated_pair_example">
!      <title>Classes Storing Pairs of Values</title>
! &pairs-untemplated;
!     </example>
! 
!     <para>Templates generalize writing class declarations by
!     permitting class declarations dependent on other types.  For
!     example, consider writing a class storing a pair of integers and a
!     class storing a pair of doubles.  See <xref
!     linkend="template_programming-template_use-untemplated_pair_example"></xref>.
!     Almost all of the code for the two definitions is the same.  Both
!     of these definitions define a class with a constructor and storing
!     two values named <varname>left</varname> and
!     <varname>right</varname> having the same type.  Only the classes'
!     names and its use of types differ.</para>
! 
!     <example id="template_programming-template_use-templated_pair_example">
!      <title>Templated Class Storing Pairs of Values</title>
! &pairs-templated;
!      <calloutlist>
!       <callout
!        arearefs="template_programming-template_use-templated_pair_program-template_declaration">
!        <para>Template parameters are written before, not after, a
!        class name.</para>
!       </callout>
!       <callout
!        arearefs="template_programming-template_use-templated_pair_program-constructor">
!        <para>The constructor has two parameters with the type <varname>T</varname>.</para>
!       </callout>
!       <callout
!        arearefs="template_programming-template_use-templated_pair_program-members">
!        <para>An object stores two values having type <varname>T</varname>.</para>
!       </callout>
!       <callout
!        arearefs="template_programming-template_use-templated_pair_program-use">
!        <para>To use a templated class, specify the template
!        parameter's argument after the class's name and surrounded by
!        angle brackets (<statement><></statement>).</para>
!       </callout>
!      </calloutlist>
!     </example>
! 
!     <para>Using templates, we can use a template parameter to
!     represent their different uses of types and write one templated
!     class definition.  See <xref
!     linkend="template_programming-template_use-templated_pair_example"></xref>.
!     The templated class definition is a copy of the common portions of
!     the two preceding definitions.  Because the two definitions differ
!     only in their use of the ∫ and &double; types, we replace
!     these concrete types with a template
!     parameter <varname>T</varname>.  We
!     <emphasis>precede</emphasis>, not follow, the class definition
!     with <statement>template <typename T></statement>.  The
!     constructor's parameters' types are changed
!     to <varname>T</varname> as are the data members'
!     types.</para>
! 
!     <para>To use a template class definition, template arguments
!     follow the class name surrounded by angle
!     brackets (<statement><></statement>).  For example,
!     <type>pair<int></type> <glossterm
!     linkend="glossary-template_instantiation"><firstterm>instantiates</firstterm></glossterm>
!     <classname>pair</classname> template class definition with
!     <varname>T</varname> equal to ∫.  That is, the compiler
!     creates a definition for <type>pair<int></type> by copying
!     <classname>pair</classname>'s template definition and substituting
!     ∫ for each occurrence of <varname>T</varname>.  The copy
!     omits the template parameter declaration <statement>template
!     <typename T></statement> at the beginning of its definition.
!     The result is a definition exactly the same as
!     <classname>pairOfInts</classname>.</para>

+     <table frame="none" colsep="0" rowsep="0" tocentry="1"
+      orient="port" pgwide="0"
+      id="template_programming-template_use-correspondence_table">
+      <title>Correspondences Between Run-Time and Compile-Time
+      Programming Constructs</title>
+      
+      <tgroup cols="3" align="left">
+       <thead>
+        <row>
+ 	<entry></entry>
+ 	<entry>run time</entry>
+ 	<entry>compile time</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+ 	<entry>values</entry>
+ 	<entry>integers, strings, objects, functions, …</entry>
+ 	<entry>types, …</entry>
+        </row>
+        <row>
+ 	<entry>create a value to store multiple values</entry>
+ 	<entry>object creation</entry>
+ 	<entry>class definition</entry>
+        </row>
+        <row>
+ 	<entry>values stored in a collection</entry>
+ 	<entry>data member, member function</entry>
+ 	<entry>nested type name, nested class, static member function,
+ 	constant integral values</entry>
+        </row>
+        <row>
+ 	<entry>placeholder for <quote>any particular value</quote></entry>
+ 	<entry>variable, e.g., <quote>any int</quote></entry>
+ 	<entry>template argument, e.g., <quote>any type</quote></entry>
+        </row>
+        <row>
+ 	<entry>repeated operations</entry>
+ 	<entry>A function generalizes a particular operation applied to
+ 	different values.  The function parameters are placeholders
+ 	for particular values.</entry>
+ 	<entry>A template class generalizes a particular class
+ 	definition using different types.  The template parameters are
+ 	placeholders for particular values.</entry>
+        </row>
+        <row>
+ 	<entry>application</entry>
+ 	<entry>Use a function by appending function arguments
+ 	surrounded by parentheses.</entry>
+ 	<entry>Use a template class by appending template arguments
+ 	surrounded by angle brackets (<>).</entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+ 
+     <para>As we mentioned above, template instantiation is analogous
+     to function application.  A template class is analogous to a
+     function.  The analogy between compile-time and run-time
+     programming constructs can be extended.  At run time, values used
+     consist of things such as integers, floating point numbers,
+     pointers, functions, and objects.  Programs compute by operating
+     on these values at run time.  At compile time, the values used
+     include types.  Compile-time operations use these types.  &cc;
+     defines default sets of values that all conforming compilers must
+     support.  Object creation extends the set of run-time values,
+     while a class definition extends the set of compile-time types.</para>
+ 
+     <para>Functions generalize similar run-time operations, while
+     template class generalize similar class definitions.  A function
+     definition generalizes a similar run-time operation.  For
+     example, consider repeatedly printing the largest of two numbers:
+ <programlisting>
+ std::cout << (3 > 4 ? 3 : 4) << std::endl;
+ std::cout << (4 > -13 ? 4 : -13) << std::endl;
+ std::cout << (23 > 4 ? 23 : 4) << std::endl;
+ std::cout << (0 > 3 ? 0 : 3) << std::endl;
+ </programlisting>  Each statement is exactly the same except for its
+ two values.  Thus, we can generalize these statements writing a function.
+ <programlisting>
+ void maxOut(int a, int b)
+ { std::cout &openopen; (a > b ? a : b) &openopen; std::endl; }
+ </programlisting>  The function's body consists of a statement with
+ variables substituted for the two particular values.  Each parameter
+ is a placeholder that, when used, holds one particular value among the
+ set of possible integral values.  The function must be named to permit
+ its use, and declarations for its two parameters follow.  Using the
+ function simplifies the code:
+ <programlisting>
+ maxOut(3, 4);
+ maxOut(4, -13);
+ maxOut(23, 4);
+ maxOut(0, 3);
+ </programlisting>  To use a function, the function's name precedes
+     parentheses surrounding specific values for its parameters.  The
+     function's return value does not appear.</para>
+ 
+     <para>A template class definition generalizes similar class
+     definitions.  If two class definitions differ only in a few types,
+     template parameters can be substituted.  Each parameter is a
+     placeholder that, when used, holds one particular value, i.e.,
+     type, among the set of possible values.  The class definition is
+     named to permit its use, and declarations for its parameters
+     precede it.  The example found in the previous section illustrates
+     this transformation.  Compare the original, untemplated classes in
+     <xref
+     linkend="template_programming-template_use-untemplated_pair_example"></xref>
+     with the templated class in <xref
+     linkend="template_programming-template_use-templated_pair_example"></xref>.
+     Note the notation for the template class parameters.
+     <statement>template <typename T></statement>
+     <emphasis>precedes</emphasis> the class definition.  The keyword
+     <keywordname>typename</keywordname> indicates the template
+     parameter is a type.  <varname>T</varname> is the template
+     parameter's name.  Note that using
+     <keywordname>class</keywordname> is equivalent to using
+     <keywordname>typename</keywordname> so <statement>template
+     <class T></statement> is equivalent to <statement>template
+     <typename T></statement>.  Using a templated class requires
+     postfix, not prefix, notation.  The class's name precedes angle
+     brackets (<>) surrounding specific values (types) for
+     its parameters.  As we showed above,
+     <statement>pair<int></statement> <glossterm
+     linkend="glossary-template_instantiation">instantiates</glossterm>
+     the template class <classname>pair</classname> with ∫ for its
+     type parameter <varname>T</varname>.</para>
+ 
+     <para>In template programming, nested type names store
+     compile-time data that can be used within template classes.  Since
+     compile-time class definitions are analogous to run-time objects
+     and the latter stores named values, nested type names are values,
+     i.e., types, stores within class definitions.  For example, the
+     template class &array; has an nested type name for the type of its
+     domain:
+ <programlisting>
+ typedef typename Engine_t::Domain_t Domain_t;
+ </programlisting> This <keywordname>typedef</keywordname>, i.e., type
+     definition, defines the type <type>Domain_t</type> as equivalent
+     to <type>Engine_t::Domain_t</type>.  The <statement>::</statement>
+     operator selects the <type>Domain_t</type> nested type from inside
+     the <type>Engine_t</type> type.  This illustrates how to access
+     &array;'s <type>Domain_t</type> when not within &array;'s scope:
+     <type>Array<Dim, T, EngineTag>::Domain_t</type>.  The
+     analogy between object members and nested type names alludes to
+     its usefulness.  Just as run-time object members store information
+     for later use, nested type names store type information for later
+     use at compile time.  Using nested type names has no impact on the
+     speed of executing programs.</para>
+    </section>
+ 
+ 
+    <section id="template_programming-pooma_implementation">
+     <title>Template Programming Used to Write &pooma;</title>
+ 
+     <para>The preceding section presented template programming tools
+     needed to read this &book; and write programs using the
+     &poomatoolkit;.  In this section, we present template programming
+     techniques used to implement &pooma;.  We extend the
+     correspondence between compile-time template programming
+     constructs and run-time constructs.  Reading this section is not
+     necessary unless you wish to understand how &pooma; works.</para>
+ 
+     <table frame="none" colsep="0" rowsep="0" tocentry="1"
+      orient="port" pgwide="0"
+      id="template_programming-pooma_implementation-correspondence_table">
+      <title>More Correspondences Between Compile-Time and Run-Time
+      Programming Constructs</title>
+      
+      <tgroup cols="3" align="left">
+       <thead>
+        <row>
+ 	<entry></entry>
+ 	<entry>run time</entry>
+ 	<entry>compile time</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+ 	<entry>values</entry>
+ 	<entry>integers, strings, objects, functions, …</entry>
+ 	<entry>types, constant integers and enumerations, …</entry>
+        </row>
+        <row>
+ 	<entry>control flow to choose among operations</entry>
+ 	<entry><keywordname>if</keywordname>, <keywordname>while</keywordname>, <keywordname>goto</keywordname>, …</entry>
+ 	<entry>template class specializations with pattern matching</entry>
+        </row>
+        <row>
+ 	<entry>values stored in a collection</entry>
+ 	<entry>An object stores values.</entry>
+ 	<entry>A <glossterm linkend="glossary-traits_class">traits
+ 	class</glossterm> contains values describing a type.</entry>
+        </row>
+        <row>
+ 	<entry></entry>
+ 	<entry></entry>
+ 	<entry></entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+ 
+ 
+     <para>
+ 
+ HERE</para>
+    </section>
+ 
+ <!-- HERE -->
+ 
+   </chapter>
+ 
+ 
    &tutorial-chapter;

    &concepts-chapter;

+   <!-- FIXME: Revert to &data-parallel-chapter; -->
+ 
+   <chapter id="data_parallel">
+    <title>Data-Parallel Expressions</title>
+ 
+    <para>In the previous sections, we accessed container values one at
+    a time.  Accessing more than one value in a container required a
+    writing an explicit loop.  Scientists and engineers commonly
+    operate on sets of values, treated as an aggregate.  For example, a
+    vector is a one-dimension collection of data and two vectors can be
+    added together.  A matrix is a two-dimensional collection of data,
+    and a scalar and a matrix are multiplied.  A <glossterm
+    linkend="glossary-data_parallel"><firstterm>data-parallel
+    expression</firstterm></glossterm> simultaneously uses multiple
+    container values.  &pooma; supports data-parallel syntax.</para>
+ 
+    <para>After introducing data-parallel expressions and statements,
+    we present the corresponding &pooma; syntax.  Then we present its
+    implementation, which uses expression-template technology.  A naive
+    data-parallel implementation might generate temporary variables,
+    cluttering a program's inner loops and slowing its execution.
+    Instead, &pooma; uses &pete, the Portable Expression Template
+    Engine.  Using expression templates, it constructs a parse tree of
+    expressions and corresponding types, which is then quickly
+    evaluated without the need for temporary variables.</para>
+ 
+ 
+    <section id="data_parallel-multiple_values">
+     <title>Expressions with More Than One Container Value</title>
+ 
+     <para>Science and math is filled with aggregated values.  A vector
+     contains several components, and a matrix is a two-dimensional
+     object.  Operations on individual values are frequently extended
+     to operations on these aggregated values.  For example, two
+     vectors having the same length are added by adding corresponding
+     components.  The product of two matrices is defined in terms of
+     sums and products on its components.  The sine of an array is an
+     array containing the sine of every value in it.</para>
+ 
+     <para>A <glossterm
+     linkend="glossary-data_parallel"><firstterm>data-parallel
+     expression</firstterm></glossterm> simultaneously refers to
+     multiple container values.  Data-parallel statements, i.e.,
+     statements using data-parallel expressions, frequently occur in
+     scientific programs.  For example, the sum of two vectors v and w
+     is written as v+w.  Algorithms frequently use data-parallel
+     syntax.  Consider, for example, computing the total energy E
+     as the sum of kinetic energy K and potential energy U.
+     For a simple particle subject to the earth's gravity, the kinetic
+     energy K equals mv<superscript>2</superscript>/2, and the
+     potential energy U equals mgh.  These formulae apply to both
+     an individual particle with a particular mass m and
+     height h and to an entire field of particles with
+     masses m and heights h.  Our algorithm works with
+     data-parallel syntax, and we would like to write the corresponding
+     computer program using data-parallel syntax as well..</para>
+    </section>
+ 
+ 
+    <section id="data_parallel-use">
+     <title>Their Use</title>
+ 
+     <para>&pooma; containers can be used in data-parallel expressions
+     and statements.  The basic guidelines are simple:
+     <itemizedlist>
+      <listitem>
+        <para>The &cc; built-in and mathematical operators operate on
+        an entire container by operating element-wise on its values.</para>
+      </listitem>
+      <listitem>
+       <para>Binary operators operate only on containers with the same
+       domain types by combining values with the same indices.  If the
+       result is a container, it has a domain equal to the left operand's
+       domain.</para>
+      </listitem>
+      <listitem>
+       <para>For assignment operators, the domains of the left
+       operand and the right operand must have the same type and
+       be conformable, i.e., have the <quote>same shape</quote>.</para>
+      </listitem>
+     </itemizedlist>
+     </para>
+ 
+     <para>The operators operate element-wise on containers' values.
+     For example, if <varname>A</varname> is a one-dimensional array,
+     <statement>-<varname>A</varname></statement> is a one-dimensional
+     array with the same size such that the value at the
+     i<superscript>th</superscript> position equals -A(i).  If
+     <varname>A</varname> and <varname>B</varname> are two-dimensional
+     &array;s on the same domain,
+     <statement><varname>A</varname>+<varname>B</varname></statement>
+     is an array on the same domain with values equaling the sum of
+     corresponding values in <varname>A</varname> and
+     <varname>B</varname>.</para>
+ 
+     <figure float="1" id="data_parallel-use-addition_example">
+      <title>Adding &array;s with Different Domains</title>
+      <mediaobject>
+       <imageobject>
+        <imagedata fileref="figures/data-parallel.212" format="EPS" align="center"></imagedata>
+       </imageobject>
+       <textobject>
+        <phrase>Adding two arrays with different domains adds values
+        with the same indices.</phrase>
+       </textobject>
+       <caption>
+        <para>Adding &array;s with different domains is supported.
+        Solid lines indicate the domains' extent.  Values with the same
+        indices are added.</para>
+       </caption>
+      </mediaobject>
+     </figure>
+ 
+     <para>Binary operators operate on containers with the same domain
+     types.  The domain's indices need not be the same, but the result
+     will have a domain equal to the left operand.  For example, the
+     sum of an &array; <varname>A</varname> with a one-dimensional
+     interval [0,3] and an &array; <varname>B</varname> with
+     a one-dimensional interval [1,2] is well-defined because both
+     domains are one-dimensional intervals.  The result is an &array;
+     with a one-dimensional interval [0,3].  Its first and last
+     entries equal <varname>A</varname>'s first and last entries, while
+     its middle two entries are the sums
+     <statement>A(1)+B(1)</statement> and
+     <statement>A(2)+B(2)</statement>.  We assume zero is the
+     default value for the type of values stored
+     in <varname>B</varname>.  A more complicated example of
+     adding two &array;s with different domains is illustrated in <xref
+     linkend="data_parallel-use-addition_example"></xref>.  Code for
+     these &array;s could be
+ <programlisting>
+ Interval<1> H(0,2), I(1,3), J(2,4);
+ Array<2, double, Brick> A(I,I), B(J,H);
+ // ... fill A and B with values ...
+ ... = A + B;
+ </programlisting>Both <varname>A</varname> and
+     <varname>B</varname> have domains of two-dimensional intervals so
+     they may be added, but their domains' extent differ, as indicated
+     by the solid lines in the figure.  The sum has domain equal to the
+     left operand's domain.  Values with the same indices are added.  For
+     example, <statement>A(2,2)</statement> and
+     <statement>B(2,2)</statement> are added.  <varname>B</varname>'s
+     domain does not include index (1,1) so, when adding
+     <statement>A(1,1)</statement> and <statement>B(1,1)</statement>,
+     the default value for <varname>B</varname>'s value type is used.
+     Usually this is 0.  Thus, <statement>A(1,1) +
+     B(1,1)</statement> equals <statement>9 + 0</statement>.</para>
+ 
+     <para>Operations with &array;s and scalar values are supported.
+     Conceptually, a scalar value can be thought of as an &array; with
+     any desired domain and having the same value everywhere.  For
+     example, consider
+ <programlisting>
+ Array<1, double, Brick> D(Interval<1>(7,10));
+ D += 2*D + 7;
+ </programlisting><statement>2*D</statement> obeys the guidelines
+     because the scalar <statement>2</statement> can be thought of as
+     an array with the same domain as <varname>D</varname>.  It has the
+     same value <statement>2</statement> everywhere.  Likewise the
+     conceptual domain for the scalar <statement>7</statement> is the
+     same as <statement>2*D</statement>'s domain.  Thus,
+     <statement>2*D(i) + 7</statement> is added to
+     <statement>D(i)</statement> wherever index i is in
+     <varname>D</varname>'s domain.  In practice, the &toolkit; does
+     not first convert scalar values to arrays but instead uses them
+     directly in expressions.</para>
+ 
+     <para>Assignment to containers is also supported.  The domain
+     types of the assignment's left-hand side and its right-hand side
+     must be the same.  Their indices need not be the same, but they
+     must correspond.  That is, the domains must be <glossterm
+     linkend="glossary-conformable_domains"><firstterm>conformable
+     domains</firstterm></glossterm>, or have the <quote>same
+     shape</quote>, i.e., have the same number of indices for each
+     dimension.  For example, the one-dimensional interval [0,3] is
+     conformable to the one-dimensional interval [1,4] because they
+     both have the same number of indices in each dimension.  The
+     domains of <varname>A</varname> and <varname>B</varname>, as
+     declared
+ <programlisting>
+ Interval<1> H(0,2), I(1,3), J(2,4), K(0,4);
+ Array<2, double, Brick> A(I,I), B(H,J), C(I,K);
+ </programlisting> are conformable because each dimension has the same
+     number of indices.  <varname>A</varname> and <varname>C</varname>
+     are not conformable because, while their first dimensions are
+     conformable, their second dimensions are not conformable.  It has
+     three indices while the other has four.  We define <glossterm
+     linkend="glossary-conformable_containers"><firstterm>conformable
+     containers</firstterm></glossterm> to be containers with
+     conformable domains.</para>
+ 
+     <para>When assigning to a container, corresponding container
+     values are assigned.  (Since the left-hand side and the right-hand
+     side are conformable, corresponding values exist.)  In this code
+     fragment,
+ <programlisting>
+ Array<1, double, Brick> A(Interval<1>(0,1));
+ Array<1, double, Brick> B(Interval<1>(1,2));
+ A = B;
+ </programlisting> <statement>A(0)</statement> is assigned
+     <statement>B(1)</statement> and <statement>A(1)</statement> is
+     assigned <statement>B(2)</statement>.</para>
+ 
+     <para>Assigning a scalar value to an &array; also is supported,
+     but assigning an &array; to a scalar is not.  A scalar value is
+     conformable to any domain because, conceptually it can be viewed
+     as an &array; with any desired domain and having the same value
+     everywhere.  Thus, the assignment <statement>B = 3</statement>
+     ensures every value in <varname>B</varname> equals 3.  Even
+     though a scalar value is conformable to any &array;, it is not an
+     l-value so it cannot appear on the left-hand side of an
+     assignment.</para>
+ 
+     <para>Data-parallel expressions can involve typical mathematical
+     functions and output operations.  For example,
+     <statement>sin(A)</statement> yields an &array; with values equal
+     to the sine of each of &array; <varname>A</varname>'s values.
+     <statement>dot(A,B)</statement> has values equaling the dot
+     product of corresponding values in &array;s <varname>A</varname>
+     and <varname>B</varname>.  The contents of an entire &array; can
+     be easily printed to standard output.  For example, the program
+ <programlisting>
+ Array<1, double, Brick> A(Interval<1>(0,2));
+ Array<1, double, Brick> B(Interval<1>(1,3));
+ A = 1.0;
+ B = 2.0;
+ std::cout << A-B << std::endl;
+ </programlisting> yields
+     <computeroutput>
+     (000:002:001) = 1 -1 -1</computeroutput>.  The initial
+     <computeroutput>(000:002:001)</computeroutput> indicates the
+     &array;'s domain ranges from 0 to 2 with a stride of 1.  The
+     three values in <statement>A-B</statement> follow.</para>
+ 
+     <para>So far, all of the above examples illustrating data-parallel
+     expressions and statements operate on all of a container's values.
+     Frequently, operating on a subset is useful.  In &pooma;, a subset
+     of a container's values is called a view.  Combining views and
+     data-parallel expressions will enable us to more succinctly and more
+     easily write the diffusion program.  Views are discussed in the
+     next chapter.</para>
+ 
+ <!-- HERE -->
+ 
+     <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ 	   orient="port" pgwide="0">
+      <title>Operators Permissible for Data-Parallel Expressions</title>
+      
+      <tgroup cols="2" align="left">
+       <thead>
+        <row>
+ 	<entry></entry>
+ 	<entry>supported operators</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+ 	<entry>unary operators</entry>
+ 	<entry>+, -, ~, !
+ HERE</entry>
+        </row>
+        <row>
+ 	<entry>binary operators</entry>
+ 	<entry>+, -, *, /, %, &, |, ^
+ HERE</entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+ 
+     <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ 	   orient="port" pgwide="0">
+       <title>Mathematical Operators Permissible for Data-Parallel Expressions</title>
+       
+       <tgroup cols="2" align="left">
+        <thead>
+ 	<row>
+ 	 <entry>function</entry>
+ 	 <entry>effect</entry>
+ 	</row>
+        </thead>
+       <tfoot>
+        <row>
+ 	<entry>Every effort has been made to present accurate
+         information, but restrictions caused by the underlying
+         functions may further restriction the data-parallel
+         functions.</entry>
+        </row>
+       </tfoot>
+        <tbody>
+        <row>
+ 	<entry><statement>Array<T> peteCast (const T1&, const Array<T>& A)</statement></entry>
+ 	<entry>Returns the casting of the array's values to type <type>T1</type>.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> ldexp (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	<entry>Multiplies <varname>A</varname>'s values by the
+ 	corresponding integral power of two in <varname>B</varname>.</entry>
+        </row>
+ <!-- HERE Reorder the above to be more sensible and add headings. -->
+        <row rowsep="1">
+ 	<entry>Trigonometric and Hyperbolic Operators</entry>
+ 	<entry><statement>#include <math.h></statement></entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> cos (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the cosines of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> sin (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the sines of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> tan (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the tangents of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> acos (const Array<T1>& A)</statement></entry>
+ 	<entry>Returns the arc cosines of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> asin (const Array<T1>& A)</statement></entry>
+ 	<entry>Returns the arc sines of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> atan (const Array<T1>& A)</statement></entry>
+ 	<entry>Returns the arc tangents of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> cosh (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the hyperbolic cosines of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> sinh (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the hyperbolic sines of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> tanh (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the hyperbolic tangents of the array's values.</entry>
+        </row>
+        <row rowsep="1">
+ 	<entry>Absolute Value and Rounding Operators</entry>
+ 	<entry><statement>#include <math.h></statement></entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> fabs (const Array<T1>& A)</statement></entry>
+ 	<entry>Returns the absolute values of the floating point
+ 	numbers in the array.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> ceil (const Array<T1>& A)</statement></entry>
+ 	<entry>For each of the array's values, return the integer
+ 	larger than or equal to it (as a floating point number).</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> ceil (const Array<T1>& A)</statement></entry>
+ 	<entry>For each of the array's values, return the integer
+ 	larger than or equal to it (as a floating point number).</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> floor (const Array<T1>& A)</statement></entry>
+ 	<entry>For each of the array's values, return the integer
+ 	smaller than or equal to it (as a floating point number).</entry>
+        </row>
+        <row rowsep="1">
+ 	<entry>Powers, Exponentiation, and Logarithmic Operators</entry>
+ 	<entry><statement>#include <math.h></statement></entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> PETE_identity (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the array.  That is, it applies the identity operation.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> sqrt (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the square roots of the array's values.</entry>
+        </row>
+ 	<row>
+ 	 <entry><statement>Array<T> pow2 (const Array<T>& A)</statement></entry>
+ 	 <entry>Returns the squares of <varname>A</varname>'s values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> pow3 (const Array<T>& A)</statement></entry>
+ 	 <entry>Returns the cubes of <varname>A</varname>'s values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> pow4 (const Array<T>& A)</statement></entry>
+ 	 <entry>Returns the fourth powers of <varname>A</varname>'s values.</entry>
+ 	</row>
+        <row>
+ 	<entry><statement>Array<T> exp (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the exponentiations of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> log (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the natural logarithms of the array's values.</entry>
+        </row>
+        <row>
+ 	<entry><statement>Array<T> log10 (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the base-10 logarithms of the array's values.</entry>
+        </row>
+        <row rowsep="1">
+ 	<entry>Operators Involving Complex Numbers</entry>
+ 	<entry><statement>#include <complex></statement></entry>
+        </row>
+        <row>
+ 	 <entry><statement>Array<T> real (const Array<complex<T&closeclose;& A)</statement></entry>
+ 	 <entry>Returns the real parts of <varname>A</varname>'s complex numbers.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> imag (const Array<complex<T&closeclose;& A)</statement></entry>
+ 	 <entry>Returns the imaginary parts of <varname>A</varname>'s complex numbers.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> abs (const Array<complex<T&closeclose;& A)</statement></entry>
+ 	<entry>Returns the absolute values (magnitudes) of
+ 	<varname>A</varname>'s complex numbers.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> abs (const Array<T>& A)</statement></entry>
+ 	<entry>Returns the absolute values of <varname>A</varname>'s values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> arg (const Array<complex<T&closeclose;& A)</statement></entry>
+ 	 <entry>Returns the angle representations (in radians) of the
+ 	 polar representations of <varname>A</varname>'s complex
+ 	 numbers.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> norm (const Array<complex<T&closeclose;& A)</statement></entry>
+ 	 <entry>Returns the squared absolute values of
+ 	 <varname>A</varname>'s complex numbers.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<complex<T&closeclose; conj (const Array<complex<T&closeclose;& A)</statement></entry>
+ 	 <entry>Returns the complex conjugates of
+ 	 <varname>A</varname>'s complex numbers.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<complex<T&closeclose; polar (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns the complex numbers created from polar
+ 	 coordinates (magnitudes and phase angles) in corresponding
+ 	 arrays.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<complex<T&closeclose; polar (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns the complex numbers created from polar
+ 	 coordinates with magnitude <varname>l</varname> and
+ 	 phase angles in the array.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<complex<T&closeclose; polar (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns the complex numbers created from polar
+ 	 coordinates with magnitudes in the array and phase
+ 	 angle <varname>r</varname>.</entry>
+ 	</row>
+         <row rowsep="1">
+ 	 <entry>Operators Involving Matrices and Tensors</entry>
+ 	 <entry><statement>#include "Pooma/Tiny.h"</statement></entry>
+         </row>
+ 	<row>
+ 	 <entry><statement>T trace (const Array<T>& A)</statement></entry>
+ 	 <entry>Returns the sum of the <varname>A</varname>'s diagonal
+ 	 entries, viewed as a matrix.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>T det (const Array<T>& A)</statement></entry>
+          <entry>Returns the determinant of <varname>A</varname>, viewed as a matrix.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> transpose (const Array<T>& A)</statement></entry>
+ 	 <entry>Returns the transpose of <varname>A</varname>, viewed as a matrix.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> symmetrize (const Array<T>& A)</statement></entry>
+ 	 <entry>Returns the tensors of <varname>A</varname> with the
+ 	 requested output symmetry.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> dot (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns the dot products of values in the two arrays.
+ 	 Value type <type>T</type> equals the type of the
+ 	 <function>dot</function> operating on <type>T1</type>
+ 	 and <type>T2</type>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> dot (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns the dot products of values in the array
+ 	 with <varname>r</varname>.
+ 	 Value type <type>T</type> equals the type of the
+ 	 <function>dot</function> operating on <type>T1</type>
+ 	 and <type>T2</type>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> dot (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns the dot products of <varname>l</varname> with
+ 	 values in the array.  Value type <type>T</type> equals the type of the
+ 	 <function>dot</function> operating on <type>T1</type>
+ 	 and <type>T2</type>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> dot (const Array<T1>& A, const T2& B)</statement></entry>
+ 	 <entry>Returns the dot products of values in the array
+ 	 Value type <type>T</type> equals the type of the
+ 	 <function>dot</function> operating on <type>T1</type>
+ 	 and <type>T2</type>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<Tensor<T&closeclose; outerProduct (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns tensors created by computing the outer product
+ 	 of corresponding vectors in the two arrays.  Value
+ 	 type <type>T</type> equals the type of the product of
+ 	 <type>T1</type> and <type>T2</type>.  The vectors
+ 	 must have the same length.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<Tensor<T&closeclose; outerProduct (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns tensors created by computing the outer product
+ 	 of <varname>l</varname> with the vectors in the array.  Value
+ 	 type <type>T</type> equals the type of the product of
+ 	 <type>T1</type> and <type>T2</type>.  The vectors
+ 	 must have the same length.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<Tensor<T&closeclose; outerProduct (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns tensors created by computing the outer product
+ 	 of vectors in the array with <varname>r</varname>.  Value
+ 	 type <type>T</type> equals the type of the product of
+ 	 <type>T1</type> and <type>T2</type>.  The vectors
+ 	 must have the same length.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>TinyMatrix<T> outerProductAsTinyMatrix (const Array<T1>& A, const
+ 	 Array<T2>& B)</statement></entry>
+ 	 <entry>Returns matrices created by computing the outer product
+ 	 of corresponding vectors in the two arrays.  Value
+ 	 type <type>T</type> equals the type of the product of
+ 	 <type>T1</type> and <type>T2</type>.  The vectors must have
+ 	 the same length.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>TinyMatrix<T> outerProductAsTinyMatrix (const T1& l, const
+ 	 Array<T2>& A)</statement></entry>
+          <entry>Returns matrices created by computing the outer
+ 	 product of <varname>l</varname> with the vectors in the array.  Value
+ 	 type <type>T</type> equals the type of the product of
+ 	 <type>T1</type> and <type>T2</type>.  The vectors must
+ 	 have the same length.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>TinyMatrix<T> outerProductAsTinyMatrix (const
+ 	 Array<T1>& A, const T2& r)</statement></entry>
+          <entry>Returns matrices created by computing the outer
+ 	 product of the vectors in the array
+ 	 with <varname>r</varname>.  Value
+ 	 type <type>T</type> equals the type of the product of
+ 	 <type>T1</type> and <type>T2</type>.  The vectors must
+ 	 have the same length.</entry>
+ 	</row>
+         <row rowsep="1">
+ 	 <entry>Comparison Operators</entry>
+         </row>
+ 	<row>
+ 	 <entry><statement>Array<T> max (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns the maximum of corresponding array values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> max (const T1& l, const Array<T2>& A)</statement></entry>
+ 	<entry>Returns the maximums of <varname>l</varname> with the array's values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> max (const Array<T1>& A, const T2& r)</statement></entry>
+ 	<entry>Returns the maximums of the array's values with <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> min (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns the minimum of corresponding array values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> min (const T1& l, const Array<T2>& A)</statement></entry>
+ 	<entry>Returns the minimums of <varname>l</varname> with the array's values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<T> min (const Array<T1>& A, const T2& r)</statement></entry>
+ 	<entry>Returns the minimums of the array's values with <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> LT (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns booleans from using the less-than
+ 	 operator < to compare corresponding array values in
+ 	 <varname>A</varname> and <varname>B</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> LT (const T1& r, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns booleans from using the less-than
+ 	 operator < to compare <varname>l</varname> with the array's
+ 	 values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> LT (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns booleans from using the less-than
+ 	 operator < to compare the array's
+ 	 values with <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> LE (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns booleans from using the less-than-or-equal
+ 	 operator ≤ to compare array values in
+ 	 <varname>A</varname> and <varname>B</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> LE (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns booleans from using the less-than-or-equal
+ 	 operator ≤ to compare <varname>l</varname> with the array's values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> LE (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns booleans from using the less-than-or-equal
+ 	 operator ≤ to compare the array's values
+ 	 with <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> GE (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns booleans from using the greater-than-or-equal
+ 	 operator ≥ to compare array values in
+ 	 <varname>A</varname> and <varname>B</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> GE (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns booleans from using the greater-than-or-equal
+ 	 operator ≥ to compare <varname>l</varname> with the array's values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> GE (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns booleans from using the greater-than-or-equal
+ 	 operator ≥ to compare the array's values with <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> GT (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns booleans from using the greater-than
+ 	 operator > to compare array values in
+ 	 <varname>A</varname> and <varname>B</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> GT (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns booleans from using the greater-than
+ 	 operator > to compare <varname>l</varname> with the array's values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> GT (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns booleans from using the greater-than
+ 	 operator > to compare the array's values with <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> EQ (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns booleans from determining whether
+ 	 corresponding array values in <varname>A</varname> and
+ 	 <varname>B</varname> are equal.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> EQ (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns booleans from determining whether
+ 	 <varname>l</varname> equals the array's values..</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> EQ (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns booleans from determining whether the array's values equal <varname>r</varname>.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> NE (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ 	 <entry>Returns booleans from determining whether
+ 	 corresponding array values in <varname>A</varname> and
+ 	 <varname>B</varname> are not equal.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> NE (const T1& l, const Array<T2>& A)</statement></entry>
+ 	 <entry>Returns booleans from determining whether
+ 	 <varname>l</varname> does not equal the array's values.</entry>
+ 	</row>
+ 	<row>
+ 	 <entry><statement>Array<bool> NE (const Array<T1>& A, const T2& r)</statement></entry>
+ 	 <entry>Returns booleans from determining whether the 
+ 	 array's values are not equal to <varname>r</varname>.</entry>
+ 	</row>
+ <!-- FIXME: Add dotdot from src/Array/PoomaArrayOperators.h if it is defined. -->
+        </tbody>
+       </tgroup>
+      </table>
+ 
+ <para>We need to explain that proper types must be chosen.  For
+ example, cos on complex and double works but ceil on complex does not.
+ HERE</para>
+ 
+ 
+ <!-- HERE -->
+ 
+    </section>
+ 
+ 
+    <section id="data_parallel-implementation">
+     <title>Implementation of Data-Parallel Statements</title>
+ 
+     <para>Data-parallel statements involving containers occur
+     frequently in the inner loops of scientific programs so their
+     efficient execution is important.  A naive implementation for
+     these statements may create and destroy containers holding
+     intermediate values, slowing execution considerably.
+     In 1995, Todd <!-- FIXME: Add citations to vandevoorde-95 and
+     veldhuizen-95. --> Veldhuizen and David Vandevoorde developed an
+     expression-template technique to transform arithmetic expressions
+     involving array-like containers into efficient loops without using
+     temporaries.  Despite its perceived complexity, &pooma;
+     incorporated the technology.  The framework called &pete, the
+     <application>Portable Expression Template Engine</application>
+     framework, is also available separately from &pooma; at
+     <ulink url="http://www.acl.lanl.gov/pete/"></ulink>.</para>
+ 
+     <para>In this section, we first describe how a naive
+     implementation may slow execution.  Then, we describe &pete;'s
+     faster implementation.  A data-parallel statement is converted
+     into a parse tree, rather than immediately evaluating it.  The
+     parse tree has two representations.  Its run-time representation
+     holds run-time values.  Its compile-time representation records
+     the types of the tree's values.  After a parse tree for the entire
+     statement is constructed, it is evaluated.  Since it is a
+     data-parallel statement, this evaluation involves at least one
+     loop.  At run time, each loop iteration, the value of one
+     container value is computed and assigned.  At compile time, when
+     the code for the loop iteration is produced, the parse tree's
+     types are traversed and code is produced without the need for any
+     intermediate values.  We present the implementation in <xref
+     linkend="data_parallel-implementation-pete"></xref>, but first we
+     explain the difficulties caused by the naive implementation.</para>
+ 
+     <section id="data_parallel-implementation-naive">
+      <title>Naive Implementation</title>
+ 
+      <para>A conventional implementation to evaluate data-parallel
+      expressions might overload arithmetic operator functions.
+      Consider this program fragment:
+ <programlisting>
+ Interval<1> I(0,3);
+ Array<1, double, Brick> A(I), B(I);
+ A = 1.0;
+ B = 2.0;
+ A += -A + 2*B;
+ std::cout << A << std::endl;
+ </programlisting> Our goal is to transform the data-parallel
+      statement <statement>A += -A + 2*B</statement> into a single
+      loop, preferably without intermediary containers.  To simplify
+      notation, let <type>Ar</type> abbreviate the type
+      <type>Array<1, double, Brick></type>.</para>
+      
+      <para>Using overloaded arithmetic operators would require using
+      intermediate containers to evaluate the statement.  For example,
+      <!-- FIXME: What is the proper tag for an inline function
+      prototype? --> the sum's left operand <statement>-A</statement>
+      would be computed by the overloaded unary operator <statement>Ar
+      operator-(const Ar&)</statement>, which would produce an
+      intermediate &array;.  <statement>Ar operator*(double,
+      const Ar&)</statement> would produce another intermediate
+      &array; holding <statement>2*B</statement>.  Yet another
+      intermediate container would hold their sum, all before
+      performing the assignment.  Thus, three intermediate containers
+      would be created and destroyed.  Below, we show these are
+      unnecessary.</para>
+     </section>
+ 
+     <section id="data_parallel-implementation-pete">
+      <title>Portable Expression Template Engine</title>
+ 
+      <para>&pooma; uses &pete;, the <application>Portable Expression
+      Template Engine</application> framework, to evaluate
+      data-parallel statements using efficient loops without
+      intermediate values.  &pete; uses expression-template technology.
+      Instead of aggressively evaluating a data-parallel statement's
+      subexpressions, it defers evaluation, instead building a parse
+      tree of the required computations.  The parse tree's type records
+      the types of each of its subtrees.  Then, the parse tree is
+      evaluated using an evaluator determined by the left-hand side's
+      type.  This container type determines how to loop through its
+      domain.  Each loop iteration, the corresponding value of the
+      right-hand side is evaluated.  No intermediate loops or temporary
+      values are needed.</para>
+ 
+      <figure float="1" id="data_parallel-implementation-pete-tree_figure">
+       <title>Annotated Parse Tree for <statement>-A + 2*B</statement></title>
+       <mediaobject>
+        <imageobject>
+ 	<imagedata fileref="figures/data-parallel.101" format="EPS" align="center"></imagedata>
+        </imageobject>
+        <textobject>
+ 	<phrase>A parse tree for the statement is produced.</phrase>
+        </textobject>
+        <caption>
+ 	<para>The parse tree for <statement>-A + 2*B</statement> with
+         type annotations.  The complete type of a node equals the
+         concatenation of the preorder traversal of annotated types.</para>
+        </caption>
+       </mediaobject>
+      </figure>
+ 
+      <para>Before explaining the implementation, let us illustrate
+      using our example statement <statement>A += -A + 2*B</statement>.
+      Evaluating the right-hand side creates a parse tree similar to
+      the one in <xref
+      linkend="data_parallel-implementation-pete-tree_figure"></xref>.
+      For example, the overloaded unary minus operator yields a tree
+      node representing <statement>-A</statement>, having a unary-minus
+      function object, and having type
+      <type>Expression<UnaryNode<OpMinus,Ar&closeclose;</type>.
+      The binary nodes continue the construction process yielding a
+      parse tree object for the entire right-hand side and having type
+      <type>Expression<BinaryNode<OpAdd, UnaryNode<OpMinus,
+      Ar>,
+      BinaryNode<OpMultiply<Scalar<int>,Ar&closeclose; ></type>.
+      Evaluating the left-hand side yields an object
+      representing <varname>A</varname>.</para>
+ 
+      <para>Finally, the assignment operator <statement>+=</statement>
+      calls the <function>evaluate</function> function corresponding to
+      the left-hand side's type.  At compile time, it produces the code
+      for the computation.  Since this templated function is
+      specialized on the type of the left-hand side, it generates a
+      loop through the left-hand side's container.  In the loop body,
+      the <function>forEach</function> function produces code for the
+      right-hand side expression at a specific position using a
+      post-order parse-tree traversal.  At a leaf, this evaluation
+      queries the leaf's container for a specified value or extracts a
+      scalar value.  At an interior node, its children's results are
+      combined using its function operator.  One loop performs the
+      entire assignment.  It is important to note that the type of the
+      entire right-hand side is known at compile time.  Thus, all of
+      these <function>evaluate</function>,
+      <function>forEach</function>, and function operator function
+      calls can be inlined at compile time to yield simple code without
+      any temporary containers and hopefully as fast as hand-written
+      loops!</para>
+ 
+      <para>To implement this scheme, we need &pooma; code to both
+      create the parse tree and to evaluate it.  We describe parse tree
+      creation first.  Parse trees consist of leaves,
+      <type>UnaryNode</type>s, <type>BinaryNode</type>s, and
+      <type>TrinaryNode</type>s.  Since <type>TrinaryNode</type>s are
+      similar to <type>BinaryNode</type>s, we omit describing them.  A
+      <type>BinaryNode</type>'s three template parameters correspond to
+      the three things it must store:
+      <variablelist>
+        <varlistentry>
+ 	<term><statement>Op</statement></term>
+ 	<listitem>
+ 	 <para>the type of the node's operation.  For example, the
+ 	 <type>OpAdd</type> type represents adding two operands
+ 	 together.</para>
+ 	</listitem>
+        </varlistentry>
+        <varlistentry>
+ 	<term><statement>Left</statement></term>
+ 	<listitem>
+ 	 <para>the type of the left child.</para>
+ 	</listitem>
+        </varlistentry>
+        <varlistentry>
+ 	<term><statement>Right</statement></term>
+ 	<listitem>
+ 	 <para>the type of the right child.</para>
+ 	</listitem>
+        </varlistentry>
+       </variablelist>
+     The node stores the left and right children's nodes.</para>
+ 
+     <para><type>BinaryNode</type> does not need to store any
+     representation of the node's operation.  Instead the
+     <type>Op</type> type is an empty structure declaring a function
+     object.  For example, <type>OpAdd</type>'s function object is
+     declared as
+ <programlisting>
+ template<class T1, class T2>
+ inline typename BinaryReturn<T1, T2, OpAdd>::Type_t
+ operator()(const T1 &a, const T2 &b) const
+ {
+   return (a + b);
+ }
+ </programlisting>  Since it has two template arguments, it can be
+     applied to operands of any type.  Because of &cc; type
+     conversions, the type of the result is determined using the
+     <type>BinaryReturn</type> traits class.  Consider adding an ∫
+     and a &double;.  <type>BinaryReturn<int, double,
+     OpAdd>::Type_t</type> equals &double;.  Inlining the function
+     ensures all this syntax is eliminated, leaving behind just an
+     addition.</para>
+ 
+     <para><type>UnaryNode</type>s are similar but have only two
+     template parameters and store only one child.</para>
+ 
+      <para>Parse tree leaves are created by the
+      <type>CreateLeaf</type> class and its specializations.  The
+      default leaf is a scalar so it has the most general definition:
+ <programlisting>
+ template<class T>
+ struct CreateLeaf
+ {
+   typedef Scalar<T> Leaf_t;
+ 
+   inline static
+   Leaf_t make(const T &a)
+   {
+     return Scalar<T>(a);
+   }
+ };
+ </programlisting> The <type>Scalar</type> class stores the scalar
+     value.  The <type>CreateLeaf</type>'s <type>Leaf_t</type> type
+     indicates its type.  The <statement>static</statement>
+     <function>make</function> function is invoked by an overloaded
+     operator function when creating its children.</para>
+ 
+     <para>The <type>CreateLeaf</type> class is specialized for &array;s:
+ <programlisting>
+ template<int Dim, class T, class EngineTag>
+ struct CreateLeaf<Array<Dim, T, EngineTag> >
+ {
+   typedef Array<Dim, T, EngineTag> Input_t;
+   typedef Reference<Input_t> Leaf_t;
+   typedef Leaf_t Return_t;
+   inline static
+   Return_t make(const Input_t &a)
+     {
+       return Leaf_t(a);
+     }
+ };
+ </programlisting>  The &array; object is stored as a
+     <type>Reference</type>, rather than directly as for scalars.</para>
+ 
+     <para>To simplify the next step of overloading arithmetic
+     operators, a parse tree's topmost type is an
+     <type>Expression</type>.</para>
+ 
+     <para>Now that we have defined the node classes, the &cc;
+     arithmetic operators must be overloaded to return the appropriate
+     parse tree.  For example, unary minus operator
+     <function>operator-</function> overloaded to accept an &array;
+     argument should create a <type>UnaryNode</type> having an &array;
+     as its child, which will be a leaf:
+ <programlisting>
+ template<int D1,class T1,class E1>
+ inline typename MakeReturn<UnaryNode<OpUnaryMinus,
+   typename CreateLeaf<Array<D1,T1,E1> >::Leaf_t> >::Expression_t
+ operator-(const Array<D1,T1,E1> & l)
+ {
+   typedef UnaryNode<OpUnaryMinus,
+     typename CreateLeaf<Array<D1,T1,E1> >::Leaf_t> Tree_t;
+   return MakeReturn<Tree_t>::make(Tree_t(
+     CreateLeaf<Array<D1,T1,E1> >::make(l)));
+ }
+ </programlisting>  <type>Tree_t</type> specifies the node's unique
+     type.  Constructing the object first involves creating a leaf
+     containing the &array; reference through the call to
+     <function>CreateLeaf<Array<D1,T1,E1>
+     >::make</function>.  The call to
+     <function>MakeReturn<Tree_t>::make</function> permits
+     programmers to store trees in different formats.  The &pooma;
+     implementation stores them as <type>Expression</type>s.  The
+     function's return type is similar to the
+     <statement>return</statement> statement except it extracts the
+     type from <type>Expression</type>'s internal
+     <type>Expression_t</type> type.</para>
+ 
+     <para>Specialized all the operators for &array;s using such
+     complicated is likely to be error-prone so &pete; provides a way
+     to automate it.  Using its <command>MakeOperators</command>
+     command with this input:
+ <programlisting>
+ classes
+ -----
+   ARG   = "int D[n],class T[n],class E[n]"
+   CLASS = "Array<D[n],T[n],E[n]>"
+ </programlisting> automatically generates code for all the needed operators.
+     The <quote>[n]</quote> strings are used to number arguments for binary
+     and ternary operators.</para>
+ 
+     <para>Assignment operators must also be specialized for &array;.
+     Inside the &array; class definition, each such operator just
+     invokes the <function>assign</function> function with a corresponding
+     function object.  For example, <function>operator+=</function>
+     invokes <statement>assign(*this, rhs, OpAddAssign())</statement>.
+     <varname>rhs</varname> is the parse tree object for the right-hand
+     side.  Calling this function invokes
+     <function>evaluate</function>, which begins the evaluation.</para>
+ 
+     <para>Before we explain the evaluation, let us summarize the
+     effect of the code so far described.  If we are considering run
+     time, parse trees for the left-hand and right-hand sides have been
+     constructed.  If we are considering compile time, the types of
+     these parse trees are known.  At compile time, the
+     <function>evaluate</function> function described below will
+     generate a loop through the left-hand side container's domain.
+     The loop's body will have code computing a container's value.  At
+     run time, this code will read values from containers, but the
+     run-time parse tree object itself will not traversed!</para>
+ 
+     <para>We now explore the evaluation, concentrating on compile
+     time, not run time.  <function>evaluate</function> is an
+     overloaded function specialized on the type of the left-hand side.
+     In our example, the left-hand side is a one-dimensional &array;,
+     so <function>evaluate(const Ar& a, const Op& op, const
+     RHS& rhs)</function> is inlined into a loop like
+ <programlisting>
+ int end = a's domain[0].first() + a's domain[0].length();
+ for (int i = a's domain[0].first(); i < end; ++i)
+   op(a(i), rhs.read(i));
+ </programlisting>  <varname>a</varname> is the array,
+     <varname>op</varname> is a function object representing the
+     assignment operation, and <varname>rhs</varname> is the right-hand
+     side's parse tree.</para>
+ 
+     <para>Evaluating <statement>rhs.read(i)</statement> inlines into a
+     call to the <function>forEach</function> function.  This function
+     performs a <emphasis>compile-time</emphasis> post-order parse-tree
+     traversal.  Its general form is
+ <programlisting>
+ forEach(const Expression& e, const LeafTag& f, const CombineTag& c).
+ </programlisting> That is, it traverses the nodes of the
+     <type>Expression</type> object <varname>e</varname>.  At
+     leaves, it applies the operation specified by
+     <type>LeafTag</type> <varname>f</varname>.  At interior
+     nodes, it combines the results using the <type>CombineTag</type>
+     operator <varname>c</varname>.  It inlines into a call to
+ <programlisting>
+     ForEach<Expression, LeafTag, CombineTag>::apply(e, f, c).
+ </programlisting>  The <function>apply</function> function continues
+     the traversal through the tree.  For our example,
+     <type>LeafTag</type> equals <type>EvalLeaf<1></type>, and
+     <type>CombineTag</type> equals <type>OpCombine</type>.  The former
+     indicates that, when reaching a leaf, the leaf should be a
+     one-dimensional container which should be evaluated
+     at the position stored in the <type>EvalLeaf</type> object.  The
+     <type>OpCombine</type> class applies an interior node's
+     <type>Op</type> to the results of its children.</para>
+ 
+     <para><type>ForEach</type> structures are specialized for the
+     various node types.  For example, the specialization for
+     <type>UnaryNode</type> is
+ <programlisting>
+ template<class Op, class A, class FTag, class CTag>
+ struct ForEach<UnaryNode<Op, A>, FTag, CTag>
+ {
+   typedef typename ForEach<A, FTag, CTag>::Type_t TypeA_t;
+   typedef typename Combine1<TypeA_t, Op, CTag>::Type_t Type_t;
+   inline static
+   Type_t apply(const UnaryNode<Op, A> &expr, const FTag &f, 
+     const CTag &c) 
+   {
+     return Combine1<TypeA_t, Op, CTag>::
+       combine(ForEach<A, FTag, CTag>::apply(expr.child(), f, c), c);
+   }
+ };
+ </programlisting>  Since this structure is specialized for
+     <type>UnaryNode</type>s, the first parameter of its
+     <statement>static </statement> <function>apply</function> function
+     is a <type>UnaryNode</type>.  After recursively calling its child,
+     it invokes the combination function indicated by the
+     <type>Combine1</type> traits class.  In our example, the
+     <varname>c</varname> function object should be applied.  Other
+     combiners have different roles.  For example, using the
+     <type>NullCombine</type> tag indicates the child's result should
+     not be combined but occurs just for side effects.</para>
+ 
+     <para>Leaves are treated as the default behavior so they are not
+     specialized:
+ <programlisting>
+ template<class Expr, class FTag, class CTag>
+ struct ForEach
+ {
+   typedef typename LeafFunctor<Expr, FTag>::Type_t Type_t;
+   inline static
+   Type_t apply(const Expr &expr, const FTag &f, const CTag &)
+   {
+     return LeafFunctor<Expr, FTag>::apply(expr, f);
+   }
+ };
+ </programlisting>  Thus, <type>LeafFunctor</type>'s
+     <function>apply</function> member is called.  <type>Expr</type>
+     represents the expression type, e.g., an &array;, and
+     <type>FTag</type> is the <type>LeafTag</type>, e.g.,
+     <type>EvalLeaf</type>.  The <type>LeafFunctor</type>specialization
+     for &array; passes the index stored by the <type>EvalLeaf</type>
+     object to the &array;'s engine, which returns the corresponding
+     value.</para>
+ 
+      <para>If one uses an aggressive optimizing compiler, code
+      resulting from the <function>evaluate</function> function
+      corresponds to this pseudocode:
+ <programlisting>
+ int end = A.domain[0].first() + A.domain[0].length();
+ for (int i = A.domain[0].first(); i < end; ++i)
+   A.engine(i) += -A.engine.read(i) + 2 * B.engine.read(i);
+ </programlisting>  The loop iterates through <varname>A</varname>'s
+     domain, using &array;'s engines to obtain values and assigning
+     values.  Notice there is no use of the run-time parse tree so the
+     optimizer can eliminate the code to construct it.  All the work to
+     construct the parse tree by overloading operators is unimportant
+     at run time, but it certainly helped the compiler produce improved
+     code.</para>
+ 
+      <para>&pete;'s expression template technology may be complicated,
+      using parse trees and their types, but the code they produce is
+      not.  Using the technology is also easy.  All data-parallel
+      statements are automatically converted.  In the next chapter, we
+      explore views of containers, permitting use of container subsets
+      and making data-parallel expressions even more useful.</para>
+     </section>
+ 
+    </section>
+ 
+   </chapter>
+   
+ 

    <chapter id="sequential">
     <title>Writing Sequential Programs</title>
*************** HERE</para>
*** 297,303 ****
     <para>FIXME: Explain the format of each section.
  HERE</para>

!    <para>FIXME: Explain the order  of the sections.
  HERE</para>

     <para>Proposed order.  Basically follow the order in the proposed
--- 1824,1830 ----
     <para>FIXME: Explain the format of each section.
  HERE</para>

!    <para>FIXME: Explain the order of the sections.
  HERE</para>

     <para>Proposed order.  Basically follow the order in the proposed
*************** HERE</para>
*** 475,490 ****
      <function>finalize</function>.  These functions respectively
      prepare and shut down &pooma;'s run-time structures.</para>

!     <section id="sequential-begin_end-files">
!      <title>Files</title>

       <programlisting>
       #include "Pooma/Pooma.h"  // or "Pooma/Arrays.h" or "Pooma/Fields.h" or ...
       </programlisting>
-     </section>

!     <section id="sequential-begin_end-declarations">
!       <title>Declarations</title>

       <funcsynopsis>
        <funcprototype>
--- 2002,2014 ----
      <function>finalize</function>.  These functions respectively
      prepare and shut down &pooma;'s run-time structures.</para>

!     <bridgehead id="sequential-begin_end-files" renderas="sect2">Files</bridgehead>

       <programlisting>
       #include "Pooma/Pooma.h"  // or "Pooma/Arrays.h" or "Pooma/Fields.h" or ...
       </programlisting>

!     <bridgehead id="sequential-begin_end-declarations" renderas="sect2">Declarations</bridgehead>

       <funcsynopsis>
        <funcprototype>
*************** HERE</para>
*** 520,529 ****
         </paramdef>
        </funcprototype>
       </funcsynopsis>
-     </section>

!     <section id="sequential-begin_end-description">
!      <title>Description</title>

       <para>Before its use, the &poomatoolkit; must be initialized by a
       call to <function>initialize</function>.  This usually occurs in
--- 2044,2051 ----
         </paramdef>
        </funcprototype>
       </funcsynopsis>

!     <bridgehead id="sequential-begin_end-description" renderas="sect2">Description</bridgehead>

       <para>Before its use, the &poomatoolkit; must be initialized by a
       call to <function>initialize</function>.  This usually occurs in
*************** HERE</para>
*** 572,581 ****
       <para>Including almost any &pooma; header file, rather than just
       <filename class="headerfile">Pooma/Pooma.h</filename> suffices
       since most other &pooma; header files include it.</para>
-     </section>

!     <section id="sequential-begin_end-example">
!      <title>Example Program</title>

       <para>Since every &pooma; program must call
       <function>initialize</function> and
--- 2094,2101 ----
       <para>Including almost any &pooma; header file, rather than just
       <filename class="headerfile">Pooma/Pooma.h</filename> suffices
       since most other &pooma; header files include it.</para>

!     <bridgehead id="sequential-begin_end-example" renderas="sect2">Example Program</bridgehead>

       <para>Since every &pooma; program must call
       <function>initialize</function> and
*************** HERE</para>
*** 584,599 ****
       use.</para>

       &initialize-finalize;
-     </section>

     </section><!-- end sequential-begin_end -->

     <section id="sequential-options">
      <title>&pooma; Command-line Options</title>

      <para>Every &pooma; program accepts a set of &pooma;-specific
      command-line options to set values at run-time.</para>

      <section id="sequential-options-list">
       <title>Options Summary</title>

--- 2104,2163 ----
       use.</para>

       &initialize-finalize;

     </section><!-- end sequential-begin_end -->

+ 
+    <section id="sequential-global">
+     <title>Global Variables</title>
+ 
+     <para>&pooma; makes a few global variables available after
+     initialization.</para>
+ 
+     <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ 	   orient="port" pgwide="0">
+      <title>&pooma; Global Variables</title>
+      
+      <tgroup cols="2" align="left">
+       <thead>
+        <row>
+ 	<entry>variable</entry>
+ 	<entry>description</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+ 	<entry>&inform; <varname>pinfo</varname></entry>
+ 	<entry>output stream used to print informative messages to the
+ 	user while the program executes.  The stream accepts a
+ 	superset of standard output operations.</entry>
+        </row>
+        <row>
+ 	<entry>&inform; <varname>pwarn</varname></entry>
+ 	<entry>HERE output stream used to print informative messages to the
+ 	user while the program executes.  The stream accepts a
+ 	superset of standard output operations.</entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+ 
+    </section>
+ 
+ <!-- HERE -->
+ 
     <section id="sequential-options">
      <title>&pooma; Command-line Options</title>

      <para>Every &pooma; program accepts a set of &pooma;-specific
      command-line options to set values at run-time.</para>

+     <para>QUESTION: Should I defer documenting &options; to the
+     reference manual, instead just listing commonly used options in
+     the previous section?
+ 
+ UNFINISHED</para>
+ 
      <section id="sequential-options-list">
       <title>Options Summary</title>

*************** HERE</para>
*** 601,614 ****
        <varlistentry>
         <term><parameter class="option">&dashdash;pooma-info</parameter></term>
         <listitem>
! 	<para>
! HERE  Who uses this?</para>
         </listitem>
        </varlistentry>
  <!-- HERE -->
       </variablelist>

       <para>FIXME: Be sure to list default values.</para>

  <!-- HERE -->

--- 2165,2181 ----
        <varlistentry>
         <term><parameter class="option">&dashdash;pooma-info</parameter></term>
         <listitem>
! 	<para>enable use of the <varname>pinfo</varname>, used to
! 	print informative messages to the user while the program
! 	executes.</para>
         </listitem>
        </varlistentry>
  <!-- HERE -->
       </variablelist>

       <para>FIXME: Be sure to list default values.</para>
+ <!-- HERE: need to describe the pinfo, pwarn, and perr streams somewhere.  To do so requires describing informs.-->
+ <!-- HERE: Which streams are buffered and which are not? -->

  <!-- HERE -->

*************** HERE  Who uses this?</para>
*** 616,627 ****

  <!-- HERE -->

-     <para>QUESTION: Should I defer documenting &options; to the
-     reference manual, instead just listing commonly used options in
-     the previous section?
- 
- UNFINISHED</para>
- 
     </section><!-- end sequential-options -->

     <section>
--- 2183,2188 ----
*************** UNFINISHED</para>
*** 740,746 ****
        code. An Array maps a fairly arbitrary input domain to an
        arbitrary range of outputs. When used by itself, an &array;
        object <varname>A</varname> refers to all of the values in its
! 				  domain. Element-wise mathematical operations or functions can be
        applied to an array using straightforward notation, like A + B
        or sin(A). Expressions involving Array objects are themselves
        Arrays. The operation A(d), where d is a domain object that
--- 2301,2307 ----
        code. An Array maps a fairly arbitrary input domain to an
        arbitrary range of outputs. When used by itself, an &array;
        object <varname>A</varname> refers to all of the values in its
!       domain. Element-wise mathematical operations or functions can be
        applied to an array using straightforward notation, like A + B
        or sin(A). Expressions involving Array objects are themselves
        Arrays. The operation A(d), where d is a domain object that
*************** UNFINISHED</para>
*** 1188,1195 ****
     class="libraryfile">.cmpl.cpp</filename>, <filename
     class="libraryfile">.mk</filename>, <filename
     class="libraryfile">.conf</filename>.  Should we also explain use
!    of <literal>inline</literal> even when necessary and the template
!    model, <!-- FIXME: s/literal/keyword/ --> e.g., including <filename
     class="libraryfile">.cpp</filename> files.</para>

     <para>QUESTION: What are the key concepts around which to organize
--- 2749,2756 ----
     class="libraryfile">.cmpl.cpp</filename>, <filename
     class="libraryfile">.mk</filename>, <filename
     class="libraryfile">.conf</filename>.  Should we also explain use
!    of <keywordname>inline</keywordname> even when necessary and the template
!    model, e.g., including <filename
     class="libraryfile">.cpp</filename> files.</para>

     <para>QUESTION: What are the key concepts around which to organize
*************** UNFINISHED</para>
*** 1420,1426 ****
  	<entry><para>dimension</para></entry>
         </row>
         <row>
! 	<entry><varname>T</varname></entry>
  	<entry><para>array element type</para></entry>
         </row>
         <row>
--- 2981,2987 ----
  	<entry><para>dimension</para></entry>
         </row>
         <row>
! 	<entry><type>T</type></entry>
  	<entry><para>array element type</para></entry>
         </row>
         <row>
*************** UNFINISHED</para>
*** 3014,3021 ****
       class="headerfile">src/Utilities/DerefIterator.h</filename>:
       <type>DerefIterator<T></type> and
       <type>ConstDerefIterator<T></type> automatically
!      dereference themselves to maintain <literal>const</literal>
!      correctness.  <!-- FIXME: s/literal/keyword/ --></para>
      </listitem>

      <listitem>
--- 4575,4582 ----
       class="headerfile">src/Utilities/DerefIterator.h</filename>:
       <type>DerefIterator<T></type> and
       <type>ConstDerefIterator<T></type> automatically
!      dereference themselves to maintain <keywordname>const</keywordname>
!      correctness.</para>
      </listitem>

      <listitem>
*************** UNFINISHED</para>
*** 3042,3048 ****
      <listitem>
       <para>Discuss &options; and related material.  Add developer
       command-line options listed in <filename
!      class="library">Utilities/Options.cmpl.cpp</filename> and also
       possibly <parameter class="option">&dashdash;pooma-threads
       <replaceable>n</replaceable></parameter>.</para>
      </listitem>
--- 4603,4609 ----
      <listitem>
       <para>Discuss &options; and related material.  Add developer
       command-line options listed in <filename
!      class="libraryfile">Utilities/Options.cmpl.cpp</filename> and also
       possibly <parameter class="option">&dashdash;pooma-threads
       <replaceable>n</replaceable></parameter>.</para>
      </listitem>
*************** UNFINISHED</para>
*** 3600,3859 ****

   </appendix>

- 
-  <!-- Bibliography -->
- 
-  <bibliography id="bibliography">
-   <title>Bibliography</title>
- 
-   <para>FIXME: How do I process these entries?</para>
- 
-   <biblioentry>
-    <abbrev>mpi99</abbrev>
-    <authorgroup>
-     <author>
-      <firstname>William</firstname><surname>Gropp</surname>
-     </author>
-     <author>
-      <firstname>Ewing</firstname><surname>Lusk</surname>
-     </author>
-     <author>
-      <firstname>Anthony</firstname><surname>Skjellum</surname>
-     </author>
-    </authorgroup>
-    <copyright>
-     <year>1999</year>
-     <holder>Massachusetts Institute of Technology</holder>
-    </copyright>
-    <isbn>0-262-57132-3</isbn>
-    <publisher>
-     <publishername>The MIT Press</publishername>
-     <address>Cambridge, MA</address>
-    </publisher>
-    <title>Using MPI</title>
-    <subtitle>Portable Parallel Programming with the Message-Passing Interface</subtitle>
-    <edition>second edition</edition>
-   </biblioentry>
- 
-   <biblioentry>
-    <abbrev>pooma95</abbrev>
-    <authorgroup>
-     <author>
-      <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
-      <affiliation>
-       <orgname>Dakota Software Systems, Inc.</orgname>
-       <address><city>Rapid City</city><state>SD</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Susan</firstname><othername role="mi">R.</othername><surname>Atlas</surname>
-      <affiliation>
-       <orgname>Parallel Solutions, Inc.</orgname>
-       <address><city>Santa Fe</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Subhankar</firstname><surname>Banerjee</surname>
-      <affiliation>
-       <orgname>New Mexico State University</orgname>
-       <address><city>Las Cruces</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
-      <affiliation>
-       <orgname>University of Illinois at Urbana-Champaign</orgname>
-       <address><city>Urbana-Champaign</city><state>IL</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
-      <affiliation>
-       <orgname>California Institute of Technology</orgname>
-       <address><city>Pasadena</city><state>CA</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Katarzyna</firstname><surname>Keahey</surname>
-      <affiliation>
-       <orgname>Indiana University</orgname>
-       <address><city>Bloomington</city><state>IN</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Marydell</firstname><surname>Tholburn</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-    </authorgroup>
-    <title>&pooma;</title>
-    <subtitle>A Framework for Scientific Simulation on Parallel Architectures</subtitle>
-    <releaseinfo>unpublished</releaseinfo>
-   </biblioentry>
- 
-   <biblioentry>
-    <abbrev>pooma-sc95</abbrev>
-    <authorgroup>
-     <author>
-      <firstname>Susan</firstname><surname>Atlas</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Subhankar</firstname><surname>Banerjee</surname>
-      <affiliation>
-       <orgname>New Mexico State University</orgname>
-       <address><city>Las Cruces</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
-      <affiliation>
-       <orgname>Advanced Computing Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>M.</firstname><surname>Srikant</surname>
-      <affiliation>
-       <orgname>New Mexico State University</orgname>
-       <address><city>Las Cruces</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Marydell</firstname><surname>Tholburn</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-    </authorgroup>
-    <title>&pooma;</title>
-    <subtitle>A High Performance Distributed Simulation Environment for
-    Scientific Applications</subtitle>
- <!-- FIXME: Where list Supercomputing 1995? -->
-   </biblioentry>
- 
-   <biblioentry>
-    <abbrev>pooma-siam98</abbrev>
-    <authorgroup>
-     <author>
-      <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>James</firstname><othername role="mi">A.</othername><surname>Crotinger</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Scott</firstname><othername role="mi">W.</othername><surname>Haney</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Stephen</firstname><othername role="mi">A.</othername><surname>Smith</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-     <author>
-      <firstname>Timothy</firstname><othername role="mi">J.</othername><surname>Williams</surname>
-      <affiliation>
-       <orgname>Los Alamos National Laboratory</orgname>
-       <address><city>Los Alamos</city><state>NM</state></address>
-      </affiliation>
-     </author>
-    </authorgroup>
-    <title>Raid Application Development and Enhanced Code
-    Interoperability using the &pooma; Framework</title>
- <!-- FIXME: Where list SIAM Workshop ... 1998? -->
-   </biblioentry>
- 
-   <biblioentry>
- <!-- FIXME: Change the year when we learn it. -->
-    <abbrev>pete-99</abbrev>
-    <authorgroup>
-     <author>
-      <firstname>Scott</firstname><surname>Haney</surname>
-     </author>
-     <author>
-      <firstname>James</firstname><surname>Crotinger</surname>
-     </author>
-     <author>
-      <firstname>Steve</firstname><surname>Karmesin</surname>
-     </author>
-     <author>
-      <firstname>Stephen</firstname><surname>Smith</surname>
-     </author>
-    </authorgroup>
-    <title>Easy Expression Templates Using &pete;: The Portable
-    Expression Template Engine</title>
- <!-- FIXME: When and where was this published? -->
-   </biblioentry>
-  </bibliography>

   &glossary-chapter; 

--- 5161,5168 ----

   </appendix>

+  &bibliography-chapter;

   &glossary-chapter; 

Index: tutorial.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v
retrieving revision 1.3
diff -c -p -r1.3 tutorial.xml
*** tutorial.xml	2001/12/17 17:27:42	1.3
--- tutorial.xml	2002/01/04 17:14:11
***************
*** 54,60 ****
      <imagedata fileref="figures/doof2d.201" format="EPS" align="center"></imagedata>
     </imageobject>
     <textobject>
!     <phrase>The Initial Configuration</phrase>
     </textobject>
    </mediaobject>
    <mediaobject>
--- 54,60 ----
      <imagedata fileref="figures/doof2d.201" format="EPS" align="center"></imagedata>
     </imageobject>
     <textobject>
!     <phrase>The Initial &doof2d; Configuration</phrase>
     </textobject>
    </mediaobject>
    <mediaobject>
***************
*** 476,482 ****
       <imagedata fileref="figures/doof2d.210" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>Adding two arrays with different domains.</phrase>
      </textobject>
      <caption>
       <para>When adding arrays, values in corresponding positions are
--- 476,482 ----
       <imagedata fileref="figures/doof2d.210" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>Adding two arrays with different domains is supported.</phrase>
      </textobject>
      <caption>
       <para>When adding arrays, values in corresponding positions are
***************
*** 587,593 ****
       <imagedata fileref="figures/doof2d.211" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>Apply a stencil to position (1,3) of an array.</phrase>
      </textobject>
      <caption>
       <para>To compute the value associated with index position (1,3)
--- 587,593 ----
       <imagedata fileref="figures/doof2d.211" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>Apply a stencil to position (1,3) of an &array;.</phrase>
      </textobject>
      <caption>
       <para>To compute the value associated with index position (1,3)
***************
*** 692,698 ****
       <imagedata fileref="figures/distributed.101" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>the &pooma; distributed computation model.</phrase>
      </textobject>
      <caption>
       <para>The &pooma; distributed computation model combines
--- 692,698 ----
       <imagedata fileref="figures/distributed.101" format="EPS" align="center"></imagedata>
      </imageobject>
      <textobject>
!      <phrase>the &pooma; distributed computation model</phrase>
      </textobject>
      <caption>
       <para>The &pooma; distributed computation model combines
Index: figures/box-macros.mp
===================================================================
RCS file: box-macros.mp
diff -N box-macros.mp
*** /dev/null	Fri Mar 23 21:37:44 2001
--- box-macros.mp	Fri Jan  4 10:14:11 2002
***************
*** 0 ****
--- 1,106 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec20
+ %% Pooma
+ 
+ %% Macros to Improve Boxes
+ 
+ %% Assumes 'input boxes;'
+ 
+   % Ensure a list of boxes all have the same width.
+   % input <- suffixes for the boxes;
+   % output-> all boxes have the same width (maximum picture width + defaultdx)
+   vardef samewidth(suffix $)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight($)(t);
+     numericSetWidth(xpart(p_)+2defaultdx)($)(t);
+   enddef;
+   
+   % Ensure a list of boxes all have the same height.
+   % input <- suffixes for the boxes;
+   % output-> all boxes have the same height (maximum picture height + defaultdy)
+   vardef sameheight(suffix $)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight($)(t);
+     numericSetWidth(ypart(p_)+2defaultdy)($)(t);
+   enddef;
+   
+   % Given a list of boxes, determine the maximum picture width and
+   % maximum picture height.
+   % input <- suffixes for the boxes
+   % output-> pair of maximum picture width and height
+   vardef maxWidthAndHeight(suffix f)(text t) =
+     save w_, h_; numeric w_, h_;
+     w_ = xpart((urcorner pic_.f - llcorner pic_.f));
+     h_ = ypart((urcorner pic_.f - llcorner pic_.f));
+     forsuffixes uu = t:
+       if xpart((urcorner pic_.uu - llcorner pic_.uu)) > w_ :
+ 	w_ := xpart((urcorner pic_.uu - llcorner pic_.uu));
+       fi
+       if ypart((urcorner pic_.uu - llcorner pic_.uu)) > h_ :
+ 	h_ := ypart((urcorner pic_.uu - llcorner pic_.uu));
+       fi
+     endfor
+     (w_, h_)
+   enddef;
+ 
+   % Given a width, ensure a box has the given width.
+   % input <- box width
+   %          suffix for the one box
+   % output-> the box has the given width by setting its .dx
+   vardef numericSetWidthOne(expr width)(suffix f) =
+     f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
+   enddef;
+   
+   % Given a width, ensure all boxes have the given width.
+   % input <- box width
+   %          suffixes for the boxes
+   % output-> all boxes have the given width by setting their .dx
+   vardef numericSetWidth(expr width)(suffix f)(text t) =
+     f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
+     forsuffixes $ = t:
+       $.dx = 0.5(width - xpart(urcorner pic_.$ - llcorner pic_.$));
+     endfor
+   enddef;
+ 
+   % Given a height, ensure all boxes have the given height.
+   % input <- box height
+   %          suffixes for the boxes
+   % output-> all boxes have the given height by setting their .dx
+   vardef numericSetHeight(expr height)(suffix f)(text t) =
+     f.dy = 0.5(height - ypart(urcorner pic_.f - llcorner pic_.f));
+     forsuffixes $ = t:
+       $.dy = 0.5(height - ypart(urcorner pic_.$ - llcorner pic_.$));
+     endfor
+   enddef;
+   
+   % Ensure a list of boxes and circles all to have the same width, height,
+   % and diameter.
+   % input <- suffixes for the boxes and circles
+   % output-> all boxes have .dx and .dy set so they have the same width,
+   %           height, and radius
+   % The boxes are squares and the circles are circular, not oval.
+   vardef sameWidthAndHeight(suffix f)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight(f)(t);
+     if (xpart(p_)+2defaultdx >= ypart(p_)+2defaultdy):
+       numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
+       numericSetHeight(xpart(p_)+2defaultdx)(f)(t);
+     else:
+       numericSetWidth(ypart(p_)+2defaultdy)(f)(t);
+       numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
+     fi
+   enddef;
+ 
+   % Ensure a list of boxes and circles all to have the same width and
+   % the same height.  Unlike sameWidthAndHeight, the width and height
+   % can differ.
+   % input <- suffixes for the boxes and circles
+   % output-> all boxes have .dx and .dy set so they have the same width,
+   %           height, and radius
+   % The boxes are squares and the circles are circular, not oval.
+   vardef sameWidthSameHeight(suffix f)(text t) =
+     save p_; pair p_;
+     p_ = maxWidthAndHeight(f)(t);
+     numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
+     numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
+   enddef;
Index: figures/data-parallel.mp
===================================================================
RCS file: data-parallel.mp
diff -N data-parallel.mp
*** /dev/null	Fri Mar 23 21:37:44 2001
--- data-parallel.mp	Fri Jan  4 10:14:11 2002
***************
*** 0 ****
--- 1,157 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec20
+ %% Pooma
+ 
+ %% Illustrations for the Data-Parallel Chapter
+ 
+ %% Assumes TEX=latex.
+ 
+ input boxes;
+ input box-macros;
+ input grid-macros;
+ 
+ verbatimtex
+ \documentclass[10pt]{article}
+ \input{macros.ltx}
+ \begin{document}
+ etex
+ 
+ %% Parse Tree for Example Statement A += -A + 2*B
+ beginfig(101)
+   numeric unit; unit = 1.5cm;
+   numeric xunit; xunit = unit;
+   numeric yunit; yunit = unit;
+   
+   %% Create the tree nodes.
+   circleit.b0(btex \statement{+=} etex);
+   circleit.b1(btex \varname{A} etex);
+   circleit.b2(btex \statement{+} etex);
+   circleit.b3(btex \statement{-} etex);
+   circleit.b4(btex \varname{A} etex);
+   circleit.b5(btex \statement{*} etex);
+   circleit.b6(btex \statement{2} etex);
+   circleit.b7(btex \varname{B} etex);
+   numeric nuBoxes; nuBoxes = 7;
+   sameWidthAndHeight(b0,b1,b2,b3,b4,b5,b6,b7);
+   
+   %% Position the tree nodes.
+   b2.c = origin;
+   b0.c - 0.5[b1.c,b2.c] = (0,yunit);
+   b2.c - 0.5[b3.c,b5.c] = (0,yunit);
+   b3.c - 0.5[b4.c,b6.c] = (0,yunit);
+   b5.c - 0.5[b6.c,b7.c] = (0,yunit);
+   b1.c - b2.c = b3.c - b5.c = b4.c - b6.c = b6.c - b8.c = (-xunit,0);
+   
+   %% Draw the tree.
+   for t = 2 upto 7:
+     drawboxed(b[t]);
+   endfor
+   vardef drawEdge(expr start, stop) =
+     draw b[start].c -- b[stop].c cutbefore bpath b[start] cutafter bpath b[stop];
+   enddef;
+   for t = (2,3), (2,5), (3,4), (5,6), (5,7):
+     drawEdge(xpart(t),ypart(t));
+   endfor
+ 
+   %% Label the node's types.
+ % TMP  label.rt(btex \type{OpAddAssign} etex, b0.e);
+ % TMP  label.rt(btex \type{Expression} etex, 0.5[b0.c,b2.c]);
+   label.top(btex \type{Expression} etex, b2.n);
+ % TMP  label.lft(btex \type{Ar} etex, b1.w);
+   label.rt(btex \type{BinaryNode<OpAdd,} etex, b2.e);
+   label.lft(btex \type{UnaryNode<OpMinus,} etex, b3.w);
+   label.lft(btex \type{Ar} etex, b4.w);
+   label.rt(btex \type{BinaryNode<OpMultiply,} etex, b5.e);
+   label.bot(btex \type{Scalar<int>} etex, b6.s);
+   label.rt(btex \type{Ar} etex, b7.e);
+   
+ endfig;
+ 
+ 
+ %% An illustratation of the addition of arrays.
+ beginfig(212)
+   numeric unit; unit = 0.9cm;	% width or height of an individual grid cell
+   numeric nuCells; nuCells = 5;	% number of cells in each dimension
+ 				% This number should be odd.
+   numeric nuArrayCells; nuArrayCells = 3;
+ 				% number of cells in array in each dimension
+   numeric operatorWidth; operatorWidth = 1.5;
+   				% horizontal space for an operator as
+   				% a multiple of "unit"
+   
+   %% Determine the locations of the arrays.
+   z0 = origin;
+   z1 = z0 + unit * (nuCells+operatorWidth,0);
+   z2 - z1 = z1 - z0;
+ 
+   %% Draw the grid cells and the operators.
+   for t = 0 upto 2:
+     drawGridDashed(nuCells, unit, z[t]);
+   endfor
+   for t = 0 upto 1:
+     drawGrid(nuArrayCells, unit, z[t]+unit*(1,1));
+   endfor
+   drawGrid(nuArrayCells, unit, z2+unit*(2,0));
+  
+   label(btex = etex, z1 + unit*(-0.6operatorWidth, 0.5nuCells));
+   label(btex + etex, z2 + unit*(-0.6operatorWidth, 0.5nuCells));
+   
+   %% Label the indices.
+   % Label b(I,J) grid indices.
+   for t = 0 upto 2:
+     labelCellBottom(btex \footnotesize 0 etex, (0,0), z[t]);
+     labelCellBottom(btex \footnotesize 1 etex, (1,0), z[t]);
+     labelCellBottom(btex \footnotesize 2 etex, (2,0), z[t]);
+     labelCellBottom(btex \footnotesize 3 etex, (3,0), z[t]);
+     labelCellBottom(btex \footnotesize 4 etex, (4,0), z[t]);
+     labelCellLeft(btex \footnotesize 0 etex, (0,0), z[t]);
+     labelCellLeft(btex \footnotesize 1 etex, (0,1), z[t]);
+     labelCellLeft(btex \footnotesize 2 etex, (0,2), z[t]);
+     labelCellLeft(btex \footnotesize 3 etex, (0,3), z[t]);
+     labelCellLeft(btex \footnotesize 4 etex, (0,4), z[t]);
+   endfor
+   
+   %% Label the grid cells' values.
+   % Label b(I,J) grid values.
+   pair zShift;
+   zShift := z1 + unit*(1,1);
+   labelCell(btex \normalsize 9 etex, (0,0), zShift);
+   labelCell(btex \normalsize 11 etex, (1,0), zShift);
+   labelCell(btex \normalsize 13 etex, (2,0), zShift);
+   labelCell(btex \normalsize 17 etex, (0,1), zShift);
+   labelCell(btex \normalsize 19 etex, (1,1), zShift);
+   labelCell(btex \normalsize 21 etex, (2,1), zShift);
+   labelCell(btex \normalsize 25 etex, (0,2), zShift);
+   labelCell(btex \normalsize 27 etex, (1,2), zShift);
+   labelCell(btex \normalsize 29 etex, (2,2), zShift);
+   % Label b(I+1,J-1) grid values.
+   zShift := z2 + unit*(2,0);
+   labelCell(btex \normalsize 3 etex, (0,0), zShift);
+   labelCell(btex \normalsize 5 etex, (1,0), zShift);
+   labelCell(btex \normalsize 7 etex, (2,0), zShift);
+   labelCell(btex \normalsize 11 etex, (0,1), zShift);
+   labelCell(btex \normalsize 13 etex, (1,1), zShift);
+   labelCell(btex \normalsize 15 etex, (2,1), zShift);
+   labelCell(btex \normalsize 19 etex, (0,2), zShift);
+   labelCell(btex \normalsize 21 etex, (1,2), zShift);
+   labelCell(btex \normalsize 23 etex, (2,2), zShift);
+   % Label b(I,J)+b(I+1,J-1) grid values.
+   zShift := z0 + unit*(1,1);
+   labelCell(btex \normalsize 9 etex, (0,0), zShift);
+   labelCell(btex \normalsize 22 etex, (1,0), zShift);
+   labelCell(btex \normalsize 26 etex, (2,0), zShift);
+   labelCell(btex \normalsize 17 etex, (0,1), zShift);
+   labelCell(btex \normalsize 38 etex, (1,1), zShift);
+   labelCell(btex \normalsize 42 etex, (2,1), zShift);
+   labelCell(btex \normalsize 25 etex, (0,2), zShift);
+   labelCell(btex \normalsize 27 etex, (1,2), zShift);
+   labelCell(btex \normalsize 29 etex, (2,2), zShift);
+ 
+   %% Label the grids.
+   labelGrid(btex $A+B$ etex, nuCells, z0);
+   labelGrid(btex $A$ etex, nuCells, z1);
+   labelGrid(btex $B$ etex, nuCells, z2);
+ endfig;
+ 
+ 
+ bye
Index: figures/doof2d.mp
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/figures/doof2d.mp,v
retrieving revision 1.2
diff -c -p -r1.2 doof2d.mp
*** figures/doof2d.mp	2001/12/11 20:36:13	1.2
--- figures/doof2d.mp	2002/01/04 17:14:11
*************** verbatimtex
*** 12,46 ****
  \begin{document}
  etex

! % Draw a set of grid cells.
! vardef drawGrid(expr nuCells, unit, llCorner) =
!   for i = 0 upto nuCells-1:
!     for j = 0 upto nuCells-1:
!       draw unitsquare scaled unit shifted (llCorner + unit*(i,j));
!     endfor
!   endfor
! enddef;
! 
! % Label the specified grid, grid cell, or its edge.
! % Place a value at the center of a grid cell.
! vardef labelCell(expr lbl, xy, llCorner) =
!   label(lbl, llCorner + unit*(xy + 0.5*(1,1)));
! enddef;
! 
! % Label the bottom of a grid cell.
! vardef labelCellBottom(expr lbl, xy, llCorner) =
!   label.bot(lbl, llCorner + unit*(xy + 0.5*(1,0)));
! enddef;
! 
! % Label the left side of a grid cell.
! vardef labelCellLeft(expr lbl, xy, llCorner) =
!   label.lft(lbl, llCorner + unit*(xy + 0.5*(0,1)));
! enddef;
! 
! % Label the top of a grid.
! vardef labelGrid(expr lbl, nuCells, llCorner) =
!   label.top(lbl, llCorner + unit*(nuCells/2,nuCells));
! enddef;

  %% Global Declarations
  numeric unit; unit = 0.9cm;	% width or height of an individual grid cell
--- 12,18 ----
  \begin{document}
  etex

! input grid-macros;

  %% Global Declarations
  numeric unit; unit = 0.9cm;	% width or height of an individual grid cell
Index: figures/grid-macros.mp
===================================================================
RCS file: grid-macros.mp
diff -N grid-macros.mp
*** /dev/null	Fri Mar 23 21:37:44 2001
--- grid-macros.mp	Fri Jan  4 10:14:11 2002
***************
*** 0 ****
--- 1,45 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec21
+ %% Pooma
+ 
+ %% Macros for Drawing Grids
+ 
+ % Draw a set of grid cells.
+ vardef drawGrid(expr nuCells, unit, llCorner) =
+   for i = 0 upto nuCells-1:
+     for j = 0 upto nuCells-1:
+       draw unitsquare scaled unit shifted (llCorner + unit*(i,j));
+     endfor
+   endfor
+ enddef;
+ 
+ % Draw a set of grid cells with dashed lines.
+ vardef drawGridDashed(expr nuCells, unit, llCorner) =
+   for i = 0 upto nuCells-1:
+     for j = 0 upto nuCells-1:
+       draw unitsquare scaled unit shifted (llCorner + unit*(i,j)) dashed evenly;
+     endfor
+   endfor
+ enddef;
+ 
+ % Label the specified grid, grid cell, or its edge.
+ % Place a value at the center of a grid cell.
+ vardef labelCell(expr lbl, xy, llCorner) =
+   label(lbl, llCorner + unit*(xy + 0.5*(1,1)));
+ enddef;
+ 
+ % Label the bottom of a grid cell.
+ vardef labelCellBottom(expr lbl, xy, llCorner) =
+   label.bot(lbl, llCorner + unit*(xy + 0.5*(1,0)));
+ enddef;
+ 
+ % Label the left side of a grid cell.
+ vardef labelCellLeft(expr lbl, xy, llCorner) =
+   label.lft(lbl, llCorner + unit*(xy + 0.5*(0,1)));
+ enddef;
+ 
+ % Label the top of a grid.
+ vardef labelGrid(expr lbl, nuCells, llCorner) =
+   label.top(lbl, llCorner + unit*(nuCells/2,nuCells));
+ enddef;
+ 
Index: figures/introduction.mp
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/figures/introduction.mp,v
retrieving revision 1.1
diff -c -p -r1.1 introduction.mp
*** figures/introduction.mp	2001/12/17 17:27:42	1.1
--- figures/introduction.mp	2002/01/04 17:14:11
***************
*** 7,12 ****
--- 7,13 ----
  %% Assumes TEX=latex.

  input boxes;
+ input box-macros;

  verbatimtex
  \documentclass[10pt]{article}
*************** beginfig(101)
*** 21,125 ****
    numeric horizSpace; horizSpace = 8unit;
    numeric vertSpace; vertSpace = unit;
    numeric nuBoxes;		% number of boxes
- 
-   % Ensure a list of boxes all have the same width.
-   % input <- suffixes for the boxes;
-   % output-> all boxes have the same width (maximum picture width + defaultdx)
-   vardef samewidth(suffix $)(text t) =
-     save p_; pair p_;
-     p_ = maxWidthAndHeight($)(t);
-     numericSetWidth(xpart(p_)+2defaultdx)($)(t);
-   enddef;
-   
-   % Ensure a list of boxes all have the same height.
-   % input <- suffixes for the boxes;
-   % output-> all boxes have the same height (maximum picture height + defaultdy)
-   vardef sameheight(suffix $)(text t) =
-     save p_; pair p_;
-     p_ = maxWidthAndHeight($)(t);
-     numericSetWidth(ypart(p_)+2defaultdy)($)(t);
-   enddef;
-   
-   % Given a list of boxes, determine the maximum picture width and
-   % maximum picture height.
-   % input <- suffixes for the boxes
-   % output-> pair of maximum picture width and height
-   vardef maxWidthAndHeight(suffix f)(text t) =
-     save w_, h_; numeric w_, h_;
-     w_ = xpart((urcorner pic_.f - llcorner pic_.f));
-     h_ = ypart((urcorner pic_.f - llcorner pic_.f));
-     forsuffixes uu = t:
-       if xpart((urcorner pic_.uu - llcorner pic_.uu)) > w_ :
- 	w_ := xpart((urcorner pic_.uu - llcorner pic_.uu));
-       fi
-       if ypart((urcorner pic_.uu - llcorner pic_.uu)) > h_ :
- 	h_ := ypart((urcorner pic_.uu - llcorner pic_.uu));
-       fi
-     endfor
-     (w_, h_)
-   enddef;
- 
-   % Given a width, ensure a box has the given width.
-   % input <- box width
-   %          suffix for the one box
-   % output-> the box has the given width by setting its .dx
-   vardef numericSetWidthOne(expr width)(suffix f) =
-     f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
-   enddef;
-   
-   % Given a width, ensure all boxes have the given width.
-   % input <- box width
-   %          suffixes for the boxes
-   % output-> all boxes have the given width by setting their .dx
-   vardef numericSetWidth(expr width)(suffix f)(text t) =
-     f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
-     forsuffixes $ = t:
-       $.dx = 0.5(width - xpart(urcorner pic_.$ - llcorner pic_.$));
-     endfor
-   enddef;
- 
-   % Given a height, ensure all boxes have the given height.
-   % input <- box height
-   %          suffixes for the boxes
-   % output-> all boxes have the given height by setting their .dx
-   vardef numericSetHeight(expr height)(suffix f)(text t) =
-     f.dy = 0.5(height - ypart(urcorner pic_.f - llcorner pic_.f));
-     forsuffixes $ = t:
-       $.dy = 0.5(height - ypart(urcorner pic_.$ - llcorner pic_.$));
-     endfor
-   enddef;
-   
-   % Ensure a list of boxes and circles all to have the same width, height,
-   % and diameter.
-   % input <- suffixes for the boxes and circles
-   % output-> all boxes have .dx and .dy set so they have the same width,
-   %           height, and radius
-   % The boxes are squares and the circles are circular, not oval.
-   vardef sameWidthAndHeight(suffix f)(text t) =
-     save p_; pair p_;
-     p_ = maxWidthAndHeight(f)(t);
-     if (xpart(p_)+2defaultdx >= ypart(p_)+2defaultdy):
-       numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
-       numericSetHeight(xpart(p_)+2defaultdx)(f)(t);
-     else:
-       numericSetWidth(ypart(p_)+2defaultdy)(f)(t);
-       numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
-     fi
-   enddef;
- 
-   % Ensure a list of boxes and circles all to have the same width and
-   % the same height.  Unlike sameWidthAndHeight, the width and height
-   % can differ.
-   % input <- suffixes for the boxes and circles
-   % output-> all boxes have .dx and .dy set so they have the same width,
-   %           height, and radius
-   % The boxes are squares and the circles are circular, not oval.
-   vardef sameWidthSameHeight(suffix f)(text t) =
-     save p_; pair p_;
-     p_ = maxWidthAndHeight(f)(t);
-     numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
-     numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
-   enddef;

    % Create the boxes.
    boxit.b0(btex \textsl{science / math} etex);
--- 22,27 ----
Index: programs/Doof2d-Array-distributed-annotated.patch
===================================================================
RCS file: Doof2d-Array-distributed-annotated.patch
diff -N Doof2d-Array-distributed-annotated.patch
*** /tmp/cvsKKb5AR	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,184 ****
- *** Doof2d-Array-distributed.cpp	Wed Dec  5 14:04:36 2001
- --- Doof2d-Array-distributed-annotated.cpp	Wed Dec  5 14:07:56 2001
- ***************
- *** 1,3 ****
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Arrays.h"	// has Pooma's Array
-   
- --- 1,5 ----
- ! <programlisting id="tutorial-array_distributed-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Arrays.h"	// has Pooma's Array
-   
- ***************
- *** 14,18 ****
-     // (i,j).  The "C" template parameter permits use of this stencil
-     // operator with both Arrays and Fields.
- !   template <class C>
-     inline
-     typename C::Element_t
- --- 16,20 ----
-     // (i,j).  The "C" template parameter permits use of this stencil
-     // operator with both Arrays and Fields.
- !   template <class C>
-     inline
-     typename C::Element_t
- ***************
- *** 42,46 ****
-     // canot use standard input and output.  Instead we use command-line
-     // arguments, which are replicated, for input, and we use an Inform
- !   // stream for output.
-     Inform output;
-   
- --- 44,48 ----
-     // canot use standard input and output.  Instead we use command-line
-     // arguments, which are replicated, for input, and we use an Inform
- !   // stream for output.  <co id="tutorial-array_distributed-doof2d-io"></co>
-     Inform output;
-   
- ***************
- *** 48,52 ****
-     if (argc != 4) {
-       // Incorrect number of command-line arguments.
- !     output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
-       return EXIT_FAILURE;
-     }
- --- 50,54 ----
-     if (argc != 4) {
-       // Incorrect number of command-line arguments.
- !     output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
-       return EXIT_FAILURE;
-     }
- ***************
- *** 55,63 ****
-     // Determine the number of processors.
-     long nuProcessors;
- !   nuProcessors = strtol(argv[1], &tail, 0);
-   
-     // Determine the number of averagings.
-     long nuAveragings, nuIterations;
- !   nuAveragings = strtol(argv[2], &tail, 0);
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- --- 57,65 ----
-     // Determine the number of processors.
-     long nuProcessors;
- !   nuProcessors = strtol(argv[1], &tail, 0);
-   
-     // Determine the number of averagings.
-     long nuAveragings, nuIterations;
- !   nuAveragings = strtol(argv[2], &tail, 0);
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- ***************
- *** 65,69 ****
-     // the grid.
-     long n;
- !   n = strtol(argv[3], &tail, 0);
-     // The dimension must be a multiple of the number of processors
-     // since we are using a UniformGridLayout.
- --- 67,71 ----
-     // the grid.
-     long n;
- !   n = strtol(argv[3], &tail, 0);
-     // The dimension must be a multiple of the number of processors
-     // since we are using a UniformGridLayout.
- ***************
- *** 71,80 ****
-   
-     // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<2> interiorDomain(I,I);
-   
-     // Create the distributed arrays.
- --- 73,82 ----
-   
-     // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<2> interiorDomain(I,I);
-   
-     // Create the distributed arrays.
- ***************
- *** 83,98 ****
-     // dimension.  Guard layers optimize communication between patches.
-     // Internal guards surround each patch.  External guards surround
- !   // the entire array domain.
- !   UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! 				    GuardLayers<2>(1),  // internal
- ! 				    GuardLayers<2>(0)); // external
- !   UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-   
-     // The template parameters indicate 2 dimensions and a 'double'
-     // element type.  MultiPatch indicates multiple computation patches,
-     // i.e., distributed computation.  The UniformTag indicates the
- !   // patches should have the same size.  Each patch has Brick type.
- !   Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > a(layout);
- !   Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > b(layout);
-   
-     // Set up the initial conditions.
- --- 85,100 ----
-     // dimension.  Guard layers optimize communication between patches.
-     // Internal guards surround each patch.  External guards surround
- !   // the entire array domain.  <co id="tutorial-array_distributed-doof2d-layout"></co>
- !   UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! 				    GuardLayers<2>(1),  // internal
- ! 				    GuardLayers<2>(0)); // external
- !   UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-   
-     // The template parameters indicate 2 dimensions and a 'double'
-     // element type.  MultiPatch indicates multiple computation patches,
-     // i.e., distributed computation.  The UniformTag indicates the
- !   // patches should have the same size.  Each patch has Brick type.  <co id="tutorial-array_distributed-doof2d-remote"></co>
- !   Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > a(layout);
- !   Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > b(layout);
-   
-     // Set up the initial conditions.
- ***************
- *** 104,112 ****
-   
-     // Create the stencil performing the computation.
- !   Stencil<DoofNinePt> stencil;
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.
-       a(interiorDomain) = stencil(b, interiorDomain);
-   
- --- 106,114 ----
-   
-     // Create the stencil performing the computation.
- !   Stencil<DoofNinePt> stencil;
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.  <co id="tutorial-array_distributed-doof2d-first_write"></co>
-       a(interiorDomain) = stencil(b, interiorDomain);
-   
- ***************
- *** 117,121 ****
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The arrays are automatically deallocated.
- --- 119,123 ----
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The arrays are automatically deallocated.
- ***************
- *** 125,126 ****
- --- 127,129 ----
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Array-element-annotated.patch
===================================================================
RCS file: Doof2d-Array-element-annotated.patch
diff -N Doof2d-Array-element-annotated.patch
*** /tmp/cvslmAiwW	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,143 ****
- *** Doof2d-Array-element.cpp	Tue Dec  4 12:02:10 2001
- --- Doof2d-Array-element-annotated.cpp	Tue Dec  4 12:24:25 2001
- ***************
- *** 1,5 ****
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
- ! #include "Pooma/Arrays.h"	// has Pooma's Array
-   
-   // Doof2d: Pooma Arrays, element-wise implementation
- --- 1,6 ----
- ! <programlisting id="tutorial-array_elementwise-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
- ! #include "Pooma/Arrays.h"	// has Pooma's Array  <co id="tutorial-array_elementwise-doof2d-header"></co>
-   
-   // Doof2d: Pooma Arrays, element-wise implementation
- ***************
- *** 7,17 ****
-   int main(int argc, char *argv[])
-   {
- !   // Prepare the Pooma library for execution.
-     Pooma::initialize(argc,argv);
-     
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- --- 8,18 ----
-   int main(int argc, char *argv[])
-   {
- !   // Prepare the Pooma library for execution.  <co id="tutorial-array_elementwise-doof2d-pooma_initialize"></co>
-     Pooma::initialize(argc,argv);
-     
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- ***************
- *** 19,37 ****
-     // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
- !   // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
- !   // Create the arrays.
-     // The template parameters indicate 2 dimensions, a 'double' element
-     // type, and ordinary 'Brick' storage.
- !   Array<2, double, Brick> a(vertDomain);
- !   Array<2, double, Brick> b(vertDomain);
-   
-     // Set up the initial conditions.
- !   // All grid values should be zero except for the central value.
-     for (int j = 1; j < n-1; j++)
-       for (int i = 1; i < n-1; i++)
- --- 20,38 ----
-     // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
- !   // Specify the arrays' domains [0,n) x [0,n).  <co id="tutorial-array_elementwise-doof2d-domain"></co>
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
- !   // Create the arrays.  <co id="tutorial-array_elementwise-doof2d-array_creation"></co>
-     // The template parameters indicate 2 dimensions, a 'double' element
-     // type, and ordinary 'Brick' storage.
- !   Array<2, double, Brick> a(vertDomain);
- !   Array<2, double, Brick> b(vertDomain);
-   
-     // Set up the initial conditions.
- !   // All grid values should be zero except for the central value.  <co id="tutorial-array_elementwise-doof2d-initialization"></co>
-     for (int j = 1; j < n-1; j++)
-       for (int i = 1; i < n-1; i++)
- ***************
- *** 43,51 ****
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
-       // Read from b.  Write to a.
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
- !         a(i,j) = weight *
-             (b(i+1,j+1) + b(i+1,j  ) + b(i+1,j-1) +
-              b(i  ,j+1) + b(i  ,j  ) + b(i  ,j-1) +
- --- 44,52 ----
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
-       // Read from b.  Write to a.
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
- !         a(i,j) = weight *  <co id="tutorial-array_elementwise-doof2d-first_write"></co>
-             (b(i+1,j+1) + b(i+1,j  ) + b(i+1,j-1) +
-              b(i  ,j+1) + b(i  ,j  ) + b(i  ,j-1) +
- ***************
- *** 53,58 ****
-   
-       // Read from a.  Write to b.
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
-           b(i,j) = weight *
-             (a(i+1,j+1) + a(i+1,j  ) + a(i+1,j-1) +
- --- 54,59 ----
-   
-       // Read from a.  Write to b.
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
-           b(i,j) = weight *
-             (a(i+1,j+1) + a(i+1,j  ) + a(i+1,j-1) +
- ***************
- *** 62,71 ****
-   
-     // Print out the final central value.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
- !   // The arrays are automatically deallocated.
-   
- !   // Tell the Pooma library execution has finished.
-     Pooma::finalize();
-     return EXIT_SUCCESS;
-   }
- --- 63,74 ----
-   
-     // Print out the final central value.
- !   Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
- !   // The arrays are automatically deallocated.  <co id="tutorial-array_elementwise-doof2d-deallocation"></co>
-   
- !   // Tell the Pooma library execution has finished.  <co id="tutorial-array_elementwise-doof2d-pooma_finish"></co>
-     Pooma::finalize();
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Array-parallel-annotated.patch
===================================================================
RCS file: Doof2d-Array-parallel-annotated.patch
diff -N Doof2d-Array-parallel-annotated.patch
*** /tmp/cvsuReKr3	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,116 ****
- *** Doof2d-Array-parallel.cpp	Tue Dec  4 11:49:43 2001
- --- Doof2d-Array-parallel-annotated.cpp	Tue Dec  4 12:24:36 2001
- ***************
- *** 1,4 ****
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Arrays.h"	// has Pooma's Array
-   
- --- 1,5 ----
- ! <programlisting id="tutorial-array_parallel-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Arrays.h"	// has Pooma's Array
-   
- ***************
- *** 12,17 ****
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- --- 13,18 ----
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- ***************
- *** 19,43 ****
-     // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
-     // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
- !   // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<1> J(1,n-2);
-   
-     // Create the arrays.
-     // The template parameters indicate 2 dimensions, a 'double' element
-     // type, and ordinary 'Brick' storage.
- !   Array<2, double, Brick> a(vertDomain);
- !   Array<2, double, Brick> b(vertDomain);
-   
-     // Set up the initial conditions.
-     // All grid values should be zero except for the central value.
-     a = b = 0.0;
- !   // Ensure all data-parallel computation finishes before accessing a value.
-     Pooma::blockAndEvaluate();
-     b(n/2,n/2) = 1000.0;
- --- 20,44 ----
-     // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
-     // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
- !   // Set up interior domains [1,n-1) x [1,n-1) for computation.  <co id="tutorial-array_parallel-doof2d-innerdomain"></co>
- !   Interval<1> I(1,n-2);
- !   Interval<1> J(1,n-2);
-   
-     // Create the arrays.
-     // The template parameters indicate 2 dimensions, a 'double' element
-     // type, and ordinary 'Brick' storage.
- !   Array<2, double, Brick> a(vertDomain);
- !   Array<2, double, Brick> b(vertDomain);
-   
-     // Set up the initial conditions.
-     // All grid values should be zero except for the central value.
-     a = b = 0.0;
- !   // Ensure all data-parallel computation finishes before accessing a value.  <co id="tutorial-array_parallel-doof2d-blockAndEvaluate"></co>
-     Pooma::blockAndEvaluate();
-     b(n/2,n/2) = 1000.0;
- ***************
- *** 47,52 ****
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.
-       a(I,J) = weight *
-         (b(I+1,J+1) + b(I+1,J  ) + b(I+1,J-1) +
- --- 48,53 ----
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.  <co id="tutorial-array_parallel-doof2d-first_write"></co>
-       a(I,J) = weight *
-         (b(I+1,J+1) + b(I+1,J  ) + b(I+1,J-1) +
- ***************
- *** 63,67 ****
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The arrays are automatically deallocated.
- --- 64,68 ----
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The arrays are automatically deallocated.
- ***************
- *** 71,72 ****
- --- 72,74 ----
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Array-stencil-annotated.patch
===================================================================
RCS file: Doof2d-Array-stencil-annotated.patch
diff -N Doof2d-Array-stencil-annotated.patch
*** /tmp/cvsLwSPO9	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,152 ****
- *** Doof2d-Array-stencil.cpp	Tue Dec  4 11:49:39 2001
- --- Doof2d-Array-stencil-annotated.cpp	Tue Dec  4 12:26:46 2001
- ***************
- *** 1,9 ****
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Arrays.h"	// has Pooma's Array
-   
-   // Doof2d: Pooma Arrays, stencil implementation
-   
- ! // Define the stencil class performing the computation.
-   class DoofNinePt
-   {
- --- 1,10 ----
- ! <programlisting id="tutorial-array_stencil-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Arrays.h"	// has Pooma's Array
-   
-   // Doof2d: Pooma Arrays, stencil implementation
-   
- ! // Define the stencil class performing the computation.  <co id="tutorial-array_stencil-doof2d-stencil"></co>
-   class DoofNinePt
-   {
- ***************
- *** 14,19 ****
-     // This stencil operator is applied to each interior domain position
-     // (i,j).  The "C" template parameter permits use of this stencil
- !   // operator with both Arrays and Fields.
- !   template <class C>
-     inline
-     typename C::Element_t
- --- 15,20 ----
-     // This stencil operator is applied to each interior domain position
-     // (i,j).  The "C" template parameter permits use of this stencil
- !   // operator with both Arrays and Fields.  <co id="tutorial-array_stencil-doof2d-stencil_operator"></co>
- !   template <class C>
-     inline
-     typename C::Element_t
- ***************
- *** 26,30 ****
-     }
-   
- !   inline int lowerExtent(int) const { return 1; }
-     inline int upperExtent(int) const { return 1; }
-   
- --- 27,31 ----
-     }
-   
- !   inline int lowerExtent(int) const { return 1; }  <co id="tutorial-array_stencil-doof2d-stencil_extent"></co>
-     inline int upperExtent(int) const { return 1; }
-   
- ***************
- *** 42,47 ****
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- --- 43,48 ----
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- ***************
- *** 49,68 ****
-     // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
-     // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<2> interiorDomain(I,I);
-   
-     // Create the arrays.
-     // The template parameters indicate 2 dimensions, a 'double' element
-     // type, and ordinary 'Brick' storage.
- !   Array<2, double, Brick> a(vertDomain);
- !   Array<2, double, Brick> b(vertDomain);
-   
-     // Set up the initial conditions.
- --- 50,69 ----
-     // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
-     // Specify the arrays' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<2> interiorDomain(I,I);
-   
-     // Create the arrays.
-     // The template parameters indicate 2 dimensions, a 'double' element
-     // type, and ordinary 'Brick' storage.
- !   Array<2, double, Brick> a(vertDomain);
- !   Array<2, double, Brick> b(vertDomain);
-   
-     // Set up the initial conditions.
- ***************
- *** 73,82 ****
-     b(n/2,n/2) = 1000.0;
-   
- !   // Create the stencil performing the computation.
- !   Stencil<DoofNinePt> stencil;
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.
-       a(interiorDomain) = stencil(b, interiorDomain);
-   
- --- 74,83 ----
-     b(n/2,n/2) = 1000.0;
-   
- !   // Create the stencil performing the computation.  <co id="tutorial-array_stencil-doof2d-stencil_creation"></co>
- !   Stencil<DoofNinePt> stencil;
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.  <co id="tutorial-array_stencil-doof2d-first_write"></co>
-       a(interiorDomain) = stencil(b, interiorDomain);
-   
- ***************
- *** 87,91 ****
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The arrays are automatically deallocated.
- --- 88,92 ----
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The arrays are automatically deallocated.
- ***************
- *** 95,96 ****
- --- 96,98 ----
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-C-element-annotated.patch
===================================================================
RCS file: Doof2d-C-element-annotated.patch
diff -N Doof2d-C-element-annotated.patch
*** /tmp/cvs2hDHVf	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,150 ****
- *** Doof2d-C-element.cpp	Tue Nov 27 08:36:38 2001
- --- Doof2d-C-element-annotated.cpp	Tue Nov 27 12:08:03 2001
- ***************
- *** 1,4 ****
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   
-   // Doof2d: C-like, element-wise implementation
- --- 1,5 ----
- ! <programlisting id="tutorial-hand_coded-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   
-   // Doof2d: C-like, element-wise implementation
- ***************
- *** 6,30 ****
-   int main()
-   {
- !   // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- !   // Use two-dimensional grids of values.
-     double **a;
-     double **b;
-   
-     // Ask the user for the number n of elements along one dimension of
- !   // the grid.
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
- !   // Allocate the arrays.
-     typedef double* doublePtr;
-     a = new doublePtr[n];
-     b = new doublePtr[n];
- !   for (int i = 0; i < n; i++) {
-       a[i] = new double[n];
-       b[i] = new double[n];
- --- 7,31 ----
-   int main()
-   {
- !   // Ask the user for the number of averagings.  <co id="tutorial-hand_coded-doof2d-nuaveragings"></co>
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- !   // Use two-dimensional grids of values.  <co id="tutorial-hand_coded-doof2d-array_storage"></co>
-     double **a;
-     double **b;
-   
-     // Ask the user for the number n of elements along one dimension of
- !   // the grid.  <co id="tutorial-hand_coded-doof2d-grid_size"></co>
-     long n;
- !   std::cout << "Please enter the array size: ";
- !   std::cin >> n;
-   
- !   // Allocate the arrays.  <co id="tutorial-hand_coded-doof2d-allocation"></co>
-     typedef double* doublePtr;
-     a = new doublePtr[n];
-     b = new doublePtr[n];
- !   for (int i = 0; i < n; i++) {
-       a[i] = new double[n];
-       b[i] = new double[n];
- ***************
- *** 32,49 ****
-   
-     // Set up the initial conditions.
- !   // All grid values should be zero except for the central value.
- !   for (int j = 0; j < n; j++)
- !     for (int i = 0; i < n; i++)
-         a[i][j] = b[i][j] = 0.0;
-     b[n/2][n/2] = 1000.0;
-   
- !   // In the average, weight elements with this value.
-     const double weight = 1.0/9.0;
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
-           a[i][j] = weight *
-             (b[i+1][j+1] + b[i+1][j  ] + b[i+1][j-1] +
- --- 33,50 ----
-   
-     // Set up the initial conditions.
- !   // All grid values should be zero except for the central value.  <co id="tutorial-hand_coded-doof2d-initialization"></co>
- !   for (int j = 0; j < n; j++)
- !     for (int i = 0; i < n; i++)
-         a[i][j] = b[i][j] = 0.0;
-     b[n/2][n/2] = 1000.0;
-   
- !   // In the average, weight elements with this value.  <co id="tutorial-hand_coded-doof2d-constants"></co>
-     const double weight = 1.0/9.0;
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.  <co id="tutorial-hand_coded-doof2d-first_write"></co>
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
-           a[i][j] = weight *
-             (b[i+1][j+1] + b[i+1][j  ] + b[i+1][j-1] +
- ***************
- *** 51,57 ****
-              b[i-1][j+1] + b[i-1][j  ] + b[i-1][j-1]);
-   
- !     // Read from a.  Write to b.
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
-           b[i][j] = weight *
-             (a[i+1][j+1] + a[i+1][j  ] + a[i+1][j-1] +
- --- 52,58 ----
-              b[i-1][j+1] + b[i-1][j  ] + b[i-1][j-1]);
-   
- !     // Read from a.  Write to b.  <co id="tutorial-hand_coded-doof2d-second_write"></co>
- !     for (int j = 1; j < n-1; j++)
- !       for (int i = 1; i < n-1; i++)
-           b[i][j] = weight *
-             (a[i+1][j+1] + a[i+1][j  ] + a[i+1][j-1] +
- ***************
- *** 60,68 ****
-     }
-   
- !   // Print out the final central value.
- !   std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl;
-   
- !   // Deallocate the arrays.
- !   for (int i = 0; i < n; i++) {
-       delete [] a[i];
-       delete [] b[i];
- --- 61,69 ----
-     }
-   
- !   // Print out the final central value.  <co id="tutorial-hand_coded-doof2d-answer"></co>
- !   std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl;
-   
- !   // Deallocate the arrays.  <co id="tutorial-hand_coded-doof2d-deallocation"></co>
- !   for (int i = 0; i < n; i++) {
-       delete [] a[i];
-       delete [] b[i];
- ***************
- *** 73,74 ****
- --- 74,76 ----
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Field-distributed-annotated.patch
===================================================================
RCS file: Doof2d-Field-distributed-annotated.patch
diff -N Doof2d-Field-distributed-annotated.patch
*** /tmp/cvsF2z45n	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,176 ****
- *** Doof2d-Field-distributed.cpp	Wed Dec  5 14:05:10 2001
- --- Doof2d-Field-distributed-annotated.cpp	Wed Dec  5 14:41:24 2001
- ***************
- *** 1,3 ****
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Fields.h"	// has Pooma's Field
-   
- --- 1,4 ----
- ! <programlisting id="tutorial-field_distributed-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <stdlib.h>		// has EXIT_SUCCESS
-   #include "Pooma/Fields.h"	// has Pooma's Field
-   
- ***************
- *** 12,16 ****
-     // canot use standard input and output.  Instead we use command-line
-     // arguments, which are replicated, for input, and we use an Inform
- !   // stream for output.
-     Inform output;
-   
- --- 13,17 ----
-     // canot use standard input and output.  Instead we use command-line
-     // arguments, which are replicated, for input, and we use an Inform
- !   // stream for output.  <co id="tutorial-field_distributed-doof2d-io"></co>
-     Inform output;
-   
- ***************
- *** 18,22 ****
-     if (argc != 4) {
-       // Incorrect number of command-line arguments.
- !     output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
-       return EXIT_FAILURE;
-     }
- --- 19,23 ----
-     if (argc != 4) {
-       // Incorrect number of command-line arguments.
- !     output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
-       return EXIT_FAILURE;
-     }
- ***************
- *** 25,33 ****
-     // Determine the number of processors.
-     long nuProcessors;
- !   nuProcessors = strtol(argv[1], &tail, 0);
-   
-     // Determine the number of averagings.
-     long nuAveragings, nuIterations;
- !   nuAveragings = strtol(argv[2], &tail, 0);
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- --- 26,34 ----
-     // Determine the number of processors.
-     long nuProcessors;
- !   nuProcessors = strtol(argv[1], &tail, 0);
-   
-     // Determine the number of averagings.
-     long nuAveragings, nuIterations;
- !   nuAveragings = strtol(argv[2], &tail, 0);
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- ***************
- *** 35,39 ****
-     // the grid.
-     long n;
- !   n = strtol(argv[3], &tail, 0);
-     // The dimension must be a multiple of the number of processors
-     // since we are using a UniformGridLayout.
- --- 36,40 ----
-     // the grid.
-     long n;
- !   n = strtol(argv[3], &tail, 0);
-     // The dimension must be a multiple of the number of processors
-     // since we are using a UniformGridLayout.
- ***************
- *** 41,50 ****
-   
-     // Specify the fields' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<1> J(1,n-2);
-   
-     // Partition the fields' domains uniformly, i.e., each patch has the
- --- 42,51 ----
-   
-     // Specify the fields' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<1> J(1,n-2);
-   
-     // Partition the fields' domains uniformly, i.e., each patch has the
- ***************
- *** 52,74 ****
-     // dimension.  Guard layers optimize communication between patches.
-     // Internal guards surround each patch.  External guards surround
- !   // the entire field domain.
- !   UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! 				    GuardLayers<2>(1),  // internal
- ! 				    GuardLayers<2>(0)); // external
- !   UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-   
-     // Specify the fields' mesh, i.e., its spatial extent, and its
- !   // centering type.
- !   UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- !   Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-   
-     // The template parameters indicate a mesh and a 'double'
-     // element type.  MultiPatch indicates multiple computation patches,
-     // i.e., distributed computation.  The UniformTag indicates the
- !   // patches should have the same size.  Each patch has Brick type.
- !   Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- !     Remote<Brick> > > a(cell, layout, mesh);
- !   Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- !     Remote<Brick> > > b(cell, layout, mesh);
-   
-     // Set up the initial conditions.
- --- 53,75 ----
-     // dimension.  Guard layers optimize communication between patches.
-     // Internal guards surround each patch.  External guards surround
- !   // the entire field domain.  <co id="tutorial-field_distributed-doof2d-layout"></co>
- !   UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! 				    GuardLayers<2>(1),  // internal
- ! 				    GuardLayers<2>(0)); // external
- !   UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-   
-     // Specify the fields' mesh, i.e., its spatial extent, and its
- !   // centering type.  <co id="tutorial-field_distributed-doof2d-mesh"></co>
- !   UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- !   Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-   
-     // The template parameters indicate a mesh and a 'double'
-     // element type.  MultiPatch indicates multiple computation patches,
-     // i.e., distributed computation.  The UniformTag indicates the
- !   // patches should have the same size.  Each patch has Brick type.  <co id="tutorial-field_distributed-doof2d-remote"></co>
- !   Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- !     Remote<Brick> > > a(cell, layout, mesh);
- !   Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- !     Remote<Brick> > > b(cell, layout, mesh);
-   
-     // Set up the initial conditions.
- ***************
- *** 83,87 ****
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
-       // Read from b.  Write to a.
-       a(I,J) = weight *
- --- 84,88 ----
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
-       // Read from b.  Write to a.
-       a(I,J) = weight *
- ***************
- *** 99,103 ****
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The fields are automatically deallocated.
- --- 100,104 ----
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The fields are automatically deallocated.
- ***************
- *** 107,108 ****
- --- 108,110 ----
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Field-parallel-annotated.patch
===================================================================
RCS file: Doof2d-Field-parallel-annotated.patch
diff -N Doof2d-Field-parallel-annotated.patch
*** /tmp/cvswOFpSv	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,120 ****
- *** Doof2d-Field-parallel.cpp	Tue Dec  4 10:01:28 2001
- --- Doof2d-Field-parallel-annotated.cpp	Tue Dec  4 11:04:26 2001
- ***************
- *** 1,5 ****
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
- ! #include "Pooma/Fields.h"	// has Pooma's Field
-   
-   // Doof2d: Pooma Fields, data-parallel implementation
- --- 1,6 ----
- ! <programlisting id="tutorial-field_parallel-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream>		// has std::cout, ...
- ! #include <stdlib.h>		// has EXIT_SUCCESS
- ! #include "Pooma/Fields.h"	// has Pooma's Field  <co id="tutorial-field_parallel-doof2d-header"></co>
-   
-   // Doof2d: Pooma Fields, data-parallel implementation
- ***************
- *** 12,17 ****
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- --- 13,18 ----
-     // Ask the user for the number of averagings.
-     long nuAveragings, nuIterations;
- !   std::cout << "Please enter the number of averagings: ";
- !   std::cin >> nuAveragings;
-     nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-   
- ***************
- *** 19,44 ****
-     // the grid.
-     long n;
- !   std::cout << "Please enter the field size: ";
- !   std::cin >> n;
-   
-     // Specify the fields' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<1> J(1,n-2);
-   
-     // Specify the fields' mesh, i.e., its spatial extent, and its
- !   // centering type.
- !   DomainLayout<2> layout(vertDomain);
- !   UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- !   Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-   
-     // Create the fields.
-     // The template parameters indicate a mesh, a 'double' element
- !   // type, and ordinary 'Brick' storage.
- !   Field<UniformRectilinearMesh<2>, double, Brick> a(cell, layout, mesh);
- !   Field<UniformRectilinearMesh<2>, double, Brick> b(cell, layout, mesh);
-   
-     // Set up the initial conditions.
- --- 20,45 ----
-     // the grid.
-     long n;
- !   std::cout << "Please enter the field size: ";
- !   std::cin >> n;
-   
-     // Specify the fields' domains [0,n) x [0,n).
- !   Interval<1> N(0, n-1);
- !   Interval<2> vertDomain(N, N);
-   
-     // Set up interior domains [1,n-1) x [1,n-1) for computation.
- !   Interval<1> I(1,n-2);
- !   Interval<1> J(1,n-2);
-   
-     // Specify the fields' mesh, i.e., its spatial extent, and its
- !   // centering type.  <co id="tutorial-field_parallel-doof2d-mesh"></co>
- !   DomainLayout<2> layout(vertDomain);
- !   UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- !   Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-   
-     // Create the fields.
-     // The template parameters indicate a mesh, a 'double' element
- !   // type, and ordinary 'Brick' storage.  <co id="tutorial-field_parallel-doof2d-field_creation"></co>
- !   Field<UniformRectilinearMesh<2>, double, Brick> a(cell, layout, mesh);
- !   Field<UniformRectilinearMesh<2>, double, Brick> b(cell, layout, mesh);
-   
-     // Set up the initial conditions.
- ***************
- *** 51,56 ****
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.
-       a(I,J) = weight *
-         (b(I+1,J+1) + b(I+1,J  ) + b(I+1,J-1) +
- --- 52,57 ----
-   
-     // Perform the simulation.
- !   for (int k = 0; k < nuIterations; ++k) {
- !     // Read from b.  Write to a.  <co id="tutorial-field_parallel-doof2d-first_write"></co>
-       a(I,J) = weight *
-         (b(I+1,J+1) + b(I+1,J  ) + b(I+1,J-1) +
- ***************
- *** 67,71 ****
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The fields are automatically deallocated.
- --- 68,72 ----
-     // Print out the final central value.
-     Pooma::blockAndEvaluate();	// Ensure all computation has finished.
- !   std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-   
-     // The fields are automatically deallocated.
- ***************
- *** 75,76 ****
- --- 76,78 ----
-     return EXIT_SUCCESS;
-   }
- + </programlisting>
--- 0 ----
Index: programs/makefile
===================================================================
RCS file: makefile
diff -N makefile
*** /tmp/cvsfaiLlD	Fri Jan  4 10:14:11 2002
--- /dev/null	Fri Mar 23 21:37:44 2001
***************
*** 1,12 ****
- ### Oldham, Jeffrey D.
- ### 2001Nov27
- ### Pooma
- ###
- ### Produce Annotated Source Code
- 
- all: Doof2d-C-element-annotated.cpp Doof2d-Array-element-annotated.cpp \
-      Doof2d-Array-parallel-annotated.cpp Doof2d-Array-stencil-annotated.cpp \
-      Doof2d-Array-distributed-annotated.cpp
- 
- %-annotated.cpp: %-annotated.patch %.cpp
- 	patch -o $@ < $<
--- 0 ----