Patch: Recent Manual Changes
Jeffrey Oldham
oldham at codesourcery.com
Fri Jan 4 10:44:00 UTC 2002
This patch mainly adds (mostly finished) chapters on understanding and
using data-parallel operators and templates to the R2 manual that is
being written.
2002-Jan-04 Jeffrey D. Oldham <oldham at codesourcery.com>
* bibliography.xml: New file containing bibliographic information.
* concepts.xml: Clarify containers that map indices to values.
* glossary.xml: Add entries for compilation time, compile time,
conformable containers, conformable domains, execution time,
instantiation, programming time, run time, template instantiation,
trait, traits class, Turing complete.
* introduction.xml: Many minor changes mainly involving formatting
and word choice. Add sections discussing program execution speed
and open-source software.
* manual.xml: Add several new entity definitions. Add unfinished
chapter discussing writing programs using templates. Add unfinished
data-parallel operator chapter. Many other minor changes. Move
bibliography to separate file.
* tutorial.xml: Minor wordsmithing changes.
* figures/box-macros.mp: New file containing macros to create
boxes in illustrations.
* figures/data-parallel.mp: New file illustrating data-parallel
operations.
* figures/doof2d.mp: Replace definitions with inclusion of
grid-macros.mp.
* figures/grid-macros.mp: New file containing macros to create
grids.
* figures/introduction.mp: Use box-macros.mp.
* programs/Doof2d-Array-distributed-annotated.patch: Moved to
different directory.
* programs/Doof2d-Array-element-annotated.patch: Likewise.
* programs/Doof2d-Array-parallel-annotated.patch: Likewise.
* programs/Doof2d-Array-stencil-annotated.patch: Likewise.
* programs/Doof2d-C-element-annotated.patch: Likewise.
* programs/Doof2d-Field-distributed-annotated.patch: Likewise.
* programs/Doof2d-Field-parallel-annotated.patch: Likewise.
* programs/makefile: Likewise.
Applied to mainline.
Thanks,
Jeffrey D. Oldham
oldham at codesourcery.com
-------------- next part --------------
Index: bibliography.xml
===================================================================
RCS file: bibliography.xml
diff -N bibliography.xml
*** /dev/null Fri Mar 23 21:37:44 2001
--- bibliography.xml Fri Jan 4 10:14:05 2002
***************
*** 0 ****
--- 1,277 ----
+ <!-- Bibliography -->
+
+ <bibliography id="bibliography">
+ <title>Bibliography</title>
+
+ <para>FIXME: How do I process these entries?</para>
+
+ <biblioentry>
+ <abbrev>mpi99</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>William</firstname><surname>Gropp</surname>
+ </author>
+ <author>
+ <firstname>Ewing</firstname><surname>Lusk</surname>
+ </author>
+ <author>
+ <firstname>Anthony</firstname><surname>Skjellum</surname>
+ </author>
+ </authorgroup>
+ <copyright>
+ <year>1999</year>
+ <holder>Massachusetts Institute of Technology</holder>
+ </copyright>
+ <isbn>0-262-57132-3</isbn>
+ <publisher>
+ <publishername>The MIT Press</publishername>
+ <address>Cambridge, MA</address>
+ </publisher>
+ <title>Using MPI</title>
+ <subtitle>Portable Parallel Programming with the Message-Passing Interface</subtitle>
+ <edition>second edition</edition>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>pooma95</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
+ <affiliation>
+ <orgname>Dakota Software Systems, Inc.</orgname>
+ <address><city>Rapid City</city><state>SD</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Susan</firstname><othername role="mi">R.</othername><surname>Atlas</surname>
+ <affiliation>
+ <orgname>Parallel Solutions, Inc.</orgname>
+ <address><city>Santa Fe</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Subhankar</firstname><surname>Banerjee</surname>
+ <affiliation>
+ <orgname>New Mexico State University</orgname>
+ <address><city>Las Cruces</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
+ <affiliation>
+ <orgname>University of Illinois at Urbana-Champaign</orgname>
+ <address><city>Urbana-Champaign</city><state>IL</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
+ <affiliation>
+ <orgname>California Institute of Technology</orgname>
+ <address><city>Pasadena</city><state>CA</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Katarzyna</firstname><surname>Keahey</surname>
+ <affiliation>
+ <orgname>Indiana University</orgname>
+ <address><city>Bloomington</city><state>IN</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Marydell</firstname><surname>Tholburn</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ </authorgroup>
+ <title>&pooma;</title>
+ <subtitle>A Framework for Scientific Simulation on Parallel Architectures</subtitle>
+ <releaseinfo>unpublished</releaseinfo>
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>pooma-sc95</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>Susan</firstname><surname>Atlas</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Subhankar</firstname><surname>Banerjee</surname>
+ <affiliation>
+ <orgname>New Mexico State University</orgname>
+ <address><city>Las Cruces</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
+ <affiliation>
+ <orgname>Advanced Computing Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>M.</firstname><surname>Srikant</surname>
+ <affiliation>
+ <orgname>New Mexico State University</orgname>
+ <address><city>Las Cruces</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Marydell</firstname><surname>Tholburn</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ </authorgroup>
+ <title>&pooma;</title>
+ <subtitle>A High Performance Distributed Simulation Environment for
+ Scientific Applications</subtitle>
+ <!-- FIXME: Where list Supercomputing 1995? -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>pooma-siam98</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>James</firstname><othername role="mi">A.</othername><surname>Crotinger</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Scott</firstname><othername role="mi">W.</othername><surname>Haney</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Stephen</firstname><othername role="mi">A.</othername><surname>Smith</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ <author>
+ <firstname>Timothy</firstname><othername role="mi">J.</othername><surname>Williams</surname>
+ <affiliation>
+ <orgname>Los Alamos National Laboratory</orgname>
+ <address><city>Los Alamos</city><state>NM</state></address>
+ </affiliation>
+ </author>
+ </authorgroup>
+ <title>Raid Application Development and Enhanced Code
+ Interoperability using the &pooma; Framework</title>
+ <!-- FIXME: Where list SIAM Workshop ... 1998? -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>pete-99</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>Scott</firstname><surname>Haney</surname>
+ </author>
+ <author>
+ <firstname>James</firstname><surname>Crotinger</surname>
+ </author>
+ <author>
+ <firstname>Steve</firstname><surname>Karmesin</surname>
+ </author>
+ <author>
+ <firstname>Stephen</firstname><surname>Smith</surname>
+ </author>
+ </authorgroup>
+ <title>&pete;: The Portable Expression Template Engine. 1999 October,
+ \emph{Dr. Dobb's Journal}, vol.24, nu.10, pp.88--95</title>
+ <!-- FIXME: Fix the tagging. -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>veldhuizen-95</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>Todd</firstname><surname>Veldhuizen</surname>
+ </author>
+ </authorgroup>
+ <title>Expression Templates. 1995 June, \emph{&cc; Report}, vol.7,
+ nu.5, pp.26--31. Also available at http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html</title>
+ <!-- FIXME: Fix the tagging. -->
+ </biblioentry>
+
+ <biblioentry>
+ <abbrev>vandevoorde-95</abbrev>
+ <authorgroup>
+ <author>
+ <firstname>David</firstname><surname>Vandevoorde</surname>
+ </author>
+ </authorgroup>
+ <title>\texttt{valarray<Troy>}: An Implementation of a Numerical
+ Array. 1995. unpublished. Available at ftp://ftp.cs.rpi.edu/pub/vandevod/Valarray/Documents/valarray.ps.</title>
+ <!-- FIXME: Fix the tagging. -->
+ </biblioentry>
+
+
+ </bibliography>
Index: concepts.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v
retrieving revision 1.3
diff -c -p -r1.3 concepts.xml
*** concepts.xml 2001/12/17 17:27:41 1.3
--- concepts.xml 2002/01/04 17:14:05
***************
*** 343,349 ****
<imagedata fileref="figures/concepts.101" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>maps from indices to values</phrase>
</textobject>
</mediaobject>
</figure>
--- 343,349 ----
<imagedata fileref="figures/concepts.101" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>&array;s and &field;s map from indices to values.</phrase>
</textobject>
</mediaobject>
</figure>
Index: glossary.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v
retrieving revision 1.4
diff -c -p -r1.4 glossary.xml
*** glossary.xml 2001/12/17 17:27:41 1.4
--- glossary.xml 2002/01/04 17:14:06
***************
*** 91,96 ****
--- 91,112 ----
</glossdef>
</glossentry>
+ <glossentry id="glossary-compilation_time">
+ <glossterm>compilation time</glossterm>
+ <glosssee otherterm="glossary-compilation_time"></glosssee>
+ </glossentry>
+
+ <glossentry id="glossary-compile_time">
+ <glossterm>compile time</glossterm>
+ <glossdef>
+ <para>time in the process from writing a program to executing it
+ when the program is compiled by a compiler. This is also called
+ <firstterm>compilation time</firstterm>.</para>
+ <glossseealso otherterm="glossary-programming_time">programming time</glossseealso>
+ <glossseealso otherterm="glossary-run_time">run time</glossseealso>
+ </glossdef>
+ </glossentry>
+
<glossentry id="glossary-computing_environment">
<glossterm>computing environment</glossterm>
<glossdef>
***************
*** 102,107 ****
--- 118,145 ----
</glossdef>
</glossentry>
+ <glossentry id="glossary-conformable_containers">
+ <glossterm>conformable containers</glossterm>
+ <glossdef>
+ <para>containers with conformable domains.</para>
+ <glossseealso otherterm="glossary-conformable_domains">conformable domains</glossseealso>
+ <glossseealso otherterm="glossary-data_parallel">data parallel</glossseealso>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="glossary-conformable_domains">
+ <glossterm>conformable domains</glossterm>
+ <glossdef>
+ <para>domains with the <quote>same shape</quote> so that
+ corresponding dimensions have the same number of elements.
+ Scalars, deemed conformable with any domain, get
+ <quote>expanded</quote> to the domain's shape. Binary operators
+ can operate on containers with conformable domains.</para>
+ <glossseealso otherterm="glossary-conformable_containers">conformable containers</glossseealso>
+ <glossseealso otherterm="glossary-data_parallel">data parallel</glossseealso>
+ </glossdef>
+ </glossentry>
+
<glossentry id="glossary-container">
<glossterm>container</glossterm>
<glossdef>
***************
*** 240,245 ****
--- 278,288 ----
</glossdef>
</glossentry>
+ <glossentry id="glossary-execution_time">
+ <glossterm>execution time</glossterm>
+ <glosssee otherterm="glossary-run_time"></glosssee>
+ </glossentry>
+
<glossentry id="glossary-external_guard_layer">
<glossterm>external guard layer</glossterm>
<glossdef>
***************
*** 297,311 ****
<glossdef>
<para>domain surrounding each patch of a container's domain. It
contains read-only values. <link
! linkend="glossary-external_guard_layer">External guard
layer</link>s ease programming, while <link
! linkend="glossary-internal_guard_layer">internal guard
layer</link>s permit each patch's computation to be occur without
! copying values from adjacent patches. They are optimizations,
! not required for program correctness.</para>
! <glossseealso otherterm="glossary-external_guard_layer">external guard layer</glossseealso>
! <glossseealso otherterm="glossary-internal_guard_layer">internal guard layer</glossseealso>
! <glossseealso otherterm="glossary-partition">partition</glossseealso>
<glossseealso otherterm="glossary-patch">patch</glossseealso>
<glossseealso otherterm="glossary-domain">domain</glossseealso>
</glossdef>
--- 340,356 ----
<glossdef>
<para>domain surrounding each patch of a container's domain. It
contains read-only values. <link
! linkend="glossary-external_guard_layer">External guard
layer</link>s ease programming, while <link
! linkend="glossary-internal_guard_layer">internal guard
layer</link>s permit each patch's computation to be occur without
! copying values from adjacent patches. They are optimizations, not
! required for program correctness.</para> <glossseealso
! otherterm="glossary-external_guard_layer">external guard
! layer</glossseealso> <glossseealso
! otherterm="glossary-internal_guard_layer">internal guard
! layer</glossseealso> <glossseealso
! otherterm="glossary-partition">partition</glossseealso>
<glossseealso otherterm="glossary-patch">patch</glossseealso>
<glossseealso otherterm="glossary-domain">domain</glossseealso>
</glossdef>
***************
*** 319,331 ****
<glossterm>index</glossterm>
<glossdef>
<para>a position in a <link
! linkend="glossary-domain">domain</link> usually denoted by an
ordered tuple. More than one index are called <link
! linkend="glossary-indices">indices</link>.</para>
! <glossseealso otherterm="glossary-domain">domain</glossseealso>
</glossdef>
</glossentry>
<glossentry id="glossary-indices">
<glossterm>indices</glossterm>
<glossdef>
--- 364,381 ----
<glossterm>index</glossterm>
<glossdef>
<para>a position in a <link
! linkend="glossary-domain">domain</link> usually denoted by an
ordered tuple. More than one index are called <link
! linkend="glossary-indices">indices</link>.</para> <glossseealso
! otherterm="glossary-domain">domain</glossseealso>
</glossdef>
</glossentry>
+ <glossentry id="glossary-instantiation">
+ <glossterm>instantiation</glossterm>
+ <glosssee>template instantiation</glosssee>
+ </glossentry>
+
<glossentry id="glossary-indices">
<glossterm>indices</glossterm>
<glossdef>
***************
*** 439,444 ****
--- 489,504 ----
<glossseealso otherterm="glossary-index">index</glossseealso>
</glossdef>
</glossentry>
+
+ <glossentry id="glossary-programming_time">
+ <glossterm>programming time</glossterm>
+ <glossdef>
+ <para>time in the process from writing a program to executing it
+ when the program is being written by a programmer.</para>
+ <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
+ <glossseealso otherterm="glossary-run_time">run time</glossseealso>
+ </glossdef>
+ </glossentry>
</glossdiv>
<glossdiv id="glossary-r">
***************
*** 480,485 ****
--- 540,556 ----
<glossseealso otherterm="glossary-stencil">stencil</glossseealso>
</glossdef>
</glossentry>
+
+ <glossentry id="glossary-run_time">
+ <glossterm>run time</glossterm>
+ <glossdef>
+ <para>time in the process from writing a program to executing it
+ when the program is executed. This is also called
+ <firstterm>execution time</firstterm>.</para>
+ <glossseealso otherterm="glossary-compile_time">compile time</glossseealso>
+ <glossseealso otherterm="glossary-programming_time">programming time</glossseealso>
+ </glossdef>
+ </glossentry>
</glossdiv>
<glossdiv id="glossary-s">
***************
*** 541,546 ****
--- 612,629 ----
<glossdiv id="glossary-t">
<title>T</title>
+ <glossentry id="glossary-template_instantiation">
+ <glossterm>template instantiation</glossterm>
+ <glossdef>
+ <para>applying a template class to template parameters to create a
+ type. For example, <statement>foo<double,3></statement>
+ instantiates <statement>template <typename T, int n> class
+ foo</statement> with the type &double; and the constant
+ integer 3. Template instantiation is analogous to applying a
+ function to function arguments.</para>
+ </glossdef>
+ </glossentry>
+
<glossentry id="glossary-tensor">
<glossterm>&tensor;</glossterm>
<glossdef>
***************
*** 558,563 ****
--- 641,673 ----
mathematical matrices as first-class objects.</para>
<glossseealso otherterm="glossary-tensor">&tensor;</glossseealso>
<glossseealso otherterm="glossary-vector">&vector;</glossseealso>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="glossary-trait">
+ <glossterm>trait</glossterm>
+ <glossdef>
+ <para>a characteristic of a type.</para>
+ <glossseealso otherterm="glossary-traits_class">traits class</glossseealso>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="glossary-traits_class">
+ <glossterm>traits class</glossterm>
+ <glossdef>
+ <para>a class containing one or more traits all describing a
+ particular type's chacteristics.</para>
+ <glossseealso otherterm="glossary-trait">trait</glossseealso>
+ </glossdef>
+ </glossentry>
+
+ <glossentry id="glossary-Turing_complete">
+ <glossterm>Turing complete</glossterm>
+ <glossdef>
+ <para>describes a language that can compute anything that can be
+ computed. That is, the language for computation is as powerful as
+ it can be. Most wide-spread programming languages are
+ Turing-complete, including &cc;, &c;, and &fortran;.</para>
</glossdef>
</glossentry>
</glossdiv>
Index: introduction.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/introduction.xml,v
retrieving revision 1.1
diff -c -p -r1.1 introduction.xml
*** introduction.xml 2001/12/17 17:27:41 1.1
--- introduction.xml 2002/01/04 17:14:06
***************
*** 2,21 ****
<title>Introduction</title>
<para>The Parallel Object-Oriented Methods and Applications
! <acronym>POOMA</acronym> &toolkitcap; is a &cc; &toolkit; for
! writing high-performance scientific programs for sequential and
! distributed computation. The &toolkit; provides a variety of
! tools:
<itemizedlist spacing="compact">
<listitem>
<para>containers and other abstractions suitable for scientific
computation,</para>
</listitem>
<listitem>
- <para>several container storage classes to reduce a program's
- storage requirements,</para>
- </listitem>
- <listitem>
<para>support for a variety of computation modes including
data-parallel expressions, stencil-based computations, and lazy
evaluation,</para>
--- 2,16 ----
<title>Introduction</title>
<para>The Parallel Object-Oriented Methods and Applications
! (<acronym>POOMA</acronym>) &toolkitcap; is a &cc; &toolkit; for
! writing high-performance scientific programs. The &toolkit; provides
! a variety of tools:
<itemizedlist spacing="compact">
<listitem>
<para>containers and other abstractions suitable for scientific
computation,</para>
</listitem>
<listitem>
<para>support for a variety of computation modes including
data-parallel expressions, stencil-based computations, and lazy
evaluation,</para>
***************
*** 25,31 ****
</listitem>
<listitem>
<para>automatic creation of all interprocessor communication for
! parallel and distributed programs, and</para>
</listitem>
<listitem>
<para>automatic out-of-order execution and loop rearrangement
--- 20,30 ----
</listitem>
<listitem>
<para>automatic creation of all interprocessor communication for
! parallel and distributed programs</para>
! </listitem>
! <listitem>
! <para>several container storage classes to reduce a program's
! storage requirements, and</para>
</listitem>
<listitem>
<para>automatic out-of-order execution and loop rearrangement
***************
*** 34,53 ****
</itemizedlist>
Since the &toolkit; provides high-level abstractions, &pooma;
programs are much shorter than corresponding &fortran; or &c;
! programs, requiring less time to write and less time to debug.
! Using these high-level abstractions, the same code runs on a wide
! variety of computers almost as fast as carefully crafted
! machine-specific hand-written programs. The &toolkit; is freely
! available, open-source software compatible with any modern &cc;
! compiler.</para>
! <formalpara><title>&pooma; Goals.</title>
<para>The goals for the &poomatoolkit; have remained unchanged
! since its inception in 1994:
<orderedlist>
<listitem>
<para>Code portability across serial, distributed, and parallel
! architectures with no change to source code.</para>
</listitem>
<listitem>
<para>Development of reusable, cross-problem-domain components
--- 33,55 ----
</itemizedlist>
Since the &toolkit; provides high-level abstractions, &pooma;
programs are much shorter than corresponding &fortran; or &c;
! programs and require less time to write and less time to debug.
! Using these high-level abstractions, the same code runs on a
! sequential, parallel, and distributed computers. It runs almost as
! fast as carefully crafted machine-specific hand-written programs.
! The &toolkit; is freely available, open-source software compatible
! with any modern &cc; compiler.</para>
!
! <section id="introduction-goals">
! <title>&pooma; Goals</title>
!
<para>The goals for the &poomatoolkit; have remained unchanged
! since its conception in 1994:
<orderedlist>
<listitem>
<para>Code portability across serial, distributed, and parallel
! architectures without any change to the source code.</para>
</listitem>
<listitem>
<para>Development of reusable, cross-problem-domain components
***************
*** 58,66 ****
scientific simulation.</para>
</listitem>
<listitem>
! <para>[&toolkitcap;] design and development driven by
! applications from a diverse set of scientific problem
! domains.</para>
</listitem>
<listitem>
<para>Shorter time from problem inception to working parallel
--- 60,67 ----
scientific simulation.</para>
</listitem>
<listitem>
! <para>&toolkitcap; design and development driven by applications
! from a diverse set of scientific problem domains.</para>
</listitem>
<listitem>
<para>Shorter time from problem inception to working parallel
***************
*** 68,296 ****
<!-- FIXME: Add citation to pooma95, p. 3 -->
</listitem>
</orderedlist>
! </para>
! </formalpara>
- <formalpara><title>Code Portability for Sequential and Distributed Programs.</title>
- <para>&pooma; programs run on sequential, distributed, and parallel
- computers with no change in source code. The programmer writes two
- or three lines specifying how each container's domain should be
- distributed among available processors. Using these directives and
- run-time information about the computer's configuration, the
- &toolkit; automatically distributes pieces of the container
- domains, called <firstterm>patch</firstterm>es, among the available
- processors. If a computation needs values from another patch,
- &pooma; automatically passes the value to the place it is needed.
- The same program, and even the same executable, works regardless of
- the number of the available processors and the size of the
- containers' domains. A programmer interested in only sequential
- execution can omit the two or three lines specifying how the
- domains are to be distributed.</para>
- </formalpara>
-
- <figure float="1" id="introduction-science_algorithms">
- <title>Science, Algorithms, Engineering, and &pooma;</title>
- <mediaobject>
- <imageobject>
- <imagedata fileref="figures/introduction.101" format="EPS" align="center"></imagedata>
- </imageobject>
- <textobject>
- <phrase>how &pooma; helps translate algorithms into programs</phrase>
- </textobject>
- <caption>
- <para>In the translation from theoretical science and math to
- computational science and math to computer programs, &pooma;
- containers eases the translation of algorithms to computer
- programs.</para>
- </caption>
- </mediaobject>
- </figure>
-
- <formalpara><title>Rapid Application Development.</title>
- <para>The &poomatoolkit; is designed to enable rapid development of
- scientific and distributed applications. For example, its vector,
- matrix, and tensor classes model the corresponding mathematical
- concepts. Its &array; and &field; classes model the discrete
- spaces and mathematical arrays frequently found in computational
- science and math. See <xref
- linkend="introduction-science_algorithms"></xref>. The left column
- illustrates theoretical science and math, the middle column
- computational science and math, and the right column computer
- science implementations. For example, theoretical physics
- frequently uses continuous fields in three-dimension space, while
- algorithms for the corresponding computational physics problem
- usually uses discrete fields. &pooma; containers, classes, and
- functions ease the engineering to map these algorithms to computer
- programs. For example, the &pooma; &field; container models
- discrete fields; both map locations in discrete space to values and
- permit computations of spatial distances and values. The &pooma;
- &array; container models the mathematical concept of an array, used
- in numerical analysis.</para>
- </formalpara>
-
- <para>&pooma; containers support a variety of computation modes,
- easing transition of algorithms into code. For example, many
- algorithms for solving partial differential equations use
- stencil-based computations. &pooma; supports stencil-based
- computations on &array;s and &field;s. It also supports
- data-parallel computation. For computations where one &field;'s
- values is a function of several other &field;'s values, the
- programmer can specify a relation. Relations are lazily evaluated;
- whenever the dependent &field;'s values are needed and it is
- related to a &field; whose values have changed, the former
- &field;'s values are computed. Lazy evaluation also assists
- correctness by eliminating the (frequently forgotten) need for a
- programmer to ensure a &field;'s values are up-to-date before being
- used.</para>
-
- <formalpara><title>Efficient Code.</title>
- <para>&pooma; incorporates a variety of techniques to ensure it
- produces code that executes as quickly as special-case,
- hand-written code.
- <!-- FIXME: Do I present execution numbers here? -->
- These techniques include extensive use of templates, out-of-order
- evaluation to permit communication and computation to overlap,
- availability of guard layers to reduce processors' synchronicity,
- and use of &pete; to produce fast inner loops.</para>
- </formalpara>
-
- <para>Using templates permits the expressiveness of using pointers
- and function arguments but ensures as much as work as possible
- occurs at compile time, not run time. Also, more code is exposed
- to the compiler's optimizer, further speeding execution. For
- example, use of template parameters to define the &pooma; &array;
- container permits the use of specialized data storage classes
- called engines, fast creation of views of a portion of an &array;,
- and polymorphic indexing. An &array;'s engine template parameter
- specifies how data is stored and indexed. Some &array;s expect
- almost all values to be used, while others might be mostly empty.
- In the latter case, using a specialized engine storing the few
- nonzero values would greatly reduce space requirements. Using
- engines also permits fast creation of container views, known as
- <firstterm>array sections</firstterm> in Fortran 90. A view's
- engine is the same as the original container's engine, while the
- view object maps its restricted domain to the original domain.
- Space requirements and execution time are minimal. Using templates
- also permits containers to support polymorphic indexing, e.g.,
- indexing both by integers and by three-dimensional coordinates.
- For example, a container defers returning values to its engine
- using a templatized index operator. The engine can define indexing
- functions with different function arguments, without the need to
- add corresponding container functions. Some of these features can
- be expressed without using templates, but doing so increases
- execution time. For example, a container could have a pointer to
- an engine object, but this requires a pointer dereference for each
- operation. Implementing polymorphic indexing without templates
- would require adding virtual function corresponding to each of the
- indexing functions.</para>
-
- <!-- FIXME: Are the claims concerning out-of-order evaluation I make true? -->
-
- <para>To ensure multiprocessor &pooma; programs execute quickly, it
- is important that interprocessor communication overlaps with
- intraprocessor computation as much as possible and communication is
- minimized. Asynchronous communication, out-of-order evaluation, and
- use of guard layers all help achieve this. &pooma; uses the
- asynchronous communication facilities of the &cheetah; communication
- library. When a processor needs data stored or computed by another
- processor, a message is sent between the two. For synchronous
- communication, the sender must issue an explicit send, and the
- recipient must issue an explicit receive. This synchronizes them.
- &cheetah; permits the sender to put and get data without the
- intervention of the remote site and also invoke functions at the
- remote site to ensure the data is up-to-date. Thus, out-of-order
- evaluation must be supported. Out-of-order evaluation has another
- benefit: only computations directly or indirectly related to values
- that are printed need occur.</para>
-
- <para>Using guard layers also helps overlap communication and
- computation. For distributed computation, each container's domain is
- split into pieces distributed among the available processors.
- Frequently, computing a container value is local, involving just the
- value itself and a few neighbors. Computing a value near the edge of
- a processor's domain may require knowing a few values from a
- neighboring domain. Guard layers permit these values to be copied
- locally so they need not be repeatedly communicated.</para>
-
- <para>&pooma; uses &pete; technology to ensure inner loops using
- &pooma;'s object-oriented containers run as quickly as hand-coded
- <!-- FIXME: Add a citation to Dr. Dobb's Journal article
- pete-99. --> loops. &pete; (the Portable Expression Template
- Engine) uses expression-template technology to convert
- data-parallel statements frequently found in the inner loops of
- programs into efficient loops without any intermediate
- computations. For example, consider evaluating the <statement>A +=
- -B + 2 * C;</statement> statement where <varname>A</varname> and
- <varname>C</varname> are <type>vector<double></type>s and
- <varname>B</varname> is a <type>vector<int></type>s.
- Ordinary evaluation might introduce intermediaries for
- <statement>-B</statement>, <statement>2*C</statement>, and their
- sum. The presence of these intermediaries in inner loops can
- measurably slow evaluation. To produce a loop without
- intermediaries, &pete; stores each expression as a parse tree. The
- resulting parse trees can be combined into a larger parse tree.
- Using its templates, the parse tree is converted, at compile time,
- to an outer loop with contents corresponding to evaluating each
- component of the result. Thus, no intermediate values are computed
- or stored. For example, the code corresponding to <statement>A +=
- -B + 2 * C;</statement> is
- <programlisting>
- vector<double>::iterator iterA = A.begin();
- vector<int>::const_iterator iterB = B.begin();
- vector<double>::const_iterator iterC = C.begin();
- while (iterA != A.end()) {
- *iterA += -*iterB + 2 * *iterC;
- ++iterA; ++iterB; ++iterC;
- }
- </programlisting>
- Furthermore, since the code is available at compile-, not run-, time,
- it can be further optimized, e.g., moving any loop-invariant code out
- of the loop.</para>
-
- <formalpara><title>Used for Diverse Set of Scientific Problems.</title>
- <para>&pooma; has been used to solve a wide variety of scientific
- problems. Most recently, physicists at Los Alamos National
- Laboratory implemented an entire library of hydrodynamics codes as
- part of the U.S. government's Science-based Stockpile Stewardship
- (<acronym>SBSS</acronym>) program to simulate nuclear weapons.
- Other applications include a matrix solver, an accelerator code
- simulating the dynamics of high-intensity charged particle beams in
- linear accelerators, and a Monte Carlo neutron transport
- code.</para>
- </formalpara>
-
- <formalpara><title>Easy Implementation.</title>
- <para>&pooma;'s tools greatly reduce the time to implement
- applications. As we noted above, &pooma;'s containers and
- expression syntax model the computational models and algorithms
- most frequently found in scientific programs. Using these
- high-level tools which are known to be correct reduce the time
- needed to debug programs. Programmers can write and test programs
- using their one or two-processor personal computers. With no
- additional work, the same program runs on computers with hundreds
- of processors; the code is exactly the same, and the &toolkit;
- automatically handles distribution of the data, all data
- communication, and all synchronization. Using all these tools
- greatly reduces programming time. For example, a team of two
- physicists and two support people at Los Alamos National Laboratory
- implemented a suite of hydrodynamics kernels in six months. Their
- work replaced the previous suite of less-powerful kernels which had
- taken sixteen people several years to implement and debug. Despite
- not previously implementing any of the kernels, they averaged one
- new kernel every three days, including the time to read the
- corresponding scientific papers!</para>
- </formalpara>
<section id="introduction-pooma_history">
<title>History of &pooma;</title>
! <para>The &poomatoolkit; developed at Los Alamos National
Laboratory to assist nuclear fusion and fission research.
! In 1994, the &toolkit; grew out of the Object-Oriented
! Particle Simulation (OOPS) class library developed for
! particle-in-cell simulations. The goals of the Framework, as it
! was called at the time, were driven by the Numerical Tokamak's
! <quote>Parallel Platform Paradox</quote>:
<blockquote>
<para>The average time required to implement a moderate-sized
application on a parallel computer architecture is equivalent to
--- 69,335 ----
<!-- FIXME: Add citation to pooma95, p. 3 -->
</listitem>
</orderedlist>
! Below, we discuss how &pooma; achieves these goals.
! </para>
!
! <bridgehead id="introduction-goals-portability" renderas="sect2">Code Portability for Sequential and Distributed Programs</bridgehead>
!
! <para>The same &pooma; programs run on sequential, distributed, and
! parallel computers. No change in source code is required. Two or
! three lines specifying how each container's domain should be
! distributed among available processors. Using these directives and
! run-time information about the computer's configuration, the
! &toolkit; automatically distributes pieces of the container domains,
! called <link
! linkend="glossary-patch"><firstterm>patches</firstterm></link>,
! among the available processors. If a computation needs values from
! another patch, &pooma; automatically passes the value to the patch
! where it is needed. The same program, and even the same executable,
! works regardless of the number of the available processors and the
! size of the containers' domains. A programmer interested in only
! sequential execution can omit the two or three lines specifying how
! the domains are to be distributed.</para>
!
! <bridgehead id="introduction-goals-rapid_development" renderas="sect2">Rapid Application Development</bridgehead>
!
! <para>The &poomatoolkit; is designed to enable rapid development of
! scientific and distributed applications. For example, its vector,
! matrix, and tensor classes model the corresponding mathematical
! concepts. Its &array; and &field; classes model the discrete spaces
! and mathematical arrays frequently found in computational science and
! math. See <xref linkend="introduction-science_algorithms"></xref>.
! The left column indicates theoretical science and math concepts, the
! middle column computational science and math concepts, and the right
! column computer science implementations. For example, theoretical
! physics frequently uses continuous fields in three-dimension space,
! while algorithms for a corresponding computational physics problem
! usually uses discrete fields. &pooma; containers, classes, and
! functions ease engineering computer programs for these algorithms.
! For example, the &pooma; &field; container models discrete fields;
! both map locations in discrete space to values and permit
! computations of spatial distances and values. The &pooma; &array;
! container models the mathematical concept of an array, used in
! numerical analysis.</para>
!
! <figure float="1" id="introduction-science_algorithms">
! <title>How &pooma; Fits Into the Scientific Process</title>
! <mediaobject>
! <imageobject>
! <imagedata fileref="figures/introduction.101" format="EPS" align="center"></imagedata>
! </imageobject>
! <textobject>
! <phrase>&pooma; helps translate algorithms into programs.</phrase>
! </textobject>
! <caption>
! <para>In the translation from theoretical science and math to
! computational science and math to computer programs, &pooma; eases
! the implementation of algorithms as computer programs.</para>
! </caption>
! </mediaobject>
! </figure>
!
! <para>&pooma; containers support a variety of computation modes,
! easing translation of algorithms into code. For example, many
! algorithms for solving partial differential equations use
! stencil-based computations. &pooma; supports stencil-based
! computations on &array;s and &field;s. It also supports
! data-parallel computation similar to &fortran 90 syntax. For
! computations where one &field;'s values is a function of several
! other &field;'s values, the programmer can specify a relation.
! Relations are lazily evaluated: whenever the dependent &field;'s
! values are needed and it is dependent on a &field; whose values have
! changed, its values are computed. Lazy evaluation also assists
! correctness by eliminating the frequently forgotten need for a
! programmer to ensure a &field;'s values are up-to-date before being
! used.</para>
!
! <bridgehead id="introduction-goals-efficient" renderas="sect2">Efficient Code</bridgehead>
!
! <para>&pooma; incorporates a variety of techniques to ensure it
! produces code that executes as quickly as special-case,
! hand-written code.
! <!-- FIXME: Do I present execution numbers here? -->
! These techniques include extensive use of templates, out-of-order
! evaluation, use of guard layers, and production of fast inner loops.</para>
!
! <para>&pooma;'s uses of &cc; templates permits the expressiveness
! from using pointers and function arguments but ensures as much as
! work as possible occurs at compile time, not run time. This speeds
! programs' execution. Since more code is produced at compile time,
! more code is available to the compiler's optimizer, further speeding
! execution. The &pooma; &array; container benefits from the use of
! template parameters. Their use permits the use of specialized data
! storage classes called <link
! linkend="glossary-engine"><firstterm>engines</firstterm></link>. An
! &array;'s engine template parameter specifies how data is stored and
! indexed. Some &array;s expect almost all values to be used, while
! others might be mostly vacant. In the latter case, using a
! specialized engine storing the few nonzero values greatly reduces
! space requirements. Using engines also permits fast creation of
! container views, known as <firstterm>array sections</firstterm> in
! Fortran 90. A view's engine is the same as the original
! container's engine, but the view object maps its restricted domain to
! the original domain. Space requirements and execution time to use
! views are minimal. Using templates also permits containers to
! support polymorphic indexing, e.g., indexing both by integers and by
! three-dimensional coordinates. A container defers indexing
! operations to its engine's templatized index operator. Since it uses
! templates, the engine can define indexing functions with different
! function arguments, without the need to add corresponding container
! functions. Some of these benefits of using templates can be
! expressed without them, but doing so increases execution time. For
! example, a container could have a pointer to an engine object, but
! this requires a pointer dereference for each operation. Implementing
! polymorphic indexing without templates would require adding virtual
! functions corresponding to each of the indexing functions.</para>
!
! <!-- FIXME: Are the claims concerning out-of-order evaluation I make true? -->
!
! <para>To ensure multiprocessor &pooma; programs execute quickly, it
! is important that interprocessor communication overlaps with
! intraprocessor computations as much as possible and that
! communication is minimized. Asynchronous communication, out-of-order
! evaluation, and use of guard layers all help achieve these goals.
! &pooma; uses the asynchronous communication facilities of the
! &cheetah; communication library. When a processor needs data that is
! stored or computed by another processor, a message is sent between
! the two. If synchronous communication was used, the sender must
! issue an explicit send, and the recipient must issue an explicit
! receive, synchronizing the two processors. &cheetah; permits the
! sender to put and get data without synchronizing with the recipient
! processor, and it also permits invoking functions at remote sites to
! ensure desired data is up-to-date. Thus, out-of-order evaluation
! must be supported. Out-of-order evaluation also has another benefit:
! Only computations directly or indirectly related to values that are
! printed need occur.</para>
!
! <para>Surrounding a patch with <link
! linkend="glossary-guard_layer"><firstterm>guard
! layers</firstterm></link> can help reduce interprocessor
! communication. For distributed computation, each container's domain
! is split into pieces distributed among the available processors.
! Frequently, computing a container value is local, involving just the
! value itself and a few neighbors, but computing a value near the edge
! of a processor's domain may require knowing a few values from a
! neighboring domain. Guard layers permit these values to be copied
! locally so they need not be repeatedly communicated.</para>
!
! <para>&pooma; uses &pete; technology to ensure inner loops involving
! &pooma;'s object-oriented containers run as quickly as hand-coded
! <!-- FIXME: Add a citation to Dr. Dobb's Journal article pete-99. -->
! loops. &pete; (the Portable Expression Template Engine) uses
! expression-template technology to convert data-parallel statements
! in the inner loops of programs into efficient loops
! without any intermediate computations. For example, consider
! evaluating the statement
! <programlisting>
! A += -B + 2 * C;</programlisting>
! where <varname>A</varname> and <varname>C</varname> are
! <type>vector<double></type>s and <varname>B</varname> is a
! <type>vector<int></type>. Naive evaluation might introduce
! intermediaries for <statement>-B</statement>,
! <statement>2*C</statement>, and their sum. The presence of these
! intermediaries in inner loops can measurably slow evaluation. To
! produce a loop without intermediaries, &pete; stores each expression
! as a parse tree. The resulting parse trees can be combined into a
! larger parse tree. Using its templates, the parse tree is converted,
! at compile time, to a loop evaluating each component of the result.
! Thus, no intermediate values are computed or stored. For example,
! the code corresponding to the statement above is
! <programlisting>
! vector<double>::iterator iterA = A.begin();
! vector<int>::const_iterator iterB = B.begin();
! vector<double>::const_iterator iterC = C.begin();
! while (iterA != A.end()) {
! *iterA += -*iterB + 2 * *iterC;
! ++iterA; ++iterB; ++iterC;
! }</programlisting>
! Furthermore, since the code is available at compile, not run, time,
! it can be further optimized, e.g., moving any loop-invariant code out
! of the loop.</para>
!
! <bridgehead id="introduction-goals-scientific" renderas="sect2">Used for Diverse Set of Scientific Problems</bridgehead>
!
! <para>&pooma; has been used to solve a wide variety of scientific
! problems. Most recently, physicists at Los Alamos National
! Laboratory implemented an entire library of hydrodynamics codes as
! part of the U.S. government's science-based Stockpile Stewardship
! Program to simulate nuclear weapons. Other applications include a
! matrix solver, an accelerator code simulating the dynamics of
! high-intensity charged particle beams in linear accelerators, and a
! Monte Carlo neutron transport code.</para>
!
! <bridgehead id="introduction-goals-easy_implementation" renderas="sect2">Easy Implementation</bridgehead>
!
! <para>&pooma;'s tools greatly reduce the time to implement
! applications. As we noted above, &pooma;'s containers and expression
! syntax model the computational models and algorithms most frequently
! found in scientific programs. These high-level tools are known to be
! correct and reduce the time to debug programs. Since the same
! programs run on one processor and multiple processors, programmers
! can write and test programs using their one or two-processor personal
! computers. With no additional work, the same program runs on
! computers with hundreds of processors; the code is exactly the same,
! and the &toolkit; automatically handles distribution of the data, all
! data communication, and all synchronization. The net results is a
! significant reduction in programming time. For example, a team of
! two physicists and two support people at Los Alamos National
! Laboratory implemented a suite of hydrodynamics kernels in six
! months. Their work replaced a previous suite of less-powerful
! kernels which had taken sixteen people several years to implement and
! debug. Despite not have previously implemented any of the kernels,
! they implemented one new kernel every three days, including the time
! to read the corresponding scientific papers!</para>
! </section><!-- introduction-goals -->
!
!
! <section id="introduction-performance">
! <title>&pooma; Produces Fast Programs</title>
!
! <para>almost as fast as &c;. wide variety of configurations: one
! processor, many processors, give performance data for at least two
! different programs
! HERE</para>
!
! <para>describe &doof2d; here
!
! &doof2d; is a two-dimensional diffusion simulation program.
! Initially, all values in the square two-dimensional grid are zero
! except for the central value.
!
! HERE</para>
!
! </section>
!
! <!-- HERE -->
!
! <section id="introduction-open_source">
! <title>&pooma; is Free, Open-Source Software</title>
!
! <para>The &poomatoolkit; is open-source software. Anyone may
! download, read, redistribute, and modify the &pooma; source code.
! If an application requires a specialized container, any programmer
! may add it. Any programmer can extend it to solve problems in
! previously unsupported domains. Companies using the &toolkit; can
! read the source code to ensure it has no hidden back doors or
! security holes. It may be downloaded for free and used for
! perpetuity. There are no annual licenses and no on-going costs. By
! keeping their own copies, companies are guaranteed the software will
! never disappear. In summary, the &poomatoolkit; is free, low-risk
! software.</para>
! </section>
<section id="introduction-pooma_history">
<title>History of &pooma;</title>
! <para>The &poomatoolkit; was developed at Los Alamos National
Laboratory to assist nuclear fusion and fission research.
! In 1994, the &toolkit; grew out of the <application
! class='software'>Object-Oriented Particle Simulation</application>
! class library developed for particle-in-cell simulations. The goals
! of the Framework, as it was called at the time, were driven by the
! Numerical Tokamak's <quote>Parallel Platform Paradox</quote>:
<blockquote>
<para>The average time required to implement a moderate-sized
application on a parallel computer architecture is equivalent to
***************
*** 298,304 ****
</blockquote>
The framework's goal of being able to quickly write efficient
scientific code that could be run on a wide variety of platforms
! remains unchanged today. Development, driven mainly by the
Advanced Computing Laboratory at Los Alamos, proceeded rapidly.
A matrix solver application was written using the framework.
<!-- FIXME: Add citation to pooma-sc95. -->
--- 337,343 ----
</blockquote>
The framework's goal of being able to quickly write efficient
scientific code that could be run on a wide variety of platforms
! remains unchanged today. Development, mainly at the
Advanced Computing Laboratory at Los Alamos, proceeded rapidly.
A matrix solver application was written using the framework.
<!-- FIXME: Add citation to pooma-sc95. -->
***************
*** 307,321 ****
<para>By 1998, &pooma; was part of the U.S. Department of
Energy's Accelerated Strategic Computing Initiative
! (<acronym>ASCI</acronym>). The Comprehensive Test Ban Treaty
! forbid nuclear weapons testing so they were instead simulated.
! <acronym>ASCI</acronym>'s goal was to radically advance the state
! of the art in high-performance computing and numerical simulations
! so the nuclear weapon simulations could use 100-teraflop
! computers. A linear accelerator code <application
class='software'>linac</application> and a Monte Carlo neutron
! transport code <application class='software'>MC++</application>
! were written.
<!-- FIXME: Add citation to pooma-siam98. -->
</para>
--- 346,360 ----
<para>By 1998, &pooma; was part of the U.S. Department of
Energy's Accelerated Strategic Computing Initiative
! (<acronym>ASCI</acronym>). The Comprehensive Test Ban Treaty forbid
! nuclear weapons testing so they were instead simulated using
! computers. <acronym>ASCI</acronym>'s goal was to radically advance
! the state of the art in high-performance computing and numerical
! simulations so the nuclear weapon simulations could use 100-teraflop
! parallel computers. A linear accelerator code <application
class='software'>linac</application> and a Monte Carlo neutron
! transport code <application class='software'>MC++</application> were
! among the codes written.
<!-- FIXME: Add citation to pooma-siam98. -->
</para>
***************
*** 332,348 ****
engines were added. Release 2.1.0 included &field;s with
their spatial extent and &dynamicarray;s with the ability to
dynamically change its domain size. Support for particles and
! their interaction with &field;s was added. The &pooma; messaging
implementation was revised in release 2.3.0. Use of the
&cheetah; Library separated &pooma; from the actual messaging
! library used. Support for applications running on clusters of
! computers was added. During the past two years, the &field;
abstraction and implementation was improved to increase its
flexibility, add support for multiple values and materials in the
same cell, and permit lazy evaluation. Simultaneously, the
! execution speed of the inner loops was greatly increased. The
! particle code has not yet been ported to the new &field;
! abstraction.</para>
</section>
</chapter>
--- 371,389 ----
engines were added. Release 2.1.0 included &field;s with
their spatial extent and &dynamicarray;s with the ability to
dynamically change its domain size. Support for particles and
! their interaction with &field;s were added. The &pooma; messaging
implementation was revised in release 2.3.0. Use of the
&cheetah; Library separated &pooma; from the actual messaging
! library used, and support for applications running on clusters of
! computers was added. <ulink
! url="http://www.codesourcery.com">CodeSourcery, LLC</ulink>, and
! <ulink url="www.proximation.com">Proximation, LLC</ulink>, took
! over &pooma; development from Los Alamos National Laboratory.
! During the past two years, the &field;
abstraction and implementation was improved to increase its
flexibility, add support for multiple values and materials in the
same cell, and permit lazy evaluation. Simultaneously, the
! execution speed of the inner loops was greatly increased.</para>
</section>
</chapter>
Index: manual.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v
retrieving revision 1.4
diff -c -p -r1.4 manual.xml
*** manual.xml 2001/12/17 17:27:41 1.4
--- manual.xml 2002/01/04 17:14:10
***************
*** 26,31 ****
--- 26,33 ----
<!-- Modify this to the desired formatting. -->
<!ENTITY cheetah "<application class='software'>Cheetah</application>" >
<!-- Produce a notation for the Cheetah Library. -->
+ <!ENTITY closeclose "> >" >
+ <!-- Produce a notation for ">>", which frequently occurs with templates. Without this, TeX produces a shift symbol. -->
<!ENTITY dashdash "- -" >
<!-- Produce a notation for a double dash. Without this, TeX produces an en-hyphen. -->
<!ENTITY doof2d "<command>Doof2d</command>" >
***************
*** 38,47 ****
<!-- Produce a notation for the MM Library. -->
<!ENTITY mpi "<application class='software'>MPI</application>">
<!-- Produce a notation for the MPI package. -->
<!ENTITY pdt "<application class='software'>PDToolkit</application>">
<!-- Produce a notation for the PDT software package. -->
<!ENTITY pete "<application class='software'>PETE</application>">
! <!-- Produce a notation for the PETE library. -->
<!ENTITY pooma "<application class='software'>POOMA</application>">
<!-- Produce a notation for Pooma software. -->
<!ENTITY poomatoolkit "<application class='software'>POOMA &toolkitcap;</application>">
--- 40,51 ----
<!-- Produce a notation for the MM Library. -->
<!ENTITY mpi "<application class='software'>MPI</application>">
<!-- Produce a notation for the MPI package. -->
+ <!ENTITY openopen "< <" >
+ <!-- Produce a notation for "<<", which frequently occurs with output. Without this, TeX produces a shift symbol. -->
<!ENTITY pdt "<application class='software'>PDToolkit</application>">
<!-- Produce a notation for the PDT software package. -->
<!ENTITY pete "<application class='software'>PETE</application>">
! <!-- Produce a notation for the PETE framework. -->
<!ENTITY pooma "<application class='software'>POOMA</application>">
<!-- Produce a notation for Pooma software. -->
<!ENTITY poomatoolkit "<application class='software'>POOMA &toolkitcap;</application>">
***************
*** 87,92 ****
--- 91,98 ----
<!-- The "Field" type. -->
<!ENTITY inform "<type>Inform</type>">
<!-- The "Inform" output type. -->
+ <!ENTITY int "<type>int</type>">
+ <!-- The C "int" type. -->
<!ENTITY interval "<type>Interval</type>">
<!-- The "Interval" type. -->
<!ENTITY layout "<type>Layout</type>">
***************
*** 155,162 ****
--- 161,172 ----
<!-- spelling: nonzero, not non-zero -->
<!-- External Chapters -->
+ <!ENTITY bibliography-chapter SYSTEM "bibliography.xml">
+ <!-- bibliography -->
<!ENTITY concepts-chapter SYSTEM "concepts.xml">
<!-- Pooma concepts chapter -->
+ <!ENTITY data-parallel-chapter SYSTEM "data-parallel.xml">
+ <!-- data-parallel expressions chapter -->
<!ENTITY glossary-chapter SYSTEM "glossary.xml">
<!-- glossary -->
<!ENTITY introductory-chapter SYSTEM "introduction.xml">
***************
*** 183,189 ****
<!-- Sequential Programs -->
<!ENTITY initialize-finalize SYSTEM "./programs/examples/Sequential/initialize-finalize-annotated.cpp">
! <!-- illustrate initialize() and finalize() -->
]>
<book>
--- 193,205 ----
<!-- Sequential Programs -->
<!ENTITY initialize-finalize SYSTEM "./programs/examples/Sequential/initialize-finalize-annotated.cpp">
! <!-- Illustrate initialize() and finalize(). -->
!
! <!-- Template Programs -->
! <!ENTITY pairs-untemplated SYSTEM "./programs/examples/Templates/pairs-untemplated-annotated.cpp">
! <!-- Illustrate defining classes with pairs of values of the same type. -->
! <!ENTITY pairs-templated SYSTEM "./programs/examples/Templates/pairs-templated-annotated.cpp">
! <!-- Illustrate defining a template class with pairs of values of the same type. -->
]>
<book>
***************
*** 205,211 ****
<revhistory>
<revision>
<revnumber>0.01</revnumber>
! <date>2001 Nov 26</date>
<authorinitials>jdo</authorinitials>
<revremark>first draft</revremark>
</revision>
--- 221,227 ----
<revhistory>
<revision>
<revnumber>0.01</revnumber>
! <date>2001 Dec 18</date>
<authorinitials>jdo</authorinitials>
<revremark>first draft</revremark>
</revision>
***************
*** 280,292 ****
<title>Programming with &pooma;</title>
<!-- FIXME: Add a partintro to the part above? -->
! &introductory-chapter;
&tutorial-chapter;
&concepts-chapter;
<chapter id="sequential">
<title>Writing Sequential Programs</title>
--- 296,1819 ----
<title>Programming with &pooma;</title>
<!-- FIXME: Add a partintro to the part above? -->
+
+ &introductory-chapter;
+
+
+ <chapter id="template_programming">
+ <title>Programming with Templates</title>
+
+ <para>&pooma; extensively uses &cc; templates to support type
+ polymorphism without any run-time cost. In this chapter, we
+ briefly introduce using templates in &cc; programs by relating them
+ to <quote>ordinary</quote> &cc; constructs such as values, objects,
+ and classes. The two main concepts underlying &cc; templates will
+ occur repeatedly:
+ <itemizedlist>
+ <listitem>
+ <para>Template programming occurs at compile time, not run
+ time. That is, template operations occur inside the compiler,
+ not when a program runs.</para>
+ </listitem>
+ <listitem>
+ <para>Templates permit declaring families of classes with a
+ single declaration. For example, the &array; template
+ declaration permits using arrays with many different element
+ types, e.g., arrays of integers, arrays of floating point
+ numbers, and arrays of arrays.</para>
+ </listitem>
+ </itemizedlist>
+ For those interested in the implementation of &pooma;, we close
+ with a discussion of some template programming concepts used in the
+ implementation but not likely to be used by &pooma; users.</para>
+
+ <section id="template_programming-compile_time">
+ <title>Templates Occur at Compile-Time</title>
+
+ <para>&pooma; uses templates to support type polymorphism without
+ incurring any run-time cost as a program executes. All template
+ operations are performed at compile time by the compiler.</para>
+
+ <para>Prior to the introduction of templates, almost all a
+ program's interesting computation occurred when it was executed.
+ When writing the program, the programmer, at <glossterm
+ linkend="glossary-programming_time"><firstterm>programming
+ time</firstterm></glossterm>, would specify which statements and
+ expressions would occur and which types to use. At <glossterm
+ linkend="glossary-compile_time"><firstterm>compile
+ time</firstterm></glossterm>, the compiler converts the program's
+ source code into an executable program. Even though the compiler
+ uses the types to produce the executable, no interesting
+ computation occurs. At <glossterm
+ linkend="glossary-run_time"><firstterm>run
+ time</firstterm></glossterm>, the resulting executable program
+ actually performs the operations.</para>
+
+ <para>The introduction of templates permits interesting
+ computation to occur while the compiler produces the executable.
+ Most interesting is template instantiation, which produces a type
+ at compile time. For example, the &array; <quote>type</quote>
+ definition requires template parameters <varname>Dim</varname>,
+ <varname>T</varname>, and <varname>EngineTag</varname>, specifying
+ its dimension, the type of its elements, and its engine type. To
+ use this, a programmer specifies values for the template
+ parameters:
+ <statement><type>Array<2,double,Brick></type></statement>.
+ At compile time, the compiler creates a type definition by
+ substituting the values for the template parameters in the
+ template definition. The substitution is analogous to the
+ run-time application of a function to specific values.</para>
+
+ <para>All computation not involving run-time input or output can
+ occur at program time, compile time, or run time, whichever is
+ more convenient. At program time, a programmer can perform
+ computations by hand rather than writing code to compute it. &cc;
+ templates are Turing-complete so they can compute anything.
+ Unfortunately, syntax for compile-time computation is more
+ difficult than for run-time computation, and also current compiler
+ are not as efficient as executables. Run-time &cc; constructs are
+ Turing-complete so using templates is unnecessary. Thus, we shift
+ computation to the time which best trades off the ease of
+ expressing syntax with the speed of computation by programmer,
+ compiler, or computer chip. For example, &pooma; uses expression
+ template technology to speed run-time execution of data-parallel
+ statements. The &pooma; developers decided to shift some of the
+ computation from run-time to compile-time using template
+ computations. The resulting run-time code runs more quickly, but
+ compiling the code takes longer. Also, programming time for the
+ &pooma; developers increased significantly, but, since most users
+ are most concerned about decreasing run times, they made this
+ choice.</para>
+
+ </section>
+
+
+ <section id="template_programming-template_use">
+ <title>Template Programming for &pooma; Users</title>
+
+ <para>Most &pooma; users need only understand a subset of
+ available tools for template programming. These tools include
+ <itemizedlist>
+ <listitem>
+ <para>reading template declarations and understanding template
+ parameters, which are used in this book.</para>
+ </listitem>
+ <listitem>
+ <para>template instantiation, specifying a particular type by
+ specifying values for template parameters.</para>
+ </listitem>
+ <listitem>
+ <para>nested type names, which are types specified within a
+ class definition.</para>
+ </listitem>
+ </itemizedlist>
+ We discuss these below.</para>
! <example id="template_programming-template_use-untemplated_pair_example">
! <title>Classes Storing Pairs of Values</title>
! &pairs-untemplated;
! </example>
!
! <para>Templates generalize writing class declarations by
! permitting class declarations dependent on other types. For
! example, consider writing a class storing a pair of integers and a
! class storing a pair of doubles. See <xref
! linkend="template_programming-template_use-untemplated_pair_example"></xref>.
! Almost all of the code for the two definitions is the same. Both
! of these definitions define a class with a constructor and storing
! two values named <varname>left</varname> and
! <varname>right</varname> having the same type. Only the classes'
! names and its use of types differ.</para>
!
! <example id="template_programming-template_use-templated_pair_example">
! <title>Templated Class Storing Pairs of Values</title>
! &pairs-templated;
! <calloutlist>
! <callout
! arearefs="template_programming-template_use-templated_pair_program-template_declaration">
! <para>Template parameters are written before, not after, a
! class name.</para>
! </callout>
! <callout
! arearefs="template_programming-template_use-templated_pair_program-constructor">
! <para>The constructor has two parameters with the type <varname>T</varname>.</para>
! </callout>
! <callout
! arearefs="template_programming-template_use-templated_pair_program-members">
! <para>An object stores two values having type <varname>T</varname>.</para>
! </callout>
! <callout
! arearefs="template_programming-template_use-templated_pair_program-use">
! <para>To use a templated class, specify the template
! parameter's argument after the class's name and surrounded by
! angle brackets (<statement><></statement>).</para>
! </callout>
! </calloutlist>
! </example>
!
! <para>Using templates, we can use a template parameter to
! represent their different uses of types and write one templated
! class definition. See <xref
! linkend="template_programming-template_use-templated_pair_example"></xref>.
! The templated class definition is a copy of the common portions of
! the two preceding definitions. Because the two definitions differ
! only in their use of the ∫ and &double; types, we replace
! these concrete types with a template
! parameter <varname>T</varname>. We
! <emphasis>precede</emphasis>, not follow, the class definition
! with <statement>template <typename T></statement>. The
! constructor's parameters' types are changed
! to <varname>T</varname> as are the data members'
! types.</para>
!
! <para>To use a template class definition, template arguments
! follow the class name surrounded by angle
! brackets (<statement><></statement>). For example,
! <type>pair<int></type> <glossterm
! linkend="glossary-template_instantiation"><firstterm>instantiates</firstterm></glossterm>
! <classname>pair</classname> template class definition with
! <varname>T</varname> equal to ∫. That is, the compiler
! creates a definition for <type>pair<int></type> by copying
! <classname>pair</classname>'s template definition and substituting
! ∫ for each occurrence of <varname>T</varname>. The copy
! omits the template parameter declaration <statement>template
! <typename T></statement> at the beginning of its definition.
! The result is a definition exactly the same as
! <classname>pairOfInts</classname>.</para>
+ <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ orient="port" pgwide="0"
+ id="template_programming-template_use-correspondence_table">
+ <title>Correspondences Between Run-Time and Compile-Time
+ Programming Constructs</title>
+
+ <tgroup cols="3" align="left">
+ <thead>
+ <row>
+ <entry></entry>
+ <entry>run time</entry>
+ <entry>compile time</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>values</entry>
+ <entry>integers, strings, objects, functions, …</entry>
+ <entry>types, …</entry>
+ </row>
+ <row>
+ <entry>create a value to store multiple values</entry>
+ <entry>object creation</entry>
+ <entry>class definition</entry>
+ </row>
+ <row>
+ <entry>values stored in a collection</entry>
+ <entry>data member, member function</entry>
+ <entry>nested type name, nested class, static member function,
+ constant integral values</entry>
+ </row>
+ <row>
+ <entry>placeholder for <quote>any particular value</quote></entry>
+ <entry>variable, e.g., <quote>any int</quote></entry>
+ <entry>template argument, e.g., <quote>any type</quote></entry>
+ </row>
+ <row>
+ <entry>repeated operations</entry>
+ <entry>A function generalizes a particular operation applied to
+ different values. The function parameters are placeholders
+ for particular values.</entry>
+ <entry>A template class generalizes a particular class
+ definition using different types. The template parameters are
+ placeholders for particular values.</entry>
+ </row>
+ <row>
+ <entry>application</entry>
+ <entry>Use a function by appending function arguments
+ surrounded by parentheses.</entry>
+ <entry>Use a template class by appending template arguments
+ surrounded by angle brackets (<>).</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>As we mentioned above, template instantiation is analogous
+ to function application. A template class is analogous to a
+ function. The analogy between compile-time and run-time
+ programming constructs can be extended. At run time, values used
+ consist of things such as integers, floating point numbers,
+ pointers, functions, and objects. Programs compute by operating
+ on these values at run time. At compile time, the values used
+ include types. Compile-time operations use these types. &cc;
+ defines default sets of values that all conforming compilers must
+ support. Object creation extends the set of run-time values,
+ while a class definition extends the set of compile-time types.</para>
+
+ <para>Functions generalize similar run-time operations, while
+ template class generalize similar class definitions. A function
+ definition generalizes a similar run-time operation. For
+ example, consider repeatedly printing the largest of two numbers:
+ <programlisting>
+ std::cout << (3 > 4 ? 3 : 4) << std::endl;
+ std::cout << (4 > -13 ? 4 : -13) << std::endl;
+ std::cout << (23 > 4 ? 23 : 4) << std::endl;
+ std::cout << (0 > 3 ? 0 : 3) << std::endl;
+ </programlisting> Each statement is exactly the same except for its
+ two values. Thus, we can generalize these statements writing a function.
+ <programlisting>
+ void maxOut(int a, int b)
+ { std::cout &openopen; (a > b ? a : b) &openopen; std::endl; }
+ </programlisting> The function's body consists of a statement with
+ variables substituted for the two particular values. Each parameter
+ is a placeholder that, when used, holds one particular value among the
+ set of possible integral values. The function must be named to permit
+ its use, and declarations for its two parameters follow. Using the
+ function simplifies the code:
+ <programlisting>
+ maxOut(3, 4);
+ maxOut(4, -13);
+ maxOut(23, 4);
+ maxOut(0, 3);
+ </programlisting> To use a function, the function's name precedes
+ parentheses surrounding specific values for its parameters. The
+ function's return value does not appear.</para>
+
+ <para>A template class definition generalizes similar class
+ definitions. If two class definitions differ only in a few types,
+ template parameters can be substituted. Each parameter is a
+ placeholder that, when used, holds one particular value, i.e.,
+ type, among the set of possible values. The class definition is
+ named to permit its use, and declarations for its parameters
+ precede it. The example found in the previous section illustrates
+ this transformation. Compare the original, untemplated classes in
+ <xref
+ linkend="template_programming-template_use-untemplated_pair_example"></xref>
+ with the templated class in <xref
+ linkend="template_programming-template_use-templated_pair_example"></xref>.
+ Note the notation for the template class parameters.
+ <statement>template <typename T></statement>
+ <emphasis>precedes</emphasis> the class definition. The keyword
+ <keywordname>typename</keywordname> indicates the template
+ parameter is a type. <varname>T</varname> is the template
+ parameter's name. Note that using
+ <keywordname>class</keywordname> is equivalent to using
+ <keywordname>typename</keywordname> so <statement>template
+ <class T></statement> is equivalent to <statement>template
+ <typename T></statement>. Using a templated class requires
+ postfix, not prefix, notation. The class's name precedes angle
+ brackets (<>) surrounding specific values (types) for
+ its parameters. As we showed above,
+ <statement>pair<int></statement> <glossterm
+ linkend="glossary-template_instantiation">instantiates</glossterm>
+ the template class <classname>pair</classname> with ∫ for its
+ type parameter <varname>T</varname>.</para>
+
+ <para>In template programming, nested type names store
+ compile-time data that can be used within template classes. Since
+ compile-time class definitions are analogous to run-time objects
+ and the latter stores named values, nested type names are values,
+ i.e., types, stores within class definitions. For example, the
+ template class &array; has an nested type name for the type of its
+ domain:
+ <programlisting>
+ typedef typename Engine_t::Domain_t Domain_t;
+ </programlisting> This <keywordname>typedef</keywordname>, i.e., type
+ definition, defines the type <type>Domain_t</type> as equivalent
+ to <type>Engine_t::Domain_t</type>. The <statement>::</statement>
+ operator selects the <type>Domain_t</type> nested type from inside
+ the <type>Engine_t</type> type. This illustrates how to access
+ &array;'s <type>Domain_t</type> when not within &array;'s scope:
+ <type>Array<Dim, T, EngineTag>::Domain_t</type>. The
+ analogy between object members and nested type names alludes to
+ its usefulness. Just as run-time object members store information
+ for later use, nested type names store type information for later
+ use at compile time. Using nested type names has no impact on the
+ speed of executing programs.</para>
+ </section>
+
+
+ <section id="template_programming-pooma_implementation">
+ <title>Template Programming Used to Write &pooma;</title>
+
+ <para>The preceding section presented template programming tools
+ needed to read this &book; and write programs using the
+ &poomatoolkit;. In this section, we present template programming
+ techniques used to implement &pooma;. We extend the
+ correspondence between compile-time template programming
+ constructs and run-time constructs. Reading this section is not
+ necessary unless you wish to understand how &pooma; works.</para>
+
+ <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ orient="port" pgwide="0"
+ id="template_programming-pooma_implementation-correspondence_table">
+ <title>More Correspondences Between Compile-Time and Run-Time
+ Programming Constructs</title>
+
+ <tgroup cols="3" align="left">
+ <thead>
+ <row>
+ <entry></entry>
+ <entry>run time</entry>
+ <entry>compile time</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>values</entry>
+ <entry>integers, strings, objects, functions, …</entry>
+ <entry>types, constant integers and enumerations, …</entry>
+ </row>
+ <row>
+ <entry>control flow to choose among operations</entry>
+ <entry><keywordname>if</keywordname>, <keywordname>while</keywordname>, <keywordname>goto</keywordname>, …</entry>
+ <entry>template class specializations with pattern matching</entry>
+ </row>
+ <row>
+ <entry>values stored in a collection</entry>
+ <entry>An object stores values.</entry>
+ <entry>A <glossterm linkend="glossary-traits_class">traits
+ class</glossterm> contains values describing a type.</entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+
+ <para>
+
+ HERE</para>
+ </section>
+
+ <!-- HERE -->
+
+ </chapter>
+
+
&tutorial-chapter;
&concepts-chapter;
+ <!-- FIXME: Revert to &data-parallel-chapter; -->
+
+ <chapter id="data_parallel">
+ <title>Data-Parallel Expressions</title>
+
+ <para>In the previous sections, we accessed container values one at
+ a time. Accessing more than one value in a container required a
+ writing an explicit loop. Scientists and engineers commonly
+ operate on sets of values, treated as an aggregate. For example, a
+ vector is a one-dimension collection of data and two vectors can be
+ added together. A matrix is a two-dimensional collection of data,
+ and a scalar and a matrix are multiplied. A <glossterm
+ linkend="glossary-data_parallel"><firstterm>data-parallel
+ expression</firstterm></glossterm> simultaneously uses multiple
+ container values. &pooma; supports data-parallel syntax.</para>
+
+ <para>After introducing data-parallel expressions and statements,
+ we present the corresponding &pooma; syntax. Then we present its
+ implementation, which uses expression-template technology. A naive
+ data-parallel implementation might generate temporary variables,
+ cluttering a program's inner loops and slowing its execution.
+ Instead, &pooma; uses &pete, the Portable Expression Template
+ Engine. Using expression templates, it constructs a parse tree of
+ expressions and corresponding types, which is then quickly
+ evaluated without the need for temporary variables.</para>
+
+
+ <section id="data_parallel-multiple_values">
+ <title>Expressions with More Than One Container Value</title>
+
+ <para>Science and math is filled with aggregated values. A vector
+ contains several components, and a matrix is a two-dimensional
+ object. Operations on individual values are frequently extended
+ to operations on these aggregated values. For example, two
+ vectors having the same length are added by adding corresponding
+ components. The product of two matrices is defined in terms of
+ sums and products on its components. The sine of an array is an
+ array containing the sine of every value in it.</para>
+
+ <para>A <glossterm
+ linkend="glossary-data_parallel"><firstterm>data-parallel
+ expression</firstterm></glossterm> simultaneously refers to
+ multiple container values. Data-parallel statements, i.e.,
+ statements using data-parallel expressions, frequently occur in
+ scientific programs. For example, the sum of two vectors v and w
+ is written as v+w. Algorithms frequently use data-parallel
+ syntax. Consider, for example, computing the total energy E
+ as the sum of kinetic energy K and potential energy U.
+ For a simple particle subject to the earth's gravity, the kinetic
+ energy K equals mv<superscript>2</superscript>/2, and the
+ potential energy U equals mgh. These formulae apply to both
+ an individual particle with a particular mass m and
+ height h and to an entire field of particles with
+ masses m and heights h. Our algorithm works with
+ data-parallel syntax, and we would like to write the corresponding
+ computer program using data-parallel syntax as well..</para>
+ </section>
+
+
+ <section id="data_parallel-use">
+ <title>Their Use</title>
+
+ <para>&pooma; containers can be used in data-parallel expressions
+ and statements. The basic guidelines are simple:
+ <itemizedlist>
+ <listitem>
+ <para>The &cc; built-in and mathematical operators operate on
+ an entire container by operating element-wise on its values.</para>
+ </listitem>
+ <listitem>
+ <para>Binary operators operate only on containers with the same
+ domain types by combining values with the same indices. If the
+ result is a container, it has a domain equal to the left operand's
+ domain.</para>
+ </listitem>
+ <listitem>
+ <para>For assignment operators, the domains of the left
+ operand and the right operand must have the same type and
+ be conformable, i.e., have the <quote>same shape</quote>.</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>The operators operate element-wise on containers' values.
+ For example, if <varname>A</varname> is a one-dimensional array,
+ <statement>-<varname>A</varname></statement> is a one-dimensional
+ array with the same size such that the value at the
+ i<superscript>th</superscript> position equals -A(i). If
+ <varname>A</varname> and <varname>B</varname> are two-dimensional
+ &array;s on the same domain,
+ <statement><varname>A</varname>+<varname>B</varname></statement>
+ is an array on the same domain with values equaling the sum of
+ corresponding values in <varname>A</varname> and
+ <varname>B</varname>.</para>
+
+ <figure float="1" id="data_parallel-use-addition_example">
+ <title>Adding &array;s with Different Domains</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata fileref="figures/data-parallel.212" format="EPS" align="center"></imagedata>
+ </imageobject>
+ <textobject>
+ <phrase>Adding two arrays with different domains adds values
+ with the same indices.</phrase>
+ </textobject>
+ <caption>
+ <para>Adding &array;s with different domains is supported.
+ Solid lines indicate the domains' extent. Values with the same
+ indices are added.</para>
+ </caption>
+ </mediaobject>
+ </figure>
+
+ <para>Binary operators operate on containers with the same domain
+ types. The domain's indices need not be the same, but the result
+ will have a domain equal to the left operand. For example, the
+ sum of an &array; <varname>A</varname> with a one-dimensional
+ interval [0,3] and an &array; <varname>B</varname> with
+ a one-dimensional interval [1,2] is well-defined because both
+ domains are one-dimensional intervals. The result is an &array;
+ with a one-dimensional interval [0,3]. Its first and last
+ entries equal <varname>A</varname>'s first and last entries, while
+ its middle two entries are the sums
+ <statement>A(1)+B(1)</statement> and
+ <statement>A(2)+B(2)</statement>. We assume zero is the
+ default value for the type of values stored
+ in <varname>B</varname>. A more complicated example of
+ adding two &array;s with different domains is illustrated in <xref
+ linkend="data_parallel-use-addition_example"></xref>. Code for
+ these &array;s could be
+ <programlisting>
+ Interval<1> H(0,2), I(1,3), J(2,4);
+ Array<2, double, Brick> A(I,I), B(J,H);
+ // ... fill A and B with values ...
+ ... = A + B;
+ </programlisting>Both <varname>A</varname> and
+ <varname>B</varname> have domains of two-dimensional intervals so
+ they may be added, but their domains' extent differ, as indicated
+ by the solid lines in the figure. The sum has domain equal to the
+ left operand's domain. Values with the same indices are added. For
+ example, <statement>A(2,2)</statement> and
+ <statement>B(2,2)</statement> are added. <varname>B</varname>'s
+ domain does not include index (1,1) so, when adding
+ <statement>A(1,1)</statement> and <statement>B(1,1)</statement>,
+ the default value for <varname>B</varname>'s value type is used.
+ Usually this is 0. Thus, <statement>A(1,1) +
+ B(1,1)</statement> equals <statement>9 + 0</statement>.</para>
+
+ <para>Operations with &array;s and scalar values are supported.
+ Conceptually, a scalar value can be thought of as an &array; with
+ any desired domain and having the same value everywhere. For
+ example, consider
+ <programlisting>
+ Array<1, double, Brick> D(Interval<1>(7,10));
+ D += 2*D + 7;
+ </programlisting><statement>2*D</statement> obeys the guidelines
+ because the scalar <statement>2</statement> can be thought of as
+ an array with the same domain as <varname>D</varname>. It has the
+ same value <statement>2</statement> everywhere. Likewise the
+ conceptual domain for the scalar <statement>7</statement> is the
+ same as <statement>2*D</statement>'s domain. Thus,
+ <statement>2*D(i) + 7</statement> is added to
+ <statement>D(i)</statement> wherever index i is in
+ <varname>D</varname>'s domain. In practice, the &toolkit; does
+ not first convert scalar values to arrays but instead uses them
+ directly in expressions.</para>
+
+ <para>Assignment to containers is also supported. The domain
+ types of the assignment's left-hand side and its right-hand side
+ must be the same. Their indices need not be the same, but they
+ must correspond. That is, the domains must be <glossterm
+ linkend="glossary-conformable_domains"><firstterm>conformable
+ domains</firstterm></glossterm>, or have the <quote>same
+ shape</quote>, i.e., have the same number of indices for each
+ dimension. For example, the one-dimensional interval [0,3] is
+ conformable to the one-dimensional interval [1,4] because they
+ both have the same number of indices in each dimension. The
+ domains of <varname>A</varname> and <varname>B</varname>, as
+ declared
+ <programlisting>
+ Interval<1> H(0,2), I(1,3), J(2,4), K(0,4);
+ Array<2, double, Brick> A(I,I), B(H,J), C(I,K);
+ </programlisting> are conformable because each dimension has the same
+ number of indices. <varname>A</varname> and <varname>C</varname>
+ are not conformable because, while their first dimensions are
+ conformable, their second dimensions are not conformable. It has
+ three indices while the other has four. We define <glossterm
+ linkend="glossary-conformable_containers"><firstterm>conformable
+ containers</firstterm></glossterm> to be containers with
+ conformable domains.</para>
+
+ <para>When assigning to a container, corresponding container
+ values are assigned. (Since the left-hand side and the right-hand
+ side are conformable, corresponding values exist.) In this code
+ fragment,
+ <programlisting>
+ Array<1, double, Brick> A(Interval<1>(0,1));
+ Array<1, double, Brick> B(Interval<1>(1,2));
+ A = B;
+ </programlisting> <statement>A(0)</statement> is assigned
+ <statement>B(1)</statement> and <statement>A(1)</statement> is
+ assigned <statement>B(2)</statement>.</para>
+
+ <para>Assigning a scalar value to an &array; also is supported,
+ but assigning an &array; to a scalar is not. A scalar value is
+ conformable to any domain because, conceptually it can be viewed
+ as an &array; with any desired domain and having the same value
+ everywhere. Thus, the assignment <statement>B = 3</statement>
+ ensures every value in <varname>B</varname> equals 3. Even
+ though a scalar value is conformable to any &array;, it is not an
+ l-value so it cannot appear on the left-hand side of an
+ assignment.</para>
+
+ <para>Data-parallel expressions can involve typical mathematical
+ functions and output operations. For example,
+ <statement>sin(A)</statement> yields an &array; with values equal
+ to the sine of each of &array; <varname>A</varname>'s values.
+ <statement>dot(A,B)</statement> has values equaling the dot
+ product of corresponding values in &array;s <varname>A</varname>
+ and <varname>B</varname>. The contents of an entire &array; can
+ be easily printed to standard output. For example, the program
+ <programlisting>
+ Array<1, double, Brick> A(Interval<1>(0,2));
+ Array<1, double, Brick> B(Interval<1>(1,3));
+ A = 1.0;
+ B = 2.0;
+ std::cout << A-B << std::endl;
+ </programlisting> yields
+ <computeroutput>
+ (000:002:001) = 1 -1 -1</computeroutput>. The initial
+ <computeroutput>(000:002:001)</computeroutput> indicates the
+ &array;'s domain ranges from 0 to 2 with a stride of 1. The
+ three values in <statement>A-B</statement> follow.</para>
+
+ <para>So far, all of the above examples illustrating data-parallel
+ expressions and statements operate on all of a container's values.
+ Frequently, operating on a subset is useful. In &pooma;, a subset
+ of a container's values is called a view. Combining views and
+ data-parallel expressions will enable us to more succinctly and more
+ easily write the diffusion program. Views are discussed in the
+ next chapter.</para>
+
+ <!-- HERE -->
+
+ <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ orient="port" pgwide="0">
+ <title>Operators Permissible for Data-Parallel Expressions</title>
+
+ <tgroup cols="2" align="left">
+ <thead>
+ <row>
+ <entry></entry>
+ <entry>supported operators</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>unary operators</entry>
+ <entry>+, -, ~, !
+ HERE</entry>
+ </row>
+ <row>
+ <entry>binary operators</entry>
+ <entry>+, -, *, /, %, &, |, ^
+ HERE</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ orient="port" pgwide="0">
+ <title>Mathematical Operators Permissible for Data-Parallel Expressions</title>
+
+ <tgroup cols="2" align="left">
+ <thead>
+ <row>
+ <entry>function</entry>
+ <entry>effect</entry>
+ </row>
+ </thead>
+ <tfoot>
+ <row>
+ <entry>Every effort has been made to present accurate
+ information, but restrictions caused by the underlying
+ functions may further restriction the data-parallel
+ functions.</entry>
+ </row>
+ </tfoot>
+ <tbody>
+ <row>
+ <entry><statement>Array<T> peteCast (const T1&, const Array<T>& A)</statement></entry>
+ <entry>Returns the casting of the array's values to type <type>T1</type>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> ldexp (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Multiplies <varname>A</varname>'s values by the
+ corresponding integral power of two in <varname>B</varname>.</entry>
+ </row>
+ <!-- HERE Reorder the above to be more sensible and add headings. -->
+ <row rowsep="1">
+ <entry>Trigonometric and Hyperbolic Operators</entry>
+ <entry><statement>#include <math.h></statement></entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> cos (const Array<T>& A)</statement></entry>
+ <entry>Returns the cosines of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> sin (const Array<T>& A)</statement></entry>
+ <entry>Returns the sines of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> tan (const Array<T>& A)</statement></entry>
+ <entry>Returns the tangents of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> acos (const Array<T1>& A)</statement></entry>
+ <entry>Returns the arc cosines of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> asin (const Array<T1>& A)</statement></entry>
+ <entry>Returns the arc sines of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> atan (const Array<T1>& A)</statement></entry>
+ <entry>Returns the arc tangents of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> cosh (const Array<T>& A)</statement></entry>
+ <entry>Returns the hyperbolic cosines of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> sinh (const Array<T>& A)</statement></entry>
+ <entry>Returns the hyperbolic sines of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> tanh (const Array<T>& A)</statement></entry>
+ <entry>Returns the hyperbolic tangents of the array's values.</entry>
+ </row>
+ <row rowsep="1">
+ <entry>Absolute Value and Rounding Operators</entry>
+ <entry><statement>#include <math.h></statement></entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> fabs (const Array<T1>& A)</statement></entry>
+ <entry>Returns the absolute values of the floating point
+ numbers in the array.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> ceil (const Array<T1>& A)</statement></entry>
+ <entry>For each of the array's values, return the integer
+ larger than or equal to it (as a floating point number).</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> ceil (const Array<T1>& A)</statement></entry>
+ <entry>For each of the array's values, return the integer
+ larger than or equal to it (as a floating point number).</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> floor (const Array<T1>& A)</statement></entry>
+ <entry>For each of the array's values, return the integer
+ smaller than or equal to it (as a floating point number).</entry>
+ </row>
+ <row rowsep="1">
+ <entry>Powers, Exponentiation, and Logarithmic Operators</entry>
+ <entry><statement>#include <math.h></statement></entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> PETE_identity (const Array<T>& A)</statement></entry>
+ <entry>Returns the array. That is, it applies the identity operation.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> sqrt (const Array<T>& A)</statement></entry>
+ <entry>Returns the square roots of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> pow2 (const Array<T>& A)</statement></entry>
+ <entry>Returns the squares of <varname>A</varname>'s values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> pow3 (const Array<T>& A)</statement></entry>
+ <entry>Returns the cubes of <varname>A</varname>'s values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> pow4 (const Array<T>& A)</statement></entry>
+ <entry>Returns the fourth powers of <varname>A</varname>'s values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> exp (const Array<T>& A)</statement></entry>
+ <entry>Returns the exponentiations of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> log (const Array<T>& A)</statement></entry>
+ <entry>Returns the natural logarithms of the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> log10 (const Array<T>& A)</statement></entry>
+ <entry>Returns the base-10 logarithms of the array's values.</entry>
+ </row>
+ <row rowsep="1">
+ <entry>Operators Involving Complex Numbers</entry>
+ <entry><statement>#include <complex></statement></entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> real (const Array<complex<T&closeclose;& A)</statement></entry>
+ <entry>Returns the real parts of <varname>A</varname>'s complex numbers.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> imag (const Array<complex<T&closeclose;& A)</statement></entry>
+ <entry>Returns the imaginary parts of <varname>A</varname>'s complex numbers.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> abs (const Array<complex<T&closeclose;& A)</statement></entry>
+ <entry>Returns the absolute values (magnitudes) of
+ <varname>A</varname>'s complex numbers.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> abs (const Array<T>& A)</statement></entry>
+ <entry>Returns the absolute values of <varname>A</varname>'s values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> arg (const Array<complex<T&closeclose;& A)</statement></entry>
+ <entry>Returns the angle representations (in radians) of the
+ polar representations of <varname>A</varname>'s complex
+ numbers.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> norm (const Array<complex<T&closeclose;& A)</statement></entry>
+ <entry>Returns the squared absolute values of
+ <varname>A</varname>'s complex numbers.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<complex<T&closeclose; conj (const Array<complex<T&closeclose;& A)</statement></entry>
+ <entry>Returns the complex conjugates of
+ <varname>A</varname>'s complex numbers.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<complex<T&closeclose; polar (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns the complex numbers created from polar
+ coordinates (magnitudes and phase angles) in corresponding
+ arrays.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<complex<T&closeclose; polar (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns the complex numbers created from polar
+ coordinates with magnitude <varname>l</varname> and
+ phase angles in the array.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<complex<T&closeclose; polar (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns the complex numbers created from polar
+ coordinates with magnitudes in the array and phase
+ angle <varname>r</varname>.</entry>
+ </row>
+ <row rowsep="1">
+ <entry>Operators Involving Matrices and Tensors</entry>
+ <entry><statement>#include "Pooma/Tiny.h"</statement></entry>
+ </row>
+ <row>
+ <entry><statement>T trace (const Array<T>& A)</statement></entry>
+ <entry>Returns the sum of the <varname>A</varname>'s diagonal
+ entries, viewed as a matrix.</entry>
+ </row>
+ <row>
+ <entry><statement>T det (const Array<T>& A)</statement></entry>
+ <entry>Returns the determinant of <varname>A</varname>, viewed as a matrix.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> transpose (const Array<T>& A)</statement></entry>
+ <entry>Returns the transpose of <varname>A</varname>, viewed as a matrix.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> symmetrize (const Array<T>& A)</statement></entry>
+ <entry>Returns the tensors of <varname>A</varname> with the
+ requested output symmetry.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> dot (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns the dot products of values in the two arrays.
+ Value type <type>T</type> equals the type of the
+ <function>dot</function> operating on <type>T1</type>
+ and <type>T2</type>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> dot (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns the dot products of values in the array
+ with <varname>r</varname>.
+ Value type <type>T</type> equals the type of the
+ <function>dot</function> operating on <type>T1</type>
+ and <type>T2</type>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> dot (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns the dot products of <varname>l</varname> with
+ values in the array. Value type <type>T</type> equals the type of the
+ <function>dot</function> operating on <type>T1</type>
+ and <type>T2</type>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> dot (const Array<T1>& A, const T2& B)</statement></entry>
+ <entry>Returns the dot products of values in the array
+ Value type <type>T</type> equals the type of the
+ <function>dot</function> operating on <type>T1</type>
+ and <type>T2</type>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<Tensor<T&closeclose; outerProduct (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns tensors created by computing the outer product
+ of corresponding vectors in the two arrays. Value
+ type <type>T</type> equals the type of the product of
+ <type>T1</type> and <type>T2</type>. The vectors
+ must have the same length.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<Tensor<T&closeclose; outerProduct (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns tensors created by computing the outer product
+ of <varname>l</varname> with the vectors in the array. Value
+ type <type>T</type> equals the type of the product of
+ <type>T1</type> and <type>T2</type>. The vectors
+ must have the same length.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<Tensor<T&closeclose; outerProduct (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns tensors created by computing the outer product
+ of vectors in the array with <varname>r</varname>. Value
+ type <type>T</type> equals the type of the product of
+ <type>T1</type> and <type>T2</type>. The vectors
+ must have the same length.</entry>
+ </row>
+ <row>
+ <entry><statement>TinyMatrix<T> outerProductAsTinyMatrix (const Array<T1>& A, const
+ Array<T2>& B)</statement></entry>
+ <entry>Returns matrices created by computing the outer product
+ of corresponding vectors in the two arrays. Value
+ type <type>T</type> equals the type of the product of
+ <type>T1</type> and <type>T2</type>. The vectors must have
+ the same length.</entry>
+ </row>
+ <row>
+ <entry><statement>TinyMatrix<T> outerProductAsTinyMatrix (const T1& l, const
+ Array<T2>& A)</statement></entry>
+ <entry>Returns matrices created by computing the outer
+ product of <varname>l</varname> with the vectors in the array. Value
+ type <type>T</type> equals the type of the product of
+ <type>T1</type> and <type>T2</type>. The vectors must
+ have the same length.</entry>
+ </row>
+ <row>
+ <entry><statement>TinyMatrix<T> outerProductAsTinyMatrix (const
+ Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns matrices created by computing the outer
+ product of the vectors in the array
+ with <varname>r</varname>. Value
+ type <type>T</type> equals the type of the product of
+ <type>T1</type> and <type>T2</type>. The vectors must
+ have the same length.</entry>
+ </row>
+ <row rowsep="1">
+ <entry>Comparison Operators</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> max (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns the maximum of corresponding array values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> max (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns the maximums of <varname>l</varname> with the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> max (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns the maximums of the array's values with <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> min (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns the minimum of corresponding array values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> min (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns the minimums of <varname>l</varname> with the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<T> min (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns the minimums of the array's values with <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> LT (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns booleans from using the less-than
+ operator < to compare corresponding array values in
+ <varname>A</varname> and <varname>B</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> LT (const T1& r, const Array<T2>& A)</statement></entry>
+ <entry>Returns booleans from using the less-than
+ operator < to compare <varname>l</varname> with the array's
+ values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> LT (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns booleans from using the less-than
+ operator < to compare the array's
+ values with <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> LE (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns booleans from using the less-than-or-equal
+ operator ≤ to compare array values in
+ <varname>A</varname> and <varname>B</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> LE (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns booleans from using the less-than-or-equal
+ operator ≤ to compare <varname>l</varname> with the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> LE (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns booleans from using the less-than-or-equal
+ operator ≤ to compare the array's values
+ with <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> GE (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns booleans from using the greater-than-or-equal
+ operator ≥ to compare array values in
+ <varname>A</varname> and <varname>B</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> GE (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns booleans from using the greater-than-or-equal
+ operator ≥ to compare <varname>l</varname> with the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> GE (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns booleans from using the greater-than-or-equal
+ operator ≥ to compare the array's values with <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> GT (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns booleans from using the greater-than
+ operator > to compare array values in
+ <varname>A</varname> and <varname>B</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> GT (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns booleans from using the greater-than
+ operator > to compare <varname>l</varname> with the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> GT (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns booleans from using the greater-than
+ operator > to compare the array's values with <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> EQ (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns booleans from determining whether
+ corresponding array values in <varname>A</varname> and
+ <varname>B</varname> are equal.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> EQ (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns booleans from determining whether
+ <varname>l</varname> equals the array's values..</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> EQ (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns booleans from determining whether the array's values equal <varname>r</varname>.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> NE (const Array<T1>& A, const Array<T2>& B)</statement></entry>
+ <entry>Returns booleans from determining whether
+ corresponding array values in <varname>A</varname> and
+ <varname>B</varname> are not equal.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> NE (const T1& l, const Array<T2>& A)</statement></entry>
+ <entry>Returns booleans from determining whether
+ <varname>l</varname> does not equal the array's values.</entry>
+ </row>
+ <row>
+ <entry><statement>Array<bool> NE (const Array<T1>& A, const T2& r)</statement></entry>
+ <entry>Returns booleans from determining whether the
+ array's values are not equal to <varname>r</varname>.</entry>
+ </row>
+ <!-- FIXME: Add dotdot from src/Array/PoomaArrayOperators.h if it is defined. -->
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>We need to explain that proper types must be chosen. For
+ example, cos on complex and double works but ceil on complex does not.
+ HERE</para>
+
+
+ <!-- HERE -->
+
+ </section>
+
+
+ <section id="data_parallel-implementation">
+ <title>Implementation of Data-Parallel Statements</title>
+
+ <para>Data-parallel statements involving containers occur
+ frequently in the inner loops of scientific programs so their
+ efficient execution is important. A naive implementation for
+ these statements may create and destroy containers holding
+ intermediate values, slowing execution considerably.
+ In 1995, Todd <!-- FIXME: Add citations to vandevoorde-95 and
+ veldhuizen-95. --> Veldhuizen and David Vandevoorde developed an
+ expression-template technique to transform arithmetic expressions
+ involving array-like containers into efficient loops without using
+ temporaries. Despite its perceived complexity, &pooma;
+ incorporated the technology. The framework called &pete, the
+ <application>Portable Expression Template Engine</application>
+ framework, is also available separately from &pooma; at
+ <ulink url="http://www.acl.lanl.gov/pete/"></ulink>.</para>
+
+ <para>In this section, we first describe how a naive
+ implementation may slow execution. Then, we describe &pete;'s
+ faster implementation. A data-parallel statement is converted
+ into a parse tree, rather than immediately evaluating it. The
+ parse tree has two representations. Its run-time representation
+ holds run-time values. Its compile-time representation records
+ the types of the tree's values. After a parse tree for the entire
+ statement is constructed, it is evaluated. Since it is a
+ data-parallel statement, this evaluation involves at least one
+ loop. At run time, each loop iteration, the value of one
+ container value is computed and assigned. At compile time, when
+ the code for the loop iteration is produced, the parse tree's
+ types are traversed and code is produced without the need for any
+ intermediate values. We present the implementation in <xref
+ linkend="data_parallel-implementation-pete"></xref>, but first we
+ explain the difficulties caused by the naive implementation.</para>
+
+ <section id="data_parallel-implementation-naive">
+ <title>Naive Implementation</title>
+
+ <para>A conventional implementation to evaluate data-parallel
+ expressions might overload arithmetic operator functions.
+ Consider this program fragment:
+ <programlisting>
+ Interval<1> I(0,3);
+ Array<1, double, Brick> A(I), B(I);
+ A = 1.0;
+ B = 2.0;
+ A += -A + 2*B;
+ std::cout << A << std::endl;
+ </programlisting> Our goal is to transform the data-parallel
+ statement <statement>A += -A + 2*B</statement> into a single
+ loop, preferably without intermediary containers. To simplify
+ notation, let <type>Ar</type> abbreviate the type
+ <type>Array<1, double, Brick></type>.</para>
+
+ <para>Using overloaded arithmetic operators would require using
+ intermediate containers to evaluate the statement. For example,
+ <!-- FIXME: What is the proper tag for an inline function
+ prototype? --> the sum's left operand <statement>-A</statement>
+ would be computed by the overloaded unary operator <statement>Ar
+ operator-(const Ar&)</statement>, which would produce an
+ intermediate &array;. <statement>Ar operator*(double,
+ const Ar&)</statement> would produce another intermediate
+ &array; holding <statement>2*B</statement>. Yet another
+ intermediate container would hold their sum, all before
+ performing the assignment. Thus, three intermediate containers
+ would be created and destroyed. Below, we show these are
+ unnecessary.</para>
+ </section>
+
+ <section id="data_parallel-implementation-pete">
+ <title>Portable Expression Template Engine</title>
+
+ <para>&pooma; uses &pete;, the <application>Portable Expression
+ Template Engine</application> framework, to evaluate
+ data-parallel statements using efficient loops without
+ intermediate values. &pete; uses expression-template technology.
+ Instead of aggressively evaluating a data-parallel statement's
+ subexpressions, it defers evaluation, instead building a parse
+ tree of the required computations. The parse tree's type records
+ the types of each of its subtrees. Then, the parse tree is
+ evaluated using an evaluator determined by the left-hand side's
+ type. This container type determines how to loop through its
+ domain. Each loop iteration, the corresponding value of the
+ right-hand side is evaluated. No intermediate loops or temporary
+ values are needed.</para>
+
+ <figure float="1" id="data_parallel-implementation-pete-tree_figure">
+ <title>Annotated Parse Tree for <statement>-A + 2*B</statement></title>
+ <mediaobject>
+ <imageobject>
+ <imagedata fileref="figures/data-parallel.101" format="EPS" align="center"></imagedata>
+ </imageobject>
+ <textobject>
+ <phrase>A parse tree for the statement is produced.</phrase>
+ </textobject>
+ <caption>
+ <para>The parse tree for <statement>-A + 2*B</statement> with
+ type annotations. The complete type of a node equals the
+ concatenation of the preorder traversal of annotated types.</para>
+ </caption>
+ </mediaobject>
+ </figure>
+
+ <para>Before explaining the implementation, let us illustrate
+ using our example statement <statement>A += -A + 2*B</statement>.
+ Evaluating the right-hand side creates a parse tree similar to
+ the one in <xref
+ linkend="data_parallel-implementation-pete-tree_figure"></xref>.
+ For example, the overloaded unary minus operator yields a tree
+ node representing <statement>-A</statement>, having a unary-minus
+ function object, and having type
+ <type>Expression<UnaryNode<OpMinus,Ar&closeclose;</type>.
+ The binary nodes continue the construction process yielding a
+ parse tree object for the entire right-hand side and having type
+ <type>Expression<BinaryNode<OpAdd, UnaryNode<OpMinus,
+ Ar>,
+ BinaryNode<OpMultiply<Scalar<int>,Ar&closeclose; ></type>.
+ Evaluating the left-hand side yields an object
+ representing <varname>A</varname>.</para>
+
+ <para>Finally, the assignment operator <statement>+=</statement>
+ calls the <function>evaluate</function> function corresponding to
+ the left-hand side's type. At compile time, it produces the code
+ for the computation. Since this templated function is
+ specialized on the type of the left-hand side, it generates a
+ loop through the left-hand side's container. In the loop body,
+ the <function>forEach</function> function produces code for the
+ right-hand side expression at a specific position using a
+ post-order parse-tree traversal. At a leaf, this evaluation
+ queries the leaf's container for a specified value or extracts a
+ scalar value. At an interior node, its children's results are
+ combined using its function operator. One loop performs the
+ entire assignment. It is important to note that the type of the
+ entire right-hand side is known at compile time. Thus, all of
+ these <function>evaluate</function>,
+ <function>forEach</function>, and function operator function
+ calls can be inlined at compile time to yield simple code without
+ any temporary containers and hopefully as fast as hand-written
+ loops!</para>
+
+ <para>To implement this scheme, we need &pooma; code to both
+ create the parse tree and to evaluate it. We describe parse tree
+ creation first. Parse trees consist of leaves,
+ <type>UnaryNode</type>s, <type>BinaryNode</type>s, and
+ <type>TrinaryNode</type>s. Since <type>TrinaryNode</type>s are
+ similar to <type>BinaryNode</type>s, we omit describing them. A
+ <type>BinaryNode</type>'s three template parameters correspond to
+ the three things it must store:
+ <variablelist>
+ <varlistentry>
+ <term><statement>Op</statement></term>
+ <listitem>
+ <para>the type of the node's operation. For example, the
+ <type>OpAdd</type> type represents adding two operands
+ together.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><statement>Left</statement></term>
+ <listitem>
+ <para>the type of the left child.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><statement>Right</statement></term>
+ <listitem>
+ <para>the type of the right child.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ The node stores the left and right children's nodes.</para>
+
+ <para><type>BinaryNode</type> does not need to store any
+ representation of the node's operation. Instead the
+ <type>Op</type> type is an empty structure declaring a function
+ object. For example, <type>OpAdd</type>'s function object is
+ declared as
+ <programlisting>
+ template<class T1, class T2>
+ inline typename BinaryReturn<T1, T2, OpAdd>::Type_t
+ operator()(const T1 &a, const T2 &b) const
+ {
+ return (a + b);
+ }
+ </programlisting> Since it has two template arguments, it can be
+ applied to operands of any type. Because of &cc; type
+ conversions, the type of the result is determined using the
+ <type>BinaryReturn</type> traits class. Consider adding an ∫
+ and a &double;. <type>BinaryReturn<int, double,
+ OpAdd>::Type_t</type> equals &double;. Inlining the function
+ ensures all this syntax is eliminated, leaving behind just an
+ addition.</para>
+
+ <para><type>UnaryNode</type>s are similar but have only two
+ template parameters and store only one child.</para>
+
+ <para>Parse tree leaves are created by the
+ <type>CreateLeaf</type> class and its specializations. The
+ default leaf is a scalar so it has the most general definition:
+ <programlisting>
+ template<class T>
+ struct CreateLeaf
+ {
+ typedef Scalar<T> Leaf_t;
+
+ inline static
+ Leaf_t make(const T &a)
+ {
+ return Scalar<T>(a);
+ }
+ };
+ </programlisting> The <type>Scalar</type> class stores the scalar
+ value. The <type>CreateLeaf</type>'s <type>Leaf_t</type> type
+ indicates its type. The <statement>static</statement>
+ <function>make</function> function is invoked by an overloaded
+ operator function when creating its children.</para>
+
+ <para>The <type>CreateLeaf</type> class is specialized for &array;s:
+ <programlisting>
+ template<int Dim, class T, class EngineTag>
+ struct CreateLeaf<Array<Dim, T, EngineTag> >
+ {
+ typedef Array<Dim, T, EngineTag> Input_t;
+ typedef Reference<Input_t> Leaf_t;
+ typedef Leaf_t Return_t;
+ inline static
+ Return_t make(const Input_t &a)
+ {
+ return Leaf_t(a);
+ }
+ };
+ </programlisting> The &array; object is stored as a
+ <type>Reference</type>, rather than directly as for scalars.</para>
+
+ <para>To simplify the next step of overloading arithmetic
+ operators, a parse tree's topmost type is an
+ <type>Expression</type>.</para>
+
+ <para>Now that we have defined the node classes, the &cc;
+ arithmetic operators must be overloaded to return the appropriate
+ parse tree. For example, unary minus operator
+ <function>operator-</function> overloaded to accept an &array;
+ argument should create a <type>UnaryNode</type> having an &array;
+ as its child, which will be a leaf:
+ <programlisting>
+ template<int D1,class T1,class E1>
+ inline typename MakeReturn<UnaryNode<OpUnaryMinus,
+ typename CreateLeaf<Array<D1,T1,E1> >::Leaf_t> >::Expression_t
+ operator-(const Array<D1,T1,E1> & l)
+ {
+ typedef UnaryNode<OpUnaryMinus,
+ typename CreateLeaf<Array<D1,T1,E1> >::Leaf_t> Tree_t;
+ return MakeReturn<Tree_t>::make(Tree_t(
+ CreateLeaf<Array<D1,T1,E1> >::make(l)));
+ }
+ </programlisting> <type>Tree_t</type> specifies the node's unique
+ type. Constructing the object first involves creating a leaf
+ containing the &array; reference through the call to
+ <function>CreateLeaf<Array<D1,T1,E1>
+ >::make</function>. The call to
+ <function>MakeReturn<Tree_t>::make</function> permits
+ programmers to store trees in different formats. The &pooma;
+ implementation stores them as <type>Expression</type>s. The
+ function's return type is similar to the
+ <statement>return</statement> statement except it extracts the
+ type from <type>Expression</type>'s internal
+ <type>Expression_t</type> type.</para>
+
+ <para>Specialized all the operators for &array;s using such
+ complicated is likely to be error-prone so &pete; provides a way
+ to automate it. Using its <command>MakeOperators</command>
+ command with this input:
+ <programlisting>
+ classes
+ -----
+ ARG = "int D[n],class T[n],class E[n]"
+ CLASS = "Array<D[n],T[n],E[n]>"
+ </programlisting> automatically generates code for all the needed operators.
+ The <quote>[n]</quote> strings are used to number arguments for binary
+ and ternary operators.</para>
+
+ <para>Assignment operators must also be specialized for &array;.
+ Inside the &array; class definition, each such operator just
+ invokes the <function>assign</function> function with a corresponding
+ function object. For example, <function>operator+=</function>
+ invokes <statement>assign(*this, rhs, OpAddAssign())</statement>.
+ <varname>rhs</varname> is the parse tree object for the right-hand
+ side. Calling this function invokes
+ <function>evaluate</function>, which begins the evaluation.</para>
+
+ <para>Before we explain the evaluation, let us summarize the
+ effect of the code so far described. If we are considering run
+ time, parse trees for the left-hand and right-hand sides have been
+ constructed. If we are considering compile time, the types of
+ these parse trees are known. At compile time, the
+ <function>evaluate</function> function described below will
+ generate a loop through the left-hand side container's domain.
+ The loop's body will have code computing a container's value. At
+ run time, this code will read values from containers, but the
+ run-time parse tree object itself will not traversed!</para>
+
+ <para>We now explore the evaluation, concentrating on compile
+ time, not run time. <function>evaluate</function> is an
+ overloaded function specialized on the type of the left-hand side.
+ In our example, the left-hand side is a one-dimensional &array;,
+ so <function>evaluate(const Ar& a, const Op& op, const
+ RHS& rhs)</function> is inlined into a loop like
+ <programlisting>
+ int end = a's domain[0].first() + a's domain[0].length();
+ for (int i = a's domain[0].first(); i < end; ++i)
+ op(a(i), rhs.read(i));
+ </programlisting> <varname>a</varname> is the array,
+ <varname>op</varname> is a function object representing the
+ assignment operation, and <varname>rhs</varname> is the right-hand
+ side's parse tree.</para>
+
+ <para>Evaluating <statement>rhs.read(i)</statement> inlines into a
+ call to the <function>forEach</function> function. This function
+ performs a <emphasis>compile-time</emphasis> post-order parse-tree
+ traversal. Its general form is
+ <programlisting>
+ forEach(const Expression& e, const LeafTag& f, const CombineTag& c).
+ </programlisting> That is, it traverses the nodes of the
+ <type>Expression</type> object <varname>e</varname>. At
+ leaves, it applies the operation specified by
+ <type>LeafTag</type> <varname>f</varname>. At interior
+ nodes, it combines the results using the <type>CombineTag</type>
+ operator <varname>c</varname>. It inlines into a call to
+ <programlisting>
+ ForEach<Expression, LeafTag, CombineTag>::apply(e, f, c).
+ </programlisting> The <function>apply</function> function continues
+ the traversal through the tree. For our example,
+ <type>LeafTag</type> equals <type>EvalLeaf<1></type>, and
+ <type>CombineTag</type> equals <type>OpCombine</type>. The former
+ indicates that, when reaching a leaf, the leaf should be a
+ one-dimensional container which should be evaluated
+ at the position stored in the <type>EvalLeaf</type> object. The
+ <type>OpCombine</type> class applies an interior node's
+ <type>Op</type> to the results of its children.</para>
+
+ <para><type>ForEach</type> structures are specialized for the
+ various node types. For example, the specialization for
+ <type>UnaryNode</type> is
+ <programlisting>
+ template<class Op, class A, class FTag, class CTag>
+ struct ForEach<UnaryNode<Op, A>, FTag, CTag>
+ {
+ typedef typename ForEach<A, FTag, CTag>::Type_t TypeA_t;
+ typedef typename Combine1<TypeA_t, Op, CTag>::Type_t Type_t;
+ inline static
+ Type_t apply(const UnaryNode<Op, A> &expr, const FTag &f,
+ const CTag &c)
+ {
+ return Combine1<TypeA_t, Op, CTag>::
+ combine(ForEach<A, FTag, CTag>::apply(expr.child(), f, c), c);
+ }
+ };
+ </programlisting> Since this structure is specialized for
+ <type>UnaryNode</type>s, the first parameter of its
+ <statement>static </statement> <function>apply</function> function
+ is a <type>UnaryNode</type>. After recursively calling its child,
+ it invokes the combination function indicated by the
+ <type>Combine1</type> traits class. In our example, the
+ <varname>c</varname> function object should be applied. Other
+ combiners have different roles. For example, using the
+ <type>NullCombine</type> tag indicates the child's result should
+ not be combined but occurs just for side effects.</para>
+
+ <para>Leaves are treated as the default behavior so they are not
+ specialized:
+ <programlisting>
+ template<class Expr, class FTag, class CTag>
+ struct ForEach
+ {
+ typedef typename LeafFunctor<Expr, FTag>::Type_t Type_t;
+ inline static
+ Type_t apply(const Expr &expr, const FTag &f, const CTag &)
+ {
+ return LeafFunctor<Expr, FTag>::apply(expr, f);
+ }
+ };
+ </programlisting> Thus, <type>LeafFunctor</type>'s
+ <function>apply</function> member is called. <type>Expr</type>
+ represents the expression type, e.g., an &array;, and
+ <type>FTag</type> is the <type>LeafTag</type>, e.g.,
+ <type>EvalLeaf</type>. The <type>LeafFunctor</type>specialization
+ for &array; passes the index stored by the <type>EvalLeaf</type>
+ object to the &array;'s engine, which returns the corresponding
+ value.</para>
+
+ <para>If one uses an aggressive optimizing compiler, code
+ resulting from the <function>evaluate</function> function
+ corresponds to this pseudocode:
+ <programlisting>
+ int end = A.domain[0].first() + A.domain[0].length();
+ for (int i = A.domain[0].first(); i < end; ++i)
+ A.engine(i) += -A.engine.read(i) + 2 * B.engine.read(i);
+ </programlisting> The loop iterates through <varname>A</varname>'s
+ domain, using &array;'s engines to obtain values and assigning
+ values. Notice there is no use of the run-time parse tree so the
+ optimizer can eliminate the code to construct it. All the work to
+ construct the parse tree by overloading operators is unimportant
+ at run time, but it certainly helped the compiler produce improved
+ code.</para>
+
+ <para>&pete;'s expression template technology may be complicated,
+ using parse trees and their types, but the code they produce is
+ not. Using the technology is also easy. All data-parallel
+ statements are automatically converted. In the next chapter, we
+ explore views of containers, permitting use of container subsets
+ and making data-parallel expressions even more useful.</para>
+ </section>
+
+ </section>
+
+ </chapter>
+
+
<chapter id="sequential">
<title>Writing Sequential Programs</title>
*************** HERE</para>
*** 297,303 ****
<para>FIXME: Explain the format of each section.
HERE</para>
! <para>FIXME: Explain the order of the sections.
HERE</para>
<para>Proposed order. Basically follow the order in the proposed
--- 1824,1830 ----
<para>FIXME: Explain the format of each section.
HERE</para>
! <para>FIXME: Explain the order of the sections.
HERE</para>
<para>Proposed order. Basically follow the order in the proposed
*************** HERE</para>
*** 475,490 ****
<function>finalize</function>. These functions respectively
prepare and shut down &pooma;'s run-time structures.</para>
! <section id="sequential-begin_end-files">
! <title>Files</title>
<programlisting>
#include "Pooma/Pooma.h" // or "Pooma/Arrays.h" or "Pooma/Fields.h" or ...
</programlisting>
- </section>
! <section id="sequential-begin_end-declarations">
! <title>Declarations</title>
<funcsynopsis>
<funcprototype>
--- 2002,2014 ----
<function>finalize</function>. These functions respectively
prepare and shut down &pooma;'s run-time structures.</para>
! <bridgehead id="sequential-begin_end-files" renderas="sect2">Files</bridgehead>
<programlisting>
#include "Pooma/Pooma.h" // or "Pooma/Arrays.h" or "Pooma/Fields.h" or ...
</programlisting>
! <bridgehead id="sequential-begin_end-declarations" renderas="sect2">Declarations</bridgehead>
<funcsynopsis>
<funcprototype>
*************** HERE</para>
*** 520,529 ****
</paramdef>
</funcprototype>
</funcsynopsis>
- </section>
! <section id="sequential-begin_end-description">
! <title>Description</title>
<para>Before its use, the &poomatoolkit; must be initialized by a
call to <function>initialize</function>. This usually occurs in
--- 2044,2051 ----
</paramdef>
</funcprototype>
</funcsynopsis>
! <bridgehead id="sequential-begin_end-description" renderas="sect2">Description</bridgehead>
<para>Before its use, the &poomatoolkit; must be initialized by a
call to <function>initialize</function>. This usually occurs in
*************** HERE</para>
*** 572,581 ****
<para>Including almost any &pooma; header file, rather than just
<filename class="headerfile">Pooma/Pooma.h</filename> suffices
since most other &pooma; header files include it.</para>
- </section>
! <section id="sequential-begin_end-example">
! <title>Example Program</title>
<para>Since every &pooma; program must call
<function>initialize</function> and
--- 2094,2101 ----
<para>Including almost any &pooma; header file, rather than just
<filename class="headerfile">Pooma/Pooma.h</filename> suffices
since most other &pooma; header files include it.</para>
! <bridgehead id="sequential-begin_end-example" renderas="sect2">Example Program</bridgehead>
<para>Since every &pooma; program must call
<function>initialize</function> and
*************** HERE</para>
*** 584,599 ****
use.</para>
&initialize-finalize;
- </section>
</section><!-- end sequential-begin_end -->
<section id="sequential-options">
<title>&pooma; Command-line Options</title>
<para>Every &pooma; program accepts a set of &pooma;-specific
command-line options to set values at run-time.</para>
<section id="sequential-options-list">
<title>Options Summary</title>
--- 2104,2163 ----
use.</para>
&initialize-finalize;
</section><!-- end sequential-begin_end -->
+
+ <section id="sequential-global">
+ <title>Global Variables</title>
+
+ <para>&pooma; makes a few global variables available after
+ initialization.</para>
+
+ <table frame="none" colsep="0" rowsep="0" tocentry="1"
+ orient="port" pgwide="0">
+ <title>&pooma; Global Variables</title>
+
+ <tgroup cols="2" align="left">
+ <thead>
+ <row>
+ <entry>variable</entry>
+ <entry>description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>&inform; <varname>pinfo</varname></entry>
+ <entry>output stream used to print informative messages to the
+ user while the program executes. The stream accepts a
+ superset of standard output operations.</entry>
+ </row>
+ <row>
+ <entry>&inform; <varname>pwarn</varname></entry>
+ <entry>HERE output stream used to print informative messages to the
+ user while the program executes. The stream accepts a
+ superset of standard output operations.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ </section>
+
+ <!-- HERE -->
+
<section id="sequential-options">
<title>&pooma; Command-line Options</title>
<para>Every &pooma; program accepts a set of &pooma;-specific
command-line options to set values at run-time.</para>
+ <para>QUESTION: Should I defer documenting &options; to the
+ reference manual, instead just listing commonly used options in
+ the previous section?
+
+ UNFINISHED</para>
+
<section id="sequential-options-list">
<title>Options Summary</title>
*************** HERE</para>
*** 601,614 ****
<varlistentry>
<term><parameter class="option">&dashdash;pooma-info</parameter></term>
<listitem>
! <para>
! HERE Who uses this?</para>
</listitem>
</varlistentry>
<!-- HERE -->
</variablelist>
<para>FIXME: Be sure to list default values.</para>
<!-- HERE -->
--- 2165,2181 ----
<varlistentry>
<term><parameter class="option">&dashdash;pooma-info</parameter></term>
<listitem>
! <para>enable use of the <varname>pinfo</varname>, used to
! print informative messages to the user while the program
! executes.</para>
</listitem>
</varlistentry>
<!-- HERE -->
</variablelist>
<para>FIXME: Be sure to list default values.</para>
+ <!-- HERE: need to describe the pinfo, pwarn, and perr streams somewhere. To do so requires describing informs.-->
+ <!-- HERE: Which streams are buffered and which are not? -->
<!-- HERE -->
*************** HERE Who uses this?</para>
*** 616,627 ****
<!-- HERE -->
- <para>QUESTION: Should I defer documenting &options; to the
- reference manual, instead just listing commonly used options in
- the previous section?
-
- UNFINISHED</para>
-
</section><!-- end sequential-options -->
<section>
--- 2183,2188 ----
*************** UNFINISHED</para>
*** 740,746 ****
code. An Array maps a fairly arbitrary input domain to an
arbitrary range of outputs. When used by itself, an &array;
object <varname>A</varname> refers to all of the values in its
! domain. Element-wise mathematical operations or functions can be
applied to an array using straightforward notation, like A + B
or sin(A). Expressions involving Array objects are themselves
Arrays. The operation A(d), where d is a domain object that
--- 2301,2307 ----
code. An Array maps a fairly arbitrary input domain to an
arbitrary range of outputs. When used by itself, an &array;
object <varname>A</varname> refers to all of the values in its
! domain. Element-wise mathematical operations or functions can be
applied to an array using straightforward notation, like A + B
or sin(A). Expressions involving Array objects are themselves
Arrays. The operation A(d), where d is a domain object that
*************** UNFINISHED</para>
*** 1188,1195 ****
class="libraryfile">.cmpl.cpp</filename>, <filename
class="libraryfile">.mk</filename>, <filename
class="libraryfile">.conf</filename>. Should we also explain use
! of <literal>inline</literal> even when necessary and the template
! model, <!-- FIXME: s/literal/keyword/ --> e.g., including <filename
class="libraryfile">.cpp</filename> files.</para>
<para>QUESTION: What are the key concepts around which to organize
--- 2749,2756 ----
class="libraryfile">.cmpl.cpp</filename>, <filename
class="libraryfile">.mk</filename>, <filename
class="libraryfile">.conf</filename>. Should we also explain use
! of <keywordname>inline</keywordname> even when necessary and the template
! model, e.g., including <filename
class="libraryfile">.cpp</filename> files.</para>
<para>QUESTION: What are the key concepts around which to organize
*************** UNFINISHED</para>
*** 1420,1426 ****
<entry><para>dimension</para></entry>
</row>
<row>
! <entry><varname>T</varname></entry>
<entry><para>array element type</para></entry>
</row>
<row>
--- 2981,2987 ----
<entry><para>dimension</para></entry>
</row>
<row>
! <entry><type>T</type></entry>
<entry><para>array element type</para></entry>
</row>
<row>
*************** UNFINISHED</para>
*** 3014,3021 ****
class="headerfile">src/Utilities/DerefIterator.h</filename>:
<type>DerefIterator<T></type> and
<type>ConstDerefIterator<T></type> automatically
! dereference themselves to maintain <literal>const</literal>
! correctness. <!-- FIXME: s/literal/keyword/ --></para>
</listitem>
<listitem>
--- 4575,4582 ----
class="headerfile">src/Utilities/DerefIterator.h</filename>:
<type>DerefIterator<T></type> and
<type>ConstDerefIterator<T></type> automatically
! dereference themselves to maintain <keywordname>const</keywordname>
! correctness.</para>
</listitem>
<listitem>
*************** UNFINISHED</para>
*** 3042,3048 ****
<listitem>
<para>Discuss &options; and related material. Add developer
command-line options listed in <filename
! class="library">Utilities/Options.cmpl.cpp</filename> and also
possibly <parameter class="option">&dashdash;pooma-threads
<replaceable>n</replaceable></parameter>.</para>
</listitem>
--- 4603,4609 ----
<listitem>
<para>Discuss &options; and related material. Add developer
command-line options listed in <filename
! class="libraryfile">Utilities/Options.cmpl.cpp</filename> and also
possibly <parameter class="option">&dashdash;pooma-threads
<replaceable>n</replaceable></parameter>.</para>
</listitem>
*************** UNFINISHED</para>
*** 3600,3859 ****
</appendix>
-
- <!-- Bibliography -->
-
- <bibliography id="bibliography">
- <title>Bibliography</title>
-
- <para>FIXME: How do I process these entries?</para>
-
- <biblioentry>
- <abbrev>mpi99</abbrev>
- <authorgroup>
- <author>
- <firstname>William</firstname><surname>Gropp</surname>
- </author>
- <author>
- <firstname>Ewing</firstname><surname>Lusk</surname>
- </author>
- <author>
- <firstname>Anthony</firstname><surname>Skjellum</surname>
- </author>
- </authorgroup>
- <copyright>
- <year>1999</year>
- <holder>Massachusetts Institute of Technology</holder>
- </copyright>
- <isbn>0-262-57132-3</isbn>
- <publisher>
- <publishername>The MIT Press</publishername>
- <address>Cambridge, MA</address>
- </publisher>
- <title>Using MPI</title>
- <subtitle>Portable Parallel Programming with the Message-Passing Interface</subtitle>
- <edition>second edition</edition>
- </biblioentry>
-
- <biblioentry>
- <abbrev>pooma95</abbrev>
- <authorgroup>
- <author>
- <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
- <affiliation>
- <orgname>Dakota Software Systems, Inc.</orgname>
- <address><city>Rapid City</city><state>SD</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Susan</firstname><othername role="mi">R.</othername><surname>Atlas</surname>
- <affiliation>
- <orgname>Parallel Solutions, Inc.</orgname>
- <address><city>Santa Fe</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Subhankar</firstname><surname>Banerjee</surname>
- <affiliation>
- <orgname>New Mexico State University</orgname>
- <address><city>Las Cruces</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
- <affiliation>
- <orgname>University of Illinois at Urbana-Champaign</orgname>
- <address><city>Urbana-Champaign</city><state>IL</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
- <affiliation>
- <orgname>California Institute of Technology</orgname>
- <address><city>Pasadena</city><state>CA</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Katarzyna</firstname><surname>Keahey</surname>
- <affiliation>
- <orgname>Indiana University</orgname>
- <address><city>Bloomington</city><state>IN</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Marydell</firstname><surname>Tholburn</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- </authorgroup>
- <title>&pooma;</title>
- <subtitle>A Framework for Scientific Simulation on Parallel Architectures</subtitle>
- <releaseinfo>unpublished</releaseinfo>
- </biblioentry>
-
- <biblioentry>
- <abbrev>pooma-sc95</abbrev>
- <authorgroup>
- <author>
- <firstname>Susan</firstname><surname>Atlas</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Subhankar</firstname><surname>Banerjee</surname>
- <affiliation>
- <orgname>New Mexico State University</orgname>
- <address><city>Las Cruces</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Paul</firstname><othername role="mi">J.</othername><surname>Hinker</surname>
- <affiliation>
- <orgname>Advanced Computing Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>M.</firstname><surname>Srikant</surname>
- <affiliation>
- <orgname>New Mexico State University</orgname>
- <address><city>Las Cruces</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Marydell</firstname><surname>Tholburn</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- </authorgroup>
- <title>&pooma;</title>
- <subtitle>A High Performance Distributed Simulation Environment for
- Scientific Applications</subtitle>
- <!-- FIXME: Where list Supercomputing 1995? -->
- </biblioentry>
-
- <biblioentry>
- <abbrev>pooma-siam98</abbrev>
- <authorgroup>
- <author>
- <firstname>Julian</firstname><othername role="mi">C.</othername><surname>Cummings</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>James</firstname><othername role="mi">A.</othername><surname>Crotinger</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Scott</firstname><othername role="mi">W.</othername><surname>Haney</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>William</firstname><othername role="mi">F.</othername><surname>Humphrey</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Steve</firstname><othername role="mi">R.</othername><surname>Karmesin</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>John</firstname><othername role="mi">V. W.</othername><surname>Reynders</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Stephen</firstname><othername role="mi">A.</othername><surname>Smith</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- <author>
- <firstname>Timothy</firstname><othername role="mi">J.</othername><surname>Williams</surname>
- <affiliation>
- <orgname>Los Alamos National Laboratory</orgname>
- <address><city>Los Alamos</city><state>NM</state></address>
- </affiliation>
- </author>
- </authorgroup>
- <title>Raid Application Development and Enhanced Code
- Interoperability using the &pooma; Framework</title>
- <!-- FIXME: Where list SIAM Workshop ... 1998? -->
- </biblioentry>
-
- <biblioentry>
- <!-- FIXME: Change the year when we learn it. -->
- <abbrev>pete-99</abbrev>
- <authorgroup>
- <author>
- <firstname>Scott</firstname><surname>Haney</surname>
- </author>
- <author>
- <firstname>James</firstname><surname>Crotinger</surname>
- </author>
- <author>
- <firstname>Steve</firstname><surname>Karmesin</surname>
- </author>
- <author>
- <firstname>Stephen</firstname><surname>Smith</surname>
- </author>
- </authorgroup>
- <title>Easy Expression Templates Using &pete;: The Portable
- Expression Template Engine</title>
- <!-- FIXME: When and where was this published? -->
- </biblioentry>
- </bibliography>
&glossary-chapter;
--- 5161,5168 ----
</appendix>
+ &bibliography-chapter;
&glossary-chapter;
Index: tutorial.xml
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v
retrieving revision 1.3
diff -c -p -r1.3 tutorial.xml
*** tutorial.xml 2001/12/17 17:27:42 1.3
--- tutorial.xml 2002/01/04 17:14:11
***************
*** 54,60 ****
<imagedata fileref="figures/doof2d.201" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>The Initial Configuration</phrase>
</textobject>
</mediaobject>
<mediaobject>
--- 54,60 ----
<imagedata fileref="figures/doof2d.201" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>The Initial &doof2d; Configuration</phrase>
</textobject>
</mediaobject>
<mediaobject>
***************
*** 476,482 ****
<imagedata fileref="figures/doof2d.210" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>Adding two arrays with different domains.</phrase>
</textobject>
<caption>
<para>When adding arrays, values in corresponding positions are
--- 476,482 ----
<imagedata fileref="figures/doof2d.210" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>Adding two arrays with different domains is supported.</phrase>
</textobject>
<caption>
<para>When adding arrays, values in corresponding positions are
***************
*** 587,593 ****
<imagedata fileref="figures/doof2d.211" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>Apply a stencil to position (1,3) of an array.</phrase>
</textobject>
<caption>
<para>To compute the value associated with index position (1,3)
--- 587,593 ----
<imagedata fileref="figures/doof2d.211" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>Apply a stencil to position (1,3) of an &array;.</phrase>
</textobject>
<caption>
<para>To compute the value associated with index position (1,3)
***************
*** 692,698 ****
<imagedata fileref="figures/distributed.101" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>the &pooma; distributed computation model.</phrase>
</textobject>
<caption>
<para>The &pooma; distributed computation model combines
--- 692,698 ----
<imagedata fileref="figures/distributed.101" format="EPS" align="center"></imagedata>
</imageobject>
<textobject>
! <phrase>the &pooma; distributed computation model</phrase>
</textobject>
<caption>
<para>The &pooma; distributed computation model combines
Index: figures/box-macros.mp
===================================================================
RCS file: box-macros.mp
diff -N box-macros.mp
*** /dev/null Fri Mar 23 21:37:44 2001
--- box-macros.mp Fri Jan 4 10:14:11 2002
***************
*** 0 ****
--- 1,106 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec20
+ %% Pooma
+
+ %% Macros to Improve Boxes
+
+ %% Assumes 'input boxes;'
+
+ % Ensure a list of boxes all have the same width.
+ % input <- suffixes for the boxes;
+ % output-> all boxes have the same width (maximum picture width + defaultdx)
+ vardef samewidth(suffix $)(text t) =
+ save p_; pair p_;
+ p_ = maxWidthAndHeight($)(t);
+ numericSetWidth(xpart(p_)+2defaultdx)($)(t);
+ enddef;
+
+ % Ensure a list of boxes all have the same height.
+ % input <- suffixes for the boxes;
+ % output-> all boxes have the same height (maximum picture height + defaultdy)
+ vardef sameheight(suffix $)(text t) =
+ save p_; pair p_;
+ p_ = maxWidthAndHeight($)(t);
+ numericSetWidth(ypart(p_)+2defaultdy)($)(t);
+ enddef;
+
+ % Given a list of boxes, determine the maximum picture width and
+ % maximum picture height.
+ % input <- suffixes for the boxes
+ % output-> pair of maximum picture width and height
+ vardef maxWidthAndHeight(suffix f)(text t) =
+ save w_, h_; numeric w_, h_;
+ w_ = xpart((urcorner pic_.f - llcorner pic_.f));
+ h_ = ypart((urcorner pic_.f - llcorner pic_.f));
+ forsuffixes uu = t:
+ if xpart((urcorner pic_.uu - llcorner pic_.uu)) > w_ :
+ w_ := xpart((urcorner pic_.uu - llcorner pic_.uu));
+ fi
+ if ypart((urcorner pic_.uu - llcorner pic_.uu)) > h_ :
+ h_ := ypart((urcorner pic_.uu - llcorner pic_.uu));
+ fi
+ endfor
+ (w_, h_)
+ enddef;
+
+ % Given a width, ensure a box has the given width.
+ % input <- box width
+ % suffix for the one box
+ % output-> the box has the given width by setting its .dx
+ vardef numericSetWidthOne(expr width)(suffix f) =
+ f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
+ enddef;
+
+ % Given a width, ensure all boxes have the given width.
+ % input <- box width
+ % suffixes for the boxes
+ % output-> all boxes have the given width by setting their .dx
+ vardef numericSetWidth(expr width)(suffix f)(text t) =
+ f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
+ forsuffixes $ = t:
+ $.dx = 0.5(width - xpart(urcorner pic_.$ - llcorner pic_.$));
+ endfor
+ enddef;
+
+ % Given a height, ensure all boxes have the given height.
+ % input <- box height
+ % suffixes for the boxes
+ % output-> all boxes have the given height by setting their .dx
+ vardef numericSetHeight(expr height)(suffix f)(text t) =
+ f.dy = 0.5(height - ypart(urcorner pic_.f - llcorner pic_.f));
+ forsuffixes $ = t:
+ $.dy = 0.5(height - ypart(urcorner pic_.$ - llcorner pic_.$));
+ endfor
+ enddef;
+
+ % Ensure a list of boxes and circles all to have the same width, height,
+ % and diameter.
+ % input <- suffixes for the boxes and circles
+ % output-> all boxes have .dx and .dy set so they have the same width,
+ % height, and radius
+ % The boxes are squares and the circles are circular, not oval.
+ vardef sameWidthAndHeight(suffix f)(text t) =
+ save p_; pair p_;
+ p_ = maxWidthAndHeight(f)(t);
+ if (xpart(p_)+2defaultdx >= ypart(p_)+2defaultdy):
+ numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
+ numericSetHeight(xpart(p_)+2defaultdx)(f)(t);
+ else:
+ numericSetWidth(ypart(p_)+2defaultdy)(f)(t);
+ numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
+ fi
+ enddef;
+
+ % Ensure a list of boxes and circles all to have the same width and
+ % the same height. Unlike sameWidthAndHeight, the width and height
+ % can differ.
+ % input <- suffixes for the boxes and circles
+ % output-> all boxes have .dx and .dy set so they have the same width,
+ % height, and radius
+ % The boxes are squares and the circles are circular, not oval.
+ vardef sameWidthSameHeight(suffix f)(text t) =
+ save p_; pair p_;
+ p_ = maxWidthAndHeight(f)(t);
+ numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
+ numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
+ enddef;
Index: figures/data-parallel.mp
===================================================================
RCS file: data-parallel.mp
diff -N data-parallel.mp
*** /dev/null Fri Mar 23 21:37:44 2001
--- data-parallel.mp Fri Jan 4 10:14:11 2002
***************
*** 0 ****
--- 1,157 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec20
+ %% Pooma
+
+ %% Illustrations for the Data-Parallel Chapter
+
+ %% Assumes TEX=latex.
+
+ input boxes;
+ input box-macros;
+ input grid-macros;
+
+ verbatimtex
+ \documentclass[10pt]{article}
+ \input{macros.ltx}
+ \begin{document}
+ etex
+
+ %% Parse Tree for Example Statement A += -A + 2*B
+ beginfig(101)
+ numeric unit; unit = 1.5cm;
+ numeric xunit; xunit = unit;
+ numeric yunit; yunit = unit;
+
+ %% Create the tree nodes.
+ circleit.b0(btex \statement{+=} etex);
+ circleit.b1(btex \varname{A} etex);
+ circleit.b2(btex \statement{+} etex);
+ circleit.b3(btex \statement{-} etex);
+ circleit.b4(btex \varname{A} etex);
+ circleit.b5(btex \statement{*} etex);
+ circleit.b6(btex \statement{2} etex);
+ circleit.b7(btex \varname{B} etex);
+ numeric nuBoxes; nuBoxes = 7;
+ sameWidthAndHeight(b0,b1,b2,b3,b4,b5,b6,b7);
+
+ %% Position the tree nodes.
+ b2.c = origin;
+ b0.c - 0.5[b1.c,b2.c] = (0,yunit);
+ b2.c - 0.5[b3.c,b5.c] = (0,yunit);
+ b3.c - 0.5[b4.c,b6.c] = (0,yunit);
+ b5.c - 0.5[b6.c,b7.c] = (0,yunit);
+ b1.c - b2.c = b3.c - b5.c = b4.c - b6.c = b6.c - b8.c = (-xunit,0);
+
+ %% Draw the tree.
+ for t = 2 upto 7:
+ drawboxed(b[t]);
+ endfor
+ vardef drawEdge(expr start, stop) =
+ draw b[start].c -- b[stop].c cutbefore bpath b[start] cutafter bpath b[stop];
+ enddef;
+ for t = (2,3), (2,5), (3,4), (5,6), (5,7):
+ drawEdge(xpart(t),ypart(t));
+ endfor
+
+ %% Label the node's types.
+ % TMP label.rt(btex \type{OpAddAssign} etex, b0.e);
+ % TMP label.rt(btex \type{Expression} etex, 0.5[b0.c,b2.c]);
+ label.top(btex \type{Expression} etex, b2.n);
+ % TMP label.lft(btex \type{Ar} etex, b1.w);
+ label.rt(btex \type{BinaryNode<OpAdd,} etex, b2.e);
+ label.lft(btex \type{UnaryNode<OpMinus,} etex, b3.w);
+ label.lft(btex \type{Ar} etex, b4.w);
+ label.rt(btex \type{BinaryNode<OpMultiply,} etex, b5.e);
+ label.bot(btex \type{Scalar<int>} etex, b6.s);
+ label.rt(btex \type{Ar} etex, b7.e);
+
+ endfig;
+
+
+ %% An illustratation of the addition of arrays.
+ beginfig(212)
+ numeric unit; unit = 0.9cm; % width or height of an individual grid cell
+ numeric nuCells; nuCells = 5; % number of cells in each dimension
+ % This number should be odd.
+ numeric nuArrayCells; nuArrayCells = 3;
+ % number of cells in array in each dimension
+ numeric operatorWidth; operatorWidth = 1.5;
+ % horizontal space for an operator as
+ % a multiple of "unit"
+
+ %% Determine the locations of the arrays.
+ z0 = origin;
+ z1 = z0 + unit * (nuCells+operatorWidth,0);
+ z2 - z1 = z1 - z0;
+
+ %% Draw the grid cells and the operators.
+ for t = 0 upto 2:
+ drawGridDashed(nuCells, unit, z[t]);
+ endfor
+ for t = 0 upto 1:
+ drawGrid(nuArrayCells, unit, z[t]+unit*(1,1));
+ endfor
+ drawGrid(nuArrayCells, unit, z2+unit*(2,0));
+
+ label(btex = etex, z1 + unit*(-0.6operatorWidth, 0.5nuCells));
+ label(btex + etex, z2 + unit*(-0.6operatorWidth, 0.5nuCells));
+
+ %% Label the indices.
+ % Label b(I,J) grid indices.
+ for t = 0 upto 2:
+ labelCellBottom(btex \footnotesize 0 etex, (0,0), z[t]);
+ labelCellBottom(btex \footnotesize 1 etex, (1,0), z[t]);
+ labelCellBottom(btex \footnotesize 2 etex, (2,0), z[t]);
+ labelCellBottom(btex \footnotesize 3 etex, (3,0), z[t]);
+ labelCellBottom(btex \footnotesize 4 etex, (4,0), z[t]);
+ labelCellLeft(btex \footnotesize 0 etex, (0,0), z[t]);
+ labelCellLeft(btex \footnotesize 1 etex, (0,1), z[t]);
+ labelCellLeft(btex \footnotesize 2 etex, (0,2), z[t]);
+ labelCellLeft(btex \footnotesize 3 etex, (0,3), z[t]);
+ labelCellLeft(btex \footnotesize 4 etex, (0,4), z[t]);
+ endfor
+
+ %% Label the grid cells' values.
+ % Label b(I,J) grid values.
+ pair zShift;
+ zShift := z1 + unit*(1,1);
+ labelCell(btex \normalsize 9 etex, (0,0), zShift);
+ labelCell(btex \normalsize 11 etex, (1,0), zShift);
+ labelCell(btex \normalsize 13 etex, (2,0), zShift);
+ labelCell(btex \normalsize 17 etex, (0,1), zShift);
+ labelCell(btex \normalsize 19 etex, (1,1), zShift);
+ labelCell(btex \normalsize 21 etex, (2,1), zShift);
+ labelCell(btex \normalsize 25 etex, (0,2), zShift);
+ labelCell(btex \normalsize 27 etex, (1,2), zShift);
+ labelCell(btex \normalsize 29 etex, (2,2), zShift);
+ % Label b(I+1,J-1) grid values.
+ zShift := z2 + unit*(2,0);
+ labelCell(btex \normalsize 3 etex, (0,0), zShift);
+ labelCell(btex \normalsize 5 etex, (1,0), zShift);
+ labelCell(btex \normalsize 7 etex, (2,0), zShift);
+ labelCell(btex \normalsize 11 etex, (0,1), zShift);
+ labelCell(btex \normalsize 13 etex, (1,1), zShift);
+ labelCell(btex \normalsize 15 etex, (2,1), zShift);
+ labelCell(btex \normalsize 19 etex, (0,2), zShift);
+ labelCell(btex \normalsize 21 etex, (1,2), zShift);
+ labelCell(btex \normalsize 23 etex, (2,2), zShift);
+ % Label b(I,J)+b(I+1,J-1) grid values.
+ zShift := z0 + unit*(1,1);
+ labelCell(btex \normalsize 9 etex, (0,0), zShift);
+ labelCell(btex \normalsize 22 etex, (1,0), zShift);
+ labelCell(btex \normalsize 26 etex, (2,0), zShift);
+ labelCell(btex \normalsize 17 etex, (0,1), zShift);
+ labelCell(btex \normalsize 38 etex, (1,1), zShift);
+ labelCell(btex \normalsize 42 etex, (2,1), zShift);
+ labelCell(btex \normalsize 25 etex, (0,2), zShift);
+ labelCell(btex \normalsize 27 etex, (1,2), zShift);
+ labelCell(btex \normalsize 29 etex, (2,2), zShift);
+
+ %% Label the grids.
+ labelGrid(btex $A+B$ etex, nuCells, z0);
+ labelGrid(btex $A$ etex, nuCells, z1);
+ labelGrid(btex $B$ etex, nuCells, z2);
+ endfig;
+
+
+ bye
Index: figures/doof2d.mp
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/figures/doof2d.mp,v
retrieving revision 1.2
diff -c -p -r1.2 doof2d.mp
*** figures/doof2d.mp 2001/12/11 20:36:13 1.2
--- figures/doof2d.mp 2002/01/04 17:14:11
*************** verbatimtex
*** 12,46 ****
\begin{document}
etex
! % Draw a set of grid cells.
! vardef drawGrid(expr nuCells, unit, llCorner) =
! for i = 0 upto nuCells-1:
! for j = 0 upto nuCells-1:
! draw unitsquare scaled unit shifted (llCorner + unit*(i,j));
! endfor
! endfor
! enddef;
!
! % Label the specified grid, grid cell, or its edge.
! % Place a value at the center of a grid cell.
! vardef labelCell(expr lbl, xy, llCorner) =
! label(lbl, llCorner + unit*(xy + 0.5*(1,1)));
! enddef;
!
! % Label the bottom of a grid cell.
! vardef labelCellBottom(expr lbl, xy, llCorner) =
! label.bot(lbl, llCorner + unit*(xy + 0.5*(1,0)));
! enddef;
!
! % Label the left side of a grid cell.
! vardef labelCellLeft(expr lbl, xy, llCorner) =
! label.lft(lbl, llCorner + unit*(xy + 0.5*(0,1)));
! enddef;
!
! % Label the top of a grid.
! vardef labelGrid(expr lbl, nuCells, llCorner) =
! label.top(lbl, llCorner + unit*(nuCells/2,nuCells));
! enddef;
%% Global Declarations
numeric unit; unit = 0.9cm; % width or height of an individual grid cell
--- 12,18 ----
\begin{document}
etex
! input grid-macros;
%% Global Declarations
numeric unit; unit = 0.9cm; % width or height of an individual grid cell
Index: figures/grid-macros.mp
===================================================================
RCS file: grid-macros.mp
diff -N grid-macros.mp
*** /dev/null Fri Mar 23 21:37:44 2001
--- grid-macros.mp Fri Jan 4 10:14:11 2002
***************
*** 0 ****
--- 1,45 ----
+ %% Oldham, Jeffrey D.
+ %% 2001Dec21
+ %% Pooma
+
+ %% Macros for Drawing Grids
+
+ % Draw a set of grid cells.
+ vardef drawGrid(expr nuCells, unit, llCorner) =
+ for i = 0 upto nuCells-1:
+ for j = 0 upto nuCells-1:
+ draw unitsquare scaled unit shifted (llCorner + unit*(i,j));
+ endfor
+ endfor
+ enddef;
+
+ % Draw a set of grid cells with dashed lines.
+ vardef drawGridDashed(expr nuCells, unit, llCorner) =
+ for i = 0 upto nuCells-1:
+ for j = 0 upto nuCells-1:
+ draw unitsquare scaled unit shifted (llCorner + unit*(i,j)) dashed evenly;
+ endfor
+ endfor
+ enddef;
+
+ % Label the specified grid, grid cell, or its edge.
+ % Place a value at the center of a grid cell.
+ vardef labelCell(expr lbl, xy, llCorner) =
+ label(lbl, llCorner + unit*(xy + 0.5*(1,1)));
+ enddef;
+
+ % Label the bottom of a grid cell.
+ vardef labelCellBottom(expr lbl, xy, llCorner) =
+ label.bot(lbl, llCorner + unit*(xy + 0.5*(1,0)));
+ enddef;
+
+ % Label the left side of a grid cell.
+ vardef labelCellLeft(expr lbl, xy, llCorner) =
+ label.lft(lbl, llCorner + unit*(xy + 0.5*(0,1)));
+ enddef;
+
+ % Label the top of a grid.
+ vardef labelGrid(expr lbl, nuCells, llCorner) =
+ label.top(lbl, llCorner + unit*(nuCells/2,nuCells));
+ enddef;
+
Index: figures/introduction.mp
===================================================================
RCS file: /home/pooma/Repository/r2/docs/manual/figures/introduction.mp,v
retrieving revision 1.1
diff -c -p -r1.1 introduction.mp
*** figures/introduction.mp 2001/12/17 17:27:42 1.1
--- figures/introduction.mp 2002/01/04 17:14:11
***************
*** 7,12 ****
--- 7,13 ----
%% Assumes TEX=latex.
input boxes;
+ input box-macros;
verbatimtex
\documentclass[10pt]{article}
*************** beginfig(101)
*** 21,125 ****
numeric horizSpace; horizSpace = 8unit;
numeric vertSpace; vertSpace = unit;
numeric nuBoxes; % number of boxes
-
- % Ensure a list of boxes all have the same width.
- % input <- suffixes for the boxes;
- % output-> all boxes have the same width (maximum picture width + defaultdx)
- vardef samewidth(suffix $)(text t) =
- save p_; pair p_;
- p_ = maxWidthAndHeight($)(t);
- numericSetWidth(xpart(p_)+2defaultdx)($)(t);
- enddef;
-
- % Ensure a list of boxes all have the same height.
- % input <- suffixes for the boxes;
- % output-> all boxes have the same height (maximum picture height + defaultdy)
- vardef sameheight(suffix $)(text t) =
- save p_; pair p_;
- p_ = maxWidthAndHeight($)(t);
- numericSetWidth(ypart(p_)+2defaultdy)($)(t);
- enddef;
-
- % Given a list of boxes, determine the maximum picture width and
- % maximum picture height.
- % input <- suffixes for the boxes
- % output-> pair of maximum picture width and height
- vardef maxWidthAndHeight(suffix f)(text t) =
- save w_, h_; numeric w_, h_;
- w_ = xpart((urcorner pic_.f - llcorner pic_.f));
- h_ = ypart((urcorner pic_.f - llcorner pic_.f));
- forsuffixes uu = t:
- if xpart((urcorner pic_.uu - llcorner pic_.uu)) > w_ :
- w_ := xpart((urcorner pic_.uu - llcorner pic_.uu));
- fi
- if ypart((urcorner pic_.uu - llcorner pic_.uu)) > h_ :
- h_ := ypart((urcorner pic_.uu - llcorner pic_.uu));
- fi
- endfor
- (w_, h_)
- enddef;
-
- % Given a width, ensure a box has the given width.
- % input <- box width
- % suffix for the one box
- % output-> the box has the given width by setting its .dx
- vardef numericSetWidthOne(expr width)(suffix f) =
- f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
- enddef;
-
- % Given a width, ensure all boxes have the given width.
- % input <- box width
- % suffixes for the boxes
- % output-> all boxes have the given width by setting their .dx
- vardef numericSetWidth(expr width)(suffix f)(text t) =
- f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f));
- forsuffixes $ = t:
- $.dx = 0.5(width - xpart(urcorner pic_.$ - llcorner pic_.$));
- endfor
- enddef;
-
- % Given a height, ensure all boxes have the given height.
- % input <- box height
- % suffixes for the boxes
- % output-> all boxes have the given height by setting their .dx
- vardef numericSetHeight(expr height)(suffix f)(text t) =
- f.dy = 0.5(height - ypart(urcorner pic_.f - llcorner pic_.f));
- forsuffixes $ = t:
- $.dy = 0.5(height - ypart(urcorner pic_.$ - llcorner pic_.$));
- endfor
- enddef;
-
- % Ensure a list of boxes and circles all to have the same width, height,
- % and diameter.
- % input <- suffixes for the boxes and circles
- % output-> all boxes have .dx and .dy set so they have the same width,
- % height, and radius
- % The boxes are squares and the circles are circular, not oval.
- vardef sameWidthAndHeight(suffix f)(text t) =
- save p_; pair p_;
- p_ = maxWidthAndHeight(f)(t);
- if (xpart(p_)+2defaultdx >= ypart(p_)+2defaultdy):
- numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
- numericSetHeight(xpart(p_)+2defaultdx)(f)(t);
- else:
- numericSetWidth(ypart(p_)+2defaultdy)(f)(t);
- numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
- fi
- enddef;
-
- % Ensure a list of boxes and circles all to have the same width and
- % the same height. Unlike sameWidthAndHeight, the width and height
- % can differ.
- % input <- suffixes for the boxes and circles
- % output-> all boxes have .dx and .dy set so they have the same width,
- % height, and radius
- % The boxes are squares and the circles are circular, not oval.
- vardef sameWidthSameHeight(suffix f)(text t) =
- save p_; pair p_;
- p_ = maxWidthAndHeight(f)(t);
- numericSetWidth(xpart(p_)+2defaultdx)(f)(t);
- numericSetHeight(ypart(p_)+2defaultdy)(f)(t);
- enddef;
% Create the boxes.
boxit.b0(btex \textsl{science / math} etex);
--- 22,27 ----
Index: programs/Doof2d-Array-distributed-annotated.patch
===================================================================
RCS file: Doof2d-Array-distributed-annotated.patch
diff -N Doof2d-Array-distributed-annotated.patch
*** /tmp/cvsKKb5AR Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,184 ****
- *** Doof2d-Array-distributed.cpp Wed Dec 5 14:04:36 2001
- --- Doof2d-Array-distributed-annotated.cpp Wed Dec 5 14:07:56 2001
- ***************
- *** 1,3 ****
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Arrays.h" // has Pooma's Array
-
- --- 1,5 ----
- ! <programlisting id="tutorial-array_distributed-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Arrays.h" // has Pooma's Array
-
- ***************
- *** 14,18 ****
- // (i,j). The "C" template parameter permits use of this stencil
- // operator with both Arrays and Fields.
- ! template <class C>
- inline
- typename C::Element_t
- --- 16,20 ----
- // (i,j). The "C" template parameter permits use of this stencil
- // operator with both Arrays and Fields.
- ! template <class C>
- inline
- typename C::Element_t
- ***************
- *** 42,46 ****
- // canot use standard input and output. Instead we use command-line
- // arguments, which are replicated, for input, and we use an Inform
- ! // stream for output.
- Inform output;
-
- --- 44,48 ----
- // canot use standard input and output. Instead we use command-line
- // arguments, which are replicated, for input, and we use an Inform
- ! // stream for output. <co id="tutorial-array_distributed-doof2d-io"></co>
- Inform output;
-
- ***************
- *** 48,52 ****
- if (argc != 4) {
- // Incorrect number of command-line arguments.
- ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
- return EXIT_FAILURE;
- }
- --- 50,54 ----
- if (argc != 4) {
- // Incorrect number of command-line arguments.
- ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
- return EXIT_FAILURE;
- }
- ***************
- *** 55,63 ****
- // Determine the number of processors.
- long nuProcessors;
- ! nuProcessors = strtol(argv[1], &tail, 0);
-
- // Determine the number of averagings.
- long nuAveragings, nuIterations;
- ! nuAveragings = strtol(argv[2], &tail, 0);
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- --- 57,65 ----
- // Determine the number of processors.
- long nuProcessors;
- ! nuProcessors = strtol(argv[1], &tail, 0);
-
- // Determine the number of averagings.
- long nuAveragings, nuIterations;
- ! nuAveragings = strtol(argv[2], &tail, 0);
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ***************
- *** 65,69 ****
- // the grid.
- long n;
- ! n = strtol(argv[3], &tail, 0);
- // The dimension must be a multiple of the number of processors
- // since we are using a UniformGridLayout.
- --- 67,71 ----
- // the grid.
- long n;
- ! n = strtol(argv[3], &tail, 0);
- // The dimension must be a multiple of the number of processors
- // since we are using a UniformGridLayout.
- ***************
- *** 71,80 ****
-
- // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<2> interiorDomain(I,I);
-
- // Create the distributed arrays.
- --- 73,82 ----
-
- // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<2> interiorDomain(I,I);
-
- // Create the distributed arrays.
- ***************
- *** 83,98 ****
- // dimension. Guard layers optimize communication between patches.
- // Internal guards surround each patch. External guards surround
- ! // the entire array domain.
- ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! GuardLayers<2>(1), // internal
- ! GuardLayers<2>(0)); // external
- ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-
- // The template parameters indicate 2 dimensions and a 'double'
- // element type. MultiPatch indicates multiple computation patches,
- // i.e., distributed computation. The UniformTag indicates the
- ! // patches should have the same size. Each patch has Brick type.
- ! Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > a(layout);
- ! Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > b(layout);
-
- // Set up the initial conditions.
- --- 85,100 ----
- // dimension. Guard layers optimize communication between patches.
- // Internal guards surround each patch. External guards surround
- ! // the entire array domain. <co id="tutorial-array_distributed-doof2d-layout"></co>
- ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! GuardLayers<2>(1), // internal
- ! GuardLayers<2>(0)); // external
- ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-
- // The template parameters indicate 2 dimensions and a 'double'
- // element type. MultiPatch indicates multiple computation patches,
- // i.e., distributed computation. The UniformTag indicates the
- ! // patches should have the same size. Each patch has Brick type. <co id="tutorial-array_distributed-doof2d-remote"></co>
- ! Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > a(layout);
- ! Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > b(layout);
-
- // Set up the initial conditions.
- ***************
- *** 104,112 ****
-
- // Create the stencil performing the computation.
- ! Stencil<DoofNinePt> stencil;
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a.
- a(interiorDomain) = stencil(b, interiorDomain);
-
- --- 106,114 ----
-
- // Create the stencil performing the computation.
- ! Stencil<DoofNinePt> stencil;
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a. <co id="tutorial-array_distributed-doof2d-first_write"></co>
- a(interiorDomain) = stencil(b, interiorDomain);
-
- ***************
- *** 117,121 ****
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The arrays are automatically deallocated.
- --- 119,123 ----
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The arrays are automatically deallocated.
- ***************
- *** 125,126 ****
- --- 127,129 ----
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Array-element-annotated.patch
===================================================================
RCS file: Doof2d-Array-element-annotated.patch
diff -N Doof2d-Array-element-annotated.patch
*** /tmp/cvslmAiwW Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,143 ****
- *** Doof2d-Array-element.cpp Tue Dec 4 12:02:10 2001
- --- Doof2d-Array-element-annotated.cpp Tue Dec 4 12:24:25 2001
- ***************
- *** 1,5 ****
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- ! #include "Pooma/Arrays.h" // has Pooma's Array
-
- // Doof2d: Pooma Arrays, element-wise implementation
- --- 1,6 ----
- ! <programlisting id="tutorial-array_elementwise-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- ! #include "Pooma/Arrays.h" // has Pooma's Array <co id="tutorial-array_elementwise-doof2d-header"></co>
-
- // Doof2d: Pooma Arrays, element-wise implementation
- ***************
- *** 7,17 ****
- int main(int argc, char *argv[])
- {
- ! // Prepare the Pooma library for execution.
- Pooma::initialize(argc,argv);
-
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- --- 8,18 ----
- int main(int argc, char *argv[])
- {
- ! // Prepare the Pooma library for execution. <co id="tutorial-array_elementwise-doof2d-pooma_initialize"></co>
- Pooma::initialize(argc,argv);
-
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ***************
- *** 19,37 ****
- // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- ! // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- ! // Create the arrays.
- // The template parameters indicate 2 dimensions, a 'double' element
- // type, and ordinary 'Brick' storage.
- ! Array<2, double, Brick> a(vertDomain);
- ! Array<2, double, Brick> b(vertDomain);
-
- // Set up the initial conditions.
- ! // All grid values should be zero except for the central value.
- for (int j = 1; j < n-1; j++)
- for (int i = 1; i < n-1; i++)
- --- 20,38 ----
- // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- ! // Specify the arrays' domains [0,n) x [0,n). <co id="tutorial-array_elementwise-doof2d-domain"></co>
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- ! // Create the arrays. <co id="tutorial-array_elementwise-doof2d-array_creation"></co>
- // The template parameters indicate 2 dimensions, a 'double' element
- // type, and ordinary 'Brick' storage.
- ! Array<2, double, Brick> a(vertDomain);
- ! Array<2, double, Brick> b(vertDomain);
-
- // Set up the initial conditions.
- ! // All grid values should be zero except for the central value. <co id="tutorial-array_elementwise-doof2d-initialization"></co>
- for (int j = 1; j < n-1; j++)
- for (int i = 1; i < n-1; i++)
- ***************
- *** 43,51 ****
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- // Read from b. Write to a.
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- ! a(i,j) = weight *
- (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) +
- b(i ,j+1) + b(i ,j ) + b(i ,j-1) +
- --- 44,52 ----
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- // Read from b. Write to a.
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- ! a(i,j) = weight * <co id="tutorial-array_elementwise-doof2d-first_write"></co>
- (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) +
- b(i ,j+1) + b(i ,j ) + b(i ,j-1) +
- ***************
- *** 53,58 ****
-
- // Read from a. Write to b.
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- b(i,j) = weight *
- (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) +
- --- 54,59 ----
-
- // Read from a. Write to b.
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- b(i,j) = weight *
- (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) +
- ***************
- *** 62,71 ****
-
- // Print out the final central value.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- ! // The arrays are automatically deallocated.
-
- ! // Tell the Pooma library execution has finished.
- Pooma::finalize();
- return EXIT_SUCCESS;
- }
- --- 63,74 ----
-
- // Print out the final central value.
- ! Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- ! // The arrays are automatically deallocated. <co id="tutorial-array_elementwise-doof2d-deallocation"></co>
-
- ! // Tell the Pooma library execution has finished. <co id="tutorial-array_elementwise-doof2d-pooma_finish"></co>
- Pooma::finalize();
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Array-parallel-annotated.patch
===================================================================
RCS file: Doof2d-Array-parallel-annotated.patch
diff -N Doof2d-Array-parallel-annotated.patch
*** /tmp/cvsuReKr3 Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,116 ****
- *** Doof2d-Array-parallel.cpp Tue Dec 4 11:49:43 2001
- --- Doof2d-Array-parallel-annotated.cpp Tue Dec 4 12:24:36 2001
- ***************
- *** 1,4 ****
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Arrays.h" // has Pooma's Array
-
- --- 1,5 ----
- ! <programlisting id="tutorial-array_parallel-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Arrays.h" // has Pooma's Array
-
- ***************
- *** 12,17 ****
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- --- 13,18 ----
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ***************
- *** 19,43 ****
- // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- ! // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<1> J(1,n-2);
-
- // Create the arrays.
- // The template parameters indicate 2 dimensions, a 'double' element
- // type, and ordinary 'Brick' storage.
- ! Array<2, double, Brick> a(vertDomain);
- ! Array<2, double, Brick> b(vertDomain);
-
- // Set up the initial conditions.
- // All grid values should be zero except for the central value.
- a = b = 0.0;
- ! // Ensure all data-parallel computation finishes before accessing a value.
- Pooma::blockAndEvaluate();
- b(n/2,n/2) = 1000.0;
- --- 20,44 ----
- // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- ! // Set up interior domains [1,n-1) x [1,n-1) for computation. <co id="tutorial-array_parallel-doof2d-innerdomain"></co>
- ! Interval<1> I(1,n-2);
- ! Interval<1> J(1,n-2);
-
- // Create the arrays.
- // The template parameters indicate 2 dimensions, a 'double' element
- // type, and ordinary 'Brick' storage.
- ! Array<2, double, Brick> a(vertDomain);
- ! Array<2, double, Brick> b(vertDomain);
-
- // Set up the initial conditions.
- // All grid values should be zero except for the central value.
- a = b = 0.0;
- ! // Ensure all data-parallel computation finishes before accessing a value. <co id="tutorial-array_parallel-doof2d-blockAndEvaluate"></co>
- Pooma::blockAndEvaluate();
- b(n/2,n/2) = 1000.0;
- ***************
- *** 47,52 ****
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a.
- a(I,J) = weight *
- (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) +
- --- 48,53 ----
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a. <co id="tutorial-array_parallel-doof2d-first_write"></co>
- a(I,J) = weight *
- (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) +
- ***************
- *** 63,67 ****
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The arrays are automatically deallocated.
- --- 64,68 ----
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The arrays are automatically deallocated.
- ***************
- *** 71,72 ****
- --- 72,74 ----
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Array-stencil-annotated.patch
===================================================================
RCS file: Doof2d-Array-stencil-annotated.patch
diff -N Doof2d-Array-stencil-annotated.patch
*** /tmp/cvsLwSPO9 Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,152 ****
- *** Doof2d-Array-stencil.cpp Tue Dec 4 11:49:39 2001
- --- Doof2d-Array-stencil-annotated.cpp Tue Dec 4 12:26:46 2001
- ***************
- *** 1,9 ****
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Arrays.h" // has Pooma's Array
-
- // Doof2d: Pooma Arrays, stencil implementation
-
- ! // Define the stencil class performing the computation.
- class DoofNinePt
- {
- --- 1,10 ----
- ! <programlisting id="tutorial-array_stencil-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Arrays.h" // has Pooma's Array
-
- // Doof2d: Pooma Arrays, stencil implementation
-
- ! // Define the stencil class performing the computation. <co id="tutorial-array_stencil-doof2d-stencil"></co>
- class DoofNinePt
- {
- ***************
- *** 14,19 ****
- // This stencil operator is applied to each interior domain position
- // (i,j). The "C" template parameter permits use of this stencil
- ! // operator with both Arrays and Fields.
- ! template <class C>
- inline
- typename C::Element_t
- --- 15,20 ----
- // This stencil operator is applied to each interior domain position
- // (i,j). The "C" template parameter permits use of this stencil
- ! // operator with both Arrays and Fields. <co id="tutorial-array_stencil-doof2d-stencil_operator"></co>
- ! template <class C>
- inline
- typename C::Element_t
- ***************
- *** 26,30 ****
- }
-
- ! inline int lowerExtent(int) const { return 1; }
- inline int upperExtent(int) const { return 1; }
-
- --- 27,31 ----
- }
-
- ! inline int lowerExtent(int) const { return 1; } <co id="tutorial-array_stencil-doof2d-stencil_extent"></co>
- inline int upperExtent(int) const { return 1; }
-
- ***************
- *** 42,47 ****
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- --- 43,48 ----
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ***************
- *** 49,68 ****
- // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<2> interiorDomain(I,I);
-
- // Create the arrays.
- // The template parameters indicate 2 dimensions, a 'double' element
- // type, and ordinary 'Brick' storage.
- ! Array<2, double, Brick> a(vertDomain);
- ! Array<2, double, Brick> b(vertDomain);
-
- // Set up the initial conditions.
- --- 50,69 ----
- // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- // Specify the arrays' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<2> interiorDomain(I,I);
-
- // Create the arrays.
- // The template parameters indicate 2 dimensions, a 'double' element
- // type, and ordinary 'Brick' storage.
- ! Array<2, double, Brick> a(vertDomain);
- ! Array<2, double, Brick> b(vertDomain);
-
- // Set up the initial conditions.
- ***************
- *** 73,82 ****
- b(n/2,n/2) = 1000.0;
-
- ! // Create the stencil performing the computation.
- ! Stencil<DoofNinePt> stencil;
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a.
- a(interiorDomain) = stencil(b, interiorDomain);
-
- --- 74,83 ----
- b(n/2,n/2) = 1000.0;
-
- ! // Create the stencil performing the computation. <co id="tutorial-array_stencil-doof2d-stencil_creation"></co>
- ! Stencil<DoofNinePt> stencil;
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a. <co id="tutorial-array_stencil-doof2d-first_write"></co>
- a(interiorDomain) = stencil(b, interiorDomain);
-
- ***************
- *** 87,91 ****
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The arrays are automatically deallocated.
- --- 88,92 ----
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The arrays are automatically deallocated.
- ***************
- *** 95,96 ****
- --- 96,98 ----
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-C-element-annotated.patch
===================================================================
RCS file: Doof2d-C-element-annotated.patch
diff -N Doof2d-C-element-annotated.patch
*** /tmp/cvs2hDHVf Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,150 ****
- *** Doof2d-C-element.cpp Tue Nov 27 08:36:38 2001
- --- Doof2d-C-element-annotated.cpp Tue Nov 27 12:08:03 2001
- ***************
- *** 1,4 ****
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
-
- // Doof2d: C-like, element-wise implementation
- --- 1,5 ----
- ! <programlisting id="tutorial-hand_coded-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
-
- // Doof2d: C-like, element-wise implementation
- ***************
- *** 6,30 ****
- int main()
- {
- ! // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ! // Use two-dimensional grids of values.
- double **a;
- double **b;
-
- // Ask the user for the number n of elements along one dimension of
- ! // the grid.
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- ! // Allocate the arrays.
- typedef double* doublePtr;
- a = new doublePtr[n];
- b = new doublePtr[n];
- ! for (int i = 0; i < n; i++) {
- a[i] = new double[n];
- b[i] = new double[n];
- --- 7,31 ----
- int main()
- {
- ! // Ask the user for the number of averagings. <co id="tutorial-hand_coded-doof2d-nuaveragings"></co>
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ! // Use two-dimensional grids of values. <co id="tutorial-hand_coded-doof2d-array_storage"></co>
- double **a;
- double **b;
-
- // Ask the user for the number n of elements along one dimension of
- ! // the grid. <co id="tutorial-hand_coded-doof2d-grid_size"></co>
- long n;
- ! std::cout << "Please enter the array size: ";
- ! std::cin >> n;
-
- ! // Allocate the arrays. <co id="tutorial-hand_coded-doof2d-allocation"></co>
- typedef double* doublePtr;
- a = new doublePtr[n];
- b = new doublePtr[n];
- ! for (int i = 0; i < n; i++) {
- a[i] = new double[n];
- b[i] = new double[n];
- ***************
- *** 32,49 ****
-
- // Set up the initial conditions.
- ! // All grid values should be zero except for the central value.
- ! for (int j = 0; j < n; j++)
- ! for (int i = 0; i < n; i++)
- a[i][j] = b[i][j] = 0.0;
- b[n/2][n/2] = 1000.0;
-
- ! // In the average, weight elements with this value.
- const double weight = 1.0/9.0;
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a.
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- a[i][j] = weight *
- (b[i+1][j+1] + b[i+1][j ] + b[i+1][j-1] +
- --- 33,50 ----
-
- // Set up the initial conditions.
- ! // All grid values should be zero except for the central value. <co id="tutorial-hand_coded-doof2d-initialization"></co>
- ! for (int j = 0; j < n; j++)
- ! for (int i = 0; i < n; i++)
- a[i][j] = b[i][j] = 0.0;
- b[n/2][n/2] = 1000.0;
-
- ! // In the average, weight elements with this value. <co id="tutorial-hand_coded-doof2d-constants"></co>
- const double weight = 1.0/9.0;
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a. <co id="tutorial-hand_coded-doof2d-first_write"></co>
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- a[i][j] = weight *
- (b[i+1][j+1] + b[i+1][j ] + b[i+1][j-1] +
- ***************
- *** 51,57 ****
- b[i-1][j+1] + b[i-1][j ] + b[i-1][j-1]);
-
- ! // Read from a. Write to b.
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- b[i][j] = weight *
- (a[i+1][j+1] + a[i+1][j ] + a[i+1][j-1] +
- --- 52,58 ----
- b[i-1][j+1] + b[i-1][j ] + b[i-1][j-1]);
-
- ! // Read from a. Write to b. <co id="tutorial-hand_coded-doof2d-second_write"></co>
- ! for (int j = 1; j < n-1; j++)
- ! for (int i = 1; i < n-1; i++)
- b[i][j] = weight *
- (a[i+1][j+1] + a[i+1][j ] + a[i+1][j-1] +
- ***************
- *** 60,68 ****
- }
-
- ! // Print out the final central value.
- ! std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl;
-
- ! // Deallocate the arrays.
- ! for (int i = 0; i < n; i++) {
- delete [] a[i];
- delete [] b[i];
- --- 61,69 ----
- }
-
- ! // Print out the final central value. <co id="tutorial-hand_coded-doof2d-answer"></co>
- ! std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl;
-
- ! // Deallocate the arrays. <co id="tutorial-hand_coded-doof2d-deallocation"></co>
- ! for (int i = 0; i < n; i++) {
- delete [] a[i];
- delete [] b[i];
- ***************
- *** 73,74 ****
- --- 74,76 ----
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Field-distributed-annotated.patch
===================================================================
RCS file: Doof2d-Field-distributed-annotated.patch
diff -N Doof2d-Field-distributed-annotated.patch
*** /tmp/cvsF2z45n Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,176 ****
- *** Doof2d-Field-distributed.cpp Wed Dec 5 14:05:10 2001
- --- Doof2d-Field-distributed-annotated.cpp Wed Dec 5 14:41:24 2001
- ***************
- *** 1,3 ****
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Fields.h" // has Pooma's Field
-
- --- 1,4 ----
- ! <programlisting id="tutorial-field_distributed-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <stdlib.h> // has EXIT_SUCCESS
- #include "Pooma/Fields.h" // has Pooma's Field
-
- ***************
- *** 12,16 ****
- // canot use standard input and output. Instead we use command-line
- // arguments, which are replicated, for input, and we use an Inform
- ! // stream for output.
- Inform output;
-
- --- 13,17 ----
- // canot use standard input and output. Instead we use command-line
- // arguments, which are replicated, for input, and we use an Inform
- ! // stream for output. <co id="tutorial-field_distributed-doof2d-io"></co>
- Inform output;
-
- ***************
- *** 18,22 ****
- if (argc != 4) {
- // Incorrect number of command-line arguments.
- ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
- return EXIT_FAILURE;
- }
- --- 19,23 ----
- if (argc != 4) {
- // Incorrect number of command-line arguments.
- ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl;
- return EXIT_FAILURE;
- }
- ***************
- *** 25,33 ****
- // Determine the number of processors.
- long nuProcessors;
- ! nuProcessors = strtol(argv[1], &tail, 0);
-
- // Determine the number of averagings.
- long nuAveragings, nuIterations;
- ! nuAveragings = strtol(argv[2], &tail, 0);
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- --- 26,34 ----
- // Determine the number of processors.
- long nuProcessors;
- ! nuProcessors = strtol(argv[1], &tail, 0);
-
- // Determine the number of averagings.
- long nuAveragings, nuIterations;
- ! nuAveragings = strtol(argv[2], &tail, 0);
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ***************
- *** 35,39 ****
- // the grid.
- long n;
- ! n = strtol(argv[3], &tail, 0);
- // The dimension must be a multiple of the number of processors
- // since we are using a UniformGridLayout.
- --- 36,40 ----
- // the grid.
- long n;
- ! n = strtol(argv[3], &tail, 0);
- // The dimension must be a multiple of the number of processors
- // since we are using a UniformGridLayout.
- ***************
- *** 41,50 ****
-
- // Specify the fields' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<1> J(1,n-2);
-
- // Partition the fields' domains uniformly, i.e., each patch has the
- --- 42,51 ----
-
- // Specify the fields' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<1> J(1,n-2);
-
- // Partition the fields' domains uniformly, i.e., each patch has the
- ***************
- *** 52,74 ****
- // dimension. Guard layers optimize communication between patches.
- // Internal guards surround each patch. External guards surround
- ! // the entire field domain.
- ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! GuardLayers<2>(1), // internal
- ! GuardLayers<2>(0)); // external
- ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-
- // Specify the fields' mesh, i.e., its spatial extent, and its
- ! // centering type.
- ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-
- // The template parameters indicate a mesh and a 'double'
- // element type. MultiPatch indicates multiple computation patches,
- // i.e., distributed computation. The UniformTag indicates the
- ! // patches should have the same size. Each patch has Brick type.
- ! Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- ! Remote<Brick> > > a(cell, layout, mesh);
- ! Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- ! Remote<Brick> > > b(cell, layout, mesh);
-
- // Set up the initial conditions.
- --- 53,75 ----
- // dimension. Guard layers optimize communication between patches.
- // Internal guards surround each patch. External guards surround
- ! // the entire field domain. <co id="tutorial-field_distributed-doof2d-layout"></co>
- ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors),
- ! GuardLayers<2>(1), // internal
- ! GuardLayers<2>(0)); // external
- ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag());
-
- // Specify the fields' mesh, i.e., its spatial extent, and its
- ! // centering type. <co id="tutorial-field_distributed-doof2d-mesh"></co>
- ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-
- // The template parameters indicate a mesh and a 'double'
- // element type. MultiPatch indicates multiple computation patches,
- // i.e., distributed computation. The UniformTag indicates the
- ! // patches should have the same size. Each patch has Brick type. <co id="tutorial-field_distributed-doof2d-remote"></co>
- ! Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- ! Remote<Brick> > > a(cell, layout, mesh);
- ! Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag,
- ! Remote<Brick> > > b(cell, layout, mesh);
-
- // Set up the initial conditions.
- ***************
- *** 83,87 ****
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- // Read from b. Write to a.
- a(I,J) = weight *
- --- 84,88 ----
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- // Read from b. Write to a.
- a(I,J) = weight *
- ***************
- *** 99,103 ****
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The fields are automatically deallocated.
- --- 100,104 ----
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The fields are automatically deallocated.
- ***************
- *** 107,108 ****
- --- 108,110 ----
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/Doof2d-Field-parallel-annotated.patch
===================================================================
RCS file: Doof2d-Field-parallel-annotated.patch
diff -N Doof2d-Field-parallel-annotated.patch
*** /tmp/cvswOFpSv Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,120 ****
- *** Doof2d-Field-parallel.cpp Tue Dec 4 10:01:28 2001
- --- Doof2d-Field-parallel-annotated.cpp Tue Dec 4 11:04:26 2001
- ***************
- *** 1,5 ****
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- ! #include "Pooma/Fields.h" // has Pooma's Field
-
- // Doof2d: Pooma Fields, data-parallel implementation
- --- 1,6 ----
- ! <programlisting id="tutorial-field_parallel-doof2d-program" linenumbering="numbered" format="linespecific">
- ! #include <iostream> // has std::cout, ...
- ! #include <stdlib.h> // has EXIT_SUCCESS
- ! #include "Pooma/Fields.h" // has Pooma's Field <co id="tutorial-field_parallel-doof2d-header"></co>
-
- // Doof2d: Pooma Fields, data-parallel implementation
- ***************
- *** 12,17 ****
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- --- 13,18 ----
- // Ask the user for the number of averagings.
- long nuAveragings, nuIterations;
- ! std::cout << "Please enter the number of averagings: ";
- ! std::cin >> nuAveragings;
- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.
-
- ***************
- *** 19,44 ****
- // the grid.
- long n;
- ! std::cout << "Please enter the field size: ";
- ! std::cin >> n;
-
- // Specify the fields' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<1> J(1,n-2);
-
- // Specify the fields' mesh, i.e., its spatial extent, and its
- ! // centering type.
- ! DomainLayout<2> layout(vertDomain);
- ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-
- // Create the fields.
- // The template parameters indicate a mesh, a 'double' element
- ! // type, and ordinary 'Brick' storage.
- ! Field<UniformRectilinearMesh<2>, double, Brick> a(cell, layout, mesh);
- ! Field<UniformRectilinearMesh<2>, double, Brick> b(cell, layout, mesh);
-
- // Set up the initial conditions.
- --- 20,45 ----
- // the grid.
- long n;
- ! std::cout << "Please enter the field size: ";
- ! std::cin >> n;
-
- // Specify the fields' domains [0,n) x [0,n).
- ! Interval<1> N(0, n-1);
- ! Interval<2> vertDomain(N, N);
-
- // Set up interior domains [1,n-1) x [1,n-1) for computation.
- ! Interval<1> I(1,n-2);
- ! Interval<1> J(1,n-2);
-
- // Specify the fields' mesh, i.e., its spatial extent, and its
- ! // centering type. <co id="tutorial-field_parallel-doof2d-mesh"></co>
- ! DomainLayout<2> layout(vertDomain);
- ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0));
- ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim);
-
- // Create the fields.
- // The template parameters indicate a mesh, a 'double' element
- ! // type, and ordinary 'Brick' storage. <co id="tutorial-field_parallel-doof2d-field_creation"></co>
- ! Field<UniformRectilinearMesh<2>, double, Brick> a(cell, layout, mesh);
- ! Field<UniformRectilinearMesh<2>, double, Brick> b(cell, layout, mesh);
-
- // Set up the initial conditions.
- ***************
- *** 51,56 ****
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a.
- a(I,J) = weight *
- (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) +
- --- 52,57 ----
-
- // Perform the simulation.
- ! for (int k = 0; k < nuIterations; ++k) {
- ! // Read from b. Write to a. <co id="tutorial-field_parallel-doof2d-first_write"></co>
- a(I,J) = weight *
- (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) +
- ***************
- *** 67,71 ****
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The fields are automatically deallocated.
- --- 68,72 ----
- // Print out the final central value.
- Pooma::blockAndEvaluate(); // Ensure all computation has finished.
- ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;
-
- // The fields are automatically deallocated.
- ***************
- *** 75,76 ****
- --- 76,78 ----
- return EXIT_SUCCESS;
- }
- + </programlisting>
--- 0 ----
Index: programs/makefile
===================================================================
RCS file: makefile
diff -N makefile
*** /tmp/cvsfaiLlD Fri Jan 4 10:14:11 2002
--- /dev/null Fri Mar 23 21:37:44 2001
***************
*** 1,12 ****
- ### Oldham, Jeffrey D.
- ### 2001Nov27
- ### Pooma
- ###
- ### Produce Annotated Source Code
-
- all: Doof2d-C-element-annotated.cpp Doof2d-Array-element-annotated.cpp \
- Doof2d-Array-parallel-annotated.cpp Doof2d-Array-stencil-annotated.cpp \
- Doof2d-Array-distributed-annotated.cpp
-
- %-annotated.cpp: %-annotated.patch %.cpp
- patch -o $@ < $<
--- 0 ----
More information about the pooma-dev
mailing list