From oldham at codesourcery.com Mon Dec 3 20:39:43 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Mon, 3 Dec 2001 12:39:43 -0800 Subject: Patch: Fix Typos and Add Missing Semicolons Message-ID: <20011203123943.A16508@codesourcery.com> This patch fixes mainly typographical errors I found in the Pooma source code. Additionally, there are two more significant revisions: 1) Ended a statement with a semicolon in src/Tulip/CollectFromContexts.h. I do not understand why this was not previously found. 2) Discovered a "hidden" paragraph in docs/Layout.html.* Should this paragraph really be present? Presumably, this documentation will soon disappear, but it would be nice to get it correct in case it does survives. 2001-11-29 Jeffrey D. Oldham * benchmarks/Doof2d/Doof2d.h: Fix typo in comment. (*::initializeStorage): Generalize comment to work for any container. * docs/Layout.html: Fix numerous typographical and speling errors. Add missing quotation mark which caused entire paragraph describing tags to not be displayed. * src/Engine/IndexFunctionEngine.h (Engine >): Fix typo in introductory comment. * src/Engine/NotifyEngineWrite.h: Refill comment. * src/Engine/UserFunction.h: Use correct article in comment. * src/Field/Mesh/UniformRectilinearMesh.h: End sentence with a period in comment. * src/Layout/UniformGridLayout.cpp (UniformGridLayoutData::repartition): Fix typo in introductory comment. * src/Layout/UniformGridLayout.h (UniformGridLayout): Remove extra comment characters. * src/Partition/DistributedMapper.h (DistributedMapper::map): Capitalize beginning of sentence in comment. * src/Partition/GridPartition.h (GridPartition): Add spaces between comment symbols and text. Fix typos and spelling mistakes. * src/Partition/UniformGridPartition.h (UniformGridPartition): Properly indent the code. * src/Tulip/CollectFromContexts.h (Serialize >::size): Add missing semicolon at statement's end. * src/Utilities/Benchmark.h: Fix typo in introductory comment. * src/Utilities/Conform.h: Change word in introductory comment. * src/Utilities/ModelElement.h: Fix incorrect introductory comment. * src/Utilities/Statistics.h: Fix incorrect introductory comment. * src/Utilities/WrappedInt.h: Fix spelling mistake in introductory comment. Tested on Linux with g++ 3.1 by compiling Pooma library and Doof2d benchmark Approved by Mark Mitchell Applied to mainline Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: benchmarks/Doof2d/Doof2d.h =================================================================== RCS file: /home/pooma/Repository/r2/benchmarks/Doof2d/Doof2d.h,v retrieving revision 1.3 diff -c -p -r1.3 Doof2d.h *** benchmarks/Doof2d/Doof2d.h 2001/10/16 18:26:09 1.3 --- benchmarks/Doof2d/Doof2d.h 2001/11/30 04:21:07 *************** *** 28,34 **** //----------------------------------------------------------------------------- // Classes Doof2dCppTran, Doof2dP2, DoofNinePt, Doof2dOpt ! // Implementation Classes Doof2dStorage, Doof2dBase, //----------------------------------------------------------------------------- #ifndef POOMA_BENCHMARKS_DOOF2D_H --- 28,34 ---- //----------------------------------------------------------------------------- // Classes Doof2dCppTran, Doof2dP2, DoofNinePt, Doof2dOpt ! // Implementation Classes Doof2dStorage, Doof2dBase //----------------------------------------------------------------------------- #ifndef POOMA_BENCHMARKS_DOOF2D_H *************** public: *** 119,125 **** void initializeStorage(int &n, int np, int ng) { ! // Get new array domain, including "guards". Interval<1> N(1, n); Interval<2> vertDomain(N, N); --- 119,125 ---- void initializeStorage(int &n, int np, int ng) { ! // Create the domain, including "guards". Interval<1> N(1, n); Interval<2> vertDomain(N, N); *************** public: *** 140,146 **** void initializeStorage(int &n, int np, int ng) { ! // Get new array domain, including "guards". Interval<1> N(1, n); Interval<2> vertDomain(N, N); --- 140,146 ---- void initializeStorage(int &n, int np, int ng) { ! // Create the domain, including "guards". Interval<1> N(1, n); Interval<2> vertDomain(N, N); *************** public: *** 169,175 **** { n = (n / np) * np; ! // Get new array domain. Interval<1> N(1, n); Interval<2> newDomain(N, N); --- 169,175 ---- { n = (n / np) * np; ! // Create the domain. Interval<1> N(1, n); Interval<2> newDomain(N, N); *************** public: *** 208,214 **** { n = (n / np) * np; ! // Get new array domain. Interval<1> N(1, n); Interval<2> newDomain(N, N); --- 208,214 ---- { n = (n / np) * np; ! // Create the domain. Interval<1> N(1, n); Interval<2> newDomain(N, N); Index: docs/Layout.html =================================================================== RCS file: /home/pooma/Repository/r2/docs/Layout.html,v retrieving revision 1.1 diff -c -p -r1.1 Layout.html *** docs/Layout.html 2001/03/19 16:11:13 1.1 --- docs/Layout.html 2001/11/30 04:22:44 *************** guard layer thicknesses. External guard *** 59,80 **** specified by the internal guard layer for those patches that make up the edges of the data object.

There are two types of GuardLayers used by a MultiPatchEngine based ! Pooma data object. These are internal guard Layers, and external guard ! layers. Each data patch that makes up POOMA data object has it's domain ! extended by the number of data elements specified by the GuardLayers object. ! Internal guard layers are layers of elements or cells added to the upper and/or lower end of the sub-domain of the PatchEngine. These cells are to be filled with the data values of the adjacent patches, in order to minimize cross context data dependency of execution of expressions for each PatchEngine. In general it is an error to specify a internal guard layer that is larger than the patch dimension. This is especially important to remember when dealing with SparseTileLayout and GridLayout. !

External guard layers are on the edge of the entire data object. Used for external boundary conditions. In this diagram, the external guard layers are indicated by 'egl'. External guard layers data allocation exists only for those PatchEngines that are on the edges of the full domain of the Pooma data object. The data elements that make up the external guard layers ! are generaly used for external boundary conditions.

There are two types of GuardLayers used by a MultiPatchEngine based ! Pooma data object. These are internal guard layers and external guard ! layers. Each data patch of a POOMA data object has its domain ! extended by the number of data elements specified by the GuardLayers object. !

Internal guard layers are layers of elements or cells added to the upper and/or lower end of the sub-domain of the PatchEngine. These cells are to be filled with the data values of the adjacent patches, in order to minimize cross context data dependency of execution of expressions for each PatchEngine. In general it is an error to specify a internal guard layer that is larger than the patch dimension. This is especially important to remember when dealing with SparseTileLayout and GridLayout. !

External guard layers are on the edge of the entire data object and are used for external boundary conditions. In this diagram, the external guard layers are indicated by 'egl'. External guard layers data allocation exists only for those PatchEngines that are on the edges of the full domain of the Pooma data object. The data elements that make up the external guard layers ! are generally used for external boundary conditions.

In this example, the GuardLayers specification is symmetric, and the size of the internal and external guard layers are both 2. In general, GuardLayers may be specified asymmetrically, for instance: *************** regions for a UniformGridLayout with asy *** 101,107 ****
Interval<2> dom(Interval<1>(0,14),Interval<1>(0,9));
UniformGridPartition<2> asypart(Loc<2>(3,2),igl,egl);
Array<2,double,MultiPatchEngine<UniformTag,CompressibleBrick> ! > Aarray(UniformGridLayout<2>(dom,asypart,ReplicatedTag()); 2x3 patch asymetric guard cell

--- 101,107 ----
Interval<2> dom(Interval<1>(0,14),Interval<1>(0,9));
UniformGridPartition<2> asypart(Loc<2>(3,2),igl,egl);
Array<2,double,MultiPatchEngine<UniformTag,CompressibleBrick> ! > Array(UniformGridLayout<2>(dom,asypart,ReplicatedTag());

************* or overlap onto the data space of patche * 129,151

DynamicLayout !
An inherently 1 dimensional Layout, that allows the patches to be resized.

DomainLayout<Dim>
A single patch domain defined by a single Interval.
!

! Partitioners:

Layouts have Partitioners that are invoked to generate the !
patch-subdomains, taking into account internal and external GuardLayers.

The currently available Partitioners are

UniformGridPartitioner<Dim> !
Used to generate patches for a UniformGirdLayout.
!

GirdPartitioner<Dim> !
Used to generate patches for a GridLayout

TilePartition<Dim>
Generates patches from a provided list of domains.

SpatialPartition<ReferenceLayout> --- 129,152 ----

DynamicLayout !
An inherently 1-dimensional Layout, that allows the patches to be resized.

DomainLayout<Dim>
A single patch domain defined by a single Interval.
!

UniformGridLayout, GridLayout and SparseTileLayout

The UniformGridLayout, GridLayout, and SparseTileLayout all have a trailing tag argument on most of their constructors that specifies how the data is ContextMappered. These tags are ReplicatedTag and DistributedTag. If ReplicatedTag is specified, then LocalMapper is used, while if DistributedTag is specified, then the DistributedMapper is used. All of the aforementioned layouts have the default constructor that doesn't require the use of the ReplicatedTag or DistributedTag. A constructor having the form (Domain,Partitioner, ContextMapper) doesn't require the use of the trailing tags since it explicitly specifies a ContextMapper.
! !

Partitioners:

Layouts have Partitioners that are invoked to generate the ! patch-subdomains, taking into account internal and external GuardLayers.

The currently available Partitioners are

UniformGridPartitioner<Dim> !
Used to generate patches for a UniformGridLayout.
!

GridPartitioner<Dim> !
Used to generate patches for a GridLayout.

TilePartition<Dim>
Generates patches from a provided list of domains.

SpatialPartition<ReferenceLayout> *************** contexts with an approximately minimum s *** 181,189 ****

ContiguousMapper !
Assigns patches to a context in a modified Fortran storage order: as the index gets to an boundary, the lowest axis index decrements, rather ! than going from LowIndex = IndexMax to LowIndex = 0; See the figure for an illustration of this mapper.

contiguous mapper
--- 182,190 ----

ContiguousMapper !
Assigns patches to a context in a modified Fortran storage order: As the index gets to an boundary, the lowest axis index decrements, rather ! than going from LowIndex = IndexMax to LowIndex = 0; see the figure for an illustration of this mapper.

contiguous mapper
*************** patch engines must not be distributed. *** 207,212 ****

Similarly, MultiPatchEngine may not be constructed with the PatchEngine tag specified as Remote<PatchEngine> if LocalMapper is specified in the Layout used to construct the MultiPatchEngine. Either of the aforementioned ! combinations will generate a PInsist error inside MultiPatchEngine. Index: src/Engine/IndexFunctionEngine.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Engine/IndexFunctionEngine.h,v retrieving revision 1.23 diff -c -p -r1.23 IndexFunctionEngine.h *** src/Engine/IndexFunctionEngine.h 2001/08/30 01:15:04 1.23 --- src/Engine/IndexFunctionEngine.h 2001/11/30 04:22:46 *************** struct IndexFunctionView *** 87,93 **** // // Typedefs for the tag, element types, domain and dimensions. // Operator() with integers to evaluate elements quickly. ! // Operator() with a doman to subset. // Accessor for the domain. //----------------------------------------------------------------------------- --- 87,93 ---- // // Typedefs for the tag, element types, domain and dimensions. // Operator() with integers to evaluate elements quickly. ! // Operator() with a domain to subset. // Accessor for the domain. //----------------------------------------------------------------------------- Index: src/Engine/NotifyEngineWrite.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Engine/NotifyEngineWrite.h,v retrieving revision 1.6 diff -c -p -r1.6 NotifyEngineWrite.h *** src/Engine/NotifyEngineWrite.h 2000/05/10 05:07:37 1.6 --- src/Engine/NotifyEngineWrite.h 2001/11/30 04:22:46 *************** *** 42,49 **** //----------------------------------------------------------------------------- // Overview: // ! // NotifyEngineWrite is a general wrapper class the is used to tell an engine ! // that we're going to write to it. //----------------------------------------------------------------------------- //----------------------------------------------------------------------------- --- 42,49 ---- //----------------------------------------------------------------------------- // Overview: // ! // NotifyEngineWrite is a general wrapper class that is used to tell ! // an engine that we're going to write to it. //----------------------------------------------------------------------------- //----------------------------------------------------------------------------- Index: src/Engine/UserFunction.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Engine/UserFunction.h,v retrieving revision 1.28 diff -c -p -r1.28 UserFunction.h *** src/Engine/UserFunction.h 2001/09/14 22:37:57 1.28 --- src/Engine/UserFunction.h 2001/11/30 04:22:46 *************** template class *** 100,106 **** // inherits from UserFunction below. // // Expression: The type of the expression to which the function ! // is being applied. This should be a Array. // //----------------------------------------------------------------------------- --- 100,106 ---- // inherits from UserFunction below. // // Expression: The type of the expression to which the function ! // is being applied. This should be an Array. // //----------------------------------------------------------------------------- Index: src/Field/Mesh/UniformRectilinearMesh.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Field/Mesh/UniformRectilinearMesh.h,v retrieving revision 1.2 diff -c -p -r1.2 UniformRectilinearMesh.h *** src/Field/Mesh/UniformRectilinearMesh.h 2001/09/20 22:07:32 1.2 --- src/Field/Mesh/UniformRectilinearMesh.h 2001/11/30 04:22:46 *************** *** 47,57 **** #include "Field/FieldEngine/FieldEnginePatch.h" // Used in ctors #include "Field/Mesh/NoMesh.h" // Base class #include "Field/FieldCentering.h" // Centering inline ! #include "Tiny/Vector.h" // Class member //----------------------------------------------------------------------------- // Holds the data for a uniform rectilinear mesh. That class has a ref-counted ! // instance of this class //----------------------------------------------------------------------------- template --- 47,57 ---- #include "Field/FieldEngine/FieldEnginePatch.h" // Used in ctors #include "Field/Mesh/NoMesh.h" // Base class #include "Field/FieldCentering.h" // Centering inline ! #include "Tiny/Vector.h" // Class member //----------------------------------------------------------------------------- // Holds the data for a uniform rectilinear mesh. That class has a ref-counted ! // instance of this class. //----------------------------------------------------------------------------- template Index: src/Layout/UniformGridLayout.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Layout/UniformGridLayout.cpp,v retrieving revision 1.36 diff -c -p -r1.36 UniformGridLayout.cpp *** src/Layout/UniformGridLayout.cpp 2001/04/18 02:19:09 1.36 --- src/Layout/UniformGridLayout.cpp 2001/11/30 04:22:47 *************** void UniformGridLayoutData::calcGCF *** 435,441 **** // Repartition the layout using a new Partitioner scheme. The initial // domain lists are cleared out, the partitioner is invoked, and then // all the observers are notified. This can only be done with a ! // GridParition partitioner. // //----------------------------------------------------------------------------- --- 435,441 ---- // Repartition the layout using a new Partitioner scheme. The initial // domain lists are cleared out, the partitioner is invoked, and then // all the observers are notified. This can only be done with a ! // GridPartition partitioner. // //----------------------------------------------------------------------------- Index: src/Layout/UniformGridLayout.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Layout/UniformGridLayout.h,v retrieving revision 1.83 diff -c -p -r1.83 UniformGridLayout.h *** src/Layout/UniformGridLayout.h 2001/06/05 18:42:12 1.83 --- src/Layout/UniformGridLayout.h 2001/11/30 04:22:47 *************** public: *** 1030,1037 **** typedef UniformGridLayout This_t; // for convenience typedef Observable Observable_t; ! // // Iterator through nodes. Basically the same as the vector iterator ! // // except it dereferences automatically. typedef DerefIterator iterator; typedef ConstDerefIterator const_iterator; --- 1030,1037 ---- typedef UniformGridLayout This_t; // for convenience typedef Observable Observable_t; ! // Iterator through nodes. Basically the same as the vector iterator ! // except it dereferences automatically. typedef DerefIterator iterator; typedef ConstDerefIterator const_iterator; Index: src/Partition/DistributedMapper.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Partition/DistributedMapper.h,v retrieving revision 1.7 diff -c -p -r1.7 DistributedMapper.h *** src/Partition/DistributedMapper.h 2001/10/15 17:34:31 1.7 --- src/Partition/DistributedMapper.h 2001/11/30 04:22:48 *************** public: *** 63,69 **** { int ncontexts = Pooma::contexts(); int npc = templist.size()/ncontexts; ! // if there are more contexts then patches, assign one // patch per context for as many patches as there are. if(ncontexts> templist.size()) { --- 63,69 ---- { int ncontexts = Pooma::contexts(); int npc = templist.size()/ncontexts; ! // If there are more contexts than patches, assign one // patch per context for as many patches as there are. if(ncontexts> templist.size()) { Index: src/Partition/GridPartition.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Partition/GridPartition.h,v retrieving revision 1.28 diff -c -p -r1.28 GridPartition.h *** src/Partition/GridPartition.h 2001/06/28 19:08:11 1.28 --- src/Partition/GridPartition.h 2001/11/30 04:22:48 *************** class UniformGridPartition; *** 88,126 **** // sub-domain specifications along each axis, or any of the specifiers used // for the UniformGridPartition. // ! // GridPartition inherets from UniformGridPartition // // A GridPartition object is constructed with the following information: // GridPartition() // Creates one partition, with no guard cells // // GridPartition(const Loc &n, int p=-1) ! //Creates n[i] blocks along each i'th dimension // // GridPartition(const Loc &n, // const GuardLayers &gcs) ! //Same as above, with internal and external guard cell sizes set to gcs. // // // GridPartition(const Loc &n, // const GuardLayers &igcs, // const GuardLayers &egcs) ! //Same as above, with internal and external guard cell sizes specified ! //independently. // // GridPartition(const Grid &g) ! //Partitions according to the Grid object. // // GridPartition(const Grid &g, // const GuardLayers &gcs) ! //Same as above, with internal and external guard cell sizes set to gcs. // // // GridPartition(const Grid &g, // const GuardLayers &igcs, // const GuardLayers &egcs) ! //Same as above, with internal and external guard cell sizes specified ! //independently. //------------------------------------------------------------------------- template --- 88,126 ---- // sub-domain specifications along each axis, or any of the specifiers used // for the UniformGridPartition. // ! // GridPartition inherits from UniformGridPartition. // // A GridPartition object is constructed with the following information: // GridPartition() // Creates one partition, with no guard cells // // GridPartition(const Loc &n, int p=-1) ! // Creates n[i] blocks along each i'th dimension // // GridPartition(const Loc &n, // const GuardLayers &gcs) ! // Same as above, with internal and external guard cell sizes set to gcs. // // // GridPartition(const Loc &n, // const GuardLayers &igcs, // const GuardLayers &egcs) ! // Same as above, with internal and external guard cell sizes specified ! // independently. // // GridPartition(const Grid &g) ! // Partitions according to the Grid object. // // GridPartition(const Grid &g, // const GuardLayers &gcs) ! // Same as above, with internal and external guard cell sizes set to gcs. // // // GridPartition(const Grid &g, // const GuardLayers &igcs, // const GuardLayers &egcs) ! // Same as above, with internal and external guard cell sizes specified ! // independently. //------------------------------------------------------------------------- template Index: src/Partition/UniformGridPartition.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Partition/UniformGridPartition.h,v retrieving revision 1.26 diff -c -p -r1.26 UniformGridPartition.h *** src/Partition/UniformGridPartition.h 2001/10/10 00:16:03 1.26 --- src/Partition/UniformGridPartition.h 2001/11/30 04:22:48 *************** public: *** 256,468 **** return hasGuards_m; } ! bool hasInternalGuards() const ! { ! return hasGuards_m && internalGuards_m != 0; ! } ! bool hasExternalGuards() const ! { ! return hasGuards_m && externalGuards_m != 0; ! } ! const GuardLayers &internalGuards() const ! { ! return internalGuards_m; ! } ! const GuardLayers &externalGuards() const ! { ! return externalGuards_m; ! } ! //============================================================ ! // Partition methods ! //============================================================ ! // For the given global domain, partition it into subdomains and put ! // the results in the provided layout object by calling ! // 'layoutData.addDomainList(List_t &templist)'. Return the ! // total number of subdomains added. ! template ! int partition(const D &domain, ! List_t & all, ! const ContextMapper& cmapper) const ! { ! // The type info for domain we should be creating for the layout. ! typedef typename DomainTraits::Element_t Element_t; ! // Make sure we have the right dimensionality. ! CTAssert(Dim == DomainTraits::dimensions); ! CTAssert(Dim == DomainTraits::dimensions); ! // This will only work with UnitStride domains ! CTAssert(DomainTraits::unitStride == 1); ! CTAssert(DomainTraits::unitStride == 1); ! // make sure the list is empty ! PAssert(all.size() == 0); ! // Cache the origin of the domain and make sure the domain is ! // properly sized. Also, build a domain corresponding to the ! // number of blocks in each direction for iterating over below. ! Element_t origin[Dim]; ! Element_t sizes[Dim]; ! Interval bdomain = Pooma::NoInit(); // dummy initializer ! int i; ! for (i = 0; i < Dim; ++i) ! { ! if (!domain.empty()) ! { ! int gcwidth = ! (internalGuards_m.lower(i) > internalGuards_m.upper(i)) ? ! internalGuards_m.lower(i) : internalGuards_m.upper(i); ! ! PInsist((domain[i].length() % blocks()[i].first()) == 0, ! "All the blocks in a grid must be the same size."); ! ! origin[i] = domain[i].first(); ! sizes[i] = domain[i].length() / blocks()[i].first(); ! ! PInsist(sizes[i] >= gcwidth, ! "Block sizes too small for guard layer specification."); ! } ! bdomain[i] = Interval<1>(blocks()[i].first()); ! } ! ! // Loop over all the blocks, creating new domains. ! ! typename Interval::const_iterator it = bdomain.begin(); ! while (it != bdomain.end()) ! { ! // Start with an initially empty domain and empty guard cells. ! Domain_t owned; ! GuardLayers iguards(0); ! GuardLayers eguards(0); ! ! // Calculate the subdomain, if the global domain is not empty. ! // If it is, we just sue the empty domain. ! ! if (!domain.empty()) ! { ! Loc pos = *it; ! for (i = 0; i < Dim; ++i) ! { ! int position = pos[i].first(); ! Element_t a = origin[i] + sizes[i]*position; ! Element_t b = a + sizes[i] - 1; ! typedef typename ! DomainTraits::OneDomain_t OneDomain_t; ! owned[i] = OneDomain_t(a, b); ! } ! // Calculate the internal and external guard layer specifications ! // for this domain. ! if (hasGuards_m) ! { ! iguards = internalGuards_m; ! // Check if we're at an edge, and if so use the ! // external specfication for that edge. ! for (int d = 0; d < Dim; ++d) ! { ! int position = pos[d].first(); ! if ( position == bdomain[d].first() ) ! { ! eguards.lower(d) = externalGuards_m.lower(d); ! iguards.lower(d) = 0; ! } ! if ( position == bdomain[d].last() ) ! { ! eguards.upper(d) = externalGuards_m.upper(d); ! iguards.upper(d) = 0; ! } ! } ! } ! } ! typename Value_t::ID_t gid = all.size(); ! typename Value_t::ID_t lid = (-1); ! // Create a new Node object to store the subdomain data. ! GuardLayers::addGuardLayers(owned,eguards); ! Domain_t allocated = owned; ! GuardLayers::addGuardLayers(allocated,iguards); ! Value_t *node = new Value_t(owned, allocated, -1, gid, lid); ! all.push_back(node); ! // Increment our counters and iterators. ! ++it; ! } ! cmapper.map(all); ! // At the end, return # of domains created. ! return num_m; ! } ! template ! int partition(const D &domain, List_t & list) const ! { ! return partition(domain,list,DefaultMapper_t(*this)); ! } protected: ! // The number of blocks along each dimension. ! Loc blocks_m; ! // Do we have guard layers? ! bool hasGuards_m; ! // Are the external guards different from the internal? ! bool hasCustomEdgeGuards_m; ! // Specification of internal guard layers. ! GuardLayers internalGuards_m; ! // Specification of external guard layers. ! GuardLayers externalGuards_m; ! // The total number of blocks to create. ! int num_m; ! // Calculate num_m from blocks_m: ! void calcNum() ! { ! num_m = blocks_m[0].first(); ! for (int d = 1; d < Dim; ++d) ! { ! num_m *= blocks_m[d].first(); ! } ! } }; --- 256,468 ---- return hasGuards_m; } ! bool hasInternalGuards() const ! { ! return hasGuards_m && internalGuards_m != 0; ! } ! bool hasExternalGuards() const ! { ! return hasGuards_m && externalGuards_m != 0; ! } ! const GuardLayers &internalGuards() const ! { ! return internalGuards_m; ! } ! const GuardLayers &externalGuards() const ! { ! return externalGuards_m; ! } ! //============================================================ ! // Partition methods ! //============================================================ ! // For the given global domain, partition it into subdomains and put ! // the results in the provided layout object by calling ! // 'layoutData.addDomainList(List_t &templist)'. Return the ! // total number of subdomains added. ! template ! int partition(const D &domain, ! List_t & all, ! const ContextMapper& cmapper) const ! { ! // The type info for domain we should be creating for the layout. ! typedef typename DomainTraits::Element_t Element_t; ! // Make sure we have the right dimensionality. ! CTAssert(Dim == DomainTraits::dimensions); ! CTAssert(Dim == DomainTraits::dimensions); ! // This will only work with UnitStride domains ! CTAssert(DomainTraits::unitStride == 1); ! CTAssert(DomainTraits::unitStride == 1); ! // make sure the list is empty ! PAssert(all.size() == 0); ! // Cache the origin of the domain and make sure the domain is ! // properly sized. Also, build a domain corresponding to the ! // number of blocks in each direction for iterating over below. ! Element_t origin[Dim]; ! Element_t sizes[Dim]; ! Interval bdomain = Pooma::NoInit(); // dummy initializer ! int i; ! for (i = 0; i < Dim; ++i) ! { ! if (!domain.empty()) ! { ! int gcwidth = ! (internalGuards_m.lower(i) > internalGuards_m.upper(i)) ? ! internalGuards_m.lower(i) : internalGuards_m.upper(i); ! ! PInsist((domain[i].length() % blocks()[i].first()) == 0, ! "All the blocks in a grid must be the same size."); ! ! origin[i] = domain[i].first(); ! sizes[i] = domain[i].length() / blocks()[i].first(); ! ! PInsist(sizes[i] >= gcwidth, ! "Block sizes too small for guard layer specification."); ! } ! bdomain[i] = Interval<1>(blocks()[i].first()); ! } ! ! // Loop over all the blocks, creating new domains. ! ! typename Interval::const_iterator it = bdomain.begin(); ! while (it != bdomain.end()) ! { ! // Start with an initially empty domain and empty guard cells. ! Domain_t owned; ! GuardLayers iguards(0); ! GuardLayers eguards(0); ! ! // Calculate the subdomain, if the global domain is not empty. ! // If it is, we just use the empty domain. ! ! if (!domain.empty()) ! { ! Loc pos = *it; ! for (i = 0; i < Dim; ++i) ! { ! int position = pos[i].first(); ! Element_t a = origin[i] + sizes[i]*position; ! Element_t b = a + sizes[i] - 1; ! typedef typename ! DomainTraits::OneDomain_t OneDomain_t; ! owned[i] = OneDomain_t(a, b); ! } ! // Calculate the internal and external guard layer specifications ! // for this domain. ! if (hasGuards_m) ! { ! iguards = internalGuards_m; ! // Check if we're at an edge, and if so use the ! // external specfication for that edge. ! for (int d = 0; d < Dim; ++d) ! { ! int position = pos[d].first(); ! if ( position == bdomain[d].first() ) ! { ! eguards.lower(d) = externalGuards_m.lower(d); ! iguards.lower(d) = 0; ! } ! if ( position == bdomain[d].last() ) ! { ! eguards.upper(d) = externalGuards_m.upper(d); ! iguards.upper(d) = 0; ! } ! } ! } ! } ! typename Value_t::ID_t gid = all.size(); ! typename Value_t::ID_t lid = (-1); ! // Create a new Node object to store the subdomain data. ! GuardLayers::addGuardLayers(owned,eguards); ! Domain_t allocated = owned; ! GuardLayers::addGuardLayers(allocated,iguards); ! Value_t *node = new Value_t(owned, allocated, -1, gid, lid); ! all.push_back(node); ! // Increment our counters and iterators. ! ++it; ! } ! cmapper.map(all); ! // At the end, return # of domains created. ! return num_m; ! } ! template ! int partition(const D &domain, List_t & list) const ! { ! return partition(domain,list,DefaultMapper_t(*this)); ! } protected: ! // The number of blocks along each dimension. ! Loc blocks_m; ! // Do we have guard layers? ! bool hasGuards_m; ! // Are the external guards different from the internal? ! bool hasCustomEdgeGuards_m; ! // Specification of internal guard layers. ! GuardLayers internalGuards_m; ! // Specification of external guard layers. ! GuardLayers externalGuards_m; ! // The total number of blocks to create. ! int num_m; ! // Calculate num_m from blocks_m: ! void calcNum() ! { ! num_m = blocks_m[0].first(); ! for (int d = 1; d < Dim; ++d) ! { ! num_m *= blocks_m[d].first(); ! } ! } }; Index: src/Tulip/CollectFromContexts.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Tulip/CollectFromContexts.h,v retrieving revision 1.1 diff -c -p -r1.1 CollectFromContexts.h *** src/Tulip/CollectFromContexts.h 2001/09/13 20:40:54 1.1 --- src/Tulip/CollectFromContexts.h 2001/11/30 04:22:48 *************** public: *** 126,132 **** static inline int size(const CollectionValue &v) { int nBytes = Serialize::size(v.valid()); ! nBytes += Serialize::size(v.context()) if (v.valid()) nBytes += Serialize::size(v.value()); --- 126,132 ---- static inline int size(const CollectionValue &v) { int nBytes = Serialize::size(v.valid()); ! nBytes += Serialize::size(v.context()); if (v.valid()) nBytes += Serialize::size(v.value()); Index: src/Utilities/Benchmark.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Utilities/Benchmark.h,v retrieving revision 1.28 diff -c -p -r1.28 Benchmark.h *** src/Utilities/Benchmark.h 2001/07/25 16:04:12 1.28 --- src/Utilities/Benchmark.h 2001/11/30 04:22:48 *************** *** 43,49 **** //----------------------------------------------------------------------------- ! // Implementaion provides a framework for implementing a benchmark in a // specific way. It is a virtual base class. Users must override almost // all of the member functions. //----------------------------------------------------------------------------- --- 43,49 ---- //----------------------------------------------------------------------------- ! // Implementation provides a framework for implementing a benchmark in a // specific way. It is a virtual base class. Users must override almost // all of the member functions. //----------------------------------------------------------------------------- Index: src/Utilities/Conform.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Utilities/Conform.h,v retrieving revision 1.2 diff -c -p -r1.2 Conform.h *** src/Utilities/Conform.h 2000/03/07 13:18:23 1.2 --- src/Utilities/Conform.h 2001/11/30 04:22:48 *************** *** 28,34 **** //----------------------------------------------------------------------------- // Overview: // A tag for checking whether the terms in an expression have ! // conformant domains. //----------------------------------------------------------------------------- #ifndef POOMA_UTILITIES_CONFORM_H --- 28,34 ---- //----------------------------------------------------------------------------- // Overview: // A tag for checking whether the terms in an expression have ! // conforming domains. //----------------------------------------------------------------------------- #ifndef POOMA_UTILITIES_CONFORM_H Index: src/Utilities/ModelElement.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Utilities/ModelElement.h,v retrieving revision 1.2 diff -c -p -r1.2 ModelElement.h *** src/Utilities/ModelElement.h 2000/03/07 13:18:25 1.2 --- src/Utilities/ModelElement.h 2001/11/30 04:22:48 *************** *** 37,43 **** //----------------------------------------------------------------------------- // Overview: // ! // ConstField : A read-only version of Field. //----------------------------------------------------------------------------- template --- 37,44 ---- //----------------------------------------------------------------------------- // Overview: // ! // ModelElement ! // A wrapper class used to differentiate overloaded functions. //----------------------------------------------------------------------------- template Index: src/Utilities/Statistics.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Utilities/Statistics.h,v retrieving revision 1.1 diff -c -p -r1.1 Statistics.h *** src/Utilities/Statistics.h 2000/04/12 00:30:06 1.1 --- src/Utilities/Statistics.h 2001/11/30 04:22:48 *************** *** 31,38 **** //----------------------------------------------------------------------------- // Classes: ! // Implementation ! // Benchmark //----------------------------------------------------------------------------- #include --- 31,38 ---- //----------------------------------------------------------------------------- // Classes: ! // Statistics ! // StatisticsData: helper class //----------------------------------------------------------------------------- #include Index: src/Utilities/WrappedInt.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Utilities/WrappedInt.h,v retrieving revision 1.10 diff -c -p -r1.10 WrappedInt.h *** src/Utilities/WrappedInt.h 2000/05/23 23:18:44 1.10 --- src/Utilities/WrappedInt.h 2001/11/30 04:22:48 *************** *** 40,46 **** // Helper class: WrappedInt // // A tag class templated on an integer. This class is intended to be ! // used to let you specialize a funtion on a compile time number. // // For example, if you have an object of type T which you want to pass // to a subroutine foo, but you want to specialize that subroutine based on --- 40,46 ---- // Helper class: WrappedInt // // A tag class templated on an integer. This class is intended to be ! // used to let you specialize a function on a compile time number. // // For example, if you have an object of type T which you want to pass // to a subroutine foo, but you want to specialize that subroutine based on From oldham at codesourcery.com Tue Dec 4 01:08:27 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Mon, 3 Dec 2001 17:08:27 -0800 Subject: Patch: Preliminary Pooma Manual Message-ID: <20011203170827.A17307@codesourcery.com> I have been working on a manual for Pooma2.4. These files will be part of the manual, mostly written using DocBook. A portion of the tutorial introduction chapter and some instructions regarding downloading and compiling have been written. The rest is an outline also containing notes to myself. Although these files are a work in progress, I wish to add these files to the Pooma source tree so others can look at them and so they will be backed up outside of Mountain View. 2001-Dec-03 Jeffrey D. Oldham * docs/manual/collateindex.pl: New file copied from DocBook distribution. * docs/manual/makefile: New file used to create PostScript and HTML versions of the manual. * docs/manual/outline.xml: New file containing the manual, written in DocBook. * docs/manual/figures/distributed.mp: New file containing MetaPost illustration showing the distributed computing concepts. * docs/manual/figures/doof2d.mp: New file containing MetaPost illustrations of the Doof2d tutorial implementations. * docs/manual/figures/makefile: New file used to create EPS figures from the MetaPost files. * docs/manual/programs/Doof2d-Array-distributed-annotated.patch: New file containing a patch adding callout annotations to Doof2d-Array-distributed.cpp. * docs/manual/programs/Doof2d-Array-element-annotated.patch: Analogous. * docs/manual/programs/Doof2d-Array-parallel-annotated.patch: Analogous. * docs/manual/programs/Doof2d-Array-stencil-annotated.patch: Analogous. * docs/manual/programs/Doof2d-C-element-annotated.patch: Analogous. * docs/manual/programs/makefile: New file containing instructions to create DocBook versions of programs. * examples/Manual/Doof2d/Doof2d-Array-distributed.cpp: New source code file containing a simple Doof2d implementation using Arrays and distributed computation. * examples/Manual/Doof2d/Doof2d-Array-element.cpp: Analogous. * examples/Manual/Doof2d/Doof2d-Array-parallel.cpp: Analogous. * examples/Manual/Doof2d/Doof2d-Array-stencil.cpp: Analogous. * examples/Manual/Doof2d/Doof2d-C-element.cpp: Analogous. * examples/Manual/Doof2d/Doof2d-Field-distributed.cpp: Analogous. * examples/Manual/Doof2d/Doof2d-Field-parallel.cpp: Analogous. * examples/Manual/Doof2d/include.mk: New file needed for compilation. * examples/Manual/Doof2d/makefile: New file containing instructions to compile simple Doof2d implementations. Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: docs/manual/collateindex.pl =================================================================== RCS file: collateindex.pl diff -N collateindex.pl *** /dev/null Fri Mar 23 21:37:44 2001 --- collateindex.pl Mon Dec 3 14:01:51 2001 *************** *** 0 **** --- 1,596 ---- + # -*- Perl -*- + # + # $Id: collateindex.pl,v 1.12 2000/01/27 15:07:15 nwalsh Exp $ + + use Getopt::Std; + + $usage = "Usage: $0 file + Where are: + -p Link to points in the document. The default is to link + to the closest containing section. + -g Group terms with IndexDiv based on the first letter + of the term (or its sortas attribute). + (This probably doesn't handle i10n particularly well) + -s name Name the IndexDiv that contains symbols. The default + is 'Symbols'. Meaningless if -g is not used. + -t name Title for the index. + -P file Read a preamble from file. The content of file will + be inserted before the tag. + -i id The ID for the tag. + -o file Output to file. Defaults to stdout. + -S scope Scope of the index, must be 'all', 'local', or 'global'. + If unspecified, 'all' is assumed. + -I scope The implied scope, must be 'all', 'local', or 'global'. + IndexTerms which do not specify a scope will have the + implied scope. If unspecified, 'all' is assumed. + -x Make a SetIndex. + -f Force the output file to be written, even if it appears + to have been edited by hand. + -N New index (generates an empty index file). + file The file containing index data generated by Jade + with the DocBook HTML Stylesheet.\n"; + + die $usage if ! getopts('Dfgi:NpP:s:o:S:I:t:x'); + + $linkpoints = $opt_p; + $lettergroups = $opt_g; + $symbolsname = $opt_s || "Symbols"; + $title = $opt_t; + $preamble = $opt_P; + $outfile = $opt_o || '-'; + $indexid = $opt_i; + $scope = uc($opt_S) || 'ALL'; + $impliedscope = uc($opt_I) || 'ALL'; + $setindex = $opt_x; + $forceoutput = $opt_f; + $newindex = $opt_N; + $debug = $opt_D; + + $indextag = $setindex ? 'setindex' : 'index'; + + if ($newindex) { + safe_open(*OUT, $outfile); + if ($indexid) { + print OUT "<$indextag id='$indexid'>\n\n"; + } else { + print OUT "<$indextag>\n\n"; + } + + print OUT "\n"; + print OUT "\n"; + + print OUT "\n"; + exit 0; + } + + $dat = shift @ARGV || die $usage; + die "$0: cannot find $dat.\n" if ! -f $dat; + + %legal_scopes = ('ALL' => 1, 'LOCAL' => 1, 'GLOBAL' => 1); + if ($scope && !$legal_scopes{$scope}) { + die "Invalid scope.\n$usage\n"; + } + if ($impliedscope && !$legal_scopes{$impliedscope}) { + die "Invalid implied scope.\n$usage\n"; + } + + @term = (); + %id = (); + + $termcount = 0; + + print STDERR "Processing $dat...\n"; + + # Read the index file, creating an array of objects. Each object + # represents and indexterm and has fields for the content of the + # indexterm + + open (F, $dat); + while () { + chop; + + if (/^\/indexterm/i) { + push (@term, $idx); + next; + } + + if (/^indexterm (.*)$/i) { + $termcount++; + $idx = {}; + $idx->{'zone'} = {}; + $idx->{'href'} = $1; + $idx->{'count'} = $termcount; + $idx->{'scope'} = $impliedscope; + next; + } + + if (/^indexpoint (.*)$/i) { + $idx->{'hrefpoint'} = $1; + next; + } + + if (/^title (.*)$/i) { + $idx->{'title'} = $1; + next; + } + + if (/^primary[\[ ](.*)$/i) { + if (/^primary\[(.*?)\] (.*)$/i) { + $idx->{'psortas'} = $1; + $idx->{'primary'} = $2; + } else { + $idx->{'psortas'} = $1; + $idx->{'primary'} = $1; + } + next; + } + + if (/^secondary[\[ ](.*)$/i) { + if (/^secondary\[(.*?)\] (.*)$/i) { + $idx->{'ssortas'} = $1; + $idx->{'secondary'} = $2; + } else { + $idx->{'ssortas'} = $1; + $idx->{'secondary'} = $1; + } + next; + } + + if (/^tertiary[\[ ](.*)$/i) { + if (/^tertiary\[(.*?)\] (.*)$/i) { + $idx->{'tsortas'} = $1; + $idx->{'tertiary'} = $2; + } else { + $idx->{'tsortas'} = $1; + $idx->{'tertiary'} = $1; + } + next; + } + + if (/^see (.*)$/i) { + $idx->{'see'} = $1; + next; + } + + if (/^seealso (.*)$/i) { + $idx->{'seealso'} = $1; + next; + } + + if (/^significance (.*)$/i) { + $idx->{'significance'} = $1; + next; + } + + if (/^class (.*)$/i) { + $idx->{'class'} = $1; + next; + } + + if (/^scope (.*)$/i) { + $idx->{'scope'} = uc($1); + next; + } + + if (/^startref (.*)$/i) { + $idx->{'startref'} = $1; + next; + } + + if (/^id (.*)$/i) { + $idx->{'id'} = $1; + $id{$1} = $idx; + next; + } + + if (/^zone (.*)$/i) { + my($href) = $1; + $_ = scalar(); + chop; + die "Bad zone: $_\n" if !/^title (.*)$/i; + $idx->{'zone'}->{$href} = $1; + next; + } + + die "Unrecognized: $_\n"; + } + close (F); + + print STDERR "$termcount entries loaded...\n"; + + # Fixup the startrefs... + # In DocBook, STARTREF is a #CONREF attribute; support this by copying + # all of the fields from the indexterm with the id specified by STARTREF + # to the indexterm that has the STARTREF. + foreach $idx (@term) { + my($ididx, $field); + if ($idx->{'startref'}) { + $ididx = $id{$idx->{'startref'}}; + foreach $field ('primary', 'secondary', 'tertiary', 'see', 'seealso', + 'psortas', 'ssortas', 'tsortas', 'significance', + 'class', 'scope') { + $idx->{$field} = $ididx->{$field}; + } + } + } + + # Sort the index terms + @term = sort termsort @term; + + # Move all of the non-alphabetic entries to the front of the index. + @term = sortsymbols(@term); + + safe_open(*OUT, $outfile); + + # Write the index... + if ($indexid) { + print OUT "<$indextag id='$indexid'>\n\n"; + } else { + print OUT "<$indextag>\n\n"; + } + + print OUT "\n"; + print OUT "\n"; + + print OUT "\n\n"; + + print OUT "$title\n\n" if $title; + + $last = {}; # the last indexterm we processed + $first = 1; # this is the first one + $group = ""; # we're not in a group yet + $lastout = ""; # we've not put anything out yet + + foreach $idx (@term) { + next if $idx->{'startref'}; # no way to represent spans... + next if ($idx->{'scope'} eq 'LOCAL') && ($scope eq 'GLOBAL'); + next if ($idx->{'scope'} eq 'GLOBAL') && ($scope eq 'LOCAL'); + next if &same($idx, $last); # suppress duplicates + + $termcount--; + + # If primary changes, output a whole new index term, otherwise just + # output another secondary or tertiary, as appropriate. We know from + # sorting that the terms will always be in the right order. + if (!&tsame($last, $idx, 'primary')) { + print "DIFF PRIM\n" if $debug; + &end_entry() if not $first; + + if ($lettergroups) { + # If we're grouping, make the right indexdivs + $letter = $idx->{'psortas'}; + $letter = $idx->{'primary'} if !$letter; + $letter = uc(substr($letter, 0, 1)); + + # symbols are a special case + if (($letter lt 'A') || ($letter gt 'Z')) { + if (($group eq '') + || (($group ge 'A') && ($group le 'Z'))) { + print OUT "\n" if !$first; + print OUT "$symbolsname\n\n"; + $group = $letter; + } + } elsif (($group eq '') || ($group ne $letter)) { + print OUT "\n" if !$first; + print OUT "$letter\n\n"; + $group = $letter; + } + } + + $first = 0; # there can only be on first ;-) + + print OUT "\n"; + print OUT " ", $idx->{'primary'}; + $lastout = "primaryie"; + + if ($idx->{'secondary'}) { + print OUT "\n \n"; + print OUT " ", $idx->{'secondary'}; + $lastout = "secondaryie"; + }; + + if ($idx->{'tertiary'}) { + print OUT "\n \n"; + print OUT " ", $idx->{'tertiary'}; + $lastout = "tertiaryie"; + } + } elsif (!&tsame($last, $idx, 'secondary')) { + print "DIFF SEC\n" if $debug; + + print OUT "\n \n" if $lastout; + + print OUT " ", $idx->{'secondary'}; + $lastout = "secondaryie"; + if ($idx->{'tertiary'}) { + print OUT "\n \n"; + print OUT " ", $idx->{'tertiary'}; + $lastout = "tertiaryie"; + } + } elsif (!&tsame($last, $idx, 'tertiary')) { + print "DIFF TERT\n" if $debug; + + print OUT "\n \n" if $lastout; + + if ($idx->{'tertiary'}) { + print OUT " ", $idx->{'tertiary'}; + $lastout = "tertiaryie"; + } + } + + &print_term($idx); + + $last = $idx; + } + + # Termcount is > 0 iff some entries were skipped. + print STDERR "$termcount entries ignored...\n"; + + &end_entry(); + + print OUT "\n" if $lettergroups; + print OUT "\n"; + + close (OUT); + + print STDERR "Done.\n"; + + sub same { + my($a) = shift; + my($b) = shift; + + my($aP) = $a->{'psortas'} || $a->{'primary'}; + my($aS) = $a->{'ssortas'} || $a->{'secondary'}; + my($aT) = $a->{'tsortas'} || $a->{'tertiary'}; + + my($bP) = $b->{'psortas'} || $b->{'primary'}; + my($bS) = $b->{'ssortas'} || $b->{'secondary'}; + my($bT) = $b->{'tsortas'} || $b->{'tertiary'}; + + my($same); + + $aP =~ s/^\s*//; $aP =~ s/\s*$//; $aP = uc($aP); + $aS =~ s/^\s*//; $aS =~ s/\s*$//; $aS = uc($aS); + $aT =~ s/^\s*//; $aT =~ s/\s*$//; $aT = uc($aT); + $bP =~ s/^\s*//; $bP =~ s/\s*$//; $bP = uc($bP); + $bS =~ s/^\s*//; $bS =~ s/\s*$//; $bS = uc($bS); + $bT =~ s/^\s*//; $bT =~ s/\s*$//; $bT = uc($bT); + + # print "[$aP]=[$bP]\n"; + # print "[$aS]=[$bS]\n"; + # print "[$aT]=[$bT]\n"; + + # Two index terms are the same if: + # 1. the primary, secondary, and tertiary entries are the same + # (or have the same SORTAS) + # AND + # 2. They occur in the same titled section + # AND + # 3. They point to the same place + # + # Notes: Scope is used to suppress some entries, but can't be used + # for comparing duplicates. + # Interpretation of "the same place" depends on whether or + # not $linkpoints is true. + + $same = (($aP eq $bP) + && ($aS eq $bS) + && ($aT eq $bT) + && ($a->{'title'} eq $b->{'title'}) + && ($a->{'href'} eq $b->{'href'})); + + # If we're linking to points, they're only the same if they link + # to exactly the same spot. (surely this is redundant?) + $same = $same && ($a->{'hrefpoint'} eq $b->{'hrefpoint'}) + if $linkpoints; + + $same; + } + + sub tsame { + # Unlike same(), tsame only compares a single term + my($a) = shift; + my($b) = shift; + my($term) = shift; + my($sterm) = substr($term, 0, 1) . "sortas"; + my($A, $B); + + $A = $a->{$sterm} || $a->{$term}; + $B = $b->{$sterm} || $b->{$term}; + + $A =~ s/^\s*//; $A =~ s/\s*$//; $A = uc($A); + $B =~ s/^\s*//; $B =~ s/\s*$//; $B = uc($B); + + return $A eq $B; + } + + sub end_entry { + # End any open elements... + print OUT "\n \n" if $lastout; + print OUT "\n\n"; + $lastout = ""; + } + + sub print_term { + # Print out the links for an indexterm. There can be more than + # one if the term has a ZONE that points to more than one place. + # (do we do the right thing in that case?) + my($idx) = shift; + my($key, $indent, @hrefs); + my(%href) = (); + my(%phref) = (); + + $indent = " "; + + if ($idx->{'see'}) { + # it'd be nice to make this a link... + if ($lastout) { + print OUT "\n \n"; + $lastout = ""; + } + print OUT $indent, "", $idx->{'see'}, "\n"; + return; + } + + if ($idx->{'seealso'}) { + # it'd be nice to make this a link... + if ($lastout) { + print OUT "\n \n"; + $lastout = ""; + } + print OUT $indent, "", $idx->{'seealso'}, "\n"; + return; + } + + if (keys %{$idx->{'zone'}}) { + foreach $key (keys %{$idx->{'zone'}}) { + $href{$key} = $idx->{'zone'}->{$key}; + $phref{$key} = $idx->{'zone'}->{$key}; + } + } else { + $href{$idx->{'href'}} = $idx->{'title'}; + $phref{$idx->{'href'}} = $idx->{'hrefpoint'}; + } + + # We can't use because we don't know the ID of the term in the + # original source (and, in fact, it might not have one). + print OUT ",\n"; + @hrefs = keys %href; + while (@hrefs) { + my($linkend) = ""; + my($role) = ""; + $key = shift @hrefs; + if ($linkpoints) { + $linkend = $phref{$key}; + } else { + $linkend = $key; + } + + $role = $linkend; + $role = $1 if $role =~ /\#(.*)$/; + + print OUT $indent; + print OUT ""; + print OUT "" if ($idx->{'significance'} eq 'PREFERRED'); + print OUT $href{$key}; + print OUT "" if ($idx->{'significance'} eq 'PREFERRED'); + print OUT ""; + } + } + + sub termsort { + my($aP) = $a->{'psortas'} || $a->{'primary'}; + my($aS) = $a->{'ssortas'} || $a->{'secondary'}; + my($aT) = $a->{'tsortas'} || $a->{'tertiary'}; + my($ap) = $a->{'count'}; + + my($bP) = $b->{'psortas'} || $b->{'primary'}; + my($bS) = $b->{'ssortas'} || $b->{'secondary'}; + my($bT) = $b->{'tsortas'} || $b->{'tertiary'}; + my($bp) = $b->{'count'}; + + $aP =~ s/^\s*//; $aP =~ s/\s*$//; $aP = uc($aP); + $aS =~ s/^\s*//; $aS =~ s/\s*$//; $aS = uc($aS); + $aT =~ s/^\s*//; $aT =~ s/\s*$//; $aT = uc($aT); + $bP =~ s/^\s*//; $bP =~ s/\s*$//; $bP = uc($bP); + $bS =~ s/^\s*//; $bS =~ s/\s*$//; $bS = uc($bS); + $bT =~ s/^\s*//; $bT =~ s/\s*$//; $bT = uc($bT); + + if ($aP eq $bP) { + if ($aS eq $bS) { + if ($aT eq $bT) { + # make sure seealso's always sort to the bottom + return 1 if ($a->{'seealso'}); + return -1 if ($b->{'seealso'}); + # if everything else is the same, keep these elements + # in document order (so the index links are in the right + # order) + return $ap <=> $bp; + } else { + return $aT cmp $bT; + } + } else { + return $aS cmp $bS; + } + } else { + return $aP cmp $bP; + } + } + + sub sortsymbols { + my(@term) = @_; + my(@new) = (); + my(@sym) = (); + my($letter); + my($idx); + + # Move the non-letter things to the front. Should digits be thier + # own group? Maybe... + foreach $idx (@term) { + $letter = $idx->{'psortas'}; + $letter = $idx->{'primary'} if !$letter; + $letter = uc(substr($letter, 0, 1)); + + if (($letter lt 'A') || ($letter gt 'Z')) { + push (@sym, $idx); + } else { + push (@new, $idx); + } + } + + return (@sym, @new); + } + + sub safe_open { + local(*OUT) = shift; + local(*F, $_); + + if (($outfile ne '-') && (!$forceoutput)) { + my($handedit) = 1; + if (open (OUT, $outfile)) { + while () { + if (//){ + $handedit = 0; + last; + } + } + close (OUT); + } else { + $handedit = 0; + } + + if ($handedit) { + print "\n$outfile appears to have been edited by hand; use -f or\n"; + print " change the output file.\n"; + exit 1; + } + } + + open (OUT, ">$outfile") || die "$usage\nCannot write to $outfile.\n"; + + if ($preamble) { + # Copy the preamble + if (open(F, $preamble)) { + while () { + print OUT $_; + } + close(F); + } else { + warn "$0: cannot open preamble $preamble.\n"; + } + } + } Index: docs/manual/makefile =================================================================== RCS file: makefile diff -N makefile *** /dev/null Fri Mar 23 21:37:44 2001 --- makefile Mon Dec 3 14:01:51 2001 *************** *** 0 **** --- 1,69 ---- + ### Oldham, Jeffrey D. + ### 1997 Dec 26 + ### misc + ### + ### LaTeX -> PostScript/PDF/WWW + ### XML -> TeX/DVI/PS/PDF + + # Definitions for PostScript and WWW Creation + TEX= latex + WWWHOMEDIR= /u/oldham/www + LATEX2HTML= latex2html + BASICLATEX2HTMLOPTIONS= -info "" -no_footnode -no_math -html_version 3.2,math + #LATEX2HTMLOPTIONS= -local_icons -split +1 $(BASICLATEX2HTMLOPTIONS) + LATEX2HTMLOPTIONS= -no_navigation -split 0 $(BASICLATEX2HTMLOPTIONS) + MPOST= mpost + + # Definitions for Jade. + JADEDIR= /usr/lib/sgml/stylesheets/docbook + PRINTDOCBOOKDSL= print/docbook.dsl + HTMLDOCBOOKDSL= html/docbook.dsl + XML= dtds/decls/xml.dcl + INDEXOPTIONS= -t 'Index' -i 'index' -g -p + + CXXFLAGS= -g -Wall -pedantic -W -Wstrict-prototypes -Wpointer-arith -Wbad-function-cast -Wcast-align -Wconversion -Wnested-externs -Wundef -Winline -static + + all: outline.ps + + %.all: %.ps %.pdf %.html + chmod 644 $*.ps $*.pdf + mv $*.ps $*.pdf $* + + %.dvi: %.ltx + $(TEX) $< + # bibtex $* + # $(TEX) $< + $(TEX) $< + + %.ps: %.dvi + dvips -t letter $< -o + + %.pdf.ltx: %.ltx + sed -e 's/^%\\usepackage{times}/\\usepackage{times}/' $< > $@ + + %.pdf: %.pdf.ps + ps2pdf $< $@ + + # This rule assumes index creation. + %.dvi: %.xml genindex.sgm + jade -D$(JADEDIR) -t sgml -d $(HTMLDOCBOOKDSL) -V html-index $(XML) $< + perl collateindex.pl $(INDEXOPTIONS) -o genindex.sgm HTML.index + jade -D$(JADEDIR) -t tex -d $(PRINTDOCBOOKDSL) $(XML) $< && jadetex $*.tex && jadetex $*.tex && jadetex $*.tex + + genindex.sgm: + perl collateindex.pl $(INDEXOPTIONS) -N -o $@ + + %.html: %.xml + jade -D$(JADEDIR) -t sgml -d $(HTMLDOCBOOKDSL) $(XML) $< + + %.pdf: %.xml + jade -D$(JADEDIR) -t tex -d $(PRINTDOCBOOKDSL) $(XML) $< && pdfjadetex $*.tex && pdfjadetex $*.tex + + mproof-%.ps: %.mp + declare -x TEX=latex && $(MPOST) $< && tex mproof.tex $*.[0-9]* && dvips mproof.dvi -o $@ + + %.txt: %.ltx + detex $< > $@ + + clean: + rm -f *.dvi *.aux *.log *.toc *.bak *.blg *.bbl *.glo *.idx *.lof *.lot *.htm *.mpx mpxerr.tex HTML.index outline.tex Index: docs/manual/outline.xml =================================================================== RCS file: outline.xml diff -N outline.xml *** /dev/null Fri Mar 23 21:37:44 2001 --- outline.xml Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,4287 ---- + + + + + + + + + + + + + + + + + + + C"> + + + C++"> + + + Cheetah" > + + Doof2d" > + + Make"> + + MM"> + + MPI"> + + PDToolkit"> + + PETE"> + + POOMA"> + + POOMA Toolkit"> + + Purify"> + + Smarts"> + + + STL"> + + Tau"> + + + + + Array"> + + Benchmark"> + + Brick"> + + CompressibleBrick"> + + DistributedTag"> + + Domain"> + + double"> + + DynamicArray"> + + Engine"> + + Field"> + + Interval"> + + Layout"> + + LeafFunctor"> + + MultiPatch"> + + ReplicatedTag"> + + Stencil"> + + Vector"> + + + + + + + + + g++"> + + KCC"> + + Linux"> + + + + + http://pooma.codesourcery.com/pooma/download'> + + + http://www.pooma.com/'> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ]> + + + + &pooma; + A &cc; Toolkit for High-Performance Parallel Scientific Computing + JeffreyD.Oldham + + CodeSourcery, LLC + + + 2001CodeSourcery, LLC () + Los Alamos National Laboratory + + + All rights reserved. This document may not be redistributed in any form without the express permission of the author. + + + + 0.01 + 2001 Nov 26 + jdo + first draft + + + + + + + + + Preface + + FINISH: Describe the target audience for &pooma; programs and + for this manual: &cc; programmers writing scientific code, possibly + parallel execution. + + Assume familiarity with &cc; template programming and the + standard template library. FIXME: Remove this index + entry.Oldham, + Jeffrey D. + +

+ Notation + + UNFINISHED +

+ + +

+ How to Read This &Book; + + FINISH: Write this section in a style similar to Lamport's + LaTeX section 1.2. FINISH: Fix the book title and the section + number. +

+ + +

+ Obtaining &pooma; and Sample Programs + + Available for free from what WWW site? Include what portions + of LICENSE? Be sure to + include CVS instructions as well. + + Which additional packages are necessary and when? + +

+ + +

+ Using and Modifying &pooma; + + &pooma; is available under open source license. It can be + used and modified by anyone, anywhere. Can it be sold? Include + LICENSE. + + QUESTION: How do developers contribute code? + +

+ + + + + + Programming with &pooma; + + + Introduction + + QUESTION: Add a partintro to the part above? + + &pooma; abbreviates Parallel Object-Oriented Methods + and Application. + + This document is an introduction to &pooma; v2.1, a &cc; + toolkit for high-performance scientific computation. &pooma; + runs efficiently on single-processor desktop machines, + shared-memory multiprocessors, and parallel supercomputers + containing dozens or hundreds of processors. What's more, by making + extensive use of the advanced features of the ANSI/ISO &cc; + standard—particularly templates—&pooma; presents a + compact, easy-to-read interface to its users. + + From Section of + papers/iscope98.pdf: + + Scientific software developers have struggled with the need + to express mathematical abstractions in an elegant and maintainable + way without sacrificing performance. The &pooma; (Parallel + Object-Oriented Methods and Applications) framework, written in + ANSI/ISO &cc;, has + demonstrated both high expressiveness and high performance for + large-scale scientific applications on platforms ranging from + workstations to massively parallel supercomputers. &pooma; provides + high-level abstractions for multidimensional arrays, physical + meshes, mathematical fields, and sets of particles. &pooma; also + exploits techniques such as expression templates to optimize serial + performance while encapsulating the details of parallel + communication and supporting block-based data compression. + Consequently, scientists can quickly assemble parallel simulation + codes by focusing directly on the physical abstractions relevant to + the system under study and not the technical difficulties of + parallel communication and machine-specific optimization. + + ADD: diagram of science and &pooma;. See the diagram that + Mark and I wrote. + + +

+ Evolution of &pooma; + + QUESTION: Is this interesting? Even if it is, it should be + short. + + The file papers/SCPaper-95.html + describes ?&pooma;1? and its abstraction layers. + + The "Introduction" of + papers/Siam0098.ps describes the DoE's + funding motivation for &pooma;: Accelerated Strategic Computing + Initiative (ASCI) and Science-based Stockpile Stewardship (SBSS), + pp. 1–2. + + See list of developers on p. 1 of + papers/pooma.ps. + + See list of developers on p. 1 of + papers/pooma.ps. See history and motivation + on p. 3 of papers/pooma.ps. + + Use README for + information. + +

+ introduction.html + + &pooma; was designed and implemented by scientists working + at the Los Alamos National Laboratory's Advanced Computing + Laboratory. Between them, these scientists have written and tuned + large applications on almost every commercial and experimental + supercomputer built in the last two decades. As the technology + used in those machines migrates down into departmental computing + servers and desktop multiprocessors, &pooma; is a vehicle for its + designers' experience to migrate as well. In particular, + &pooma;'s authors understand how to get good performance out of + modern architectures, with their many processors and multi-level + memory hierarchies, and how to handle the subtly complex problems + that arise in real-world applications. +

+ +

+ + + + + + A Tutorial Introduction + + UPDATE: In the following paragraph, fix the cross-reference + to the actual section. + + &pooma; provides different containers and processor + configurations and supports different implementation styles, as + described in . In this + chapter, we present several different implementations of the + &doof2d; two-dimensional diffusion simulation program: + + + a C-style implementation omitting any use of &pooma; + computing each array element individually, + + + a &pooma; &array; implementation computing each array + element individually, + + + a &pooma; &array; implementation using data-parallel + statements, + + + a &pooma; &array; implementation using stencils, which + support local computations, + + + a stencil-based &pooma; &array; implementation supporting + computation on multiple processors + + + a &pooma; &field; implementation using data-parallel + statements, and + + + a data-parallel &pooma; &field; implementation for + multi-processor execution. + + + + These illustrate the &array;, &field;, &engine;, layout, + mesh, and domain data types. They also illustrate various + immediate computation styles (element-wise accesses, data-parallel + expressions, and stencil computation) and various processor + configurations (one sequential processor and multiple + processors). + +

+ &doof2d; Averagings + + + + + + The Initial Configuration + + + + + + + + After the First Averaging + + + + + + + + After the Second Averaging + + + + + The &doof2d; diffusion program starts with a two-dimensional + grid of values. To model an initial density, all grid values are + zero except for one nonzero value in the center. Each averaging, + each grid element, except the outermost ones, updates its value by + averaging its value and its eight neighbors. To avoid overwriting + grid values before all their uses occur, we use two arrays, reading + the first and writing the second and then reversing their roles + within each iteration. + + Figure + illustrates the averagings. Initially, only the center element has + nonzero value. To form the first averaging, each element's new + value equals the average of its and its neighbors' previous values. + Thus, the initial nonzero value spreads to a three-by-three grid. + The averaging continues, spreading to a five-by-five grid of + nonzero values. Values in outermost grid cells are always + zero. + + Before presenting various implementations of %doof2d;, we + explain how to install the &poomaToolkit;. + + REMOVE: &doof2d; algorithm and code is illustrated in + Section 4.1 of + pooma-publications/pooma.ps. It includes a + figure illustrating parallel communication of data. + +

+ Installing &pooma; + + ADD: How does one install &pooma; using Windows or Mac? + + UPDATE: Make a more recent &pooma; source code file + available on &poomaDownloadPage;. For example, + LINUXgcc.conf is not available. + + In this section, we describe how to obtain, build, and + install the &poomaToolkit;. We focus on installing under the + Unix operating system. Instructions for installing on computers + running Microsoft Windows or MacOS, as well as more extensive + instructions for Unix, appear in . + + Obtain the &pooma; source code &poomaSourceFile; + from the &pooma; download page (&poomaDownloadPage;) available off + the &pooma; home page (&poomaHomePage;). The tgz + indicates this is a compressed tar archive file. To extract the + source files, use tar xzvf &poomaSourceFile;. + Move into the source code directory &poomaSource; directory; e.g., + cd &poomaSource;. + + Configuring the source code prepares the necessary paths for + compilation. First, determine a configuration file in + corresponding to your operating system and compiler in the + config/arch/ directory. + For example, LINUXgcc.conf supports compiling + under a &linux; operating system with &gcc; and SGI64KCC.conf supports compiling + under a 64-bit SGI Unix operating + system with &kcc;. Then, configure the source code: + ./configure --arch LINUXgcc --opt --suite + LINUXgcc-opt. The architecture argument to the + --arch option is the name of the corresponding + configuration file, omitting its .conf suffix. The + --opt indicates the &poomaToolkit; will + contain optimized source code, which makes the code run more + quickly but may impede debugging. Alternatively, the + --debug option supports debugging. The + suite name + can be any arbitrary string. We chose + LINUXgcc-opt to remind us of the architecture + and optimization choice. configure creates subdirectories + named by the suite name LINUXgcc-opt for use when + compiling the source files. Comments at the beginning of + lib/suiteName/PoomaConfiguration.h + record the configuration arguments. + + To compile the source code, set the + POOMASUITE environment variable to the suite name + and then type make. To set the environment + variable for the bash shell use + export + POOMASUITE=suiteName, + substituting the suite name's + suiteName. For the + csh shell, use setenv + POOMASUITE LINUXgcc-opt. Issuing the + make command compiles the &pooma; source code + files to create the &pooma; library. The &pooma; makefiles assume + the GNU &make; so substitute the proper + command if necessary. The &pooma; library can be found in, e.g., + lib/LINUXgcc-opt/libpooma-gcc.a. +

+ +

+ Hand-Coded Implementation + + Before implementing &doof2d; using the &poomaToolkit;, we + present a hand-coded implementation of &doof2d;. See . After querying the + user for the number of averagings, the arrays' memory is + allocated. Since the arrays' size is not known at compile time, + the arrays are accesses via pointers to allocated dynamic memory. + This memory is deallocated at the program's end to avoid memory + leaks. The arrays are initialized with initial conditions. For + the b array, all values except the central ones + have nonzero values. Only the outermost values of the + a array need be initialized to zero, but we + instead initialize them all using the loop used by + b. + + The simulation's kernel consists of triply nested loops. + The outermost loop controls the number of iterations. The inner + nested loops iterate through the arrays' elements, excepting the + outermost elements; note the loop indices range from 1 to n-2 + while the array indices range from 0 to n-1. Each + a value is assigned the average of its + corresponding value in b and the latter's + neighbors. Values in the two-dimensional grids are accessed using + two sets of brackets, e.g., a[i][j]. After + assigning values to a, a second averaging reads + values in a, writing values in + b. + + After the kernel finishes, the final central value is + printed. If the desired number of averagings is even, the value + in b is printed; otherwise, the value in + a is used. Finally, the dynamically-allocated + memory must be freed to avoid memory leaks. + + + Hand-Coded Implementation of &doof2d; + &doof2d-c-element; + + + The user specifies the desired number of averagings. + + + These variables point to the two-dimensional, + dynamically-allocated grids so we use a pointer to a pointer to + a &double;. + + + The user enters the desired grid size. The grid will be + a square with n by n grid cells. + + + Memory for the arrays is allocated. By default, the + array indices are zero-based. + + + Initially, all grid values are zero except for the one + nonzero value at the center of the second array. Array + positions are indicated using two brackets, e.g., + a[i][j]. A better implementation might + initialize only the outermost values of the + a array. + + + These constants indicate the number of iterations, and + the average weighting. + + + Each a value, except an outermost one, + is assigned the average of its analogous b + value and that value's neighbors. Note the loop indices ensure + the outermost values are not changed. The + weight's value ensures the computation is an + average. + + + The second averaging computes b's + values using values stored in a. + + + After the averagings finish, the central value is printed. + + + The dynamically-allocated memory must be deallocated to + avoid memory leaks. + + + + + To compile the executable, change directories to the &pooma; + &poomaExampleDirectory;/Doof2d + directory. Ensure the POOMASUITE environment + variable specifies the desired suite name + suiteName, as we did when compiling + &pooma; in the previous section . Issuing the + make Doof2d-C-element command creates the + executable + suiteName/Doof2d-C-element. + + When running the executable, specify the desired a + nonnegative number of averagings and the nonnegative number of + grid cells along any dimension. The resulting grid has the same + number of cells along each dimension. After the executable + finishes, the resulting value of the central element is + printed. +

+ + +

+ Element-wise &array; Implementation + + The simplest way to use the &poomaToolkit; is to + use the &pooma; &array; class instead of &c; arrays. &array;s + automatically handle memory allocation and deallocation, support a + wider variety of assignments, and can be used in expressions. + + implements &doof2d; using &array;s and element-wise accesses. + Since the same algorithm is used as , we will concentrate + on the differences. + + + Element-wise &array; Implementation of &doof2d; + &doof2d-array-element; + + + To use &pooma; &array;s, the Pooma/Arrays.h must be included. + + + The &poomaToolkit; structures must be constructed before + their use. + + + Before creating an &array;, its domain must be specified. + The N interval represents the + one-dimensional integral set {0, 1, 2, …, n-1}. An + Interval<2> object represents the entire + two-dimensional index domain. + + + An &array;'s template parameters indicate its dimension, + its value type, and how the values will be stored or computed. + The &brick; &engine; type indicates values will be directly + stored. It is responsible for allocating and deallocating + storage so new and + delete statements are not necessary. + The vertDomain specifies the array index + domain. + + + The first statement initializes all &array; values to the + same scalar value. This is possible because each &array; + knows its domain. The second statement + illustrates &array; element access. Indices, separated by + commas, are surrounded by parentheses rather than surrounded by + square brackets ([]). + + + &array; element access uses parentheses, rather than + square brackets + + + &pooma; may reorder computation of statements. Calling + Pooma::blockAndEvaluate ensures all + computation finishes before accessing a particular array + element. + + + Since &array;s are first-class objects, they + automatically deallocate any memory they require, eliminating + memory leaks. + + + The &poomaToolkit; structures must be destructed after + their use. + + + + + We describe the use of &array; and the &poomaToolkit; in + . + &array;s, declared in the Pooma/Arrays.h, are first-class + objects. They know their index domain, can be used + in expressions, can be assigned scalar and array values, and + handle their own memory allocation and deallocation. + + The creation of the a and + b &array;s requires an object specifying their + index domains. Since these are two-dimensional arrays, their + index domains are also two dimensional. The two-dimensional + Interval<2> object is the Cartesian product of + two one-dimensional Interval<1> objects, each + specifying the integral set {0, 1, 2, …, n-1}. + + An &array;'s template parameters indicate its dimension, the + type of its values, and how the values are stored. Both + a and b are two-dimension + arrays storing &double;s so their dimension + is 2 and its element type is &double;. An &engine; stores an + &array;'s values. For example, a &brick; &engine; explicitly + stores all values. A &compressiblebrick; &engine; also explicitly + stores values if more than value is present, but, if all values + are the same, storage for just that value is required. Since an + engine can store its values any way it desires, it might instead + compute its values using a function or compute the values stored + in separate engines. In practice, most explicitly specified + &engine;s are either &brick; or &compressiblebrick;. + + &array;s support both element-wise access and scalar + assignment. Element-wise access uses parentheses, not square + brackets. For example, b(n/2,n/2) + specifies the central element. The scalar assignment b + = 0.0 assigns the same 0.0 value to all array + elements. This is possible because the array knows the extent of + its domain. + + After the kernel finishes, the central value is printed out. + Just prior to this &array; access, a call to + Pooma::blockAndEvaluate() ensures all + computation has finished. &pooma; may reorder computation or + distribute them among various processors. Before reading an + individual &array; value, blockAndEvaluate + ensures the value has the correct value. Calling this function is + necessary only when accessing individual array elements because + &pooma; cannot determine when to call the function itself. For + example, before printing an array, &pooma; will call + blockAndEvaluate itself. + + Any program using the &poomaToolkit; must initialize the + toolkit's data structures using + Pooma::initialize(argc,argv). This + extracts &pooma;-specific command-line options from the + command-line arguments in argv and initializes + the inter-processor communication and other data structures. When + finished, Pooma::finalize() ensures all + computation has finished and the communication and other data + structures are destructed. +

+ + +

+ Data-Parallel &array; Implementation + + &pooma; supports data-parallel &array; accesses. Many + algorithms are more easily expressed using data-parallel + expressions. Also, the &poomaToolkit; might be able to reorder + the data-parallel computations to be more efficient or distribute + them among various processors. In this section, we concentrate + the differences between the data-parallel implementation of + &doof2d; listed in and the + element-wise implementation listed in the previous section . + + + Data-Parallel &array; Implementation of &doof2d; + &doof2d-array-parallel; + + + These variables specify one-dimensional domains {1, 2, + …, n-2}. Their Cartesian product specifies the domain + of the array values that are modified. + + + Data-parallel expressions replace nested loops and array + element accesses. For example, a(I,J) + represents the subset of the a array having + a domain equal to the Cartesian product of I + and J. Intervals can shifted by an additive + or multiplicative constant. + + + + + Data-parallel expressions apply domain objects to containers + to indicate a set of parallel expressions. For example, in the + program listed above, a(I,J) specifies all + of a array excepting the outermost elements. + The array's vertDomain domain consists of the + Cartesian product of {0, 1, 2, …, n-1} and itself, while + I and J each specify {1, 2, + …, n-2}. Thus, a(I,J) is the subset + with a domain of the Cartesian product of {1, 2, …, n-2} + and itself. It is called a view of an + array. It is itself an array, with a domain and supporting + element access, but its storage is the same as + a's. Changing a value in + a(I,J) also changes the same value in + a. Changing a value in the latter also changes + the former if the value is not one of a's + outermost elements. The expression + b(I+1,J+1) indicates the subset of + b with a domain consisting of the Cartesian + product of {2, 3, …, n-1}, i.e., the same domain as + a(I,J) but shifted up one unit and to the + right one unit. Only an &interval;'s value, not its name, is + important. Thus, all uses of J in this program + could be replaced by I without changing the + semantics. + +

+ + +

+ Stencil &array; Implementation + + Many computations are local, computing a &array;'s value by + using close-by &array; values. Encapsulating this computation in + a stencil can yield faster code because the compiler can determine + all accesses come from the same array. Each stencil consists of a + function object and an indication of the stencil's extent. + + + Stencil &array; Implementation of &doof2d; + &doof2d-array-stencil; + + + A stencil is a function object implementing a local + operation on an &array;. + + + &pooma; applies this function call + operator() to the interior domain of an + &array;. Although not strictly necessary, the function's + template parameter C permits using this + stencil with &array;s and other containers. The + read &array; member function supports only + reading values, not writing values, thus possibly permitting + faster access. + + + These two functions indicate the stencil's size. For + each dimension, the stencil extends one cell to the left of (or + below) its center and also one call to the right (or above) its + center. + + + Create the stencil. + + + Applying stencil to the + b array and a subset + interiorDomain of its domain yields an + array, which is assigned to a subset of a. + The stencil's function object is applied to each position in + the specified subset of b. + + + + + Before we describe how to create a stencil, we describe how + to apply a stencil to an array, yielding values. To compute the + value associated with index position (1,3), the stencil's center + is placed at (1,3). The stencil's + upperExtent and + lowerExtent functions indicate which &array; + values the stencil's function will use. See . + Applying the stencil's function call + operator() yields the computed value. To + compute multiple &array; values, apply a stencil to the array and + a domain object: stencil(b, + interiorDomain). This applies the stencil to each + position in the domain. The user must ensure that applying the + stencil does not access nonexistent &array; values. + +

+ Applying a Stencil to an &array; + + + + + + Apply a stencil to position (1,3) of an array. + + + To compute the value associated with index position (1,3) + of an array, place the stencil's center, indicated with dashed + lines, at the position. The computation involves the array + values covered by the array and delineated by + upperExtent and + lowerExtent. + + + + + To create a stencil object, apply the &stencil; type to a + function object class. For example, + Stencil<DoofNinePt> stencil declares + the stencil object. The function object class + must define a function call operator() with a + container parameter and index parameters. The number of index + parameters, indicating the stencil's center, must equal the + container's dimension. For example, DoofNinePt + defines operator()(const C& c, int i, int + j). We templated the container type + C although this is not strictly necessary. The + two index parameters i and j + ensure the stencil works with two-dimensional containers. The + lowerExtent indicates how far to the left + (or below) the stencil extends beyond its center. Its parameter + indicates a particular dimension. Index parameters + i and j are in dimension 0 + and 1. upperExtent serves an + analogous purpose. The &poomaToolkit; uses these functions when + distribution computation among various processors, but it does not + use these functions to ensure nonexistent &array; values are not + accessed. Caveat stencil user! +

+ + +

+ Distributed &array; Implementation + + A &pooma; program can execute on one or multiple processors. + To convert a program designed for uniprocessor execution to a + program designed for multiprocessor execution, the programmer need + only specify how each container's domain should be split into + patches. The &poomaToolkit; automatically + distributes the data among the available processors and handles + any required communication between processors. + + + Distributed Stencil &array; Implementation of &doof2d; + &doof2d-array-distributed; + + + The number of processors executing a &pooma; program can + be specified at run-time. + + + The UniformGridPartition declaration + specifies how an array's domain will be partition, of split, + into patches. Guard layers are an optimization that can reduce + data communication between patches. The + UniformGridLayout declaration applies the + partition to the given domain, distributing the resulting + patches among various processors. + + + The MultiPatch &engine; distributes requests + for &array; values to the associated patch. Since a patch may + associated with a different processor, its + remote engine has type + Remote<Brick>. &pooma; automatically + distributes the patches among available memories and + processors. + + + The stencil computation, whether for one processor or + multiple processors, is the same. + + + + + Supporting distributed computation requires only minor code + changes. These changes specify how each container's domain is + distributed among the available processors. The rest of the + program, including all the computations, remains the same. When + running, the &pooma; executable interacts with the run-time + library to determine which processors are available, distributes + the containers' domains, and automatically handles all necessary + interprocessor communication. The same executable runs on one or + many processors. Thus, the programmer can write one program, + debugging it on a uniprocessor computer and running it on a + supercomputer. + +

+ The &pooma; Distributed Computation Model + + + + + + the &pooma; distributed computation model. + + + The &pooma; distributed computation model combines + partitioning containers' domains and the computer configuration + to create a layout. + + + + + &pooma;'s distributed computing model separates container + domain concepts from computer configuration concepts. See . + The program indicates how each container's domain will be + partitioned. This process is represented in the upper left corner + of the figure. A user-specified partition specifies how to split + the domain into pieces. For example, the illustrated partition + splits the domain into three equal-sized pieces along the + x-dimension and two equal-sized pieces along the y-dimension. + Thus, the domain is split into patches. + The partition also specifies external and internal guard layers. + A guard layer is a domain surrounding a + patch. A patch's computation only reads but does not write these + values. An external guard layer + conceptually surrounds the entire container domain with boundary + values whose presence permits all domain computations to be + performed the same way even for values along the domain's edge. + An internal guard layer duplicates values + from adjacent patches so communication need not occur during a + patch's computation. The use of guard layers is an optimization; + using external guard layers eases programming and using internal + guard layers reduces communication between processor. Their use + is not required. + + The computer configuration of shared memory and processors + is determined by the run-time system. See the upper right portion + of . + A context is a collection of shared memory + and processors that can execute a program or a portion of a + program. For example, a two-processor desktop computer might have + memory accessible to both processors so it is a context. A + supercomputer consisting of desktop computers networked together + might have as many contexts as computers. The run-time system, + e.g., the Message Passing Interface (&mpi;) Communications Library + (FIXME: xref linkend="mpi99", ) or the &mm; + Shared Memory Library (), communicates + the available contexts to the executable. &pooma; must be + configured for the particular run-time system. See . + + A layout combines patches with + contexts so the program can be executed. If &distributedtag; is + specified, the patches are distributed among the available + contexts. If &replicatedtag; is specified, each set of patches is + replicated among each context. Regardless, the containers' + domains are now distributed among the contexts so the program can + run. When a patch needs data from another patch, the &pooma; + toolkit sends messages to the desired patch uses a message-passing + library. All such communication is automatically performed by the + toolkit with no need for programmer or user input. + + FIXME: The two previous paragraphs demonstrate confusion + between run-time system and message-passing + library. + + Incorporating &pooma;'s distributed computation model into a + program requires writing very few lines of code. illustrates + this. The partition declaration creates a + UniformGridPartition splitting each dimension of a + container's domain into equally-sized + nuProcessors pieces. The first + GuardLayers argument specifies each patch will have + copy of adjacent patches' outermost values. This may speed + computation because a patch need not synchronize its computation + with other patches' processors. Since each value's computation + requires knowing its surrounding neighbors, the internal guard + layer is one layer deep. The second GuardLayers + argument specifies no external guard layer. External guard layers + simplify computing values along the edges of domains. Since the + program already uses only the interior domain for computation, we + do not use this feature. + + The layout declaration creates a + UniformGridLayout layout. As illustrates, + it needs to know a container's domain, a partition, the computer's + contexts, and a &distributedtag; or &replicatedtag;. These + comprise layout's three parameters; the + contexts are implicitly supplied by the run-time system. + + To create a distributed &array;, it should be created using + a &layout; object and have a &multipatch; engine. Prior + implementations designed for uniprocessors constructed the + container using a &domain; object. A distributed implementation + uses a &layout; object, which conceptually specifies a &domain; + object and its distribution throughout the computer. A + &multipatch; engine supports computations using multiple patches. + The UniformTag indicates the patches all have the + same size. Since patches may reside on different contexts, the + second template parameter is Remote. Its + Brick template parameter specifies the engine for a + particular patch on a particular context. Most distributed + programs use MultiPatch<UniformTag, Remote<Brick> + > or MultiPatch<UniformTag, + Remote<CompressibleBrick> > engines. + + The computations for a distributed implementation are + exactly the same as for a sequential implementation. The &pooma; + Toolkit and a message-passing library automatically perform all + computation. + + The command to run the programs is dependent on the run-time + system. To use &mpi; with the Irix 6.5 operating system, one + can use the mpirun command. For example, + mpirun -np 9 Doof2d-Array-distributed -mpi + --num-patches 3 invokes the &mpi; run-time system with + nine processors. The -mpi argument tells + the &pooma; executable Doof2d-Array-distributed + to use the &mpi; Library. + + HERE + + The command Doof2d-Array-distributed -shmem -np 2 + --num-patches 2 + + To run Doof2d-Array-distributed with the &mm; + Shared Memory Library, use + + HERE + + + + COMMENT: See background.html for a partial + explanation. A context is a distinct + region of memory in some computer. Execution thread is associated + with each context. One or more different processors can be + associated with the same context. + + QUESTION: How do &pooma; parallel concepts compare with + Fortran D or high-performance Fortran FINISH CITE: + {koelbel94:_high_perfor_fortr_handb}? + + QUESTION: What does Cheetah do for us? Must configure with + --messaging and Cheetah library must be available. When running + Doof2d benchmark, use --num-patches N. On LinuxKCC, use + '--num-patches p --run-impls 14 --sim-params N 0 1'. Runtime + system must also provide some support. How do I write about this? + What is an example? How does one install Cheetah? + + +

+ + +

+ Relations + + UNFINISHED + +

+ + + + + + Overview of &pooma; Concepts + + Describe the software application layers similar to + papers/SCPaper-95.html and "Short Tour of + &pooma;" in papers/SiamOO98_paper.ps. + Section 2.2, "Why a Framework?," of + papers/pooma.ps argues why a layered approach + eases use. Section 3.1, "Framework Layer Description," + describes the five layers. + + FINISH: Write short glossary entries for each of these. + + FINISH: Look through the source code to ensure all main + concepts are listed. + + Here are (preliminary) &pooma; equations: + + + &pooma; <quote>Equations</quote> + + + + + field = data + materials + centering + layout + mesh + + + map from space to values + + + array = data + layout + + + map from indices to values + + + mesh = layout + origin + spacings + + + distribute domain through physical space + + + layout = domain + partition + layout_tag (distributed/replicated) + + + distribute domain's blocks among processors/contexts + + + partition = blocks + guard layers + + + split domain into blocks + + + domain = newDomain + + + space of permissible indices + + + +

+ + + FINISH: Following is a first try at describing the &pooma; + abstraction layers. See also paper illustration. + + + &pooma; Abstraction Layers + + + + + application program + + + &array; &field; (should have + FieldEngine under it) + + + &engine; + + + evaluators + + + +

+ + FINISH: How does parallel execution fit in? + + FINISH: Should we also name and describe each layer? + +

+ Domains + +

+ Section 4 "Future Improvements in + &pooma; II" of + papers/SiamOO98_paper.ps + + A &domain; is a set of discrete points in some space.… + &domain;s provide all of the expected domain calculus + capabilities such as subsetting and intersection. + +

+ + Section 3, "Domains and Views," of + papers/iscope98.pdf describes five types of + domains +

+ + +

+ Layouts + + UNFINISHED + + Also describe partitions and guard cells within here. + +

+ + +

+ Meshes + + UNFINISHED +

+ + +

+ Data-Parallel Statements + + Can we use "An Overview of &pete;" from + papers/PETE_DDJ/ddj_article.html or is this + too low-level? + + Section 3.2.1 of papers/pooma.ps + gives a simple example of data-parallel expression. It also has a + paragraph introducing data-parallel operations and selecting + subsets of domains. Section 3.4 describes the Chained + Expression Object (CEO), apparently a precursor + of &pete;. Regardless, it provides some motivation and + introductory material. + + From Section 4 of + papers/SiamOO98_paper.ps: + + This version of &pete; reduces compile time of user codes + and utilizes compile-time knowledge of expression &domain;s for + better optimization. For example, more efficient loops for + evaluating an expression can be generated if &pete; knows that the + &domain; has unit stride in memory. + + Section 4, "Expressions and Evaluators", of + papers/iscope98.pdf has a good explanation of + &pooma; II's expression trees and expression engines. + + COMMENT: background.html has some related + &pete; material. +

+ +

+ Containers + +

+ &array; + +

+ Section 4 "Future Improvements in + &pooma; II" of + papers/SiamOO98_paper.ps + + An &array; can be thought of as a map from one &domain; to + another.… &array;s depend only on the interface of + &domain;s. Thus, a subset of view of an &array; can be + manipulated in all the same ways as the original &array;. + &array;s can perform indirect addressing because the output + &domain; one one &array; can be used as the input &domain; of + another &array;. &array;s also provide individual element + access. +

+ + + + (unformatted) From + papers/GenericProgramming_CSE/dubois.html: + The &pooma; &array; concept provides an example of how these + generic-programming features can lead to flexible and efficient + code. An Array maps a fairly arbitrary input domain to an + arbitrary range of outputs. When used by itself, an &array; + object A refers to all of the values in its + domain. Element-wise mathematical operations or functions can be + applied to an array using straightforward notation, like A + B + or sin(A). Expressions involving Array objects are themselves + Arrays. The operation A(d), where d is a domain object that + describes a subset of A's domain, creates a view of A that + refers to that subset of points. Like an array expression, a + view is also an Array. If d represents a single point in the + domain, this indexing operation returns a single value from the + range. Equivalently, one can index an N-dimensional Array by + specifying N indices, which collectively specify a single point + in the input domain: A(i1, i2, ..., iN). + + The &pooma; multi-dimensional Array concept is similar to + the Fortran 90 array facility, but extends it in several + ways. Both &pooma; and Fortran arrays can have up to seven + dimensions, and can serve as containers for arbitrary + types. Both support the notion of views of a portion of the + array, known as array sections in F90. The &pooma; Array concept + supports more complex domains, including bounded, continuous + (floating-point) domains. Furthermore, Array indexing in &pooma; + is polymorphic; that is, the indexing operation X(i1,i2) can + perform the mapping from domain to range in a variety of ways, + depending on the particular type of the Array being + indexed. + + Fortran arrays are dense and the elements are arranged + according to column-major conventions. Therefore, X(i1,i2) + refers to element number i1-1+(i2-1)*numberRowsInA. However, as + Fig. 1 shows, Fortran-style "Brick" storage is not the only + storage format of interest to scientific programmers. For + compatibility with C conventions, one might want to use an array + featuring dense, row-major storage (a C-style Brick). To save + memory, it might be advantageous to use an array that only + stores a single value if all its element values are the + same. Other sparse storage schemes that only store certain + values may also be desirable. To exploit parallelism, it is + convenient for an array's storage to be broken up into patches, + which can be processed independently by different CPUs. Finally, + one can imagine an array with no data at all. For example, the + values can be computed from an expression involving other + arrays, or analytically from the indices. + + + The &pooma; &array; Class Template + + Next we describe &pooma;'s model of the Array concept, the + Array class template. The three most important requirements from + the point of view of overall design are: (1) arbitrary domain, + (2) arbitrary range, and (3) polymorphic indexing. These express + themselves in the template parameters for the &pooma; Array + class. The template + + template <int Dim, class T = double, class EngineTag = Brick> + class Array; + + is a specification for creating a set of classes all named + Array. The template parameters Dim, T, and EngineTag determine + the precise type of the Array. Dim represents the dimension of + the array's domain. T gives the type of array elements, thereby + defining the output range of the array. EngineTag specifies the + the manner of indexing and types of the indices. + + End From + papers/GenericProgramming_CSE/dubois.html: + + Section 2, "Arrays and Engines," of + papers/iscope98.pdf describes both &array;s + and &engine;s. This may or may not duplicate the material in + papers/GenericProgramming_CSE/dubois.html. + +

+ Views of &array;s + + Section 3, "Domains and Views," of + papers/iscope98.pdf motivates the need for + views: +

+ One of the primary uses of domains is to specify + subsections of &array; objects. Subarrays are a common + feature of array classes; however, it is often difficult to + make such subarrays behave like first-class objects. The + &pooma; II engine concept provides a clean solution to + this problem: subsetting an &array; with a domain object + creates a new &array; that has a view engine. +

+ +

+ &field; + + QUESTION: Do we include boundary conditions here? + + FINISH: Do we have an example that shows something not possible + with &array;? + + Describe and illustrate multi-material and + multivalue? + + ADD: description of meshes and guard layers. + +

+ + +

+ <type>TinyMatrix</type> + + Section 3.2.2 of + papers/pooma.ps describes &vector;s and + matrix classes. +

+ +

+ Engines + + (unformatted) From + papers/GenericProgramming_CSE/dubois.html: + + The Engine Concept + + To implement polymorphic indexing, the Array class defers + data storage and data lookup to an engine object. The requirements + that the Array template places on its engine provide the + definition for the Engine concept. We'll describe these by + examining a simplified version of the Array template, shown in + Fig. 2. + + First, the Array class determines and exports (makes + Engine_t part of Array's public interface) the type of the engine + class that it will use: + + typedef Engine<Dim, T, EngineTag> Engine_t; + + This statement declares Engine_t to be an alias for the type + Engine<Dim,T,EngineTag>. This is the first requirement + placed on engine classes: they must be specializations of a + general Engine template whose template parameters are identical to + those of Array. Next, the Array template determines the type of + scalar arguments (indices) to be used in operator(), the function + that implements &pooma;'s Fortran-style indexing syntax X(i1,i2): + + typedef typename Engine_t::Index_t Index_t; + + This statement defines another type alias: + Array<Dim,T,EngineTag>::Index_t is simply an alias for + Engine_t::Index_t. Engine_t::Index_t is a qualified name, which + means that the type Index_t is found in the class Engine_t. This + is the second requirement for the Engine concept: the class + Engine_t must define a public type called Index_t. This line will + not compile if that definition is not supplied. This indirection + is one of the ways that &pooma; supports polymorphic indexing. If + the Engine works with a discrete integer domain, it defines its + Index_t to be an integral type. If the Engine works in a + continuous domain, it defines its Index_t to be a floating-point + type. + + The data lookup is performed in the operator() function. We + see that Array simply passes the indices on to its engine + object. Thus, we have the third requirement for the Engine + concept: it must provide a version of operator() that takes Dim + values of type Index_t. + + Simply passing the indices on to the engine object may seem + odd. After all, engine(i,j) looks like we're just indexing another + array. There are several advantages to this extra level of + indirection. The Array class is as faithful a model of the Array + concept as possible, while the Engine class is a low-level + interface to a user-defined data source. As a result, Array has a + wide variety of constructors for user convenience, while engines + have but a few. Array supports a wide variety of overloaded + operator() functions for view creation and indexing. Engines + support indexing only. Array does not have direct access to the + data, which is managed by the engine object. Finally, Array has a + wide variety of overloaded mathematical operators and functions, + and works with the Portable Expression Template Engine (PETE) [4] + to provide efficient evaluation of Array expressions. Engines have + no such support. In general, Array is much more complex and + feature-laden than Engine. This is the prime advantage of the + separation of interface and implementation: Array only has to be + implemented once by the &pooma; developers. Engines are simple + enough to be written by users and plugged directly into the Array + framework. + + Figure 3 illustrates the "Brick" specialization of the + Engine template, which implements Fortran-style lookup into a + block of memory. First, there is the general Engine template, + which is empty as there is no default behavior for an unknown + EngineTag. The general template is therefore not a model for the + Engine concept and Array classes attempting to use it will not + compile. Next, there is the definition of the Brick class, a + policy tag whose sole purpose is to select a particular + specialization of the Engine template. Finally, there is the + partial specialization of the Engine template. Examining its body, + we see the required Index_t typedef and the required operator(), + which follows the Fortran prescription for generating an offset + into the data block based on the row, column, and the number of + rows. All of the requirements are met, so the Brick-Engine class + is a model of the Engine concept. + + End From + papers/GenericProgramming_CSE/dubois.html: + + (unformatted) From + papers/GenericProgramming_CSE/dubois.html: + + Compile-time Versus Run-Time Polymorphism + + Encapsulating the indexing in an Engine class has important + advantages, both in terms of flexibility and efficiency. To + illustrate this point, we introduce the PolarGaussian-Engine + specialization in Fig. 4. This is an analytic engine that + calculates its values directly from its inputs. Unlike the + Brick-Engine, this engine is "indexed" with data of the same type + as its output: it maps a set of T's to a single T. Therefore, the + Index_t typedef selects T as the index type, as opposed to the int + in the Brick-Engine specialization. The operator() function also + differs in that it computes the return value according to an + analytic formula. + + Both Engine<Dim,T,Brick> and + Engine<Dim,T,PolarGaussian> can be plugged in to an Array by + simply varying the Array's EngineTag. This is possible despite the + fact that the two classes exhibit dramatically different behavior + because they are both models of the Engine concept. + + Notice that we have achieved polymorphic indexing without + the use of inheritance or virtual functions. For instance, + consider the following code snippet: + + Array<2, double, Brick> a; + Array<2, double, PolarGaussian> b; + + double x = a(2, 3); // x = a.engine.data[2 + 3 * a.engine.numRows]; + double y = b(2.0, 3.0); // y = exp(-(2.0*2.0+3.0*3.0) / b.engine.delta); + + The data lookup functions for the two Arrays perform completely + different operations. Since this is accomplished using static + types, it is known as compile-time polymorphism. Moreover, + everything is known at compile time, so the functions are fully + inlined and optimized, thereby yielding code equivalent to that + shown in the comments above. + + The flexibility and efficiency of compile-time polymorphism + cannot be duplicated with a run-time implementation. To illustrate + this point, in Fig. 5, we re-implement our Array concept using the + classic Envelope-Letter pattern [5], with the array class, + RTArray, being the envelope and the run-time-engine, RTEngine, + being the letter. RTArray defers data lookup to the engine object + by invoking the engine's functions through a pointer to the + RTEngine base class. Figure 6 illustrates the RTEngine base class + and Fig. 7 illustrates two descendants: RTBrick and + RTPolarGaussian. + + The run-time implementation provides the same basic + functionality as the compile-time implementation, but it is not as + flexible or as efficient. It lacks flexibility in that the return + type of the indexing operation must be specified in the RTEngine + base class and in the RTArray class. Thus, in Figs. 5 and 6,we see + versions of RTArray::operator() and RTEngine::index functions that + take both int's and T's. If the programmer wants to add another + index-type option, these classes must be modified. This is a + violation of the open-closed principle proposed by Meyer + [6]. Also, since RTEngine descendants will usually only implement + one version of index, we cannot make RTEngine an abstract base + class. Instead, we have the default versions of index throw an + exception. Thus, compile-time error checking is + weakened. Furthermore, since indexing is done via a virtual + function call, it will almost never be inlined, which is not + acceptable in most scientific applications. + + There are advantages to the Envelope-Letter approach. First, + all RTArray objects have the same type, allowing them to be stored + in homogeneous collections. This can simplify the design of some + applications. Second, RTArray objects can change their engines at + runtime, and thus effectively change their types on the fly??this + is the primary reason for using the Envelope-Letter idiom, and can + be very important in some applications. + + For most scientific applications, however, these issues are + minor, and maximum performance for array indexing is of paramount + importance. Our compile-time approach achieves this performance + while providing the desired polymorphic indexing. + + From Section 4 of + papers/SiamOO98_paper.ps: + + The &array; class is templated on an &engine; type that + handles the actual implementation of the mapping from input to + output. Thus, the &array; interface features are completely + separate from the implementation, which could be a single &c; + array, a function of some kind or some other mechanism. This + flexibility allows an expression itself to be viewed through the + &array; interface. Thus, one can write something like + + foo(A*B+C); + where A, B and + C are &array;s and foo is + a function taking an &array; as an argument. The expression + A*B+C + will only be evaluated by the expression engine as needed by + foo. + + In fact, one can even write &engine;s which are wrappers + around external data structures created in non-&pooma; codes and + know to manipulate these structures. Once this is done, the + external entities have access to the entire &array; interface and + can utilize all of the powerful features of + &pooma; II. + + Section 2, "Arrays and Engines," of + papers/iscope98.pdf describes both &array;s + and &engine;s. This may or may not duplicate the material in + papers/GenericProgramming_CSE/dubois.html. + + Section 4, "Expressions and Evaluators", of + papers/iscope98.pdf has a good explanation of + &pooma; II's expression trees and expression engines. + + + MultiPatch Engine + From README: To actually use multiple + contexts effectively, you need to use the MultiPatch engine with + patch engines that are Remote engines. Then the data will be + distributed across multiple contexts instead of being copied on + every context. See the files in example/Doof2d for a simple + example that creates a MultiPatch array that can be distributed + across multiple contexts and performs a stencil computation on + that array. + + +

+ + +

+ Relations + + UNFINISHED +

+ + +

+ Stencils + + Section 3.5.4, "Stencil Objects," of + papers/pooma.ps provides a few uses of + stencils. + + Section 5, "Performance," of + papers/iscope98.pdf motivates and explains + stencils. +

+ + +

+ Contexts + +

+ background.html + In order to be able to cope with the variations in machine + architecture noted above, &pooma;'s distributed execution model + is defined in terms of one or more contexts, each of which may + host one or more threads. A context is a distinct region of + memory in some computer. The threads associated with the context + can access data in that memory region and can run on the + processors associated with that context. Threads running in + different contexts cannot access memory in other contexts. + + A single context may include several physical processors, + or just one. Conversely, different contexts do not have to be on + separate computers—for example, a 32-node SMP computer could + have up to 32 separate contexts. This release of &pooma; only + supports a single context for each application, but can use + multiple threads in the context on supported platforms. Support + for multiple contexts will be added in an upcoming + release. +

+ + +

+ Utility Types: ???TITLE?? + +

+ &vector; + + Section 3.2.2 of + papers/pooma.ps describes &vector;s and + matrix classes. +

+ +

+ + + + + Writing Sequential Programs + + UNFINISHED + +

+ &benchmark; Programs + + Define a &benchmark; program vs. an example or an + executable. Provide a short overview of how to run these + programs. Provide an overview of how to write these programs. + See src/Utilities/Benchmark.h. +

+ + +

+ Using <type>Inform</type>s for Output + + UNFINISHED +

+ + + + + + Writing Distributed Programs + + Discuss the distributed model and guard cells. See docs/parallelism.html. + + Does any of the parallel implementation described in + papers/SCPaper-95.html still apply? + + ?Tuning program for maximize parallel performance? + + external references to &mpi; and threads + + QUESTION: Are there interesting, short parallel programs in + any &mpi; book that we can convert to &pooma;? + +

+ Layouts + + An out-of-date description can be found in Section 3.3, + especially 3.3.2, of papers/pooma.ps + describes the global/local interactions and parallel abstraction + layers. +

+ +

+ Parallel Communication + + An out-of-date description can be found in + Section 3.3.3 of papers/pooma.ps +

+ +

+ Using Threads + + QUESTION: Where do threads fit into the manual? Do threads + even work? + + From Section 4, of + papers/SiamOO98_paper.ps + + &pooma; II will make use of a new parallel run-time + system called &smarts; that is under development at the ACL. + &smarts; supports lightweight threads, so the evaluator will be + able to farm out data communication tasks and the evaluation of + subsets of an expression to multiple threads, thus increasing the + overlap of communication and computation. Threads will also be + available at the user level for situations in which a + task-parallel approach is deemed appropriate. +

+ + + + + + Under the Hood: How &pooma; Works + + from point of view of &cc; interpreter + +

+ &pete; + + Use the material in + papers/PETE_DDJ/ddj_article.html, which gives + example code and descriptions of how the code works. + + See material in background.html's Expression + Templates. +

+ + + + + + Debugging and Profiling &pooma; Programs + + UNFINISHED + + + + + + Example Program: Jacobi Solver + + QUESTION: Is this chapter necessary? Do we have enough + existing source code to write this chapter? + + + + + + &pooma; Reference Manual + + + TMP: This Chapter Holds These Comments But Will Be Removed + + For each template parameter need to describe the constraints + on it. + + Remove this section when the following concerns have been + addressed. + + Add a partintro explaining file suffixes such as .h, .cpp, .cmpl.cpp, .mk, .conf. Should we also explain use + of inline even when necessary and the template + model, e.g., including .cpp files. + + QUESTION: What are the key concepts around which to organize + the manual? + + QUESTION: What format should the manual use? + +

+ Musser, Derge, and Sanai, §20.0. + It is important to state the requirements on the components + as generally as possible. For example, instead of saying + class X must define a member function + operator++(), we say for any + object x of type X, + ++x is defined. +

+ + + + + A Typical &pooma; Class + + + Class Member Notation + + + *_t + + + + type within a class. QUESTION: What is the &cc; name for + this? + + + + + *_m + + + + data member + + + + + + &pooma; Class Vocabulary + + component + + one of several values packaged together. For example, a + three-dimensional vector has three components, i.e., three + values. + + + + element-wise + + applied to each element in the group, e.g., an array + + + + reduction + + repeated application of a binary operator to all elements, + yielding one value + + + + tag + + an enumerated value indicating inclusion in a particular + semantic class. The set of values need not be explicitly + declared. + + + + + + + + + Installing and Configuring &pooma; + + + + Installing &pooma;. + + + Requirements for configuration files. + + + + Include descriptions of using &smarts;, &cheetah;, τ, + &pdt;. + + QUESTION: Does it install on windows and on mac? If so, what + are the instructions? See also INSTALL.{mac,unix,windows}. + + README has some + information on &cheetah; and threads in the Message-Based + Parallelism section. + + Which additional packages are necessary and when? + + What configure options should we list? See configure. Be sure to list + debugging option and how its output relates to config/LINUXgcc.suite.mk. + + config/arch has files + for (OS, compiler) pairs. Explain how to modify a configuration + file. List requirements when making a new configuration file (low + priority). + + config/LINUXgcc.suite.mk has output + from configure. Useful to + relate to configuration files and configure's debugging output. + + + + + + Compilation and &make; Files + + We assume Gnu make. Do we know what assumptions are made? + + How do all these files interact with each other? Ala a make + interpreter, give an example of which files are read and + when. + + + config/Shared/README.make + This has short descriptions of many files, + especially in config/Shared. + + makefile + These appear throughout all directories. What are + the equivalences classes and what are their + parts? + + include.mk + What does this do? Occurs in many directories: + when? Template seems to be config/Shared/include2.mk. + + subdir.mk + list of subdirectories; occurs in several + directories: when? src/subdir.mk is a good + example. + + + objfile.mk + + list of object files to construct, presumably from + *.cmpl.cpp files. + src/Utilities/objfile.mk is an + example. + + + config/Shared/rules.mk + most compiler rules + + config/head.mk + read at beginning of each + makefile? + + config/Shared/tail.mk + read at end of each makefile? + + config/Shared/variables.mk + Is this used? + + config/Shared/compilerules.mk + table of origin and target suffixes and commands + for conversion + + + + + + + + + &array;s + + Include src/Pooma/Arrays.h to use &array;s. + The implementation source code is in src/Array. + + FINISH: Define an array. Introduce its parts. + + ADD: some mention of the maximum supported number of + dimensions somewhere. + +

+ The &array; Container + + + Template Parameters + + + + + Parameter + Interpretation + + + + + Dim + dimension + + + T + array element type + + + EngineTag + type of computation engine object + + + +

+ + QUESTION: How do I introduce class type definitions, when + they are used, i.e., compile-time or run-time, and when + programmers should use them? + + + Compile-Time Types and Values + + + + + Type or Value + Interpretation + + + + + This_t + the &array; object's type + + + Engine_t + the &array; object's engine's type + + + EngineTag_t + indication of engine's category + + + Element_t + the type of the array elements, i.e., T + + + ElementRef_t + the type of a reference to an array element, + i.e., T&. Equivalently, the type to write to a + single element. + + + Domain_t + the array's domain's type, i.e., the type of the + union of all array indices + + + Layout_t + unknown + + + dimensions + integer equalling the number of dimensions, i.e., + Dim + + + rank + integer equalling the number of dimensions, i.e., + Dim; a synonym for + dimensions + + + +

+ +

+ Constructors and Destructors + + + Constructors and Destructors + + + + + Function + Effect + + + + + + + Array + + + + Creates an array that will be resized + later. + + + + + Array + const Engine_t& + engine + + + Creates an array with an engine equivalent to + the engine. This array will have the + same values as engine. QUESTION: Why + would a user every want to use this + constructor? + + + + + Array + + const + Engine<Dim2, T2, EngineTag2>& + engine + + + const + Initializer& init + + + + What does this do? + + + ADD ALL CONSTRUCTORS AND DESTRUCTORS. + + + +

+ + +

+ Initializers + + Add a table. +

+ + +

+ Element Access + + + &array; Element Access + + + + + Function + Effect + + + + + + + Element_t read + + + + unknown: See line 1839. + + + + + Element_t read + + const + Sub1& s1 + + + const + Sub2& s2 + + + + How does the version with template parameters, + e.g., Sub1 differ from the int + version? + + + + + Element_t operator() + + const + Sub1& s1 + + + const + Sub2& s2 + + + + How does this differ from read(const + Sub1& s1, const Sub2& s2)? + + + ADD ALL reads and + operator()s. + + + +

+ + +

+ Component Access + + When an array stores elements having components, e.g., an + array of vectors, tensors, or arrays, the + comp returns an array consisting of the + specified components. The original and component array share the + same engine so changing the values in one affects values in the + other. + + For example, if &n; × &n; array a + consists of three-dimensional real-valued vectors, + a.comp(1) returns a &n; × &n; + real-valued array of all the middle vector components. Assigning + to the component array will also modify the middle components of + the vectors in a. + + + &array; Component Access + + + + + Function + Effect + + + + + + + UNKNOWN compute this comp + + const + int& + i1 + + + + unknown: See line 1989. + + + ADD ALL comps. + + + +

+ +

+ Accessors + + + &array; Accessor Methods + + + + + Function + Effect + + + + + + + int first + + int + d + + + + unknown: See line 2050 + + + ADD ALL other accessor methods, including + engine. + + + +

+ + +

+ Copying &array;s + + Explain how copied arrays and views of arrays share the + same underlying engine so changing values in one also affects the + other. This is called a shallow copy. +

+ + +

+ Utility Methods + + + &array; Utility Methods + + + + + Function + Effect + + + + + + + void makeOwnCopy + + + + unknown: See line 2044 + + + ADD ALL other utility methods. + + + +

+ + +

+ Implementation Details + + As a container, an &array;'s implementation is quite + simple. Its privatedata consists of + an engine, and it has no private + functions. + + + &array; Implementation Data + + + + + Data Member + Meaning + + + + + + + private + Engine_t engine_m + + + engine computing the array's values + + + +

+ +

+ + +

+ &dynamicarray;s: Dynamically-Sized Domains + + A DynamicArray is a read-write array with extra + create/destroy methods. It can act just like a regular Array, but + can have a dynamically-changing domain. See src/DynamicArray/DynamicArray.h. + + ADD: Briefly describe what the class does and an example of + where it is used. + + ADD: Check that its interface is actually the same as for + &array;. + + ADD: Check that the operations on dynamic arrays are + actually the same as for &array;. See src/DynamicArray/DynamicArrayOperators.h, + src/DynamicArray/PoomaDynamicArrayOperators.h, + and src/DynamicArray/VectorDynamicArrayOperators.h. + + +

+ Implementation Details + + DynamicArray has no + protected or + private members. +

+ + +

+ Views of &array;s + + UNFINISHED +

+ + +

+ &array; Assignments + + &pooma; supports assignments to &array;s of other &array;s + and scalar values. QUESTION: Is the following correct? For the + former, the right-hand side array's domain must be at least as + large as the left-hand side array's domain. Corresponding values + are copied. Assigning a scalar value to an array ensures all the + array elements have the same scalar value. + + UNFINISHED: Add a table containing assignment operators + found one lines 2097–2202. +

+ + +

+ Printing &array;s + + &array;s support output to but not input from IO streams. + In particular, output to ostreams and file streams is + supported. + + Add a table, using src/Array/Array.h, lines + 2408–2421. See the implementation in src/Array/PrintArray.h. + + QUESTION: How does one print a &dynamicarray;. +

+ + +

+ Expressions Involving &array;s + + In &pooma;, expressions may contain entire &array;s. That + is, &array;s are first-class objects with respect to expressions. + For example, given &array;s a and + b, the expression a + b + is equivalent to an array containing the element-wise sum of the + two arrays. + + Any finite number of the operators listed below can be used + in an expression. The precedence and order of operation is the + same as with ordinary built-in types. + + QUESTION: Do &field;s also support the same set of + operations? + + QUESTION: Some operations in src/Field/FieldOperators.h use both + &array; and &field;. Do we list them here or in the &field; + section or both or somewhere else? + + In the table below, &array; supplants the exact return types + because they are complicated and rarely need to be explicitly + written down. + + + Operators on &array; + + + + + Operator + Value + + + + + + + + Array acos + const Array<Dim,T,EngineTag>& a + + + + an array containing the element-wise inverse + cosine of the array a + + + ADD ALL other operators appearing in src/Array/ArrayOperators.h, + src/Array/ArrayOperatorSpecializations.h, + src/Array/PoomaArrayOperators.h, + and src/Array/VectorArrayOperators.h. + + + +

+ + FINISH: Write one or two examples or refer to ones + previously in the text. +

+ + +

+ Reducing All &array; Elements to One Value + + These reduction functions repeatedly apply a binary + operation to all array elements to yield a value. These functions + are similar to the Standard Template Library's + accumulate function. For example, + sum repeatedly applies the binary plus + operator to all array elements, yielding the sum of all array + elements. + + FINISH: What order of operation, if any, is + guaranteed? + + FINISH: Add a table of the functions in src/Array/Reductions.h. + + How does one use one's own binary function? See src/Engine/Reduction.h. +

+ + +

+ Utility Functions + +

+ Compressed Data + + Add a table containing + elementsCompressed, + compressed, compress, + and uncompress. +

+ + +

+ Centering Sizes and Number of Materials + + ADD: a description of numMaterials and + centeringSize found in src/Field/Field.h. These functions + are meaningless for &array; but are provided for consistency with + &field;. +

+ +

+ Obtaining Subfields + + ADD: a description of subField found + in src/Field/Field.h. + This function, meaningless for &array;, is provided for + consistency with &field;. +

+ + +

+ TMP: What do we do with these …? Remove this + section. + + QUESTION: Do we describe the &leaffunctor;s specialized for + &array;s in src/Array/Array.h or in the &pete; + reference section? What about the functions in src/Array/CreateLeaf.h? + + QUESTION: What is an EngineFunctor? We + probably should describe it in an analogous way as for + &leaffunctor;s. + + QUESTION: Where do we write about + ExpressionTraits for &array;s? + + QUESTION: Do we describe the ElementProperties + specialization at this place or in its section? + + QUESTION: Do we describe the Patch + specialization for &array;s (src/Array/Array.h:1300) in this + place or in a section for patches? +

+ + + + + &field;s + + An &array; is a set of values indexed by + coordinates, one value per coordinate. It models the computer + science idea of an array. Similarly, a &field; is a set of values + indexed by coordinate. It models the mathematical and physical + idea of a field represented by a grid of rectangular cells, each + having at least one value. A &field;'s functionality is a superset + of an &array;'s functionality because: + + + A &field; is distributed through space so one can compute + the distances between cells. + + + Each cell can hold multiple values. For example, a + rectangular cell can have one value on each of its faces. + + + Multiple materials can share the same cell. For example, + different values can be stored in the same cell for carbon, + oxygen, and nitrogen. + + + Also, &field;s' values can be related by relations. Thus, if one + field's values change, a dependent field's values can be + automatically computed when needed. FIXME: See also the unfinished + works chapter's entry concerning relations and arrays. + + QUESTION: Should we add a picture comparing and contrasting + an array and a field? + + QUESTION: How much structure can be copied from the &array; + chapter? + + QUESTION: Where is NewMeshTag, defined in + src/Field/Field.h, + used? + + QUESTION: Do we describe the &leaffunctor;s specialized for + &field;s in src/Field/Field.h or in the &pete; + reference section? Use the same decision for &array;s. + + QUESTION: What do the structure and functions in src/Field/Mesh/PositionFunctions.h + do? + + +

+ The &field; Container + + ADD: table of template parameters and table of compile-time + types and values. + + +

+ Constructors and Destructors + + ADD: this section similar to &array;s's constructor and + destructor section. +

+ +

+ Initializers + + Add a table. +

+ + +

+ Element Access + + ADD: a table ala &array;. Be sure to include + all. +

+ + +

+ Component Access + + ADD: a table ala &array;. +

+ + +

+ Obtaining Subfields + + ADD: discussion and a table listing ways to obtain + subfields. Although the implementation may treat subfield views + and other field views similarly (?Is this true?), they are + conceptually different ideas so we present them + separately. + + See src/Field/Field.h's + operator[], + subField, …, + material. +

+ + +

+ Supporting Relations + + ADD: a table with the member functions including + addRelation, + removeRelations, + applyRelations, and + setDirty. +

+ + +

+ Accessors + + ADD: a table using lines like src/Field/Field.h:1243–1333. +

+ + +

+ Utility Methods + + ADD: a table including + makeOwnCopy. +

+ + +

+ Implementation Details + + ADD: a table similar to &array;'s. + +

+ +

+ + +

+ Views of &field;s + + Be sure to relate to &array; views. Note only three + dimensions are supported. + + Be sure to describe f[i]. Does this + refer to a particular material or a particular value within a + cell? I do not remember. See SubFieldView in + src/Field/Field.h. +

+ + +

+ &field; Assignments + + ADD: Describe supported assignments, relating to &array;'s + assignments. + + UNFINISHED: Add a table containing assignment operators + found on src/Field/Field.h:2097–2202 + and 1512–1611. +

+ + +

+ Printing &field;s + + QUESTION: How similar is this to printing &array;s? + + &field;s support output to but not input from IO streams. + In particular, output to ostreams and file streams is + supported. + + Add a table, using src/Field/Field.h, lines + 1996–2009. See the implementation in src/Field/PrintField.h. +

+ + +

+ Combining &field; Elements + + Like &array;s, &field;s support reduction of all elements to + one value. Additionally, the latter supports computing a field's + values using field stencils. QUESTION: How do I describe this + with a minimum of jargon? + + ADD: something similar to &array; reductions. + + FINISH: Add a table of the functions in src/Field/FieldReductions.h. + + FINISH: Add a table of the functions in src/Field/DiffOps/FieldOffsetReductions.h. + QUESTION: Why is only sum defined? +

+ + +

+ Expressions Involving &field;s + + Do something similar to &array;'s section. See the + operations defined in src/Field/FieldOperators.h, + src/Field/FieldOperatorSpecializations.h, + src/Field/PoomaFieldOperators.h, and + src/Field/VectorFieldOperators.h. + + Some operations involve both &array; and &field; + parameters. Where do we list them? +

+ + +

+ &field; Stencils: Faster, Local Computations + + ADD: a description of a stencil. Why is it needed? How + does a user use it? How does a user know when to use one? Add + documentation of the material from src/Field/DiffOps/FieldStencil.h. + + How is FieldShiftEngine used by &field; + stencils? Should it be described here or in the &engine; section? + See the the code in src/Field/DiffOps/FieldShiftEngine.h. +

+ + +

+ Cell Volumes, Face Areas, Edge Lengths, Normals + + ADD: a description of these functions. See src/Field/Mesh/MeshFunctions.h. + These are initialized in, e.g., src/Field/Mesh/UniformRectilinearMesh.h. + Note that these do not work for NoMesh. +

+ + +

+ Divergence Operators + + ADD: a table having divergence operators, explaining the + current restrictions imposed by what is implemented. See + src/Field/DiffOps/Div.h + and src/Field/DiffOps/Div.UR.h. What + restrictions does UR (mesh) connote? +

+ + +

+ Utility Functions + +

+ Compressed Data + + Add a table containing + elementsCompressed, + compressed, compress, + and uncompress. +

+ + +

+ Centering Sizes and Number of Materials + + ADD: a description of numMaterials and + centeringSize found in src/Field/Field.h. + + QUESTION: How do these relate to any method functions? +

+ + +

+ Obtaining Subfields + + ADD: a description of subField found + in src/Field/Field.h. +

+ +

+ + +

+ &field; Centerings + + DO: Describe the purpose of a centering and its definition. + Describe the ability to obtain canonical centerings. Explain how + to construct a unique centering. See src/Field/FieldDentering.h. +

+ + +

+ Relative &field; Positions + + Permit specifying field positions relative to a field + location. Describe FieldOffset and + FieldOffsetList. See src/Field/FieldOffset.h +

+ + +

+ Computing Close-by Field Positions + + Given a field location, return the set of field locations + that are closest using ?Manhattan? distance. See src/Field/NearestNeighbors.h. +

+ + +

+ Mesh ??? + + Unlike &array;s, &field;s are distributed throughout space + so distances between values within the &field can be computed. A + &field;'s mesh stores this spatial distribution. + + QUESTION: What do we need to write about meshes? What is + unimportant implementation and what should be described in this + reference section? + + QUESTION: Where in here should emphasize vertex, not cell, + positions? VERTEX appears repeatedly in src/Field/Mesh/NoMesh.h. + + + Mesh Types + + + + + Mesh Type + Description + + + + + NoMesh<Dim> + no physical spacing, causing a &field; to mimic + an &array; with multiple engines. + + + UniformRectilinearMesh<Dim,T> + physical spacing formed by the Cartesian product + of ????. + + + +

+ + +

+ Mesh Accessors + + ADD: a table listing accessors, explaining the difference + between (physical and total) and (cell and vertex) domains. See + src/Field/Mesh/NoMesh.h. + Also, include spacings and + origin in src/Field/Mesh/UniformRectilinearMesh.h. + Note NoMesh does not provide the latter two. +

+ +

+ + +

+ TMP: What do we do with these …? Remove this + section. + + QUESTION: Do we describe the Patch + specialization for &field; at this place or in some common place? + Follow &array;'s lead. + + QUESTION: Where do we describe CreateLeaf and + MakeFieldReturn in src/Field/FieldCreateLeaf.h and + src/Field/FieldMakeReturn.h. + + QUESTION: What do we do with FieldEnginePatch + in src/Field/FieldEngine/FieldEnginePatch.h. +

+ + + + + &engine;s + + From a user's point of view, a container makes data available + for reading and writing. In fact, the container's &engine; stores + the data or, if the data is computed, performs a computation to + yield the data. + + FINISH: Introduce the various types of engines. Add a table + with a short description of each engine type. + + FINISH: First, we specify a generic &engine;'s interface. + Then, we present &engine; specializations. + + + Types of &engine;s + + + + + Engine Type + Engine Tag + Description + + + + + Brick + Brick + Explicitly store all elements in, e.g., a &cc; + array. + + + Compressible + CompressibleBrick + If all values are the same, use constant storage + for that single value. Otherwise, explicitly store all + elements. + + + Constant + ConstantFunction + Returns the same constant value for all + indices. + + + Dynamic + Dynamic + Manages a contiguous, local, one-dimensional, + dynamically resizable block of data. + + + Component Forwarding + CompFwd<EngineTag, + Components> + Returns the specified components from + EngineTag's engine. Components are + pieces of multi-value elements such as vectors + and tensors. + + + Expression + ExpressionTag<Expr> + Returns the value of the specified &pete; + expression. + + + Index Function + IndexFunction<Functor> + Makes the function + Functoraccepting indices mimic an + array. + + + MultiPatch + MultiPatch<LayoutTag,PatchTag> + Support distributed computation using several + processors (???contexts???). LayoutTag + indicates how the entire array is distributed among the + processors. Each processor uses a PatchTag + engine. + + + Remote + Remote<EngineTag> + unknown + + + Remote Dynamic + Remote<Dynamic> + unknown: specialization + + + Stencil + StencilEngine<Function, + Expression> + Returns values computed by applying the + user-specified function to sets of contiguous values in the + given engine or container. Compare with user function + engines. + + + User Function + UserFunctionEngine<UserFunction,Expression> + Returns values computed by applying the + user-specified function to the given engine or container. + QUESTION: Is the following claim correct? For each returned + value, only one value from the engine or container is + used. + + + +

+ + QUESTION: Where do we describe views? + + QUESTION: What does NewEngine do? Should it be + described when describing views? Should it be omitted as an + implementation detail? + + QUESTION: Where do we describe &engine; patches found in + src/Engine/EnginePatch.h? + All patch data in a separate chapter or engine-specific pieces in + this chapter? + + QUESTION: What is notifyEngineWrite? + See also src/Engine/NotifyEngineWrite.h. + + QUESTION: What aspect of MultiPatch uses IsValid in + src/Engine/IsValidLocation.h? + + QUESTION: Who uses intersections? Where should this be + described? See src/Engine/Intersector.h, src/Engine/IntersectEngine.h, and + src/Engine/ViewEngine.h. + +

+ &engine; Compile-Time Interface + + ADD: a table of template parameters ala &array;. ADD: + compile-time types and values. +

+ + +

+ Constructors and Destructors + + ADD: a table of constructors and destructors ala + &array;'s. +

+ + +

+ Element Access + + ADD: a table with read and + operator(). +

+ + +

+ Accessors + + ADD: a table of accessors. +

+ + +

+ &engine; Assignments + + similar to &array;'s assignments. shallow copies. ADD: a + table with one entry +

+ + +

+ Utility Methods + + ADD: a table including + makeOwnCopy. + + QUESTION: What are dataObject, + isShared, and related methods? +

+ + +

+ Implementation Details + + ADD: this section. Explain that + dataBlock_m and data_m point + to the same place. The latter speeds access, but what is the + purpose of the former? +

+ + +

+ Brick and BrickView Engines + + ADD: description of what a brick means. ADD: whatever + specializations the class has, e.g., + offset. + + QUESTION: What does DoubleSliceHelper do? +

+ + +

+ Compressible Brick and BrickView Engines + + ADD this. +

+ + +

+ Dynamic and DynamicView Engines: + + ADD this. Manages a contiguous, local, resizable, 1D block + of data. +

+ + +

+ Component Engines + + I believe these implement array component-forwarding. See + src/Engine/ForwardingEngine.h. +

+ + +

+ Expression Engines + + Should this be described in the &pete; section? Unlikely. + See src/Engine/ExpressionEngine.h. +

+ + +

+ &engine; Functors + + QUESTION: What is an EngineFunctor? Should it + have its own section? See src/Engine/EngineFunctor.h. +

+ + +

+ <type>FieldEngine</type>: A Hierarchy of &engine;s + + A &field; consists of a hierarchy of materials and + centerings. These are implemented using a hierarchy of engines. + See src/Field/FieldEngine/FieldEngine.h + and src/Field/FieldEngine/FieldEngine.ExprEngine.h. +

+ + + + + &benchmark; Programs + + Explain how to use &benchmark; programs, especially the + options. Explain how to write a &benchmark; program. See also + src/Utilities/Benchmark.h + and src/Utilities/Benchmark.cmpl.cpp. + + + + + + Layouts and Partitions: Distribute Computation Among + Contexts + + QUESTION: What is the difference between + ReplicatedTag and DistributedTag? + + + + + + &pete;: Evaluating Parallel Expressions + +

+ UNKNOWN + +

+ Leaf Tag Classes + + NotifyPreReadTag indicates a term is about to + be read. Why is this needed? Defined in src/Utilities/NotifyPreRead.h. +

+ + + + + + Views + + QUESTION: Should this have its own chapter or be part of a + container chapter? + + Describe View0, View1, …, + View7 and View1Implementation. + + QUESTION: What causes the need for AltView0 and + AltComponentView? + + Be sure to describe ComponentView in the same + place. This is specialized for &array;s in src/Array/Array.h:1323–1382. + +

+ <type>ViewIndexer<Dim,Dim2></type> + + Defined in src/Utilities/ViewIndexer.h, this + type translates indices between a domain and a view of it. +

+ + + + Threads + + Perhaps include information in src/Engine/DataObject.h. + + &pooma; options include UNFINISHED + + + + + + Utility Types + + TMP: What is a good order? + +

+ <type>Options</type>: Varying Run-Time Execution + + Each &pooma; executable has a Options object, + created by Pooma::initialize, storing + run-time configurable values found in argv. + Default options are found in + Options::usage. + + See src/Utilities/Options.h and + src/Utilities/Options.cmpl.cpp. + + Scatter the specific options to other parts of the + manual. +

+ +

+ Check Correctness: <type>CTAssert</type>, + <type>PAssert</type>, <type>PInsist</type>, + <type>SameType</type> + + Assertions ensure program invariants are obeyed. + CTAssert, checked at compile time, incur no run-time + cost. PAssert and PInsist are checked + to run-time, the latter producing an explanatory message if the + assertion fails. Compiling with NOCTAssert and + NOPTAssert disable these checks. Compiling with just + NOPTAssert disables only the run-time checks. + + SameType ensures, at compile-time, two types + are the same. + + These are implemented in src/Utilities/PAssert.h and + src/Utilities/PAssert.cmpl.cpp. +

+ +

+ <type>Clock</type>: Measuring a Program's Execution Time + + See src/Utilities/Clock.h. +

+ + +

+ Smart Pointers: <type>RefCountedPtr</type>, + <type>RefCountedBlockPtr</type>, and + <type>DataBlockPtr</type> + + See src/Utilities/{RefCountedPtr,RefCountedBlockPtr,DataBlockPtr}.h. + src/Utilities/RefCounted.h + helps implement it. DataBlockPtr uses + &smarts;. +

+ +

+ <type>Inform</type>: Formatted Output for Multi-context + Execution + + See src/Utilities/Inform.h and src/Utilities/Inform.cmpl.cpp. +

+ +

+ <type>Statistics</type>: Report &pooma; Execution Statistics + + Collect and print execution statistics. Defined in + src/Utilities/Statistics.h. +

+ +

+ Random Numbers: <type>Unique</type> + + See src/Utilities/Unique.h. +

+ + + + + Types for Implementing &pooma; + + TMP: What is a good order? + + Describe types defined to implement &pooma; but that users do + not directly use. This chapter has lower priority than other + chapters since users (hopefully) do not need to know about these + classes. + +

+ <type>Tester</type>: Check Implementation Correctness + + &pooma; implementation test programs frequently consist of a + series of operations followed by correctness checks. The + Tester object supports these tests, returning a + boolean whether all the correctness checks yield true. Under + verbose output, messages are printed for each test. See src/Utilities/Tester.h. +

+ +

+ <type>ElementProperties<T></type>: Properties a Type + Supports + + This traits class permits optimizations in other templated + classes. See src/Utilities/ElementProperties.h. + +

+ +

+ <type>TypeInfo<T></type>: Print a String Describing + the Type + + Print a string describing the type. Defined in src/Utilities/TypeInfo.h. It is + specialized for other types in other files, e.g., src/Engine/EngineTypeInfo.h and + src/Field/FieldTypeInfo.h. + Is this a compile-time version of RTTI? +

+ +

+ <type>LoopUtils</type>: Loop Computations at Compile Time + + At compile time, LoopUtils supports copying + between arrays and computing the dot product of arrays. See + src/Utilities/MetaProg.h. +

+ +

+ <type>ModelElement<T></type>: Wrap a Type + + A wrapper class used to differentiate overloaded functions. + Defined in src/Utilities/ModelElement.h. Used + only by &array; and DynamicArray. +

+ +

+ <type>WrappedInt<int></type>: Wrap a Number + + A wrapper class used to differentiate overloaded functions + among different integers. Defined in src/Utilities/WrappedInt.h. Is this + class deprecated? Is it even necessary? +

+ +

+ Supporting Empty Classes + + The NoInit tag class indicates certain + initializations should be skipped. Defined in src/Utilities/NoInit.h. + + FIXME: Should be macro, not function. + POOMA_PURIFY_CONSTRUCTORS generates an empty + constructor, copy constructor, and destructor to avoid &purify; + warnings. Defined in src/Utilities/PurifyConstructors.h. + +

+ +

+ <type>Pooled<T></type>: Fast Memory Allocation of + Small Blocks + + Pooled<T> speeds allocation and + deallocation of memory blocks for small objects with + type T. Defined in src/Utilities/Pooled.h, it is + implemented in src/Utilities/Pool.h and src/Utilities/Pool.cmpl.cpp. + src/Utilities/StaticPool.h + no longer seems to be used. +

+ +

+ <type>UninitializedVector<T,Dim></type>: Create + Without Initializing + + This class optimizes creation of an array of objects by + avoiding running the default constructors. Later initialization + can occur, perhaps using a loop that can be unrolled. Defined in + src/Utilities/UninitializedVector.h, + this is used only by DomainTraits. +

+ + + + Algorithms for Implementing &pooma; + + In src/Utilities/algorithms.h, + copy, delete_back, and + delete_shiftup provide additional algorithms + using iterators. + + + + + TMP: Where do we describe these files? + + + + src/Utilities/Conform.h: tag for + checking whether terms in expression have conforming + domains + + + + src/Utilities/DerefIterator.h: + DerefIterator<T> and + ConstDerefIterator<T> automatically + dereference themselves to maintain const + correctness. + + + + src/Utilities/Observable.h, + src/Utilities/Observer.h, + and src/Utilities/ObserverEvent.h: + Observable<T>, + SingleObserveable<T>, + Observer<T>, and ObserverEvent + implement the observer pattern. What is the observer pattern? + Where is this used in the code? + + + + + + + + + + Future Development + +

+ Particles + + docs/ParticlesDoc.txt has + out-of-date information. + + See Section 3.2.3 of + papers/pooma.ps for an out-of-date + description. + + papers/jvwr.ps concerns mainly + particles. papers/8thSIAMPOOMAParticles.pdf, + by Julian Cummings and Bill Humphrey, concerns parallel particle + simulations. papers/iscope98linac.pdf + describes a particle beam simulation using &pooma;; it mainly + concerns particles. + +

+ Particles + + Do we want to include such a section? + + Section 3, "Sample Applications" of + papers/SiamOO98_paper.ps describes porting a + particle program written using High-Performance Fortran to + &pooma; and presumably why particles were added to &pooma;. It + also describes MC++, a Monte Carlo + neutron transport code. + +

+ +

+ + +

+ Composition of &engine;s + + The i,j-th element of the composition + a∘b of two arrays + a and b equals a(b(i,j)). + The composition engine tagged IndirectionTag<Array1, + Array2>, defined in src/Engine/IndirectionEngine.h is + unfinished. +

+ + +

+ Improving Consistency of Container Interfaces + +

+ Relations for &array;s + + Do &array;s currently support relations? If not, why not? + Should they be added? +

+ +

+ Supporting the Same Number of Dimensions + + &array; and &field; should support the same maximum number + of dimensions. Currently, &array;s support seven dimensions and + &field;s support only three. By definition, &dynamicarray; + supports only one dimension. + + Relations for &array;s. + + External guards for &array;s. +

+ +

+ + +

+ <function>where</function> Proxies + + QUESTION: Do we even discuss this broken + feature? Where is it used? Some related code is in + src/Array/Array.h:2511–2520. +

+ + +

+ Very Long Term Development Ideas + + Describe how to write a new configuration file. +

+ + + + + + Obtaining and Installing &pooma; + + ADD: Write this section, including extensive instructions + for Unix, MS Windows, and MacOS. List the configuration options. + Be sure to describe configuring for parallel execution. + +

+ Supporting Distributed Computation + + To use multiple processors with &pooma; requires installing + the &cheetah; messaging library and an underlying messaging library + such as the Message Passing Interface (&mpi;) Communications + Library or the &mm; Shared Memory Library. In this section, we + first describe how to install &mm;. Read the section only if using + &mm;, not &mpi;. Then we describe how to install &cheetah; and + configure &pooma; to use it. + +

+ Obtaining and Installing the &mm; Shared Memory Library + + &cheetah;, and thus &pooma;, can use Ralf Engelschall's &mm; + Shared Memory Library to pass messages between processors. For + example, the &author; uses this library on a two-processor + computer running &linux;. The library, available at + http://www.engelschall.com/sw/mm/, is available for free and has + been successfully tested on a variety of Unix platforms. + + We describe how to download and install the &mm; library. + + + Download the library from the &pooma; Download page + available off the &pooma; home page (&poomaHomePage;). + + + Extract the source code using tar xzvf + mm-1.1.3.tar.gz. Move into the resulting source + code directory mm-1.1.3. + + + Prepare to compile the source code by configuring it + using the configure command. To change + the default installation directory /usr/local, specify + --prefix=directory + option. The other configuration options can be listed by + specifying the --help option. Since the + &author; prefers to keep all &pooma;-related code in his + poomasubdirectory, he + uses ./configure + --prefix=${HOME}/pooma/mm-1.1.3. + + + Create the library by issuing the make + command. This compiles the source code using a &c; compiler. To + use a different compiler than the &mm; configuration chooses, set + the CC to the compiler before configuring. + + + Optionally test the library by issuing the make + test command. If successful, the penultimate line + should be OK - ALL TESTS SUCCESSFULLY + PASSED. + + + Install the &mm; Library by issuing the make + install command. This copies the library files to the + installation directory. The mm-1.1.3 directory containing the + source code may now be removed. + + + +

+ + +

+ Obtaining and Installing the &cheetah; Messaging Library + + The &cheetah; Library decouples communication from + synchronization. Using asynchronous messaging rather than + synchronous messaging permits a message sender to operate without + the cooperation of the message recipient. Thus, implementing + message sending is simpler and processing is more efficiently + overlapped with it. Remote method invocation is also supported. + The library was developed at the Los Alamos National Laboratory's + Advanced Computing Laboratory. + + &cheetah;'s messaging is implemented using an underlying + messaging library such as the Message Passing Interface (&mpi;) + Communications Library (FIXME: xref linkend="mpi99", ) or the &mm; + Shared Memory Library. &mpi; works on a wide variety of platforms + and has achieved widespread usage. &mm; works under Unix on any + computer with shared memory. Both libraries are available for + free. The instructions below work for whichever library you + choose. + + We describe how to download and install &cheetah;. + + + Download the library from the &pooma; Download page + available off the &pooma; home page (&poomaHomePage;). + + + Extract the source code using tar xzvf + cheetah-1.0.tgz. Move into the resulting source code + directory cheetah-1.0. + + + Edit a configuration file corresponding to your operating + system and compiler. These .conf files are located in the + config directory. For + example, to use &gcc; with the &linux; operating system, use + config/LINUXGCC.conf. + + The configuration file usually does not need + modification. However, if you are using &mm;, ensure + shmem_default_dir specifies its location. + For example, the &author; modified the value to + "/home/oldham/pooma/mm-1.1.3". + + + Prepare to compile the source code by configuring it + using the configure command. Specify the + configuration file using the --arch option. + Its argument should be the configuration file's name, omitting + its .conf suffix. For + example, --arch LINUXGCC. Some other + options include + + + --help + + lists all the available options + + + + --shmem --nompi + + indicates use of &mm;, not &mpi; + + + + --mpi --noshmem + + indicates use of &mpi;, not &mm; + + + + --opt + + causes the compiler to produce optimized source code + + + + --noex + + prevents use of &cc; exceptions + + + + --prefix directory + + specifies the installation directory where the + library will be copied rather than the default. + + + + For example, the &author; uses ./configure --arch + LINUXGCC --shmem --nompi --noex --prefix + ${HOME}/pooma/cheetah-1.0 --opt. The + --arch LINUXGCC indicates use of &gcc; + under a &linux; operating system. The &mm; library is used, + but &cc; exceptions are not. The latter choice matches + &pooma;'s default choice. The library will be installed in + the ${HOME}/pooma/cheetah-1.0. + Finally, the library code will be optimized, hopefully running + faster than unoptimized code. + + + Follow the directions printed by + configure: Change directories to the + lib subdirectory named + by the --arch argument and then type + make to compile the source code and create + the library. + + + Optionally ensure the library works correctly by issuing + the make tests command. + + + Install the library by issuing the make + install command. This copies the library files to + the installation directory. The cheetah-1.0 directory containing + the source code may now be removed. + + + +

+ +

+ Configuring &pooma; When Using &cheetah; + + To use &pooma; with &cheetah;, one must tell &pooma; the + location of the &cheetah; library using the + --messaging configuration option. To do this, + + + Set the &cheetah; directory environment variable + CHEETAHDIR to the directory containing the + installed &cheetah; library. For + example, declare -x + CHEETAHDIR=${HOME}/pooma/cheetah-1.0 specifies the + installation directory used in the previous section. + + + When configuring &pooma;, specify the + --messaging option. For example, + ./configure --arch LINUXgcc --opt + --messaging configures for &linux;, &gcc;, and an + optimized library using &cheetah;. + + + +

+ + +

+ + + + + Dealing with Compilation Errors + + Base this low-priority section on errors.html. QUESTION: Where is + errors.html? + + + + + + TMP: Notes to Myself + +

+ Miscellaneous + + + + QUESTION: How do I know when to use a type name versus just + the concept? For example, when do I use array + versus &array;? + + + + Krylov solvers are described in Section 3.5.2 of + papers/pooma.ps. + + + + Section 5, "The Polygon Overlay Problem," describes + porting an ANSI &c; program to &pooma;. + + + + A good example book: STL Tutorial and Reference + Guide: &cc; Programming with the Standard Template + Library, second edition, by David R. Musser, + Gillmer J. Derge, and Atul Sanai, ISBN 0-201-37923-6, + QA76.73.C153.M87 2001. + + + + One STL reference book listed functions in margin notes, + easing finding material. Do this. + + + + QUESTION: Does Berna Massingill at Trinity University have + any interest ior access to any parallel computers? + + + +

+ + +

+ Existing HTML Tutorials + + All these tutorials are out-of-date, but the ideas and text + may still be relevant. + + + index.html + list of all tutorials. No useful + material. + + introduction.html + data-parallel Laplace solver using Jacobi + iteration ala Doof2d + + background.html + short, indirect introduction to &pete;; parallel + execution model; &cc;; templates; &stl;; expression + templates + + tut-01.html + UNFINISHED + + Layout.html + UNFINISHED + + parallelism.html + UNFINISHED + + self-test.html + UNFINISHED + + threading.html + UNFINISHED + + tut-03.html + UNFINISHED + + tut-04.html + UNFINISHED + + tut-05.html + UNFINISHED + + tut-06.html + UNFINISHED + + tut-07.html + UNFINISHED + + tut-08.html + UNFINISHED + + tut-09.html + UNFINISHED + + tut-10.html + UNFINISHED + + tut-11.html + UNFINISHED + + tut-12.html + UNFINISHED + + tut-13.html + UNFINISHED + + + +

+ + + + + + + + Bibliography + + FIXME: How do I process these entries? + + + mpi99 + + + WilliamGropp + + + EwingLusk + + + AnthonySkjellum + + + + 1999 + Massachusetts Institute of Technology + + 0-262-57132-3 + + The MIT Press +

Cambridge, MA

+ + Using MPI + Portable Parallel Programming with the Message-Passing Interface + second edition + + + + + + + + Glossary + + ADD: Make sure all entries are indexed and perhaps point back + to their first use. WARNING: This is constructed by hand so it is + likely to be full of inconsistencies and errors. + + + S + + + Suite Name + + An arbitrary string denoting a particular toolkit + configuration. For example, the string + SUNKCC-debug might indicate a configuration for + the Sun Solaris + operating system and the &kcc; &cc; compiler with debugging + support. By default, the suite name it is equal to the + configuration's architecture name. + + + + + + + + + &genindex.sgm; + + Index: docs/manual/figures/distributed.mp =================================================================== RCS file: distributed.mp diff -N distributed.mp *** /dev/null Fri Mar 23 21:37:44 2001 --- distributed.mp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,195 ---- + %% Oldham, Jeffrey D. + %% 2001Nov28 + %% Pooma + + %% Illustrations for Distributed Computing + + %% Assumes TEX=latex. + + input boxes; + + verbatimtex + \documentclass[10pt]{article} + \begin{document} + etex + + %% Parts of Distributed Computation + beginfig(101) + numeric unit; unit = 0.9cm; + + %% Create the Container Storage Partition subfigure. + numeric arrayWidth; arrayWidth = 2; % as multiple of unit + numeric arrayHeight; arrayHeight = 4; % as multiple of unit + numeric guardWidth; guardWidth = 0.1; % as multiple of unit + numeric patchWidth; patchWidth = arrayWidth/3; % as multiple of unit + numeric patchHeight; patchHeight = arrayHeight/2; % as multiple of unit + numeric xPatchDistance; xPatchDistance = 0.9patchWidth; % as multiple of unit + numeric yPatchDistance; yPatchDistance = 0.5patchWidth; % as multiple of unit + numeric arrayPartitionDistance; arrayPartitionDistance = arrayWidth; + % distance between array and partition + numeric arrayLayoutDistance; arrayLayoutDistance = 0.5arrayHeight; + % distance between array and layout + numeric arrowLayoutDistance; arrowLayoutDistance = 0.5arrayLayoutDistance; + % distance between arrow and top of layout, not its label + numeric iota; iota = labeloffset; + numeric storageBoundaryWidth; storageBoundaryWidth = 1; % as multiple of unit + % gap between storage box and its internals + + % Create the Array. Use box "a". + boxit.a(); + a.ne - a.sw = unit*(arrayWidth,arrayHeight); + + % Create the partition. Use boxes "p[]". + for t = 0 upto 5: + boxit.p[t](); + p[t].ne - p[t].sw = unit*(1,1); + endfor; + for t = 0 upto 2: + p[t].sw=p[t+3].nw; + p[t].se=p[t+3].ne; + if t < 2: + p[t].ne = p[t+1].nw; + p[t+3].ne = p[t+4].nw; + fi + endfor; + boxit.pt(btex \begin{tabular}{c} + \\ external guard layers \\ + \\ internal guard layers \end{tabular} etex); + pt.n = p[4].s; + + % Create the layout patches "l[]" and their guard layers "g[]". + for t = 0 upto 5: + boxit.l[t](); + boxit.g[t](); + l[t].ne - l[t].sw = unit*(patchWidth, patchHeight); + g[t].ne - l[t].ne = -(g[t].sw - l[t].sw) = unit*guardWidth*(1,1); + endfor + for t = 0 upto 2: + if t < 2: + g[t+1].nw - g[t].ne = unit*(xPatchDistance,0); + fi + g[t].sw - g[t+3].nw = unit*(0,yPatchDistance); + endfor; + + % Create the storage equation boxes. + boxit.containerPlus(btex + etex); + boxit.containerArrow(btex $\Big\Downarrow$ etex); + + % Position the storage pieces. + p[0].nw - a.ne = unit*(arrayPartitionDistance,0); + containerPlus.c = (xpart(0.5[a.ne,p[0].nw]), ypart(a.c)); + containerArrow.c = (xpart(containerPlus.c), ypart(l[1].n) + unit*arrowLayoutDistance); + ypart(a.s - l[1].n) = unit*arrayLayoutDistance; + xpart(containerPlus.c - l[1].n) = 0; + + % Create a boundary box around storage partition. + boxit.storageBoundary(); + ypart(storageBoundary.n - a.n) = + ypart(l[4].s - storageBoundary.s) = unit*2storageBoundaryWidth; + xpart(a.w - storageBoundary.w) = unit*storageBoundaryWidth; + xpart(storageBoundary.e - pt.e) = unit*storageBoundaryWidth; + + %% Create the Computer Configuration subfigure. + numeric configurationBoundaryWidth; configurationBoundaryWidth = storageBoundaryWidth; + % gap between computer configuration box and its internals + + for t = 0 upto 2: + circleit.c[t](); + c[t].n - c[t].s = 1.3(0,ypart(g[0].ne - g[3].sw)); + c[t].e - c[t].w = 1.5(xpart(g[0].ne - g[3].sw),0); + endfor + c[2].c - c[1].c = c[1].c - c[0].c = g[1].c - g[0].c; + + boxit.configurationBoundary(); + ypart(configurationBoundary.n - configurationBoundary.s) = + ypart(storageBoundary.n - storageBoundary.s); + xpart(configurationBoundary.e - configurationBoundary.w) = + xpart(c[2].e - c[0].w)+2*unit*configurationBoundaryWidth; + configurationBoundary.c = c[1].c; + + %% Create the Computation Configuration subfigure. + % Create the patches. + for t = 0 upto 5: + boxit.L[t](); + boxit.G[t](); + L[t].ne - L[t].sw = unit*(patchWidth, patchHeight); + G[t].ne - L[t].ne = -(G[t].sw - L[t].sw) = unit*guardWidth*(1,1); + endfor + for t = 0 upto 2: + if t < 2: + G[t+1].nw - G[t].ne = unit*(xPatchDistance,0); + fi + G[t].sw - G[t+3].nw = unit*(0,yPatchDistance); + endfor; + + % Create the contexts. + for t = 0 upto 2: + circleit.C[t](); + C[t].n - C[t].s = 1.3(0,ypart(G[0].ne - G[3].sw)); + C[t].e - C[t].w = 1.5(xpart(G[0].ne - G[3].sw),0); + endfor + C[2].c - C[1].c = C[1].c - C[0].c = G[1].c - G[0].c; + C[0].c = 0.5[G[0].c,G[3].c]; + + %% Relate the subfigures. + numeric containerConfigurationDistance; + containerConfigurationDistance = arrayPartitionDistance; + % distance between container storage and computer configuration subfigures + numeric containerComputationDistance; containerComputationDistance = arrayLayoutDistance; + % distance between container storage subfigure and computation configuration subfigure + numeric arrowComputationDistance; arrowComputationDistance = arrowLayoutDistance; + % distance between arrow and top of computation configuration, not its label + + boxit.figurePlus(btex + etex); + boxit.figureArrow(btex $\Big\Downarrow$ etex); + + configurationBoundary.w - storageBoundary.e = + unit*(containerConfigurationDistance,0); %% HERE + figurePlus.c = 0.5[configurationBoundary.w, storageBoundary.e]; + figureArrow.c = (xpart(0.5[configurationBoundary.e,storageBoundary.w]), + ypart(C[1].n) + unit*arrowComputationDistance); + + 0.5[configurationBoundary.se,storageBoundary.sw] - C[1].n = + unit*(0,containerComputationDistance); + + %% Draw the Container Domain Partitioning structures. + drawboxed(a); label.top(btex \begin{tabular}{c} container's\\ domain \end{tabular} etex, a.n); + for t = 0 upto 5: + drawboxed(p[t]); + endfor + drawunboxed(pt); + label.top(btex partition etex, p[1].n); + for t = 0 upto 5: + drawboxed(l[t],g[t]); + endfor + label.top(btex patches etex, g[1].n); + z0 = g[2].e + unit*(1,0); + drawarrow z0 -- (g[2].e+(iota,0)); + label.rt(btex \begin{tabular}{l} patch with\\guard cells \end{tabular} etex, z0); + drawunboxed(containerPlus,containerArrow); + drawboxed(storageBoundary); + label.top(btex Partition Container's Domain etex, storageBoundary.n); + + %% Draw the Computer Configuration structures. + for t = 0 upto 2: + drawboxed(c[t]); + endfor + label.top(btex contexts etex, c[1].n); + label.bot(btex \begin{tabular}{c} Each context has memory and\\ processors to execute a program. \end{tabular} etex, c[1].s); + drawboxed(configurationBoundary); + label.top(btex Computer Configuration etex, configurationBoundary.n); + + %% Draw the Computer Computation structures. + for t = 0 upto 5: + drawboxed(L[t],G[t]); + endfor + for t = 0 upto 2: + drawboxed(C[t]); + endfor + label.top(btex Layout etex, C[1].n); + label.bot(btex Each context can contain several patches. etex, C[1].s); + + %% Draw the subfigure relations structures. + drawunboxed(figurePlus,figureArrow); + label.rt(btex DistributedTag etex, figureArrow.e); + endfig; + + bye Index: docs/manual/figures/doof2d.mp =================================================================== RCS file: doof2d.mp diff -N doof2d.mp *** /dev/null Fri Mar 23 21:37:44 2001 --- doof2d.mp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,257 ---- + %% Oldham, Jeffrey D. + %% 2001Nov26 + %% Pooma + + %% Illustrations for the Tutorial Chapter (Chapter 2) + + %% Assumes TEX=latex. + + verbatimtex + \documentclass[10pt]{article} + \begin{document} + etex + + % Draw a set of grid cells. + vardef drawGrid(expr nuCells, unit, llCorner) = + for i = 0 upto nuCells-1: + for j = 0 upto nuCells-1: + draw unitsquare scaled unit shifted (llCorner + unit*(i,j)); + endfor + endfor + enddef; + + % Label the specified grid, grid cell, or its edge. + % Place a value at the center of a grid cell. + vardef labelCell(expr lbl, xy, llCorner) = + label(lbl, llCorner + unit*(xy + 0.5*(1,1))); + enddef; + + % Label the bottom of a grid cell. + vardef labelCellBottom(expr lbl, xy, llCorner) = + label.bot(lbl, llCorner + unit*(xy + 0.5*(1,0))); + enddef; + + % Label the left side of a grid cell. + vardef labelCellLeft(expr lbl, xy, llCorner) = + label.lft(lbl, llCorner + unit*(xy + 0.5*(0,1))); + enddef; + + % Label the top of a grid. + vardef labelGrid(expr lbl, nuCells, llCorner) = + label.top(lbl, llCorner + unit*(nuCells/2,nuCells)); + enddef; + + %% Global Declarations + numeric unit; unit = 0.9cm; % width or height of an individual grid cell + + + %% Initial Configuration. + beginfig(201) + numeric nuCells; nuCells = 7; % number of cells in each dimension + % This number should be odd. + % Draw the grid cells. + drawGrid(nuCells, unit, origin); + + % Label the grid cells' values. + for i = 0 upto nuCells-1: + for j = 0 upto nuCells-1: + if ((i = nuCells div 2) and (j = nuCells div 2)): + labelCell(btex \footnotesize 1000.0 etex, (i,j), origin); + else: + labelCell(btex \footnotesize 0.0 etex, (i,j), origin); + fi + endfor + endfor + + % Label the grid. + labelGrid(btex Array \texttt{b}: Initial Configuration etex, nuCells, origin); + endfig; + + + %% After the first averaging. + beginfig(202) + numeric nuCells; nuCells = 7; % number of cells in each dimension + % This number should be odd. + % Draw the grid cells. + drawGrid(nuCells, unit, origin); + + % Label the grid cells' values. + for i = 0, 1, nuCells-2, nuCells-1: + for j = 0 upto nuCells-1: + labelCell(btex \footnotesize 0.0 etex, (i,j), origin); + endfor + endfor + for j = 0, 1, nuCells-2, nuCells-1: + for i = 0 upto nuCells-1: + labelCell(btex \footnotesize 0.0 etex, (i,j), origin); + endfor + endfor + for i = (nuCells div 2)-1 upto (nuCells div 2)+1: + for j = (nuCells div 2)-1 upto (nuCells div 2)+1: + labelCell(btex \footnotesize 111.1 etex, (i, j), origin); + endfor + endfor + + % Label the grid. + labelGrid(btex Array \texttt{a}: After the first averaging etex, nuCells, origin); + endfig; + + + %% After the second averaging. + beginfig(203) + numeric nuCells; nuCells = 7; % number of cells in each dimension + % This number should be odd. + % Draw the grid cells. + drawGrid(nuCells, unit, origin); + + % Label the grid cells' values. + for i = 0, nuCells-1: + for j = 0 upto nuCells-1: + labelCell(btex \footnotesize 0.0 etex, (i,j), origin); + endfor + endfor + for j = 0, nuCells-1: + for i = 0 upto nuCells-1: + labelCell(btex \footnotesize 0.0 etex, (i,j), origin); + endfor + endfor + labelCell(btex \footnotesize 111.1 etex, (3,3), origin); + for t = (3,2), (4,3), (3,4), (2,3): + labelCell(btex \footnotesize 74.1 etex, t, origin); + endfor + for t = (2,2), (2,4), (4,4), (4,2): + labelCell(btex \footnotesize 49.4 etex, t, origin); + endfor + for t = (3,1), (5,3), (3,5), (1,3): + labelCell(btex \footnotesize 37.0 etex, t, origin); + endfor + for t = (1,2), (2,1), (4,1), (5,2), (5,4), (4,5), (2,5), (1,4): + labelCell(btex \footnotesize 24.7 etex, t, origin); + endfor + for t = (1,1), (5,1), (5,5), (1,5): + labelCell(btex \footnotesize 12.3 etex, t, origin); + endfor + + % Label the grid. + labelGrid(btex Array \texttt{b}: After the second averaging etex, nuCells, origin); + endfig; + + + %% Illustrate addition of arrays. + beginfig(210) + numeric nuCells; nuCells = 3; % number of cells in each dimension + % This number should be odd. + numeric operatorWidth; operatorWidth = 1.5; + % horizontal space for an operator as + % a multiple of "unit" + + %% Determine the locations of the arrays. + z0 = origin; + z1 = z0 + unit * (nuCells+operatorWidth,0); + z2 - z1 = z1 - z0; + + %% Draw the grid cells and the operators. + for t = 0 upto 2: + drawGrid(nuCells, unit, z[t]); + endfor + label(btex = etex, z1 + unit*(-0.9operatorWidth, 0.5nuCells)); + label(btex + etex, z2 + unit*(-0.9operatorWidth, 0.5nuCells)); + + %% Label the grid cells' values. + % Label b(I,J) grid values. + labelCell(btex \normalsize 9 etex, (0,0), z1); + labelCell(btex \normalsize 11 etex, (1,0), z1); + labelCell(btex \normalsize 13 etex, (2,0), z1); + labelCell(btex \normalsize 17 etex, (0,1), z1); + labelCell(btex \normalsize 19 etex, (1,1), z1); + labelCell(btex \normalsize 21 etex, (2,1), z1); + labelCell(btex \normalsize 25 etex, (0,2), z1); + labelCell(btex \normalsize 27 etex, (1,2), z1); + labelCell(btex \normalsize 29 etex, (2,2), z1); + % Label b(I+1,J-1) grid values. + labelCell(btex \normalsize 3 etex, (0,0), z2); + labelCell(btex \normalsize 5 etex, (1,0), z2); + labelCell(btex \normalsize 7 etex, (2,0), z2); + labelCell(btex \normalsize 11 etex, (0,1), z2); + labelCell(btex \normalsize 13 etex, (1,1), z2); + labelCell(btex \normalsize 15 etex, (2,1), z2); + labelCell(btex \normalsize 19 etex, (0,2), z2); + labelCell(btex \normalsize 21 etex, (1,2), z2); + labelCell(btex \normalsize 23 etex, (2,2), z2); + % Label b(I,J)+b(I+1,J-1) grid values. + labelCell(btex \normalsize 12 etex, (0,0), z0); + labelCell(btex \normalsize 16 etex, (1,0), z0); + labelCell(btex \normalsize 20 etex, (2,0), z0); + labelCell(btex \normalsize 28 etex, (0,1), z0); + labelCell(btex \normalsize 32 etex, (1,1), z0); + labelCell(btex \normalsize 36 etex, (2,1), z0); + labelCell(btex \normalsize 34 etex, (0,2), z0); + labelCell(btex \normalsize 38 etex, (1,2), z0); + labelCell(btex \normalsize 42 etex, (2,2), z0); + + %% Label the indices. + % Label b(I,J) grid indices. + labelCellBottom(btex \footnotesize 1 etex, (0,0), z1); + labelCellBottom(btex \footnotesize 2 etex, (1,0), z1); + labelCellBottom(btex \footnotesize 3 etex, (2,0), z1); + labelCellLeft(btex \footnotesize 1 etex, (0,0), z1); + labelCellLeft(btex \footnotesize 2 etex, (0,1), z1); + labelCellLeft(btex \footnotesize 3 etex, (0,2), z1); + % Label b(I+1,J-1) grid indices. + labelCellBottom(btex \footnotesize 2 etex, (0,0), z2); + labelCellBottom(btex \footnotesize 3 etex, (1,0), z2); + labelCellBottom(btex \footnotesize 4 etex, (2,0), z2); + labelCellLeft(btex \footnotesize 0 etex, (0,0), z2); + labelCellLeft(btex \footnotesize 1 etex, (0,1), z2); + labelCellLeft(btex \footnotesize 2 etex, (0,2), z2); + + %% Label the grids. + labelGrid(btex $b(I,J)+b(I+1,J-1)$ etex, nuCells, z0); + labelGrid(btex $b(I,J)$ etex, nuCells, z1); + labelGrid(btex $b(I+1,J-1)$ etex, nuCells, z2); + endfig; + + + %% Illustrate application of a stencil. + beginfig(211) + numeric nuCells; nuCells = 5; % number of cells in each dimension + numeric nuStencilCells; nuStencilCells = 3; + % number of stencil cells in each dimension + numeric stencilMultiple; stencilMultiple = 0.1; + % small multiple to make it visible + + % Draw the grid cells. + drawGrid(nuCells, unit, origin); + + % Draw the stencil. + draw unitsquare scaled ((nuStencilCells-2stencilMultiple) * unit) shifted (unit*(stencilMultiple*(1,1)+(0,2))); + draw (unitsquare scaled ((1-stencilMultiple) * unit) shifted (unit*(0.5*stencilMultiple*(1,1)+(1,3)))) dashed evenly; + + % Label the extents. + picture lbl; + ahlength := 0.4unit; + drawarrow unit*(2,4) -- unit*(3,5); + lbl = thelabel.lrt(btex \scriptsize upperExtent etex, unit*0.5[(2,4),(3,5)]); + unfill bbox lbl; draw lbl; + drawarrow unit*(1,3) -- unit*(0,2); + lbl := thelabel.lrt(btex \scriptsize lowerExtent etex, unit*0.5[(1,3),(0,2)]); + unfill bbox lbl; draw lbl; + + % Label the indices. + labelCellBottom(btex \footnotesize 0 etex, (0,0), origin); + labelCellBottom(btex \footnotesize 1 etex, (1,0), origin); + labelCellBottom(btex \footnotesize 2 etex, (2,0), origin); + labelCellBottom(btex \footnotesize 3 etex, (3,0), origin); + labelCellBottom(btex \footnotesize 4 etex, (4,0), origin); + labelCellLeft(btex \footnotesize 0 etex, (0,0), origin); + labelCellLeft(btex \footnotesize 1 etex, (0,1), origin); + labelCellLeft(btex \footnotesize 2 etex, (0,2), origin); + labelCellLeft(btex \footnotesize 3 etex, (0,3), origin); + labelCellLeft(btex \footnotesize 4 etex, (0,4), origin); + + % Label the grid. + labelGrid(btex Applying a Stencil to Position (1,3) etex, nuCells, origin); + + endfig; + + bye Index: docs/manual/figures/makefile =================================================================== RCS file: makefile diff -N makefile *** /dev/null Fri Mar 23 21:37:44 2001 --- makefile Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,69 ---- + ### Oldham, Jeffrey D. + ### 1997 Dec 26 + ### misc + ### + ### LaTeX -> PostScript/PDF/WWW + ### XML -> TeX/DVI/PS/PDF + + # Definitions for PostScript and WWW Creation + TEX= latex + WWWHOMEDIR= /u/oldham/www + LATEX2HTML= latex2html + BASICLATEX2HTMLOPTIONS= -info "" -no_footnode -no_math -html_version 3.2,math + #LATEX2HTMLOPTIONS= -local_icons -split +1 $(BASICLATEX2HTMLOPTIONS) + LATEX2HTMLOPTIONS= -no_navigation -split 0 $(BASICLATEX2HTMLOPTIONS) + MPOST= mpost + + # Definitions for Jade. + JADEDIR= /usr/lib/sgml/stylesheets/docbook + PRINTDOCBOOKDSL= print/docbook.dsl + HTMLDOCBOOKDSL= html/docbook.dsl + XML= dtds/decls/xml.dcl + INDEXOPTIONS= -t 'Index' -i 'index' -g -p + + CXXFLAGS= -g -Wall -pedantic -W -Wstrict-prototypes -Wpointer-arith -Wbad-function-cast -Wcast-align -Wconversion -Wnested-externs -Wundef -Winline -static + + all: outline.ps + + %.all: %.ps %.pdf %.html + chmod 644 $*.ps $*.pdf + mv $*.ps $*.pdf $* + + %.dvi: %.ltx + $(TEX) $< + # bibtex $* + # $(TEX) $< + $(TEX) $< + + %.ps: %.dvi + dvips -t letter $< -o + + %.pdf.ltx: %.ltx + sed -e 's/^%\\usepackage{times}/\\usepackage{times}/' $< > $@ + + %.pdf: %.pdf.ps + ps2pdf $< $@ + + # This rule assumes index creation. + %.dvi: %.xml genindex.sgm + jade -D$(JADEDIR) -t sgml -d $(HTMLDOCBOOKDSL) -V html-index $(XML) $< + perl collateindex.pl $(INDEXOPTIONS) -o genindex.sgm HTML.index + jade -D$(JADEDIR) -t tex -d $(PRINTDOCBOOKDSL) $(XML) $< && jadetex $*.tex && jadetex $*.tex && jadetex $*.tex + + genindex.sgm: + perl collateindex.pl $(INDEXOPTIONS) -N -o $@ + + %.html: %.xml + jade -D$(JADEDIR) -t sgml -d $(HTMLDOCBOOKDSL) $(XML) $< + + %.pdf: %.xml + jade -D$(JADEDIR) -t tex -d $(PRINTDOCBOOKDSL) $(XML) $< && pdfjadetex $*.tex && pdfjadetex $*.tex + + mproof-%.ps: %.mp + declare -x TEX=latex && $(MPOST) $< && tex mproof.tex $*.[0-9]* && dvips mproof.dvi -o $@ + + %.txt: %.ltx + detex $< > $@ + + clean: + rm -f *.dvi *.aux *.log *.toc *.bak *.blg *.bbl *.glo *.idx *.lof *.lot *.htm *.mpx mpxerr.tex HTML.index outline.tex Index: docs/manual/programs/Doof2d-Array-distributed-annotated.patch =================================================================== RCS file: Doof2d-Array-distributed-annotated.patch diff -N Doof2d-Array-distributed-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-distributed-annotated.patch Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,162 ---- + *** Doof2d-Array-distributed.cpp Wed Nov 28 07:46:56 2001 + --- Doof2d-Array-distributed-annotated.cpp Wed Nov 28 07:53:31 2001 + *************** + *** 1,4 **** + ! #include // has std::cout, ... + ! #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + --- 1,5 ---- + ! + ! #include <iostream> // has std::cout, ... + ! #include <stdlib.h> // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + *************** + *** 15,19 **** + // (i,j). The "C" template parameter permits use of this stencil + // operator with both Arrays and Fields. + ! template + inline + typename C::Element_t + --- 16,20 ---- + // (i,j). The "C" template parameter permits use of this stencil + // operator with both Arrays and Fields. + ! template <class C> + inline + typename C::Element_t + *************** + *** 40,52 **** + Pooma::initialize(argc,argv); + + ! // Ask the user for the number of processors. + long nuProcessors; + ! std::cout << "Please enter the number of processors: "; + ! std::cin >> nuProcessors; + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + --- 41,53 ---- + Pooma::initialize(argc,argv); + + ! // Ask the user for the number of processors. + long nuProcessors; + ! std::cout << "Please enter the number of processors: "; + ! std::cin >> nuProcessors; + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + *************** + *** 54,67 **** + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<2> interiorDomain(I,I); + + // Create the distributed arrays. + --- 55,68 ---- + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<2> interiorDomain(I,I); + + // Create the distributed arrays. + *************** + *** 70,85 **** + // dimension. Guard layers optimize communication between patches. + // Internal guards surround each patch. External guards surround + ! // the entire array domain. + ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors), + ! GuardLayers<2>(1), // internal + ! GuardLayers<2>(0)); // external + ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag()); + + // The template parameters indicate 2 dimensions and a 'double' + // element type. MultiPatch indicates multiple computation patches, + // i.e., distributed computation. The UniformTag indicates the + ! // patches should have the same size. Each patch has Brick type. + ! Array<2, double, MultiPatch > > a(layout); + ! Array<2, double, MultiPatch > > b(layout); + + // Set up the initial conditions. + --- 71,86 ---- + // dimension. Guard layers optimize communication between patches. + // Internal guards surround each patch. External guards surround + ! // the entire array domain. + ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors), + ! GuardLayers<2>(1), // internal + ! GuardLayers<2>(0)); // external + ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag()); + + // The template parameters indicate 2 dimensions and a 'double' + // element type. MultiPatch indicates multiple computation patches, + // i.e., distributed computation. The UniformTag indicates the + ! // patches should have the same size. Each patch has Brick type. + ! Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > a(layout); + ! Array<2, double, MultiPatch<UniformTag, Remote<Brick> > > b(layout); + + // Set up the initial conditions. + *************** + *** 89,97 **** + + // Create the stencil performing the computation. + ! Stencil stencil; + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(interiorDomain) = stencil(b, interiorDomain); + + --- 90,98 ---- + + // Create the stencil performing the computation. + ! Stencil<DoofNinePt> stencil; + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(interiorDomain) = stencil(b, interiorDomain); + + *************** + *** 102,106 **** + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + --- 103,107 ---- + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + *************** + *** 110,111 **** + --- 111,113 ---- + return EXIT_SUCCESS; + } + + Index: docs/manual/programs/Doof2d-Array-element-annotated.patch =================================================================== RCS file: Doof2d-Array-element-annotated.patch diff -N Doof2d-Array-element-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-element-annotated.patch Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,144 ---- + *** Doof2d-Array-element.cpp Tue Nov 27 11:04:04 2001 + --- Doof2d-Array-element-annotated.cpp Tue Nov 27 12:06:32 2001 + *************** + *** 1,5 **** + ! #include // has std::cout, ... + ! #include // has EXIT_SUCCESS + ! #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, element-wise implementation + --- 1,6 ---- + ! + ! #include <iostream> // has std::cout, ... + ! #include <stdlib.h> // has EXIT_SUCCESS + ! #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, element-wise implementation + *************** + *** 7,17 **** + int main(int argc, char *argv[]) + { + ! // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + --- 8,18 ---- + int main(int argc, char *argv[]) + { + ! // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + *************** + *** 19,37 **** + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + ! // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + ! // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + ! Array<2, double, Brick> a(vertDomain); + ! Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + ! // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + --- 20,38 ---- + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + ! // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + ! // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + ! Array<2, double, Brick> a(vertDomain); + ! Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + ! // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + *************** + *** 41,49 **** + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + ! a(i,j) = weight * + (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + + --- 42,50 ---- + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + ! a(i,j) = weight * + (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + + *************** + *** 51,56 **** + + // Read from a. Write to b. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + b(i,j) = weight * + (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + + --- 52,57 ---- + + // Read from a. Write to b. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + b(i,j) = weight * + (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + + *************** + *** 60,70 **** + + // Print out the final central value. + ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + ! // The arrays are automatically deallocated. + + ! // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } + --- 61,72 ---- + + // Print out the final central value. + ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + ! // The arrays are automatically deallocated. + + ! // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } + + Index: docs/manual/programs/Doof2d-Array-parallel-annotated.patch =================================================================== RCS file: Doof2d-Array-parallel-annotated.patch diff -N Doof2d-Array-parallel-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-parallel-annotated.patch Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,106 ---- + *** Doof2d-Array-parallel.cpp Tue Nov 27 13:00:09 2001 + --- Doof2d-Array-parallel-annotated.cpp Tue Nov 27 14:07:07 2001 + *************** + *** 1,4 **** + ! #include // has std::cout, ... + ! #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + --- 1,5 ---- + ! + ! #include <iostream> // has std::cout, ... + ! #include <stdlib.h> // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + *************** + *** 12,17 **** + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + --- 13,18 ---- + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + *************** + *** 19,38 **** + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + ! // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<1> J(1,n-2); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + ! Array<2, double, Brick> a(vertDomain); + ! Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + --- 20,39 ---- + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + ! // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<1> J(1,n-2); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + ! Array<2, double, Brick> a(vertDomain); + ! Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + *************** + *** 45,50 **** + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + --- 46,51 ---- + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + *************** + *** 61,65 **** + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + --- 62,66 ---- + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + *************** + *** 69,70 **** + --- 70,72 ---- + return EXIT_SUCCESS; + } + + Index: docs/manual/programs/Doof2d-Array-stencil-annotated.patch =================================================================== RCS file: Doof2d-Array-stencil-annotated.patch diff -N Doof2d-Array-stencil-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-stencil-annotated.patch Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,152 ---- + *** Doof2d-Array-stencil.cpp Tue Nov 27 17:23:41 2001 + --- Doof2d-Array-stencil-annotated.cpp Tue Nov 27 17:36:13 2001 + *************** + *** 1,9 **** + ! #include // has std::cout, ... + ! #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, stencil implementation + + ! // Define the stencil class performing the computation. + class DoofNinePt + { + --- 1,10 ---- + ! + ! #include <iostream> // has std::cout, ... + ! #include <stdlib.h> // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, stencil implementation + + ! // Define the stencil class performing the computation. + class DoofNinePt + { + *************** + *** 14,19 **** + // This stencil operator is applied to each interior domain position + // (i,j). The "C" template parameter permits use of this stencil + ! // operator with both Arrays and Fields. + ! template + inline + typename C::Element_t + --- 15,20 ---- + // This stencil operator is applied to each interior domain position + // (i,j). The "C" template parameter permits use of this stencil + ! // operator with both Arrays and Fields. + ! template <class C> + inline + typename C::Element_t + *************** + *** 26,30 **** + } + + ! inline int lowerExtent(int) const { return 1; } + inline int upperExtent(int) const { return 1; } + + --- 27,31 ---- + } + + ! inline int lowerExtent(int) const { return 1; } + inline int upperExtent(int) const { return 1; } + + *************** + *** 42,47 **** + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + --- 43,48 ---- + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + *************** + *** 49,68 **** + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<2> interiorDomain(I,I); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + ! Array<2, double, Brick> a(vertDomain); + ! Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + --- 50,69 ---- + // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<2> interiorDomain(I,I); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + ! Array<2, double, Brick> a(vertDomain); + ! Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + *************** + *** 71,80 **** + b(n/2,n/2) = 1000.0; + + ! // Create the stencil performing the computation. + ! Stencil stencil; + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(interiorDomain) = stencil(b, interiorDomain); + + --- 72,81 ---- + b(n/2,n/2) = 1000.0; + + ! // Create the stencil performing the computation. + ! Stencil<DoofNinePt> stencil; + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(interiorDomain) = stencil(b, interiorDomain); + + *************** + *** 85,89 **** + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + --- 86,90 ---- + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + *************** + *** 93,94 **** + --- 94,96 ---- + return EXIT_SUCCESS; + } + + Index: docs/manual/programs/Doof2d-C-element-annotated.patch =================================================================== RCS file: Doof2d-C-element-annotated.patch diff -N Doof2d-C-element-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-C-element-annotated.patch Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,150 ---- + *** Doof2d-C-element.cpp Tue Nov 27 08:36:38 2001 + --- Doof2d-C-element-annotated.cpp Tue Nov 27 12:08:03 2001 + *************** + *** 1,4 **** + ! #include // has std::cout, ... + ! #include // has EXIT_SUCCESS + + // Doof2d: C-like, element-wise implementation + --- 1,5 ---- + ! + ! #include <iostream> // has std::cout, ... + ! #include <stdlib.h> // has EXIT_SUCCESS + + // Doof2d: C-like, element-wise implementation + *************** + *** 6,30 **** + int main() + { + ! // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + ! // Use two-dimensional grids of values. + double **a; + double **b; + + // Ask the user for the number n of elements along one dimension of + ! // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + ! // Allocate the arrays. + typedef double* doublePtr; + a = new doublePtr[n]; + b = new doublePtr[n]; + ! for (int i = 0; i < n; i++) { + a[i] = new double[n]; + b[i] = new double[n]; + --- 7,31 ---- + int main() + { + ! // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + ! // Use two-dimensional grids of values. + double **a; + double **b; + + // Ask the user for the number n of elements along one dimension of + ! // the grid. + long n; + ! std::cout << "Please enter the array size: "; + ! std::cin >> n; + + ! // Allocate the arrays. + typedef double* doublePtr; + a = new doublePtr[n]; + b = new doublePtr[n]; + ! for (int i = 0; i < n; i++) { + a[i] = new double[n]; + b[i] = new double[n]; + *************** + *** 32,49 **** + + // Set up the initial conditions. + ! // All grid values should be zero except for the central value. + ! for (int j = 0; j < n; j++) + ! for (int i = 0; i < n; i++) + a[i][j] = b[i][j] = 0.0; + b[n/2][n/2] = 1000.0; + + ! // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + a[i][j] = weight * + (b[i+1][j+1] + b[i+1][j ] + b[i+1][j-1] + + --- 33,50 ---- + + // Set up the initial conditions. + ! // All grid values should be zero except for the central value. + ! for (int j = 0; j < n; j++) + ! for (int i = 0; i < n; i++) + a[i][j] = b[i][j] = 0.0; + b[n/2][n/2] = 1000.0; + + ! // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + a[i][j] = weight * + (b[i+1][j+1] + b[i+1][j ] + b[i+1][j-1] + + *************** + *** 51,57 **** + b[i-1][j+1] + b[i-1][j ] + b[i-1][j-1]); + + ! // Read from a. Write to b. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + b[i][j] = weight * + (a[i+1][j+1] + a[i+1][j ] + a[i+1][j-1] + + --- 52,58 ---- + b[i-1][j+1] + b[i-1][j ] + b[i-1][j-1]); + + ! // Read from a. Write to b. + ! for (int j = 1; j < n-1; j++) + ! for (int i = 1; i < n-1; i++) + b[i][j] = weight * + (a[i+1][j+1] + a[i+1][j ] + a[i+1][j-1] + + *************** + *** 60,68 **** + } + + ! // Print out the final central value. + ! std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl; + + ! // Deallocate the arrays. + ! for (int i = 0; i < n; i++) { + delete [] a[i]; + delete [] b[i]; + --- 61,69 ---- + } + + ! // Print out the final central value. + ! std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl; + + ! // Deallocate the arrays. + ! for (int i = 0; i < n; i++) { + delete [] a[i]; + delete [] b[i]; + *************** + *** 73,74 **** + --- 74,76 ---- + return EXIT_SUCCESS; + } + + Index: docs/manual/programs/makefile =================================================================== RCS file: makefile diff -N makefile *** /dev/null Fri Mar 23 21:37:44 2001 --- makefile Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,12 ---- + ### Oldham, Jeffrey D. + ### 2001Nov27 + ### Pooma + ### + ### Produce Annotated Source Code + + all: Doof2d-C-element-annotated.cpp Doof2d-Array-element-annotated.cpp \ + Doof2d-Array-parallel-annotated.cpp Doof2d-Array-stencil-annotated.cpp \ + Doof2d-Array-distributed-annotated.cpp + + %-annotated.cpp: %-annotated.patch %.cpp + patch -o $@ < $< Index: examples/Manual/Doof2d/Doof2d-Array-distributed.cpp =================================================================== RCS file: Doof2d-Array-distributed.cpp diff -N Doof2d-Array-distributed.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-distributed.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,111 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, stencil, multiple processor implementation + + // Define the stencil class performing the computation. + class DoofNinePt + { + public: + // Initialize the constant average weighting. + DoofNinePt() : weight(1.0/9.0) {} + + // This stencil operator is applied to each interior domain position + // (i,j). The "C" template parameter permits use of this stencil + // operator with both Arrays and Fields. + template + inline + typename C::Element_t + operator()(const C& x, int i, int j) const + { + return ( weight * + ( x.read(i+1,j+1) + x.read(i+1,j ) + x.read(i+1,j-1) + + x.read(i ,j+1) + x.read(i ,j ) + x.read(i ,j-1) + + x.read(i-1,j+1) + x.read(i-1,j ) + x.read(i-1,j-1) ) ); + } + + inline int lowerExtent(int) const { return 1; } + inline int upperExtent(int) const { return 1; } + + private: + + // In the average, weight element with this value. + const double weight; + }; + + int main(int argc, char *argv[]) + { + // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // Ask the user for the number of processors. + long nuProcessors; + std::cout << "Please enter the number of processors: "; + std::cin >> nuProcessors; + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + std::cout << "Please enter the number of averagings: "; + std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + Interval<1> I(1,n-2); + Interval<2> interiorDomain(I,I); + + // Create the distributed arrays. + // Partition the arrays' domains uniformly, i.e., each patch has the + // same size. The first parameter tells how many patches for each + // dimension. Guard layers optimize communication between patches. + // Internal guards surround each patch. External guards surround + // the entire array domain. + UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors), + GuardLayers<2>(1), // internal + GuardLayers<2>(0)); // external + UniformGridLayout<2> layout(vertDomain, partition, DistributedTag()); + + // The template parameters indicate 2 dimensions and a 'double' + // element type. MultiPatch indicates multiple computation patches, + // i.e., distributed computation. The UniformTag indicates the + // patches should have the same size. Each patch has Brick type. + Array<2, double, MultiPatch > > a(layout); + Array<2, double, MultiPatch > > b(layout); + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + + // Create the stencil performing the computation. + Stencil stencil; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(interiorDomain) = stencil(b, interiorDomain); + + // Read from a. Write to b. + b(interiorDomain) = stencil(a, interiorDomain); + } + + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + + // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/Doof2d-Array-element.cpp =================================================================== RCS file: Doof2d-Array-element.cpp diff -N Doof2d-Array-element.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-element.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,70 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, element-wise implementation + + int main(int argc, char *argv[]) + { + // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + std::cout << "Please enter the number of averagings: "; + std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + Array<2, double, Brick> a(vertDomain); + Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + + // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + for (int j = 1; j < n-1; j++) + for (int i = 1; i < n-1; i++) + a(i,j) = weight * + (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + + b(i-1,j+1) + b(i-1,j ) + b(i-1,j-1)); + + // Read from a. Write to b. + for (int j = 1; j < n-1; j++) + for (int i = 1; i < n-1; i++) + b(i,j) = weight * + (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + + a(i ,j+1) + a(i ,j ) + a(i ,j-1) + + a(i-1,j+1) + a(i-1,j ) + a(i-1,j-1)); + } + + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + + // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/Doof2d-Array-parallel.cpp =================================================================== RCS file: Doof2d-Array-parallel.cpp diff -N Doof2d-Array-parallel.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-parallel.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,70 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, data-parallel implementation + + int main(int argc, char *argv[]) + { + // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + std::cout << "Please enter the number of averagings: "; + std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + Interval<1> I(1,n-2); + Interval<1> J(1,n-2); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + Array<2, double, Brick> a(vertDomain); + Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + + // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + b(I ,J+1) + b(I ,J ) + b(I ,J-1) + + b(I-1,J+1) + b(I-1,J ) + b(I-1,J-1)); + + // Read from a. Write to b. + b(I,J) = weight * + (a(I+1,J+1) + a(I+1,J ) + a(I+1,J-1) + + a(I ,J+1) + a(I ,J ) + a(I ,J-1) + + a(I-1,J+1) + a(I-1,J ) + a(I-1,J-1)); + } + + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + + // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/Doof2d-Array-stencil.cpp =================================================================== RCS file: Doof2d-Array-stencil.cpp diff -N Doof2d-Array-stencil.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Array-stencil.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,94 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + #include "Pooma/Arrays.h" // has Pooma's Array + + // Doof2d: Pooma Arrays, stencil implementation + + // Define the stencil class performing the computation. + class DoofNinePt + { + public: + // Initialize the constant average weighting. + DoofNinePt() : weight(1.0/9.0) {} + + // This stencil operator is applied to each interior domain position + // (i,j). The "C" template parameter permits use of this stencil + // operator with both Arrays and Fields. + template + inline + typename C::Element_t + operator()(const C& c, int i, int j) const + { + return ( weight * + ( c.read(i+1,j+1) + c.read(i+1,j ) + c.read(i+1,j-1) + + c.read(i ,j+1) + c.read(i ,j ) + c.read(i ,j-1) + + c.read(i-1,j+1) + c.read(i-1,j ) + c.read(i-1,j-1) ) ); + } + + inline int lowerExtent(int) const { return 1; } + inline int upperExtent(int) const { return 1; } + + private: + + // In the average, weight element with this value. + const double weight; + }; + + int main(int argc, char *argv[]) + { + // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + std::cout << "Please enter the number of averagings: "; + std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Specify the arrays' domains [0,n) x [0,n). + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + Interval<1> I(1,n-2); + Interval<2> interiorDomain(I,I); + + // Create the arrays. + // The template parameters indicate 2 dimensions, a 'double' element + // type, and ordinary 'Brick' storage. + Array<2, double, Brick> a(vertDomain); + Array<2, double, Brick> b(vertDomain); + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + + // Create the stencil performing the computation. + Stencil stencil; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(interiorDomain) = stencil(b, interiorDomain); + + // Read from a. Write to b. + b(interiorDomain) = stencil(a, interiorDomain); + } + + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The arrays are automatically deallocated. + + // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/Doof2d-C-element.cpp =================================================================== RCS file: Doof2d-C-element.cpp diff -N Doof2d-C-element.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-C-element.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,74 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + + // Doof2d: C-like, element-wise implementation + + int main() + { + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + std::cout << "Please enter the number of averagings: "; + std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + // Use two-dimensional grids of values. + double **a; + double **b; + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Allocate the arrays. + typedef double* doublePtr; + a = new doublePtr[n]; + b = new doublePtr[n]; + for (int i = 0; i < n; i++) { + a[i] = new double[n]; + b[i] = new double[n]; + } + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + for (int j = 0; j < n; j++) + for (int i = 0; i < n; i++) + a[i][j] = b[i][j] = 0.0; + b[n/2][n/2] = 1000.0; + + // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + for (int j = 1; j < n-1; j++) + for (int i = 1; i < n-1; i++) + a[i][j] = weight * + (b[i+1][j+1] + b[i+1][j ] + b[i+1][j-1] + + b[i ][j+1] + b[i ][j ] + b[i ][j-1] + + b[i-1][j+1] + b[i-1][j ] + b[i-1][j-1]); + + // Read from a. Write to b. + for (int j = 1; j < n-1; j++) + for (int i = 1; i < n-1; i++) + b[i][j] = weight * + (a[i+1][j+1] + a[i+1][j ] + a[i+1][j-1] + + a[i ][j+1] + a[i ][j ] + a[i ][j-1] + + a[i-1][j+1] + a[i-1][j ] + a[i-1][j-1]); + } + + // Print out the final central value. + std::cout << (nuAveragings % 2 ? a[n/2][n/2] : b[n/2][n/2]) << std::endl; + + // Deallocate the arrays. + for (int i = 0; i < n; i++) { + delete [] a[i]; + delete [] b[i]; + } + delete [] a; + delete [] b; + + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/Doof2d-Field-distributed.cpp =================================================================== RCS file: Doof2d-Field-distributed.cpp diff -N Doof2d-Field-distributed.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Field-distributed.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,87 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + #include "Pooma/Fields.h" // has Pooma's Field + + // Doof2d: Pooma Fields, data-parallel, multiple processor implementation + + int main(int argc, char *argv[]) + { + // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // nuIterations is the number of simulation iterations. + const int nuIterations = 10/2; + + // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // nuProcessors is the number of processors along one dimension. + const int nuProcessors = 2; + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Specify the fields' domains [0,n) x [0,n). + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + Interval<1> I(1,n-2); + Interval<1> J(1,n-2); + + // Partition the fields' domains uniformly, i.e., each patch has the + // same size. The first parameter tells how many patches for each + // dimension. Guard layers optimize communication between patches. + // Internal guards surround each patch. External guards surround + // the entire field domain. + UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors), + GuardLayers<2>(1), // internal + GuardLayers<2>(0)); // external + UniformGridLayout<2> layout(vertDomain, partition, DistributedTag()); + + // Specify the fields' mesh, i.e., its spatial extent, and its + // centering type. + UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0)); + Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim); + + // The template parameters indicate a mesh and a 'double' + // element type. MultiPatch indicates multiple computation patches, + // i.e., distributed computation. The UniformTag indicates the + // patches should have the same size. Each patch has Brick type. + Field, double, MultiPatch > > a(cell, layout, mesh); + Field, double, MultiPatch > > b(cell, layout, mesh); + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + b(I ,J+1) + b(I ,J ) + b(I ,J-1) + + b(I-1,J+1) + b(I-1,J ) + b(I-1,J-1)); + + // Read from a. Write to b. + b(I,J) = weight * + (a(I+1,J+1) + a(I+1,J ) + a(I+1,J-1) + + a(I ,J+1) + a(I ,J ) + a(I ,J-1) + + a(I-1,J+1) + a(I-1,J ) + a(I-1,J-1)); + } + + // Print out the final central value. + std::cout << b(n/2,n/2) << std::endl; + + // The fields are automatically deallocated. + + // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/Doof2d-Field-parallel.cpp =================================================================== RCS file: Doof2d-Field-parallel.cpp diff -N Doof2d-Field-parallel.cpp *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Field-parallel.cpp Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,72 ---- + #include // has std::cout, ... + #include // has EXIT_SUCCESS + #include "Pooma/Fields.h" // has Pooma's Field + + // Doof2d: Pooma Fields, data-parallel implementation + + int main(int argc, char *argv[]) + { + // Prepare the Pooma library for execution. + Pooma::initialize(argc,argv); + + // nuIterations is the number of simulation iterations. + const int nuIterations = 10/2; + + // In the average, weight element with this value. + const double weight = 1.0/9.0; + + // Ask the user for the number n of elements along one dimension of + // the grid. + long n; + std::cout << "Please enter the array size: "; + std::cin >> n; + + // Specify the fields' domains [0,n) x [0,n). + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + Interval<1> I(1,n-2); + Interval<1> J(1,n-2); + + // Specify the fields' mesh, i.e., its spatial extent, and its + // centering type. + DomainLayout<2> layout(vertDomain); + UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0)); + Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim); + + // Create the fields. + // The template parameters indicate a mesh, a 'double' element + // type, and ordinary 'Brick' storage. + Field, double, Brick> a(cell, layout, mesh); + Field, double, Brick> b(cell, layout, mesh); + + // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + b(n/2,n/2) = 1000.0; + + // Perform the simulation. + for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + b(I ,J+1) + b(I ,J ) + b(I ,J-1) + + b(I-1,J+1) + b(I-1,J ) + b(I-1,J-1)); + + // Read from a. Write to b. + b(I,J) = weight * + (a(I+1,J+1) + a(I+1,J ) + a(I+1,J-1) + + a(I ,J+1) + a(I ,J ) + a(I ,J-1) + + a(I-1,J+1) + a(I-1,J ) + a(I-1,J-1)); + } + + // Print out the final central value. + std::cout << b(n/2,n/2) << std::endl; + + // The fields are automatically deallocated. + + // Tell the Pooma library execution has finished. + Pooma::finalize(); + return EXIT_SUCCESS; + } Index: examples/Manual/Doof2d/include.mk =================================================================== RCS file: include.mk diff -N include.mk *** /dev/null Fri Mar 23 21:37:44 2001 --- include.mk Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,59 ---- + # Generated by mm.pl: Mon Mar 9 13:58:39 MST 1998 + # ACL:license + # ---------------------------------------------------------------------- + # This software and ancillary information (herein called "SOFTWARE") + # called POOMA (Parallel Object-Oriented Methods and Applications) is + # made available under the terms described here. The SOFTWARE has been + # approved for release with associated LA-CC Number LA-CC-98-65. + # + # Unless otherwise indicated, this SOFTWARE has been authored by an + # employee or employees of the University of California, operator of the + # Los Alamos National Laboratory under Contract No. W-7405-ENG-36 with + # the U.S. Department of Energy. The U.S. Government has rights to use, + # reproduce, and distribute this SOFTWARE. The public may copy, distribute, + # prepare derivative works and publicly display this SOFTWARE without + # charge, provided that this Notice and any statement of authorship are + # reproduced on all copies. Neither the Government nor the University + # makes any warranty, express or implied, or assumes any liability or + # responsibility for the use of this SOFTWARE. + # + # If SOFTWARE is modified to produce derivative works, such modified + # SOFTWARE should be clearly marked, so as not to confuse it with the + # version available from LANL. + # + # For more information about POOMA, send e-mail to pooma at acl.lanl.gov, + # or visit the POOMA web page at http://www.acl.lanl.gov/pooma/. + # ---------------------------------------------------------------------- + # ACL:license + + + # Wrap make components from SHARED_ROOT and the current directory in the + # proper order so that variables like ODIR have the correct directory-specific + # value at the right moment. See the included files for details of what they + # are doing. This file should NOT be manually edited. + + # Set NEXTDIR, THISDIR and DIR_LIST + include $(SHARED_ROOT)/include1.mk + + # Include list of subdirectories to process + -include $(THISDIR)/subdir.mk + + # Set ODIR, PROJECT_INCLUDES, UNIQUE + include $(SHARED_ROOT)/include2.mk + + # Set list of object files, relative to ODIR + -include $(THISDIR)/objfile.mk + + # Set rules for the ODIR directory + include $(SHARED_ROOT)/compilerules.mk + + # Remove current dir from DIR_LIST + DIR_LIST :=$(filter-out $(firstword $(DIR_LIST)), $(DIR_LIST)) + + + # ACL:rcsinfo + # ---------------------------------------------------------------------- + # $RCSfile: include.mk,v $ $Author: swhaney $ + # $Revision: 1.3 $ $Date: 2000/03/07 13:14:47 $ + # ---------------------------------------------------------------------- + # ACL:rcsinfo Index: examples/Manual/Doof2d/makefile =================================================================== RCS file: makefile diff -N makefile *** /dev/null Fri Mar 23 21:37:44 2001 --- makefile Mon Dec 3 14:01:55 2001 *************** *** 0 **** --- 1,96 ---- + # Generated by mm.pl: Mon Mar 9 13:58:39 MST 1998 + # ACL:license + # ---------------------------------------------------------------------- + # This software and ancillary information (herein called "SOFTWARE") + # called POOMA (Parallel Object-Oriented Methods and Applications) is + # made available under the terms described here. The SOFTWARE has been + # approved for release with associated LA-CC Number LA-CC-98-65. + # + # Unless otherwise indicated, this SOFTWARE has been authored by an + # employee or employees of the University of California, operator of the + # Los Alamos National Laboratory under Contract No. W-7405-ENG-36 with + # the U.S. Department of Energy. The U.S. Government has rights to use, + # reproduce, and distribute this SOFTWARE. The public may copy, distribute, + # prepare derivative works and publicly display this SOFTWARE without + # charge, provided that this Notice and any statement of authorship are + # reproduced on all copies. Neither the Government nor the University + # makes any warranty, express or implied, or assumes any liability or + # responsibility for the use of this SOFTWARE. + # + # If SOFTWARE is modified to produce derivative works, such modified + # SOFTWARE should be clearly marked, so as not to confuse it with the + # version available from LANL. + # + # For more information about POOMA, send e-mail to pooma at acl.lanl.gov, + # or visit the POOMA web page at http://www.acl.lanl.gov/pooma/. + # ---------------------------------------------------------------------- + # ACL:license + + # This file is user-editable + + PROJECT_ROOT = $(shell cd ../../..; pwd) + include $(PROJECT_ROOT)/config/head.mk + + PASS=APP + + default:: Doof2d-C-element Doof2d-Array-element Doof2d-Array-parallel \ + Doof2d-Array-stencil Doof2d-Array-distributed \ + Doof2d-Field-parallel Doof2d-Field-distributed + + .PHONY: Doof2d-C-element + + Doof2d-C-element:: $(ODIR)/Doof2d-C-element + + $(ODIR)/Doof2d-C-element: $(ODIR)/Doof2d-C-element.o + $(LinkToSuite) + + .PHONY: Doof2d-Array-element + + Doof2d-Array-element:: $(ODIR)/Doof2d-Array-element + + $(ODIR)/Doof2d-Array-element: $(ODIR)/Doof2d-Array-element.o + $(LinkToSuite) + + .PHONY: Doof2d-Array-parallel + + Doof2d-Array-parallel:: $(ODIR)/Doof2d-Array-parallel + + $(ODIR)/Doof2d-Array-parallel: $(ODIR)/Doof2d-Array-parallel.o + $(LinkToSuite) + + .PHONY: Doof2d-Array-stencil + + Doof2d-Array-stencil:: $(ODIR)/Doof2d-Array-stencil + + $(ODIR)/Doof2d-Array-stencil: $(ODIR)/Doof2d-Array-stencil.o + $(LinkToSuite) + + .PHONY: Doof2d-Array-distributed + + Doof2d-Array-distributed:: $(ODIR)/Doof2d-Array-distributed + + $(ODIR)/Doof2d-Array-distributed: $(ODIR)/Doof2d-Array-distributed.o + $(LinkToSuite) + + .PHONY: Doof2d-Field-parallel + + Doof2d-Field-parallel:: $(ODIR)/Doof2d-Field-parallel + + $(ODIR)/Doof2d-Field-parallel: $(ODIR)/Doof2d-Field-parallel.o + $(LinkToSuite) + + .PHONY: Doof2d-Field-distributed + + Doof2d-Field-distributed:: $(ODIR)/Doof2d-Field-distributed + + $(ODIR)/Doof2d-Field-distributed: $(ODIR)/Doof2d-Field-distributed.o + $(LinkToSuite) + + include $(SHARED_ROOT)/tail.mk + + # ACL:rcsinfo + # ---------------------------------------------------------------------- + # $RCSfile: makefile,v $ $Author: oldham $ + # $Revision: 1.1 $ $Date: 2000/07/21 21:34:44 $ + # ---------------------------------------------------------------------- + # ACL:rcsinfo From Bert.Tijskens at agr.kuleuven.ac.be Tue Dec 4 07:10:39 2001 From: Bert.Tijskens at agr.kuleuven.ac.be (Tijskens, Bert) Date: Tue, 4 Dec 2001 08:10:39 +0100 Subject: pooma performance Message-ID: Hi, looking for support for POOMA++ 2.3.0 I found this e-mail adress somewhere on the internet. I wrote the following benchmark program which computes something of the form y=3Dax+b, where y and x are (DynamicArrays of) vectors and a and b are (DynamicArrays of) scalars. I was surprised and dissappointed to find out that the simple c version of this loop is 4-5 times faster than the POOMA version? I suppose I must have overlooked something. Can you help? the program is at the bottom of this message the tests were run on a PC using the Intel C++ compiler many thanks in advance, bert Dr. Engelbert TIJSKENS Laboratory for Agro-Machinery and -Processing Department of Agro-Engineering and -Economy KULeuven Kasteelpark Arenberg 30 B-3001 LEUVEN BELGIUM tel: ++(32) 16 32 8557 fax: ++(32) 16 32 8590 e-mail: engelbert.tijskens at agr.kuleuven.ac.be Here's the program #include "Pooma/Particles.h" #include "Pooma/DynamicArrays.h" #include "Tiny/Vector.h" #include "Utilities/Inform.h" #include #include #include #if POOMA_CHEETAH typedef MultiPatch< DynamicTag, Remote > AttributeEngineTag_t; #else typedef MultiPatch< DynamicTag, Dynamic > AttributeEngineTag_t; #endif template struct PC_UniformLayout_traits { typedef AttributeEngineTag_t AttributeEngineTag_t; typedef Layout_t ParticleLayout_t; }; // The particle traits class and layout type for this application typedef PC_UniformLayout_traits PC_UniformLayout_t; // Dimensionality of this problem static const int nsd =3D 3; static const int NumPart =3D 10000; // Number of particles in simulation static const int nLoops =3D 100; // Number of loops // Particles subclass with position and velocity class PC : public Particles { public: // Typedefs typedef Particles Base_t; typedef Base_t::AttributeEngineTag_t AttributeEngineTag_t; typedef Base_t::ParticleLayout_t ParticleLayout_t; typedef double AxisType_t; typedef Vector PointType_t; // Constructor: set up layouts, register attributes PC(const ParticleLayout_t &pl) : Particles(pl) { addAttribute(y); addAttribute(x); addAttribute(a); addAttribute(b);=09 } // Position and velocity attributes (as public members) DynamicArray x,y; DynamicArray a,b; double x_[NumPart][nsd], y_[NumPart][nsd]; double a_[NumPart], b_[NumPart]; }; // Number of patches to distribute particles across. // Typically one would use one patch per processor. const int numPatches =3D 1; // Main simulation routine int main(int argc, char *argv[]) { // Initialize POOMA and output stream Pooma::initialize(argc,argv); Inform out(argv[0]); =09 out << "Begin Bounce example code" << std::endl; out << "-------------------------" << std::endl; // Create a particle layout object for our use PC_UniformLayout_t::ParticleLayout_t particleLayout(numPatches); // Create the Particles subclass object PC pc(particleLayout); // Create some particles, recompute the global domain, and initialize // the attributes randomly. pc.globalCreate(NumPart); srand(12345U); typedef PC::AxisType_t Coordinate_t; Coordinate_t recranmax =3D 1.0/static_cast(RAND_MAX); for (int i =3D 0; i < NumPart; ++i) { for (int d =3D 0; d < nsd; ++d) { pc.x(i)(d) =3D rand() * recranmax; pc.x_[i][d] =3D pc.x(i)(d); } pc.a_[i] =3D pc.a(i) =3D rand() * recranmax; pc.b_[i] =3D pc.b(i) =3D rand() * recranmax; } // reference using ordinary arrays : y =3D ax+b Timer t_array("ordinary arrays",cout); // starts a timer for (int it=3D1; it <=3D nLoops; ++it) { for (int i =3D 0; i < NumPart; ++i) for (int d =3D 0; d < nsd; ++d) pc.y_[i][d] =3D pc.a_[i]*pc.x_[i][d] + pc.b_[i]; } t_array.stop(); t_array.print(); // using pooma attributes: y =3D ax+b Timer t_PoomaAttributes("pooma attributes",cout); // starts a timer for (int it=3D1; it <=3D nLoops; ++it) { pc.y =3D pc.a*pc.x + pc.b; } t_PoomaAttributes.stop(); t_PoomaAttributes.print(); // Shut down POOMA and exit Pooma::finalize(); return 0; } From oldham at codesourcery.com Tue Dec 4 20:43:13 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Tue, 4 Dec 2001 12:43:13 -0800 Subject: Explanation of blockAndEvaluate() Message-ID: <20011204124313.A6159@codesourcery.com> Mark requested that Stephen Smith's explanation be posted to the pooma-dev mailing list so it is archived for posterity. Jeffrey's complaint: > When I run the attached Pooma program (from examples/Manual/Doof2d/) > for one-processor, it works fine, returning 55.0221 for 4 averagings > and an array size of 20. When I run it with Pooma configured with > --messaging and use the MM Shared Memory Library, it returns 0. Just > before the blockAndEvaluate() call, the "b" array has the proper value > but afterwards it has changed to zero. Why? Why is it ever dangerous > to call blockAndEvaluate()? How do I explain when to call > blockAndEvaluate()? The program is attached. Stephen Smith's (stephens at proximation.com) reply: > This code is missing a blockAndEvaluate, it should look > like: > > a = b = 0; > Pooma::blockAndEvaluate(); > b(n/2,n/2) = 1000.0; > > Currently the default is that all code is dangerous, which may > not be a good thing. To ensure correctness you either need > to run with --poomaBlockingExpressions or add blockAndEvaluate() > all the necessary places. > > Here's the basic issue: > > 1: a = b; > 2: c = a; > 3: e = c; > 4: c(5) = 7; > 5: d = c + e; > 6: cout << d(5) << d(3) << endl; > > For this code to work correctly, the data-parallel expressions > writing to c must be done before statement 4 is run and the > data-parallel expression writing to d must be done before the > line that prints values from d. Using blockingExpressions() > ensures correctness by inserting blockAndEvaluate() after EVERY > data-parallel statement: > > 1: a = b; > blockAndEvaluate(); > 2: c = a; > blockAndEvaluate(); > 3: e = c; > blockAndEvaluate(); > 4: c(5) = 7; > 5: d = c + e; > blockAndEvaluate(); > 6: cout << d(5) << d(3) << endl; > > This may not be very efficient when the arrays are decomposed > into patches, because all the patches in statement 1 must execute > before any from statement 2. It would be a lot more cache efficient > to perform (a = b; c = a; e = c;) on one patch, then move to the next > patch. > > In the past, my recommendation to users was to add blockAndEvaluate > immediately before any serial code: > > 1: a = b; > 2: c = a; > 3: e = c; > blockAndEvaluate(); > 4: c(5) = 7; > 5: d = c + e; > blockAndEvaluate(); > 6: cout << d(5) << d(3) << endl; > > This approach is guaranteed to ensure correctness. There was no > way for use to implement this automatically. We know inside POOMA > every time a data-parallel expression occurs, but we don't know what > the next statement is going to be. There's no simple way to check for > serial access without slowing the code down incredibly. All the inner > loops which get run by SMARTS also access elements through operator(), > so we would have to put an if test for every element access that would > say "Are we running inside the evaluator, or back in the users code?" > > So the use of blockAndEvaluate is an optimization. Perhaps it would be > better to make --blockingExpressions the default and if users want more > efficient code they can add the necessary blockAndEvaluates and run > --withoutBlockingExpressions. Note that if they really understand > the parallelism issues, they could get trickier: > > 1: a = b; > 2: c = a; > blockAndEvaluate(); > 3: e = c; > 4: c(5) = 7; > 5: d = c + e; > blockAndEvaluate(); > 6: cout << d(5) << d(3) << endl; > > is also correct because we've guaranteed that c has been computed. Note > that blockAndEvaluate() causes EVERY expression to finally be computed. > We had at one point thought about a more specific syntax: > > blockOnEvaluation(c); > c(5) = 7; > > This syntax would ensure that all the expressions relating to a given > array are finished. (That would allow the main branch of the code to > continue while some computations are still going.) > > This idea is a ways off from even being prototyped, though. Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- #include // has std::cout, ... #include // has EXIT_SUCCESS #include "Pooma/Arrays.h" // has Pooma's Array // Doof2d: Pooma Arrays, element-wise implementation int main(int argc, char *argv[]) { // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); // Ask the user for the number of averagings. long nuAveragings, nuIterations; std::cout << "Please enter the number of averagings: "; std::cin >> nuAveragings; nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. // Ask the user for the number n of elements along one dimension of // the grid. long n; std::cout << "Please enter the array size: "; std::cin >> n; // Specify the arrays' domains [0,n) x [0,n). Interval<1> N(0, n-1); Interval<2> vertDomain(N, N); // Create the arrays. // The template parameters indicate 2 dimensions, a 'double' element // type, and ordinary 'Brick' storage. Array<2, double, Brick> a(vertDomain); Array<2, double, Brick> b(vertDomain); // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; b(n/2,n/2) = 1000.0; // In the average, weight element with this value. const double weight = 1.0/9.0; // Perform the simulation. for (int k = 0; k < nuIterations; ++k) { // Read from b. Write to a. for (int j = 1; j < n-1; j++) for (int i = 1; i < n-1; i++) a(i,j) = weight * (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + b(i-1,j+1) + b(i-1,j ) + b(i-1,j-1)); // Read from a. Write to b. for (int j = 1; j < n-1; j++) for (int i = 1; i < n-1; i++) b(i,j) = weight * (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + a(i ,j+1) + a(i ,j ) + a(i ,j-1) + a(i-1,j+1) + a(i-1,j ) + a(i-1,j-1)); } // Print out the final central value. std::cout << "before: " << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // TMP Pooma::blockAndEvaluate(); // Ensure all computation has finished. std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. // Tell the Pooma library execution has finished. Pooma::finalize(); return EXIT_SUCCESS; } From stephens at proximation.com Tue Dec 4 20:01:04 2001 From: stephens at proximation.com (Stephen Smith) Date: Tue, 4 Dec 2001 13:01:04 -0700 Subject: [pooma-dev] Explanation of blockAndEvaluate() Message-ID: Actually, Jim pointed out a problem with one of the examples. I was trying to come up with an example where code that doesn't need to be evaluated can occur between blockAndEvaluate and some serial code. You want everything that reads as well as writes to a given array to be done before you change it, and I missed one of the dependencies. -----Original Message----- From: Jeffrey Oldham [mailto:oldham at codesourcery.com] Sent: Tuesday, December 04, 2001 1:43 PM To: pooma-dev at pooma.codesourcery.com Subject: [pooma-dev] Explanation of blockAndEvaluate() > 1: a = b; > 2: c = a; > blockAndEvaluate(); > 3: e = c; > 4: c(5) = 7; > 5: d = c + e; > blockAndEvaluate(); > 6: cout << d(5) << d(3) << endl; this isn't quite correct because statement 3 reads from c and statement 4 could occur before statement 3. The following example is correct: 1: a = b; 2: c = a; blockAndEvaluate(); 3: e = b; 4: c(5) = 7; 5: d = c + e; blockAndEvaluate(); 6: cout << d(5) << d(3) << endl; It's tricky to figure out, which is why it's best to just put blockAndEvaluate() immediately before all blocks of serial access, if you want code that is pretty close to optimal and guaranteed to be correct. Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From oldham at codesourcery.com Tue Dec 11 20:25:01 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Tue, 11 Dec 2001 12:25:01 -0800 Subject: Patch: Revise Manual Example Codes Message-ID: <20011211122501.A28891@codesourcery.com> The example (tutorial) programs changed to become more consistent, easier to use, and, most importantly, more correct. These are stored in examples/Manual/Doof2d/. 2001-Dec-11 Jeffrey D. Oldham * Doof2d-Array-distributed.cpp: Remove , which is not used. (DoofNinePt): Fix typo in comment. (main): Revise to use command-line arguments and Informs, not standard IO. Modify to ensure domain size is a multiple of the number of processors. Add blockAndEvaluate(). * Doof2d-Array-element.cpp (main): Replace data-parallel initialization with loops. Fix typo in comments. Remove unnecessary blockAndEvaluate(). * Doof2d-Array-parallel.cpp (main): Add blockAndEvaluate(). Fix typo in comment. * Doof2d-Array-stencil.cpp (DoofNinePt): Fix typo in comment. (main): Add blockAndEvaluate(). * Doof2d-C-element.cpp (main): Fix typo in comment. * Doof2d-Field-distributed.cpp: Remove , which is not used. (main): Revise to use command-line arguments and Informs, not standard IO. Modify to ensure domain size is a multiple of the number of processors. Add blockAndEvaluate(). Fix typo in comment. * Doof2d-Field-parallel.cpp (main): Revise input to use cin, not hard-coded constants. Add blockAndEvaluate(). Fix typo in comment. Applied to mainline Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: Doof2d-Array-distributed.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-Array-distributed.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-distributed.cpp *** Doof2d-Array-distributed.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-Array-distributed.cpp 2001/12/11 19:01:38 *************** *** 1,4 **** - #include // has std::cout, ... #include // has EXIT_SUCCESS #include "Pooma/Arrays.h" // has Pooma's Array --- 1,3 ---- *************** public: *** 30,36 **** private: ! // In the average, weight element with this value. const double weight; }; --- 29,35 ---- private: ! // In the average, weight elements with this value. const double weight; }; *************** int main(int argc, char *argv[]) *** 38,60 **** { // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); ! ! // Ask the user for the number of processors. long nuProcessors; ! std::cout << "Please enter the number of processors: "; ! std::cin >> nuProcessors; ! // Ask the user for the number of averagings. long nuAveragings, nuIterations; ! std::cout << "Please enter the number of averagings: "; ! std::cin >> nuAveragings; nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. // Ask the user for the number n of elements along one dimension of // the grid. long n; ! std::cout << "Please enter the array size: "; ! std::cin >> n; // Specify the arrays' domains [0,n) x [0,n). Interval<1> N(0, n-1); --- 37,73 ---- { // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); ! ! // Since multiple copies of this program may simultaneously run, we ! // canot use standard input and output. Instead we use command-line ! // arguments, which are replicated, for input, and we use an Inform ! // stream for output. ! Inform output; ! ! // Read the program input from the command-line arguments. ! if (argc != 4) { ! // Incorrect number of command-line arguments. ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl; ! return EXIT_FAILURE; ! } ! char *tail; ! ! // Determine the number of processors. long nuProcessors; ! nuProcessors = strtol(argv[1], &tail, 0); ! // Determine the number of averagings. long nuAveragings, nuIterations; ! nuAveragings = strtol(argv[2], &tail, 0); nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. // Ask the user for the number n of elements along one dimension of // the grid. long n; ! n = strtol(argv[3], &tail, 0); ! // The dimension must be a multiple of the number of processors ! // since we are using a UniformGridLayout. ! n = ((n+nuProcessors-1) / nuProcessors) * nuProcessors; // Specify the arrays' domains [0,n) x [0,n). Interval<1> N(0, n-1); *************** int main(int argc, char *argv[]) *** 85,90 **** --- 98,105 ---- // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; + // Ensure all data-parallel computation finishes before accessing a value. + Pooma::blockAndEvaluate(); b(n/2,n/2) = 1000.0; // Create the stencil performing the computation. *************** int main(int argc, char *argv[]) *** 101,107 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. --- 116,122 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. Index: Doof2d-Array-element.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-Array-element.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-element.cpp *** Doof2d-Array-element.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-Array-element.cpp 2001/12/11 19:01:38 *************** int main(int argc, char *argv[]) *** 33,42 **** // Set up the initial conditions. // All grid values should be zero except for the central value. ! a = b = 0.0; b(n/2,n/2) = 1000.0; ! // In the average, weight element with this value. const double weight = 1.0/9.0; // Perform the simulation. --- 33,44 ---- // Set up the initial conditions. // All grid values should be zero except for the central value. ! for (int j = 1; j < n-1; j++) ! for (int i = 1; i < n-1; i++) ! a(i,j) = b(i,j) = 0.0; b(n/2,n/2) = 1000.0; ! // In the average, weight elements with this value. const double weight = 1.0/9.0; // Perform the simulation. *************** int main(int argc, char *argv[]) *** 59,65 **** } // Print out the final central value. - Pooma::blockAndEvaluate(); // Ensure all computation has finished. std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. --- 61,66 ---- Index: Doof2d-Array-parallel.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-Array-parallel.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-parallel.cpp *** Doof2d-Array-parallel.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-Array-parallel.cpp 2001/12/11 19:01:38 *************** int main(int argc, char *argv[]) *** 38,46 **** // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; b(n/2,n/2) = 1000.0; ! // In the average, weight element with this value. const double weight = 1.0/9.0; // Perform the simulation. --- 38,48 ---- // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; + // Ensure all data-parallel computation finishes before accessing a value. + Pooma::blockAndEvaluate(); b(n/2,n/2) = 1000.0; ! // In the average, weight elements with this value. const double weight = 1.0/9.0; // Perform the simulation. Index: Doof2d-Array-stencil.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-Array-stencil.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-stencil.cpp *** Doof2d-Array-stencil.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-Array-stencil.cpp 2001/12/11 19:01:38 *************** public: *** 30,36 **** private: ! // In the average, weight element with this value. const double weight; }; --- 30,36 ---- private: ! // In the average, weight elements with this value. const double weight; }; *************** int main(int argc, char *argv[]) *** 68,73 **** --- 68,75 ---- // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; + // Ensure all data-parallel computation finishes before accessing a value. + Pooma::blockAndEvaluate(); b(n/2,n/2) = 1000.0; // Create the stencil performing the computation. Index: Doof2d-C-element.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-C-element.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-C-element.cpp *** Doof2d-C-element.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-C-element.cpp 2001/12/11 19:01:38 *************** int main() *** 37,43 **** a[i][j] = b[i][j] = 0.0; b[n/2][n/2] = 1000.0; ! // In the average, weight element with this value. const double weight = 1.0/9.0; // Perform the simulation. --- 37,43 ---- a[i][j] = b[i][j] = 0.0; b[n/2][n/2] = 1000.0; ! // In the average, weight elements with this value. const double weight = 1.0/9.0; // Perform the simulation. Index: Doof2d-Field-distributed.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-Field-distributed.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Field-distributed.cpp *** Doof2d-Field-distributed.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-Field-distributed.cpp 2001/12/11 19:01:38 *************** *** 1,4 **** - #include // has std::cout, ... #include // has EXIT_SUCCESS #include "Pooma/Fields.h" // has Pooma's Field --- 1,3 ---- *************** int main(int argc, char *argv[]) *** 9,28 **** // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); ! // nuIterations is the number of simulation iterations. ! const int nuIterations = 10/2; ! ! // In the average, weight element with this value. ! const double weight = 1.0/9.0; ! // nuProcessors is the number of processors along one dimension. ! const int nuProcessors = 2; // Ask the user for the number n of elements along one dimension of // the grid. long n; ! std::cout << "Please enter the array size: "; ! std::cin >> n; // Specify the fields' domains [0,n) x [0,n). Interval<1> N(0, n-1); --- 8,43 ---- // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); ! // Since multiple copies of this program may simultaneously run, we ! // canot use standard input and output. Instead we use command-line ! // arguments, which are replicated, for input, and we use an Inform ! // stream for output. ! Inform output; ! ! // Read the program input from the command-line arguments. ! if (argc != 4) { ! // Incorrect number of command-line arguments. ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl; ! return EXIT_FAILURE; ! } ! char *tail; ! // Determine the number of processors. ! long nuProcessors; ! nuProcessors = strtol(argv[1], &tail, 0); ! ! // Determine the number of averagings. ! long nuAveragings, nuIterations; ! nuAveragings = strtol(argv[2], &tail, 0); ! nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. // Ask the user for the number n of elements along one dimension of // the grid. long n; ! n = strtol(argv[3], &tail, 0); ! // The dimension must be a multiple of the number of processors ! // since we are using a UniformGridLayout. ! n = ((n+nuProcessors-1) / nuProcessors) * nuProcessors; // Specify the fields' domains [0,n) x [0,n). Interval<1> N(0, n-1); *************** int main(int argc, char *argv[]) *** 59,66 **** --- 74,86 ---- // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; + // Ensure all data-parallel computation finishes before accessing a value. + Pooma::blockAndEvaluate(); b(n/2,n/2) = 1000.0; + // In the average, weight elements with this value. + const double weight = 1.0/9.0; + // Perform the simulation. for (int k = 0; k < nuIterations; ++k) { // Read from b. Write to a. *************** int main(int argc, char *argv[]) *** 77,83 **** } // Print out the final central value. ! std::cout << b(n/2,n/2) << std::endl; // The fields are automatically deallocated. --- 97,104 ---- } // Print out the final central value. ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The fields are automatically deallocated. Index: Doof2d-Field-parallel.cpp =================================================================== RCS file: /home/pooma/Repository/r2/examples/Manual/Doof2d/Doof2d-Field-parallel.cpp,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Field-parallel.cpp *** Doof2d-Field-parallel.cpp 2001/12/04 00:07:00 1.1 --- Doof2d-Field-parallel.cpp 2001/12/11 19:01:38 *************** int main(int argc, char *argv[]) *** 9,24 **** // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); ! // nuIterations is the number of simulation iterations. ! const int nuIterations = 10/2; - // In the average, weight element with this value. - const double weight = 1.0/9.0; - // Ask the user for the number n of elements along one dimension of // the grid. long n; ! std::cout << "Please enter the array size: "; std::cin >> n; // Specify the fields' domains [0,n) x [0,n). --- 9,24 ---- // Prepare the Pooma library for execution. Pooma::initialize(argc,argv); ! // Ask the user for the number of averagings. ! long nuAveragings, nuIterations; ! std::cout << "Please enter the number of averagings: "; ! std::cin >> nuAveragings; ! nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. // Ask the user for the number n of elements along one dimension of // the grid. long n; ! std::cout << "Please enter the field size: "; std::cin >> n; // Specify the fields' domains [0,n) x [0,n). *************** int main(int argc, char *argv[]) *** 44,51 **** --- 44,56 ---- // Set up the initial conditions. // All grid values should be zero except for the central value. a = b = 0.0; + // Ensure all data-parallel computation finishes before accessing a value. + Pooma::blockAndEvaluate(); b(n/2,n/2) = 1000.0; + // In the average, weight elements with this value. + const double weight = 1.0/9.0; + // Perform the simulation. for (int k = 0; k < nuIterations; ++k) { // Read from b. Write to a. *************** int main(int argc, char *argv[]) *** 62,68 **** } // Print out the final central value. ! std::cout << b(n/2,n/2) << std::endl; // The fields are automatically deallocated. --- 67,74 ---- } // Print out the final central value. ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The fields are automatically deallocated. From oldham at codesourcery.com Tue Dec 11 21:40:31 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Tue, 11 Dec 2001 13:40:31 -0800 Subject: Patch: Manual: Tutorial Chapter and Part of Concepts Chapter Message-ID: <20011211134031.A29015@codesourcery.com> This patch contains continuing work toward a Pooma manual. Most of these changes fail into these two categories: 1) continuing work on a tutorial chapter using Doof2d implementations to introduce Pooma concepts 2) moving material among files to yield smaller files. 2001-Dec-11 Jeffrey D. Oldham * glossary.xml: New file containing the glossary entries. It was previously part of outline.xml. * makefile: Revise to create manual.ps, not outline.ps. * manual.xml: New file containing most of the manual. It replaces outline.xml. Too many changes were made to be listed here. Most of the changes have occurred in the tutorial chapter and the concept chapter. * outline.xml: Removed in favor of manual.xml and the files the latter includes. * tutorial.xml: New file containing the tutorial chapter. This was created from outline.xml. * figures/concepts.mp: New MetaPost file to create illustrations for Array and Field's purposes and of the conceptual dependences when declaring containers. * figures/distributed.mp: Revise to make it smaller, move macros out of the source code, and fix formatting mistake. * figures/doof2d.mp: Revise to use macros. * figures/macros.ltx: New file containing LaTeX macros common to MetaPost files. * programs/Doof2d-Array-distributed-annotated.patch: Revised to reflect changes to programs/Doof2d-Array-distributed.cpp. * programs/Doof2d-Array-distributed-annotated.patch: Similar. * programs/Doof2d-Array-parallel-annotated.patch: Similar. * programs/Doof2d-Array-stencil-annotated.patch: Similar. * programs/Doof2d-C-element-annotated.patch: Similar. * programs/Doof2d-Field-distributed-annotated.patch: New file with an annotated version of Doof2d-Field-distributed.cpp. * programs/Doof2d-Field-parallel-annotated.patch: Similar. Applied to mainline Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: glossary.xml =================================================================== RCS file: glossary.xml diff -N glossary.xml *** /dev/null Fri Mar 23 21:37:44 2001 --- glossary.xml Tue Dec 11 13:31:06 2001 *************** *** 0 **** --- 1,514 ---- + + + + Glossary + + + + ADD: Make sure all entries are indexed and perhaps point back + to their first use. WARNING: This is constructed by hand so it is + likely to be full of inconsistencies and errors. + + + A + + &array; + + a &pooma; container generalizing &c; arrays and mapping + indices to values. Constant-time access to values is supported, + ignoring the time to compute the values if applicable. &array;s + are first-class + objects. &dynamicarray;s and &field;s generalize &array;. + &dynamicarray; + &field; + + + + + + B + + brick engine + + an engine explicitly storing each of its values. Its space + requirements are at least the size of the engine's domain. + engine + + + + + + C + + + cell + + a domain element of + a &field;. Both &array; and &field; domain + elements are denoted by indices, but a cell exists in space. For + example, it might be a rectangle or rectangular + parallelepiped. + &field; + + + + + cell size + + specifies a &field; cell's dimensions e.g., its + width, height, and depth, in &space;. This is frequently + used to specify a mesh. + mesh + corner position + + + + + computing environment + + computer. More precisely, a computer with its arrangement + of processors and associated memory, possibly shared among + processors. + sequential computing environment + distributed computing environment + + + + + container + + an object that stores other objects, controlling their + allocation, deallocation, and access. Similar to &cc; containers, + the most important &pooma; containers are &array;s and &field;s. + &array; + &dynamicarray; + &field; + &tensor; + &matrix; + &vector; + + + + + context + + a collection of shared memory and processors that can execute + a program of a portion of a program. It can have one or more + processors, but all these processors must access the same shared + memory. Usually the computer and its operating system, not the + programmer, determine the available contexts. + distributed computing environment + layout + + + + + context mapper + + indicates how a container's patches are mapped to + processors and shared memory. Two common choices are + distribution among the various processors and replication. + patch + + + + + corner position + + specifies the &space; point corresponding to a &field; + domain's lower, left corner. + mesh + cell size + + + + + + D + + + distributed computing environment + + computing environment with one or more processors each + having associated memory, possibly shared. In some contexts, it + refers to strictly multi-processor computation. + computing environment + sequential computing environment + + + + + domain + + a set of points on which a container can define values. + For example, a set of discrete integral &n;-tuples in + &n;-dimensional space frequently serve as container domains. + container + interval + stride + range + region + + + + + &dynamicarray; + + a &pooma; container generalizing one-dimensional &array;s by supporting domain + resizing at run-time. It maps indices to values in constant-time + access, ignoring the time to compute the values if applicable. + &dynamicarray;s are first-class objects. + &array; + &field; + + + + + + E + + + engine + + stores and, if necessary, computes a container's values. These + can be specialized, e.g., to minimize storage when a domain has + few distinct values. Separating a container and its storage also + permits views of a container. + &engine; + view of a container + + + + + external guard layer + + guard layer + surrounding a container's domain used to ease computation along + the domain's edges by permitting the same computations as for + more internal computations. It is an optimization, not required + for program correctness. + guard layer + internal guard layer + patch + + + + + + F + + + &field; + + a &pooma; container representing an &array; with spatial + extent. It also supports multiple values and multiple materials + indexed by the same value. It maps indices to values in constant + time, ignoring the time to compute the values if applicable. It + also supports geometric computations such as the distance between + two cells and normals to a + cell. &field;s are first-class objects. + &array; + &dynamicarray; + + + + + first-class type + + a type of object with all the capabilities of the built-in + type with the most capabilities. For example, char + and int are first-class types in &cc; because they + may be declared anywhere, stored in automatic variables, accessed + anywhere, copied, and passed by both value and reference. + &pooma; &array; and &field; are first-class + types. + + + + + + G + + + guard layer + + domain surrounding each patch of a container's domain. It + contains read-only values. External guard + layers ease programming, while internal guard + layers permit each patch's computation to be occur without + copying values from adjacent patches. They are optimizations, + not required for program correctness. + external guard layer + internal guard layer + partition + patch + domain + + + + + + I + + + index + + a position in a domain usually denoted by an + ordered tuple. More than one index are called indices. + domain + + + + + indices + + More than one index. + index + + + + + internal guard layer + + guard layer + containing copies of adjacent patches' values. These copies can + permit an individual patch's computation to occur without asking + adjacent patches for values. This can speed computation but are + not required for program correctness. + guard layer + external guard layer + patch + + + + + interval + + a set of integral points between two values. This domain + is frequently represented using mathematical interval notation + [a,b] even though it contains only the integral points, e.g., a, + a+1, a+2, …, b. It is also generalized to the tensor + product of one-dimensional intervals. Many containers' domains + consist of these sets of ordered tuples. + domain + stride + range + + + + + + + L + + + layout + + a map from an index to processor(s) and memory used to + compute the container's associated value. For a uniprocessor + implementation, a container's layout always consists of its + domain, the processor, and its memory. For a multi-processor + implementation, the layout maps portions of the domain to + (possibly different) processors and memory. + container + domain + + + + + + M + + + mesh + + a &field;'s map from indices to geometric values such as + cell size, edge length, and cell normals. In other words, it + specifies a &field;'s spatial extent. + &field; + layout + + + + + + P + + + partition + + a specification how to divide a container's domain into + patches for distributed computation. It can be independent of + the domain's size. For example, it divide each domain into + halves, yielding a total of eight patches in three dimensions. + See + for an illustration. + guard layer + patch + domain + + + + + patch + + + ERE + partition + guard layer + domain + + + + + point + + a location in multi-dimensional space &space;. + In contrast, indices specify positions in container domains. + &field; + mesh + index + + + + + + R + + + range + + an &n;-dimensional domain formed by the tensor product of + &n; strides. + stride + interval + domain + + + + + region + + a domain consisting of a subset of &n;-dimensional + continuous space. + domain + interval + + + + + + S + + + sequential computing environment + + a computing environment with one processor and associated + memory. Only one processor executes a program even if the + conmputer itself has multiple processors. + computing environment + distributed computing environment + + + + + stride + + a subset of regularly-spaced points in an integral + interval. For example, the set of points a, a+2, a+4, …, + b-2, b is specified by [a,b] with stride 2. It is a + domain. + range + interval + domain + + + + + suite name + + an arbitrary string denoting a particular toolkit + configuration. For example, the string + SUNKCC-debug might indicate a configuration for + the Sun Solaris + operating system and the &kcc; &cc; compiler with debugging + support. By default, the suite name it is equal to the + configuration's architecture name. + + + + + + T + + + &tensor; + + a &pooma; container implementing multi-dimensional + mathematical tensors as first-class objects. + &matrix; + &vector; + + + + + &matrix; + + a &pooma; container implementing two-dimensional + mathematical matrices as first-class objects. + &tensor; + &vector; + + + + + + V + + + &vector; + + a &pooma; container implementing multi-dimensional + mathematical vectors, i.e., an ordered tuple of components, as + first-class objects. + &tensor; + &matrix; + + + + + view of a container + + a container derived from another. The former's domain is a + subset of the latter's, but, where the domains intersect, + accessing a value through the view is the same as accessing it + through the original container. Only &array;s, &dynamicarray;s, + and &field;s support views. + container + + + + + Index: makefile =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/makefile,v retrieving revision 1.1 diff -c -p -r1.1 makefile *** makefile 2001/12/04 00:07:00 1.1 --- makefile 2001/12/11 20:31:06 *************** INDEXOPTIONS= -t 'Index' -i 'index' -g *** 23,29 **** CXXFLAGS= -g -Wall -pedantic -W -Wstrict-prototypes -Wpointer-arith -Wbad-function-cast -Wcast-align -Wconversion -Wnested-externs -Wundef -Winline -static ! all: outline.ps %.all: %.ps %.pdf %.html chmod 644 $*.ps $*.pdf --- 23,29 ---- CXXFLAGS= -g -Wall -pedantic -W -Wstrict-prototypes -Wpointer-arith -Wbad-function-cast -Wcast-align -Wconversion -Wnested-externs -Wundef -Winline -static ! all: manual.ps %.all: %.ps %.pdf %.html chmod 644 $*.ps $*.pdf *************** mproof-%.ps: %.mp *** 66,69 **** detex $< > $@ clean: ! rm -f *.dvi *.aux *.log *.toc *.bak *.blg *.bbl *.glo *.idx *.lof *.lot *.htm *.mpx mpxerr.tex HTML.index outline.tex --- 66,69 ---- detex $< > $@ clean: ! rm -f *.dvi *.aux *.log *.toc *.bak *.blg *.bbl *.glo *.idx *.lof *.lot *.htm *.mpx mpxerr.tex HTML.index manual.tex Index: manual.xml =================================================================== RCS file: manual.xml diff -N manual.xml *** /dev/null Fri Mar 23 21:37:44 2001 --- manual.xml Tue Dec 11 13:31:09 2001 *************** *** 0 **** --- 1,4034 ---- + + + + + + + + + + + + + + + + + + + C"> + + + C++"> + + + Cheetah" > + + + + Doof2d" > + + Make"> + + MM"> + + MPI"> + + PDToolkit"> + + PETE"> + + POOMA"> + + POOMA Toolkit"> + + Purify"> + + Smarts"> + + + STL"> + + Tau"> + + + + + Array"> + + Benchmark"> + + Brick"> + + CompressibleBrick"> + + DistributedTag"> + + Domain"> + + double"> + + DynamicArray"> + + Engine"> + + Field"> + + Inform"> + + Interval"> + + Layout"> + + LeafFunctor"> + + TinyMatrix"> + + MultiPatch"> + + ReplicatedTag"> + + Stencil"> + + Tensor"> + + Vector"> + + + + + + + + + d"> + + + + g++"> + + KCC"> + + Linux"> + + + + + http://pooma.codesourcery.com/pooma/download'> + + + http://www.pooma.com/'> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ]> + + + + &pooma; + A &cc; Toolkit for High-Performance Parallel Scientific Computing + JeffreyD.Oldham + + CodeSourcery, LLC + + + 2001CodeSourcery, LLC () + Los Alamos National Laboratory + + + All rights reserved. This document may not be redistributed in any form without the express permission of the author. + + + + 0.01 + 2001 Nov 26 + jdo + first draft + + + + + + + + + Preface + + FINISH: Describe the target audience for &pooma; programs and + for this manual: &cc; programmers writing scientific code, possibly + parallel execution. + + Assume familiarity with &cc; template programming and the + standard template library. FIXME: Remove this index + entry.Oldham, + Jeffrey D. + +

+ Notation + + UNFINISHED +

+ + +

+ How to Read This &Book; + + FINISH: Write this section in a style similar to Lamport's + LaTeX section 1.2. FINISH: Fix the book title and the section + number. +

+ + +

+ Acknowledgements + + Mark Mitchell, Stephen Smith +

+ introduction.html + + &pooma; was designed and implemented by scientists working + at the Los Alamos National Laboratory's Advanced Computing + Laboratory. Between them, these scientists have written and tuned + large applications on almost every commercial and experimental + supercomputer built in the last two decades. As the technology + used in those machines migrates down into departmental computing + servers and desktop multiprocessors, &pooma; is a vehicle for its + designers' experience to migrate as well. In particular, + &pooma;'s authors understand how to get good performance out of + modern architectures, with their many processors and multi-level + memory hierarchies, and how to handle the subtly complex problems + that arise in real-world applications. +

+ +

+ + + + &tutorial-chapter; + + + + Overview of &pooma; Concepts + + FIXME: How does multi-threaded computation fit into the + model? + + In the previous chapter, we presented several different + implementations of the &doof2d; simulation program. The + implementations illustrate the various containers, computation + syntaxes, and computation environments that &pooma; supports. In + this chapter, we describe the concepts associated with each of + these three categories. Specific details needed by programmers are + deferred to later chapters. + + + &pooma; Implementation Concepts + + + + Container + Computation Syntax + Computation Environment + + + + + &array; + element-wise + sequential + + + &dynamicarray; + data-parallel + distributed + + + &field; + stencil-based + + + + &tensor; + relational + + + + &matrix; + + + + + &vector; + + + + + +

+ + The most important &pooma; concepts can be grouped into three + separate categories: + + + container + + data structure holding one or more values and addressed + by indices + + + + computation syntax + + styles of expressing computations + + + + computation environment + + description of resources for computing, e.g., single + processor or multi-processor + + + + See . Many &pooma; programs + select one possibility from each column. For example, used a &array; + container and stencils for sequential computation, while used a &field; + container and data-parallel statements with distributed + computation. A program may use multiple containers and various + computation syntax, but the computation environment either has + distributed processors or not. + + In the rest of this chapter, we explore these three + categories. First, we describe &pooma; containers, illustrating + the purposes of each, and explaining the concepts needed to declare + them. Then, we describe the different computation syntaxes and + finally distributed computation concepts. + + +

+ &pooma; Containers + + Most &pooma; programs use containers + to store groups of values. &pooma; containers are objects that + store other objects. They control allocation and deallocation of + and access to these objects. They are a generalization of &c; + arrays, but &pooma; containers are first-class objects so they can + be used directly in expressions. They are similar to &cc; + containers such as vector, list, and + stack. See for a summary of the + containers. + + This chapter describes many concepts, not all of which are + needed to begin programming with the &pooma; Toolkit. Below we + introduce the different categories of concepts. After that, we + introduce the different &pooma;'s containers and describe how to + choose the appropriate one for a particular task. + indicates which concepts must be understood when declaring a + particular container. All of these concepts are described in + and + . + Use this figure to decide which concepts in the former are + relevant. Reading the latter section is necessary only if + computing using multiple processors. The programs in the previous + chapter illustrate many of these concepts. + + + &pooma; Container Summary + + + + &array; + container mapping indices to values and that may be + used in expressions + + + &dynamicarray; + one-dimensional &array; whose domain can be dynamically + resized + + + &field; + container mapping indices to one or more values and + residing in multi-dimensional space + + + &tensor; + multi-dimensional mathematical tensor + + + &matrix; + two-dimensional mathematical matrix + + + &vector; + multi-dimensional mathematical vector + + + +

+ + + + A &pooma; array;, generalizing a &c; + array, maps indices to values. Given a index or position in an + &array;'s domain, it returns the associated value, either by + returning a stored value or by computing it. The use of indices, + which are usually ordered tuples, permits constant-time access + although computing a particular value may require significant + time. In addition to the functionality provided by &c; arrays, + the &array; class automatically handles memory allocation and + deallocation, supports a wider variety of assignments, and can be + used in expressions. For example, the addition of two arrays can + be assigned to an array and the product of a scalar element and an + array is permissible. + + + + A &pooma; &dynamicarray; extends + &array; capabilities to support a dynamically-changing domain but + is restricted to only one dimension. When the &dynamicarray; is + resized, its values are preserved. + + + + A &pooma; &field; is an &array; with + spatial extent. Each domain consists of cells + in one-, two-, or three-dimensional space. Although indexed + similarly to &array;s, each cell may contain multiple values and + multiple materials. A &field;'s mesh stores its spatial + characteristics and can map yield, e.g., a point contained in a + cell, the distance between two cells, and a cell's normals. A + &field; should be used whenever geometric or spatial computations + are needed, multiple values per index are desired, or a + computation involves more than one material. + + + + A &tensor; + implements a multi-dimensional mathematical tensor. Since it is a + first-class type, it can be used in expressions such as + adding two &tensor;s. + + + + A &matrix; + implements a two-dimensional mathematical matrix. Since it is a + first-class type, it can be used in expressions such as + multiplying matrices and assignments to matrices. + + + + A &vector; + implements a multi-dimensional mathematical vector, which is an + ordered tuple of components. Since it is a first-class type, it + can be used in expressions such as adding two &vector;s and + multiplying a &matrix; and a &vector;. + + The data of an &array;, &dynamicarray;, or &field; can be + viewed using more than one container by taking a view. A + view of + an existing container &container; is a container whose domain + is a subset of &container;. The subset can equal the + original domain. A view acts like a reference in that changing + any of the view's values also changes the original container's and + vice versa. While users sometimes explicitly create views, they + are perhaps more frequently created as temporaries in expressions. + For example, if A is an &array; and + I is a domain, A(I) - + A(I-1) forms the difference between adjacent + values. + + +

+ Choosing a Container + + The two most commonly used &pooma; containers are &array;s + and &field;s. contains a + decision tree describing how to choose an appropriate + container. + + + Choosing a &pooma; Container + + + + If modeling mathematical entries, + use a &vector;, &matrix;, or &tensor;. + + + If indices and values reside in multi-dimensional space + &space;, + use a &field;. + + + If there are multiple values per index, + use a &field;. + + + If there are multiple materials participating in the same computation, + use a &field;. + + + If the domain's size dynamically changes and is one-dimensional, + use a &dynamicarray;. + + + Otherwise + use an &array;. + + + +

+ +

+ + +

+ Declaring Sequential Containers + +

+ Concepts For Declaring Containers + + + + + + concepts involved in declaring containers + + + + + In the previous sections, we introduced the &pooma; + containers and described how to choose one appropriate for a + given task. In this section, we describe the concepts involved + in declaring them. Concepts specific to distributed computation + are described in the next section. + + + illustrates the containers and the concepts involved in their + declarations. The containers are listed in the top row. Lines + connect these containers to the components necessary for their + declarations. For example, an &array; declaration requires an + engine and a layout. These, in turn, depend on other &pooma; + concepts. Declarations necessary only for distributed, or + multiprocessor, computation are surrounded by dashed lines. You + can use these dependences to indicate the concepts needed for a + particular container. + + An engine + stores and, if necessary, computes a container's values. A + container has one or more engines. The separation of a container + and its storage permits optimizing a program's space + requirements. For example, a container returning the same value + for all indices can use a constant engine, which need only store + one value for the entire domain. A &compressiblebrick; engine + reduces its space requirements to a constant whenever all its + values are the same. The separation also permits taking views of containers without + copying storage. + +

+ &array; and &field; Mathematical and Computational Concepts + + + + + + maps from indices to values + + + + + A layout + maps domain indices to the + processors and computer memory used by a container's engines. + See . + A computer computes a container's values using a processor and + memory. The layout specifies the processor(s) and memory to use + for each particular index. A container's layout for a + uniprocessor implementation consists of its domain, the + processor, and its memory. For a multi-processor implementation, + the layout maps portions of the domain to (possibly different) + processors and memory. + + A &field;'s mesh + maps domain indices to + geometric values in &space; such as distance between cells, edge + lengths, and normals to cells. In other words, it provides a + &field;'s spatial extent. See also . + Different mesh types may support different geometric + values. + + A mesh's corner + position specifies the point in &space; corresponding to + the lower, left corner of its domain. Combining this, the + domain, and the cell size fully specifies the mesh's map from + indices to &space;. + + A mesh's cell + size specifies the spatial dimensions of + a &field; cell, e.g., its + width, height, and depth, in &space;. Combining this, the + domain, and the corner position fully specifies the mesh's map + from indices to &space;. + + A domain + is a set of points on which a container can define values. An + interval + consists of all integral points between two values. It is + frequently represented using mathematical interval notation [a,b] + even though it contains only the integral points, e.g., a, a+1, + a+2, …, b. The concept is generalized to multiple + dimensions by forming tensor product of intervals, i.e., all the + integral tuples in an &n;-dimensional space. For example, the + two-dimensional containers in the previous chapter are defined on + a two-dimensional domain with the both dimensions' spanning the + interval [0,n). A stride + is a subset of an interval consisting of regularly-spaced + points. A range + is a subset of an interval formed by the tensor product of strides. + A region + represents a continuous &n;-dimensional domain. +

+ + +

+ Declaring Distributed Containers + + In the previous section, we introduced the concepts + important when declaring containers for use on uniprocessor + computers. When using multi-processor computers, we augment + these concepts with those for distributed computation. Reading + this section is important only for running the same program on + multiple processors. Many of these concepts were introduced in + and . + illustrates the &pooma; distributed computation model. In this + section, we concentrate on the concepts necessary to declare a + distributed container. + + As we noted in , a &pooma; + programmer must specify how each container's domain should be + distributed among the available processors and memory spaces. + Using this information, the Toolkit automatically distributes the + data among the available processors and handles any required + communication among them. The three concepts necessary for + declaring distributed containers are a partition, a guard layer, + and a context mapper tag. + + A partition + specified how to divide a container's domain into distributed + pieces. For example, the partition illustrated in + would divide a two-dimensional domain into three equally-sized + pieces along the x-dimension and two equally-sized pieces along + the y-dimension. Partitions can be independent of the size of + container's domain. The example partition will work on any + domain as long as the size of its x-dimension is a multiple of + three. A domain is separated into disjoint patches. + + A guard + layer is extra domain + surrounding each patch. This region has read-only values. An + external guard + layer specifies values surrounding the + domain. Its presence eases computation along the domain's edges + by permitting the same computations as for more internal + computations. An internal guard + layer duplicates values from adjacent + patches so communication with adjacent patches need not occur + during a patch's computation. The use of guard layers is an + optimization; using external guard layers eases programming and + using internal guard layers reduces communication among + processors. Their use is not required. + + A context + mapper indicates how a container's + patches are mapped to processors and shared memory. For example, + the &distributedtag; indicates that the patches should be + distributed among the processors so each patch occurs once in the + entire computation. The &replicatedtag; indicates that the + patches should be replicated among the processors so each + processing unit has its own copy of all the patches. While it + could be wasteful to have different processors perform the same + computation, replicating a container can reduce possibly more + expensive communication costs. +

+ + +

+ ????Computation Syntax???? + + UNFINISHED +

+ + +

+ Computation Environment + + A &pooma; program can execute on a wide variety of + computers. The default sequential computing + environment consists of one processor and + associated memory, as found on a personal computer. In contrast, + a distributed computing + environment may have multiple processors + and multiple distributed or shared memories. For example, some + desktop computers have dual processors and shared memory. A + large supercomputer may have thousands of processors, perhaps + with groups of eight sharing the same memory. + + Using distributed computation requires three things: the + programmer must declare how container domains will be + distributed, &pooma; must be configured to use a communications + library, and the &pooma; executable must be run using the + library. All of these were illustrated in and . + illustrates the &pooma; distributed computation model. + described how to declare containers with distributed domains. + Detailed instructions how to configure &pooma; for distributed + computation appear in . More + detailed instructions how to run distributed &pooma; executables + appear in . Here we present + three concepts for distributed computation: context, layout, and + a communication library. + + A context + is a collection of shared memory and processors that can execute + a program of a portion of a program. It can have one or more + processors, but all these processors must access the same shared + memory. Usually the computer and its operating system, not the + programmer, determine the available contexts. + + + HERE + + +

+ +

+ Extraneous Material + + Describe the software application layers similar to + papers/SCPaper-95.html and "Short Tour of + &pooma;" in papers/SiamOO98_paper.ps. + Section 2.2, "Why a Framework?," of + papers/pooma.ps argues why a layered approach + eases use. Section 3.1, "Framework Layer Description," + describes the five layers. + + FINISH: Write short glossary entries for each of these. + + FINISH: Look through the source code to ensure all main + concepts are listed. + + Here are (preliminary) &pooma; equations: + + + &pooma; <quote>Equations</quote> + + + + + field = data + materials + centering + layout + mesh + + + map from space to values + + + array = data + layout + + + map from indices to values + + + mesh = layout + origin + spacings + + + distribute domain through physical space + + + layout = domain + partition + context_mapper_tag (distributed/replicated) + + + distribute domain's blocks among processors/contexts + + + partition = blocks + guard layers + + + split domain into blocks + + + domain = newDomain + + + space of permissible indices + + + +

+ + FINISH: How does parallel execution fit in? + + FINISH: Should we also name and describe each layer? +

+ + +

+ +

+ Containers + +

+ &array; + +

+ Section 4 "Future Improvements in + &pooma; II" of + papers/SiamOO98_paper.ps + + An &array; can be thought of as a map from one &domain; to + another.… &array;s depend only on the interface of + &domain;s. Thus, a subset of view of an &array; can be + manipulated in all the same ways as the original &array;. + &array;s can perform indirect addressing because the output + &domain; one one &array; can be used as the input &domain; of + another &array;. &array;s also provide individual element + access. +

+ Views of &array;s + + Section 3, "Domains and Views," of + papers/iscope98.pdf motivates the need for + views: +

+ One of the primary uses of domains is to specify + subsections of &array; objects. Subarrays are a common + feature of array classes; however, it is often difficult to + make such subarrays behave like first-class objects. The + &pooma; II engine concept provides a clean solution to + this problem: subsetting an &array; with a domain object + creates a new &array; that has a view engine. +

+ +

+ + +

+ +

+ + +

+ Relations + + UNFINISHED +

+ + +

+ Stencils + + Section 3.5.4, "Stencil Objects," of + papers/pooma.ps provides a few uses of + stencils. + + Section 5, "Performance," of + papers/iscope98.pdf motivates and explains + stencils. +

+ + +

+ Contexts + +

+ background.html + In order to be able to cope with the variations in machine + architecture noted above, &pooma;'s distributed execution model + is defined in terms of one or more contexts, each of which may + host one or more threads. A context is a distinct region of + memory in some computer. The threads associated with the context + can access data in that memory region and can run on the + processors associated with that context. Threads running in + different contexts cannot access memory in other contexts. + + A single context may include several physical processors, + or just one. Conversely, different contexts do not have to be on + separate computers—for example, a 32-node SMP computer could + have up to 32 separate contexts. This release of &pooma; only + supports a single context for each application, but can use + multiple threads in the context on supported platforms. Support + for multiple contexts will be added in an upcoming + release. +

+ + + + + + Writing Sequential Programs + + &pooma; can reorder computations to permit more efficient + computation. When running a sequential program, reordering may + permit omission of unneeded computations. For example, if only + values from a particular field are printed, only computations + involving the field and containers dependent on it need to occur. + When running a distributed program, reordering may permit + computation and communication among processors to overlap. &pooma; + automatically tracks dependences between data-parallel expressions, + ensuring correct ordering. It does not track statements accessing + particular &array; and &field; values so the programmer must + precede these statements with calls to + Pooma::blockAndEvaluate(). Each call forces + the executable to wait until all computation has completed. Thus, + the desired values are known to be available. In practice, some + calls to Pooma::blockAndEvaluate may not be + necessary, but omitting them requires knowledge of &pooma;'s + dependence computations, so the &author; recommends calling + Pooma::blockAndEvaluate before each access to + a particular value in an &array; or &field;. Omitting a necessary + call may lead to a race condition. See for + instructions how to diagnose and eliminate these race conditions. + + Section 3, "Domains and Views," of + papers/iscope98.pdf describes five types of + domains. + + UNFINISHED + +

+ + +

+ Using <type>Inform</type>s for Output + + UNFINISHED +

+ + +

+ Miscellaneous + + Section 3, "Domains and Views," of + papers/iscope98.pdf describes five types of + domains. +

+ + + + + Writing Distributed Programs + + Discuss the distributed model and guard cells. See docs/parallelism.html. + + Does any of the parallel implementation described in + papers/SCPaper-95.html still apply? + + ?Tuning program for maximize parallel performance? + + external references to &mpi; and threads + + QUESTION: Are there interesting, short parallel programs in + any &mpi; book that we can convert to &pooma;? + +

+ Layouts + + An out-of-date description can be found in Section 3.3, + especially 3.3.2, of papers/pooma.ps + describes the global/local interactions and parallel abstraction + layers. +

+ +

+ Parallel Communication + + An out-of-date description can be found in + Section 3.3.3 of papers/pooma.ps +

+ +

+ + + + + + Debugging and Profiling &pooma; Programs + + Consider &dashdash;pooma-debug number. + See also other &pooma; options in src/Utilities/Options.h. + + UNFINISHED +

+ Finding Race Conditions From Missing + <function>blockAndEvaluate</function> Calls + + &pooma; may reorder computations so calls to + Pooma::blockAndEvaluate() are necessary + before accessing particular &array; and &field; values. + Omission of necessary calls can lead to race conditions where + the ordering of reads and writes to particular values is + incorrect. To help diagnose if calls to + Pooma::blockAndEvaluate are missing, invoke + a &pooma; executable with the + &dashdash;pooma-blocking-expressions option. + This automatically causes + Pooma::blockAndEvaluate to be called after + each statement. Doing so ensures program correctness, but it + may increase running times, particularly if multiple processors + are used, because computation and communication may not overlap + as much as possible. Of course, program correctness is more + important than execution speed. + + If using + &dashdash;pooma-blocking-expressions changes a + program's output, it is missing one or more calls to + Pooma::blockAndEvaluate. To narrow the + region with a missing call, surround the region in question with + calls to Pooma::blockingExpressions(true) + and Pooma::blockingExpressions(false), + but do not use the + &dashdash;pooma-blocking-expressions option. + Within the region, Pooma::blockAndEvaluate + will be invoked after each statement. Repeatedly reducing the + region's size should reveal where calls are missing. +

+ + + + + + &pooma; Reference Manual + + + TMP: This Chapter Holds These Comments But Will Be Removed + + For each template parameter need to describe the constraints + on it. + + Remove this section when the following concerns have been + addressed. + + Add a partintro explaining file suffixes such as .h, .cpp, .cmpl.cpp, .mk, .conf. Should we also explain use + of inline even when necessary and the template + model, e.g., including .cpp files. + + QUESTION: What are the key concepts around which to organize + the manual? + + QUESTION: What format should the manual use? + +

+ Musser, Derge, and Sanai, §20.0. + It is important to state the requirements on the components + as generally as possible. For example, instead of saying + class X must define a member function + operator++(), we say for any + object x of type X, + ++x is defined. +

+ +

+ + +

+ Initializers + + Add a table. +

+ + +

+ + +

+ +

+ + +

+ Copying &array;s + + Explain how copied arrays and views of arrays share the + same underlying engine so changing values in one also affects the + other. This is called a shallow copy. +

+ + +

+ Utility Methods + + + &array; Utility Methods + + + + + Function + Effect + + + + + + + void makeOwnCopy + + + + unknown: See line 2044 + + + ADD ALL other utility methods. + + + +

+ + +

+ +

+ + +

+ Implementation Details + + DynamicArray has no + protected or + private members. +

+ + +

+ Views of &array;s + + UNFINISHED +

+ + +

+ + FINISH: Write one or two examples or refer to ones + previously in the text. +

+ + +

+ Utility Functions + +

+ Compressed Data + + Add a table containing + elementsCompressed, + compressed, compress, + and uncompress. +

+ + +

+ +

+ Obtaining Subfields + + ADD: a description of subField found + in src/Field/Field.h. + This function, meaningless for &array;, is provided for + consistency with &field;. +

+ + +

+ The &field; Container + + ADD: table of template parameters and table of compile-time + types and values. + + +

+ Constructors and Destructors + + ADD: this section similar to &array;s's constructor and + destructor section. +

+ +

+ Initializers + + Add a table. +

+ + +

+ Element Access + + ADD: a table ala &array;. Be sure to include + all. +

+ + +

+ Component Access + + ADD: a table ala &array;. +

+ + +

+ Supporting Relations + + ADD: a table with the member functions including + addRelation, + removeRelations, + applyRelations, and + setDirty. +

+ + +

+ Accessors + + ADD: a table using lines like src/Field/Field.h:1243–1333. +

+ + +

+ Utility Methods + + ADD: a table including + makeOwnCopy. +

+ + +

+ Implementation Details + + ADD: a table similar to &array;'s. + +

+ +

+ + +

+ Utility Functions + +

+ Compressed Data + + Add a table containing + elementsCompressed, + compressed, compress, + and uncompress. +

+ + +

+ Centering Sizes and Number of Materials + + ADD: a description of numMaterials and + centeringSize found in src/Field/Field.h. + + QUESTION: How do these relate to any method functions? +

+ + +

+ Obtaining Subfields + + ADD: a description of subField found + in src/Field/Field.h. +

+ +

+ + +

+ Relative &field; Positions + + Permit specifying field positions relative to a field + location. Describe FieldOffset and + FieldOffsetList. See src/Field/FieldOffset.h +

+ + +

+ Computing Close-by Field Positions + + Given a field location, return the set of field locations + that are closest using ?Manhattan? distance. See src/Field/NearestNeighbors.h. +

+ + +

+ + +

+ +

+ + +

+ &engine; Compile-Time Interface + + ADD: a table of template parameters ala &array;. ADD: + compile-time types and values. +

+ + +

+ Constructors and Destructors + + ADD: a table of constructors and destructors ala + &array;'s. +

+ + +

+ Element Access + + ADD: a table with read and + operator(). +

+ + +

+ Accessors + + ADD: a table of accessors. +

+ + +

+ &engine; Assignments + + similar to &array;'s assignments. shallow copies. ADD: a + table with one entry +

+ + +

+ Utility Methods + + ADD: a table including + makeOwnCopy. + + QUESTION: What are dataObject, + isShared, and related methods? +

+ + +

+ Implementation Details + + ADD: this section. Explain that + dataBlock_m and data_m point + to the same place. The latter speeds access, but what is the + purpose of the former? +

+ + +

+ Brick and BrickView Engines + + ADD: description of what a brick means. ADD: whatever + specializations the class has, e.g., + offset. + + QUESTION: What does DoubleSliceHelper do? +

+ + +

+ Compressible Brick and BrickView Engines + + ADD this. +

+ + +

+ Dynamic and DynamicView Engines: + + ADD this. Manages a contiguous, local, resizable, 1D block + of data. +

+ + +

+ Component Engines + + I believe these implement array component-forwarding. See + src/Engine/ForwardingEngine.h. +

+ + +

+ Expression Engines + + Should this be described in the &pete; section? Unlikely. + See src/Engine/ExpressionEngine.h. +

+ + +

+ &engine; Functors + + QUESTION: What is an EngineFunctor? Should it + have its own section? See src/Engine/EngineFunctor.h. +

+ + +

+ + + + + &benchmark; Programs + + Explain how to use &benchmark; programs, especially the + options. Explain how to write a &benchmark; program. See also + src/Utilities/Benchmark.h + and src/Utilities/Benchmark.cmpl.cpp. + + + + + + Layouts and Partitions: Distribute Computation Among + Contexts + + What is the difference between ReplicatedTag and + DistributedTag? + + + + + + &pete;: Evaluating Parallel Expressions + +

+ UNKNOWN + +

+ Leaf Tag Classes + + NotifyPreReadTag indicates a term is about to + be read. Why is this needed? Defined in src/Utilities/NotifyPreRead.h. +

+ <type>ViewIndexer<Dim,Dim2></type> + + Defined in src/Utilities/ViewIndexer.h, this + type translates indices between a domain and a view of it. +

+ + + + Threads + + Perhaps include information in src/Engine/DataObject.h. + + &pooma; options include UNFINISHED + + + + + + Utility Types + + TMP: What is a good order? + +

+ +

+ <type>Clock</type>: Measuring a Program's Execution Time + + See src/Utilities/Clock.h. +

+ + +

+ +

+ <type>Inform</type>: Formatted Output for Multi-context + Execution + + See src/Utilities/Inform.h and src/Utilities/Inform.cmpl.cpp. +

+ +

+ <type>Statistics</type>: Report &pooma; Execution Statistics + + Collect and print execution statistics. Defined in + src/Utilities/Statistics.h. +

+ +

+ Random Numbers: <type>Unique</type> + + See src/Utilities/Unique.h. +

+ +

+ <type>ElementProperties<T></type>: Properties a Type + Supports + + This traits class permits optimizations in other templated + classes. See src/Utilities/ElementProperties.h. + +

+ +

+ <type>LoopUtils</type>: Loop Computations at Compile Time + + At compile time, LoopUtils supports copying + between arrays and computing the dot product of arrays. See + src/Utilities/MetaProg.h. +

+ +

+ <type>ModelElement<T></type>: Wrap a Type + + A wrapper class used to differentiate overloaded functions. + Defined in src/Utilities/ModelElement.h. Used + only by &array; and DynamicArray. +

+ +

+ + +

+ Improving Consistency of Container Interfaces + +

+ Relations for &array;s + + Do &array;s currently support relations? If not, why not? + Should they be added? +

+ +

+ + +

+ <function>where</function> Proxies + + QUESTION: Do we even discuss this broken + feature? Where is it used? Some related code is in + src/Array/Array.h:2511–2520. +

+ + +

+ Easing Input for Distributed Programs + + Currently, standard input to distributed programs is not + supported. Instead input can be passed via command-line arguments, + which are replicated to each context. &inform; streams support for + input could be added. For context 0, standard input could be + used. Other contexts would use a RemoteProxy to + distribute the value to the other contests. See src/Engine/RemoteEngine.h for example + uses of RemoteProxy. +

+ + +

+ Improving Consistency Between &pooma; and &cheetah; + + Improve the consistency between &cheetah;'s and &pooma;'s + configurations. Currently, their defaults differ regarding + &cc; exceptions and static/shared libraries. +

+ + +

+ Very Long Term Development Ideas + + Describe how to write a new configuration file. +

+ Obtaining and Installing the &mm; Shared Memory Library + + &cheetah;, and thus &pooma;, can use Ralf Engelschall's &mm; + Shared Memory Library to pass messages between processors. For + example, the &author; uses this library on a two-processor + computer running &linux;. The library, available at + http://www.engelschall.com/sw/mm/, is available for free and has + been successfully tested on a variety of Unix platforms. + + We describe how to download and install the &mm; library. + + + Download the library from the &pooma; Download page + available off the &pooma; home page (&poomaHomePage;). + + + Extract the source code using tar xzvf + mm-1.1.3.tar.gz. Move into the resulting source + code directory mm-1.1.3. + + + Prepare to compile the source code by configuring it + using the configure command. To change + the default installation directory /usr/local, specify + &dashdash;prefix=directory + option. The other configuration options can be listed by + specifying the &dashdash;help option. Since the + &author; prefers to keep all &pooma;-related code in his + poomasubdirectory, he + uses ./configure + &dashdash;prefix=${HOME}/pooma/mm-1.1.3. + + + Create the library by issuing the make + command. This compiles the source code using a &c; compiler. To + use a different compiler than the &mm; configuration chooses, set + the CC to the compiler before configuring. + + + Optionally test the library by issuing the make + test command. If successful, the penultimate line + should be OK - ALL TESTS SUCCESSFULLY + PASSED. + + + Install the &mm; Library by issuing the make + install command. This copies the library files to the + installation directory. The mm-1.1.3 directory containing the + source code may now be removed. + + + +

+ + +

+ Obtaining and Installing the &cheetah; Messaging Library + + The &cheetah; Library decouples communication from + synchronization. Using asynchronous messaging rather than + synchronous messaging permits a message sender to operate without + the cooperation of the message recipient. Thus, implementing + message sending is simpler and processing is more efficiently + overlapped with it. Remote method invocation is also supported. + The library was developed at the Los Alamos National Laboratory's + Advanced Computing Laboratory. + + &cheetah;'s messaging is implemented using an underlying + messaging library such as the Message Passing Interface (&mpi;) + Communications Library (FIXME: xref linkend="mpi99", ) or the &mm; + Shared Memory Library. &mpi; works on a wide variety of platforms + and has achieved widespread usage. &mm; works under Unix on any + computer with shared memory. Both libraries are available for + free. The instructions below work for whichever library you + choose. + + We describe how to download and install &cheetah;. + + + Download the library from the &pooma; Download page + available off the &pooma; home page (&poomaHomePage;). + + + Extract the source code using tar xzvf + cheetah-1.0.tgz. Move into the resulting source code + directory cheetah-1.0. + + + Edit a configuration file corresponding to your operating + system and compiler. These .conf files are located in the + config directory. For + example, to use &gcc; with the &linux; operating system, use + config/LINUXGCC.conf. + + The configuration file usually does not need + modification. However, if you are using &mm;, ensure + shmem_default_dir specifies its location. + For example, the &author; modified the value to + "/home/oldham/pooma/mm-1.1.3". + + + Prepare to compile the source code by configuring it + using the configure command. Specify the + configuration file using the &dashdash;arch option. + Its argument should be the configuration file's name, omitting + its .conf suffix. For + example, &dashdash;arch LINUXGCC. Some other + options include + + + &dashdash;help + + lists all the available options + + + + &dashdash;shmem &dashdash;nompi + + indicates use of &mm;, not &mpi; + + + + &dashdash;mpi &dashdash;noshmem + + indicates use of &mpi;, not &mm; + + + + &dashdash;opt + + causes the compiler to produce optimized source code + + + + &dashdash;noex + + prevents use of &cc; exceptions + + + + &dashdash;static + + creates a static library, not a shared library + + + + &dashdash;shared + + creates a shared library, not a static library. This + is the default. + + + + &dashdash;prefix directory + + specifies the installation directory where the + library will be copied rather than the default. + + + + For example, the &author; uses ./configure &dashdash;arch + LINUXGCC &dashdash;shmem &dashdash;nompi &dashdash;noex &dashdash;static &dashdash;prefix + ${HOME}/pooma/cheetah-1.0 &dashdash;opt. The + &dashdash;arch LINUXGCC indicates use of &gcc; + under a &linux; operating system. The &mm; library is used, + but &cc; exceptions are not. The latter choice matches + &pooma;'s default choice. A static library, not a shared + library, is created. This is also &pooma;'s default choice. + The library will be installed in the ${HOME}/pooma/cheetah-1.0. + Finally, the library code will be optimized, hopefully running + faster than unoptimized code. + + + Follow the directions printed by + configure: Change directories to the + lib subdirectory named + by the &dashdash;arch argument and then type + make to compile the source code and create + the library. + + + Optionally ensure the library works correctly by issuing + the make tests command. + + + Install the library by issuing the make + install command. This copies the library files to + the installation directory. The cheetah-1.0 directory containing + the source code may now be removed. + + + +

+ +

+ Configuring &pooma; When Using &cheetah; + + To use &pooma; with &cheetah;, one must tell &pooma; the + location of the &cheetah; library using the + &dashdash;messaging configuration option. To do this, + + + Set the &cheetah; directory environment variable + CHEETAHDIR to the directory containing the + installed &cheetah; library. For + example, declare -x + CHEETAHDIR=${HOME}/pooma/cheetah-1.0 specifies the + installation directory used in the previous section. + + + When configuring &pooma;, specify the + &dashdash;messaging option. For example, + ./configure &dashdash;arch LINUXgcc &dashdash;opt + &dashdash;messaging configures for &linux;, &gcc;, and an + optimized library using &cheetah;. + + + +

+ + + + + Dealing with Compilation Errors + + Base this low-priority section on errors.html. QUESTION: Where is + errors.html? + + + + + + TMP: Notes to Myself + +

+ Miscellaneous + + + + If there is time, present another example program, e.g., a + Jacobi solver. + + + + If a reference manual for &pooma; implementors is written, + begin with a chapter Under the Hood: How &pooma; + Works, written from the point of view of &cc; + interpreter. For &pete;, use the material in + papers/PETE_DDJ/ddj_article.html, which + gives example code and descriptions of how the code works, and + see material in background.html's + Expression Templates. + + + + QUESTION: How do &pooma; parallel concepts compare with + Fortran D or high-performance Fortran FINISH CITE: + {koelbel94:_high_perfor_fortr_handb}? + + + + QUESTION: How do I know when to use a type name versus just + the concept? For example, when do I use array + versus &array;? + + + + Krylov solvers are described in Section 3.5.2 of + papers/pooma.ps. + + + + Section 5, "The Polygon Overlay Problem," describes + porting an ANSI &c; program to &pooma;. + + + + A good example book: STL Tutorial and Reference + Guide: &cc; Programming with the Standard Template + Library, second edition, by David R. Musser, + Gillmer J. Derge, and Atul Sanai, ISBN 0-201-37923-6, + QA76.73.C153.M87 2001. + + + + One STL reference book listed functions in margin notes, + easing finding material. Do this. + + + + QUESTION: Does Berna Massingill at Trinity University have + any interest ior access to any parallel computers? + + + +

+ + +

Cambridge, MA

+ + Using MPI + Portable Parallel Programming with the Message-Passing Interface + second edition + + + + + &glossary-chapter; + + + + &genindex.sgm; + + Index: outline.xml =================================================================== RCS file: outline.xml diff -N outline.xml *** /tmp/cvs1ybeCp Tue Dec 11 13:31:11 2001 --- /dev/null Fri Mar 23 21:37:44 2001 *************** *** 1,4287 **** - - - - - - - - - - - - - - - - - - - C"> - - - C++"> - - - Cheetah" > - - Doof2d" > - - Make"> - - MM"> - - MPI"> - - PDToolkit"> - - PETE"> - - POOMA"> - - POOMA Toolkit"> - - Purify"> - - Smarts"> - - - STL"> - - Tau"> - - - - - Array"> - - Benchmark"> - - Brick"> - - CompressibleBrick"> - - DistributedTag"> - - Domain"> - - double"> - - DynamicArray"> - - Engine"> - - Field"> - - Interval"> - - Layout"> - - LeafFunctor"> - - MultiPatch"> - - ReplicatedTag"> - - Stencil"> - - Vector"> - - - - - - - - - g++"> - - KCC"> - - Linux"> - - - - - http://pooma.codesourcery.com/pooma/download'> - - - http://www.pooma.com/'> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]> - - - - &pooma; - A &cc; Toolkit for High-Performance Parallel Scientific Computing - JeffreyD.Oldham - - CodeSourcery, LLC - - - 2001CodeSourcery, LLC () - Los Alamos National Laboratory - - - All rights reserved. This document may not be redistributed in any form without the express permission of the author. - - - - 0.01 - 2001 Nov 26 - jdo - first draft - - - - - - - - - Preface - - FINISH: Describe the target audience for &pooma; programs and - for this manual: &cc; programmers writing scientific code, possibly - parallel execution. - - Assume familiarity with &cc; template programming and the - standard template library. FIXME: Remove this index - entry.Oldham, - Jeffrey D. - -

- Notation - - UNFINISHED -

- - -

- How to Read This &Book; - - FINISH: Write this section in a style similar to Lamport's - LaTeX section 1.2. FINISH: Fix the book title and the section - number. -

- - -

- Obtaining &pooma; and Sample Programs - - Available for free from what WWW site? Include what portions - of LICENSE? Be sure to - include CVS instructions as well. - - Which additional packages are necessary and when? - -

- - -

- Using and Modifying &pooma; - - &pooma; is available under open source license. It can be - used and modified by anyone, anywhere. Can it be sold? Include - LICENSE. - - QUESTION: How do developers contribute code? - -

- - - - - - Programming with &pooma; - - - Introduction - - QUESTION: Add a partintro to the part above? - - &pooma; abbreviates Parallel Object-Oriented Methods - and Application. - - This document is an introduction to &pooma; v2.1, a &cc; - toolkit for high-performance scientific computation. &pooma; - runs efficiently on single-processor desktop machines, - shared-memory multiprocessors, and parallel supercomputers - containing dozens or hundreds of processors. What's more, by making - extensive use of the advanced features of the ANSI/ISO &cc; - standard—particularly templates—&pooma; presents a - compact, easy-to-read interface to its users. - - From Section of - papers/iscope98.pdf: - - Scientific software developers have struggled with the need - to express mathematical abstractions in an elegant and maintainable - way without sacrificing performance. The &pooma; (Parallel - Object-Oriented Methods and Applications) framework, written in - ANSI/ISO &cc;, has - demonstrated both high expressiveness and high performance for - large-scale scientific applications on platforms ranging from - workstations to massively parallel supercomputers. &pooma; provides - high-level abstractions for multidimensional arrays, physical - meshes, mathematical fields, and sets of particles. &pooma; also - exploits techniques such as expression templates to optimize serial - performance while encapsulating the details of parallel - communication and supporting block-based data compression. - Consequently, scientists can quickly assemble parallel simulation - codes by focusing directly on the physical abstractions relevant to - the system under study and not the technical difficulties of - parallel communication and machine-specific optimization. - - ADD: diagram of science and &pooma;. See the diagram that - Mark and I wrote. - - -

- Evolution of &pooma; - - QUESTION: Is this interesting? Even if it is, it should be - short. - - The file papers/SCPaper-95.html - describes ?&pooma;1? and its abstraction layers. - - The "Introduction" of - papers/Siam0098.ps describes the DoE's - funding motivation for &pooma;: Accelerated Strategic Computing - Initiative (ASCI) and Science-based Stockpile Stewardship (SBSS), - pp. 1–2. - - See list of developers on p. 1 of - papers/pooma.ps. - - See list of developers on p. 1 of - papers/pooma.ps. See history and motivation - on p. 3 of papers/pooma.ps. - - Use README for - information. - -

- introduction.html - - &pooma; was designed and implemented by scientists working - at the Los Alamos National Laboratory's Advanced Computing - Laboratory. Between them, these scientists have written and tuned - large applications on almost every commercial and experimental - supercomputer built in the last two decades. As the technology - used in those machines migrates down into departmental computing - servers and desktop multiprocessors, &pooma; is a vehicle for its - designers' experience to migrate as well. In particular, - &pooma;'s authors understand how to get good performance out of - modern architectures, with their many processors and multi-level - memory hierarchies, and how to handle the subtly complex problems - that arise in real-world applications. -

- -

- - - - - - A Tutorial Introduction - - UPDATE: In the following paragraph, fix the cross-reference - to the actual section. - - &pooma; provides different containers and processor - configurations and supports different implementation styles, as - described in . In this - chapter, we present several different implementations of the - &doof2d; two-dimensional diffusion simulation program: - - - a C-style implementation omitting any use of &pooma; - computing each array element individually, - - - a &pooma; &array; implementation computing each array - element individually, - - - a &pooma; &array; implementation using data-parallel - statements, - - - a &pooma; &array; implementation using stencils, which - support local computations, - - - a stencil-based &pooma; &array; implementation supporting - computation on multiple processors - - - a &pooma; &field; implementation using data-parallel - statements, and - - - a data-parallel &pooma; &field; implementation for - multi-processor execution. - - - - These illustrate the &array;, &field;, &engine;, layout, - mesh, and domain data types. They also illustrate various - immediate computation styles (element-wise accesses, data-parallel - expressions, and stencil computation) and various processor - configurations (one sequential processor and multiple - processors). - -

- &doof2d; Averagings - - - - - - The Initial Configuration - - - - - - - - After the First Averaging - - - - - - - - After the Second Averaging - - - - - The &doof2d; diffusion program starts with a two-dimensional - grid of values. To model an initial density, all grid values are - zero except for one nonzero value in the center. Each averaging, - each grid element, except the outermost ones, updates its value by - averaging its value and its eight neighbors. To avoid overwriting - grid values before all their uses occur, we use two arrays, reading - the first and writing the second and then reversing their roles - within each iteration. - - Figure - illustrates the averagings. Initially, only the center element has - nonzero value. To form the first averaging, each element's new - value equals the average of its and its neighbors' previous values. - Thus, the initial nonzero value spreads to a three-by-three grid. - The averaging continues, spreading to a five-by-five grid of - nonzero values. Values in outermost grid cells are always - zero. - - Before presenting various implementations of %doof2d;, we - explain how to install the &poomaToolkit;. - - REMOVE: &doof2d; algorithm and code is illustrated in - Section 4.1 of - pooma-publications/pooma.ps. It includes a - figure illustrating parallel communication of data. - -

- Installing &pooma; - - ADD: How does one install &pooma; using Windows or Mac? - - UPDATE: Make a more recent &pooma; source code file - available on &poomaDownloadPage;. For example, - LINUXgcc.conf is not available. - - In this section, we describe how to obtain, build, and - install the &poomaToolkit;. We focus on installing under the - Unix operating system. Instructions for installing on computers - running Microsoft Windows or MacOS, as well as more extensive - instructions for Unix, appear in . - - Obtain the &pooma; source code &poomaSourceFile; - from the &pooma; download page (&poomaDownloadPage;) available off - the &pooma; home page (&poomaHomePage;). The tgz - indicates this is a compressed tar archive file. To extract the - source files, use tar xzvf &poomaSourceFile;. - Move into the source code directory &poomaSource; directory; e.g., - cd &poomaSource;. - - Configuring the source code prepares the necessary paths for - compilation. First, determine a configuration file in - corresponding to your operating system and compiler in the - config/arch/ directory. - For example, LINUXgcc.conf supports compiling - under a &linux; operating system with &gcc; and SGI64KCC.conf supports compiling - under a 64-bit SGI Unix operating - system with &kcc;. Then, configure the source code: - ./configure --arch LINUXgcc --opt --suite - LINUXgcc-opt. The architecture argument to the - --arch option is the name of the corresponding - configuration file, omitting its .conf suffix. The - --opt indicates the &poomaToolkit; will - contain optimized source code, which makes the code run more - quickly but may impede debugging. Alternatively, the - --debug option supports debugging. The - suite name - can be any arbitrary string. We chose - LINUXgcc-opt to remind us of the architecture - and optimization choice. configure creates subdirectories - named by the suite name LINUXgcc-opt for use when - compiling the source files. Comments at the beginning of - lib/suiteName/PoomaConfiguration.h - record the configuration arguments. - - To compile the source code, set the - POOMASUITE environment variable to the suite name - and then type make. To set the environment - variable for the bash shell use - export - POOMASUITE=suiteName, - substituting the suite name's - suiteName. For the - csh shell, use setenv - POOMASUITE LINUXgcc-opt. Issuing the - make command compiles the &pooma; source code - files to create the &pooma; library. The &pooma; makefiles assume - the GNU &make; so substitute the proper - command if necessary. The &pooma; library can be found in, e.g., - lib/LINUXgcc-opt/libpooma-gcc.a. -

- -

- Hand-Coded Implementation - - Before implementing &doof2d; using the &poomaToolkit;, we - present a hand-coded implementation of &doof2d;. See . After querying the - user for the number of averagings, the arrays' memory is - allocated. Since the arrays' size is not known at compile time, - the arrays are accesses via pointers to allocated dynamic memory. - This memory is deallocated at the program's end to avoid memory - leaks. The arrays are initialized with initial conditions. For - the b array, all values except the central ones - have nonzero values. Only the outermost values of the - a array need be initialized to zero, but we - instead initialize them all using the loop used by - b. - - The simulation's kernel consists of triply nested loops. - The outermost loop controls the number of iterations. The inner - nested loops iterate through the arrays' elements, excepting the - outermost elements; note the loop indices range from 1 to n-2 - while the array indices range from 0 to n-1. Each - a value is assigned the average of its - corresponding value in b and the latter's - neighbors. Values in the two-dimensional grids are accessed using - two sets of brackets, e.g., a[i][j]. After - assigning values to a, a second averaging reads - values in a, writing values in - b. - - After the kernel finishes, the final central value is - printed. If the desired number of averagings is even, the value - in b is printed; otherwise, the value in - a is used. Finally, the dynamically-allocated - memory must be freed to avoid memory leaks. - - - Hand-Coded Implementation of &doof2d; - &doof2d-c-element; - - - The user specifies the desired number of averagings. - - - These variables point to the two-dimensional, - dynamically-allocated grids so we use a pointer to a pointer to - a &double;. - - - The user enters the desired grid size. The grid will be - a square with n by n grid cells. - - - Memory for the arrays is allocated. By default, the - array indices are zero-based. - - - Initially, all grid values are zero except for the one - nonzero value at the center of the second array. Array - positions are indicated using two brackets, e.g., - a[i][j]. A better implementation might - initialize only the outermost values of the - a array. - - - These constants indicate the number of iterations, and - the average weighting. - - - Each a value, except an outermost one, - is assigned the average of its analogous b - value and that value's neighbors. Note the loop indices ensure - the outermost values are not changed. The - weight's value ensures the computation is an - average. - - - The second averaging computes b's - values using values stored in a. - - - After the averagings finish, the central value is printed. - - - The dynamically-allocated memory must be deallocated to - avoid memory leaks. - - - - - To compile the executable, change directories to the &pooma; - &poomaExampleDirectory;/Doof2d - directory. Ensure the POOMASUITE environment - variable specifies the desired suite name - suiteName, as we did when compiling - &pooma; in the previous section . Issuing the - make Doof2d-C-element command creates the - executable - suiteName/Doof2d-C-element. - - When running the executable, specify the desired a - nonnegative number of averagings and the nonnegative number of - grid cells along any dimension. The resulting grid has the same - number of cells along each dimension. After the executable - finishes, the resulting value of the central element is - printed. -

- - -

- Element-wise &array; Implementation - - The simplest way to use the &poomaToolkit; is to - use the &pooma; &array; class instead of &c; arrays. &array;s - automatically handle memory allocation and deallocation, support a - wider variety of assignments, and can be used in expressions. - - implements &doof2d; using &array;s and element-wise accesses. - Since the same algorithm is used as , we will concentrate - on the differences. - - - Element-wise &array; Implementation of &doof2d; - &doof2d-array-element; - - - To use &pooma; &array;s, the Pooma/Arrays.h must be included. - - - The &poomaToolkit; structures must be constructed before - their use. - - - Before creating an &array;, its domain must be specified. - The N interval represents the - one-dimensional integral set {0, 1, 2, …, n-1}. An - Interval<2> object represents the entire - two-dimensional index domain. - - - An &array;'s template parameters indicate its dimension, - its value type, and how the values will be stored or computed. - The &brick; &engine; type indicates values will be directly - stored. It is responsible for allocating and deallocating - storage so new and - delete statements are not necessary. - The vertDomain specifies the array index - domain. - - - The first statement initializes all &array; values to the - same scalar value. This is possible because each &array; - knows its domain. The second statement - illustrates &array; element access. Indices, separated by - commas, are surrounded by parentheses rather than surrounded by - square brackets ([]). - - - &array; element access uses parentheses, rather than - square brackets - - - &pooma; may reorder computation of statements. Calling - Pooma::blockAndEvaluate ensures all - computation finishes before accessing a particular array - element. - - - Since &array;s are first-class objects, they - automatically deallocate any memory they require, eliminating - memory leaks. - - - The &poomaToolkit; structures must be destructed after - their use. - - - - - We describe the use of &array; and the &poomaToolkit; in - . - &array;s, declared in the Pooma/Arrays.h, are first-class - objects. They know their index domain, can be used - in expressions, can be assigned scalar and array values, and - handle their own memory allocation and deallocation. - - The creation of the a and - b &array;s requires an object specifying their - index domains. Since these are two-dimensional arrays, their - index domains are also two dimensional. The two-dimensional - Interval<2> object is the Cartesian product of - two one-dimensional Interval<1> objects, each - specifying the integral set {0, 1, 2, …, n-1}. - - An &array;'s template parameters indicate its dimension, the - type of its values, and how the values are stored. Both - a and b are two-dimension - arrays storing &double;s so their dimension - is 2 and its element type is &double;. An &engine; stores an - &array;'s values. For example, a &brick; &engine; explicitly - stores all values. A &compressiblebrick; &engine; also explicitly - stores values if more than value is present, but, if all values - are the same, storage for just that value is required. Since an - engine can store its values any way it desires, it might instead - compute its values using a function or compute the values stored - in separate engines. In practice, most explicitly specified - &engine;s are either &brick; or &compressiblebrick;. - - &array;s support both element-wise access and scalar - assignment. Element-wise access uses parentheses, not square - brackets. For example, b(n/2,n/2) - specifies the central element. The scalar assignment b - = 0.0 assigns the same 0.0 value to all array - elements. This is possible because the array knows the extent of - its domain. - - After the kernel finishes, the central value is printed out. - Just prior to this &array; access, a call to - Pooma::blockAndEvaluate() ensures all - computation has finished. &pooma; may reorder computation or - distribute them among various processors. Before reading an - individual &array; value, blockAndEvaluate - ensures the value has the correct value. Calling this function is - necessary only when accessing individual array elements because - &pooma; cannot determine when to call the function itself. For - example, before printing an array, &pooma; will call - blockAndEvaluate itself. - - Any program using the &poomaToolkit; must initialize the - toolkit's data structures using - Pooma::initialize(argc,argv). This - extracts &pooma;-specific command-line options from the - command-line arguments in argv and initializes - the inter-processor communication and other data structures. When - finished, Pooma::finalize() ensures all - computation has finished and the communication and other data - structures are destructed. -

- - -

- Data-Parallel &array; Implementation - - &pooma; supports data-parallel &array; accesses. Many - algorithms are more easily expressed using data-parallel - expressions. Also, the &poomaToolkit; might be able to reorder - the data-parallel computations to be more efficient or distribute - them among various processors. In this section, we concentrate - the differences between the data-parallel implementation of - &doof2d; listed in and the - element-wise implementation listed in the previous section . - - - Data-Parallel &array; Implementation of &doof2d; - &doof2d-array-parallel; - - - These variables specify one-dimensional domains {1, 2, - …, n-2}. Their Cartesian product specifies the domain - of the array values that are modified. - - - Data-parallel expressions replace nested loops and array - element accesses. For example, a(I,J) - represents the subset of the a array having - a domain equal to the Cartesian product of I - and J. Intervals can shifted by an additive - or multiplicative constant. - - - - - Data-parallel expressions apply domain objects to containers - to indicate a set of parallel expressions. For example, in the - program listed above, a(I,J) specifies all - of a array excepting the outermost elements. - The array's vertDomain domain consists of the - Cartesian product of {0, 1, 2, …, n-1} and itself, while - I and J each specify {1, 2, - …, n-2}. Thus, a(I,J) is the subset - with a domain of the Cartesian product of {1, 2, …, n-2} - and itself. It is called a view of an - array. It is itself an array, with a domain and supporting - element access, but its storage is the same as - a's. Changing a value in - a(I,J) also changes the same value in - a. Changing a value in the latter also changes - the former if the value is not one of a's - outermost elements. The expression - b(I+1,J+1) indicates the subset of - b with a domain consisting of the Cartesian - product of {2, 3, …, n-1}, i.e., the same domain as - a(I,J) but shifted up one unit and to the - right one unit. Only an &interval;'s value, not its name, is - important. Thus, all uses of J in this program - could be replaced by I without changing the - semantics. - -

- Adding &array;s - - - - - - Adding two arrays with different domains. - - - When adding arrays, values in corresponding positions are - added even if they have different indices, indicated by the - small numbers adjacent to the arrays. - - - - - The statement assigning to a(I,J) - illustrates that &array;s may participate in expressions. Each - addend is a view of an array, which is itself an array. Each view - has the same domain size so their sum can be formed by - corresponding elements of each array. For example, the lower, - left element of the result equals the sum of the lower, left - elements of the addend arrays. For the computation, indices are - ignored; only the relative positions within each domain are used. - - illustrates adding two arrays with different domain indices. The - indices are indicated by the small numbers to the left and the - bottom of the arrays. Even though 9 and 3 have different indices - (1,1) and (2,0), they are added to each other because they have - the same relative positions within the addends. -

- - -

- Stencil &array; Implementation - - Many computations are local, computing a &array;'s value by - using close-by &array; values. Encapsulating this computation in - a stencil can yield faster code because the compiler can determine - all accesses come from the same array. Each stencil consists of a - function object and an indication of the stencil's extent. - - - Stencil &array; Implementation of &doof2d; - &doof2d-array-stencil; - - - A stencil is a function object implementing a local - operation on an &array;. - - - &pooma; applies this function call - operator() to the interior domain of an - &array;. Although not strictly necessary, the function's - template parameter C permits using this - stencil with &array;s and other containers. The - read &array; member function supports only - reading values, not writing values, thus possibly permitting - faster access. - - - These two functions indicate the stencil's size. For - each dimension, the stencil extends one cell to the left of (or - below) its center and also one call to the right (or above) its - center. - - - Create the stencil. - - - Applying stencil to the - b array and a subset - interiorDomain of its domain yields an - array, which is assigned to a subset of a. - The stencil's function object is applied to each position in - the specified subset of b. - - - - - Before we describe how to create a stencil, we describe how - to apply a stencil to an array, yielding values. To compute the - value associated with index position (1,3), the stencil's center - is placed at (1,3). The stencil's - upperExtent and - lowerExtent functions indicate which &array; - values the stencil's function will use. See . - Applying the stencil's function call - operator() yields the computed value. To - compute multiple &array; values, apply a stencil to the array and - a domain object: stencil(b, - interiorDomain). This applies the stencil to each - position in the domain. The user must ensure that applying the - stencil does not access nonexistent &array; values. - -

- Applying a Stencil to an &array; - - - - - - Apply a stencil to position (1,3) of an array. - - - To compute the value associated with index position (1,3) - of an array, place the stencil's center, indicated with dashed - lines, at the position. The computation involves the array - values covered by the array and delineated by - upperExtent and - lowerExtent. - - - - - To create a stencil object, apply the &stencil; type to a - function object class. For example, - Stencil<DoofNinePt> stencil declares - the stencil object. The function object class - must define a function call operator() with a - container parameter and index parameters. The number of index - parameters, indicating the stencil's center, must equal the - container's dimension. For example, DoofNinePt - defines operator()(const C& c, int i, int - j). We templated the container type - C although this is not strictly necessary. The - two index parameters i and j - ensure the stencil works with two-dimensional containers. The - lowerExtent indicates how far to the left - (or below) the stencil extends beyond its center. Its parameter - indicates a particular dimension. Index parameters - i and j are in dimension 0 - and 1. upperExtent serves an - analogous purpose. The &poomaToolkit; uses these functions when - distribution computation among various processors, but it does not - use these functions to ensure nonexistent &array; values are not - accessed. Caveat stencil user! -

- - -

- Distributed &array; Implementation - - A &pooma; program can execute on one or multiple processors. - To convert a program designed for uniprocessor execution to a - program designed for multiprocessor execution, the programmer need - only specify how each container's domain should be split into - patches. The &poomaToolkit; automatically - distributes the data among the available processors and handles - any required communication between processors. - - - Distributed Stencil &array; Implementation of &doof2d; - &doof2d-array-distributed; - - - The number of processors executing a &pooma; program can - be specified at run-time. - - - The UniformGridPartition declaration - specifies how an array's domain will be partition, of split, - into patches. Guard layers are an optimization that can reduce - data communication between patches. The - UniformGridLayout declaration applies the - partition to the given domain, distributing the resulting - patches among various processors. - - - The MultiPatch &engine; distributes requests - for &array; values to the associated patch. Since a patch may - associated with a different processor, its - remote engine has type - Remote<Brick>. &pooma; automatically - distributes the patches among available memories and - processors. - - - The stencil computation, whether for one processor or - multiple processors, is the same. - - - - - Supporting distributed computation requires only minor code - changes. These changes specify how each container's domain is - distributed among the available processors. The rest of the - program, including all the computations, remains the same. When - running, the &pooma; executable interacts with the run-time - library to determine which processors are available, distributes - the containers' domains, and automatically handles all necessary - interprocessor communication. The same executable runs on one or - many processors. Thus, the programmer can write one program, - debugging it on a uniprocessor computer and running it on a - supercomputer. - -

- The &pooma; Distributed Computation Model - - - - - - the &pooma; distributed computation model. - - - The &pooma; distributed computation model combines - partitioning containers' domains and the computer configuration - to create a layout. - - - - - &pooma;'s distributed computing model separates container - domain concepts from computer configuration concepts. See . - The program indicates how each container's domain will be - partitioned. This process is represented in the upper left corner - of the figure. A user-specified partition specifies how to split - the domain into pieces. For example, the illustrated partition - splits the domain into three equal-sized pieces along the - x-dimension and two equal-sized pieces along the y-dimension. - Thus, the domain is split into patches. - The partition also specifies external and internal guard layers. - A guard layer is a domain surrounding a - patch. A patch's computation only reads but does not write these - values. An external guard layer - conceptually surrounds the entire container domain with boundary - values whose presence permits all domain computations to be - performed the same way even for values along the domain's edge. - An internal guard layer duplicates values - from adjacent patches so communication need not occur during a - patch's computation. The use of guard layers is an optimization; - using external guard layers eases programming and using internal - guard layers reduces communication between processor. Their use - is not required. - - The computer configuration of shared memory and processors - is determined by the run-time system. See the upper right portion - of . - A context is a collection of shared memory - and processors that can execute a program or a portion of a - program. For example, a two-processor desktop computer might have - memory accessible to both processors so it is a context. A - supercomputer consisting of desktop computers networked together - might have as many contexts as computers. The run-time system, - e.g., the Message Passing Interface (&mpi;) Communications Library - (FIXME: xref linkend="mpi99", ) or the &mm; - Shared Memory Library (), communicates - the available contexts to the executable. &pooma; must be - configured for the particular run-time system. See . - - A layout combines patches with - contexts so the program can be executed. If &distributedtag; is - specified, the patches are distributed among the available - contexts. If &replicatedtag; is specified, each set of patches is - replicated among each context. Regardless, the containers' - domains are now distributed among the contexts so the program can - run. When a patch needs data from another patch, the &pooma; - toolkit sends messages to the desired patch uses a message-passing - library. All such communication is automatically performed by the - toolkit with no need for programmer or user input. - - FIXME: The two previous paragraphs demonstrate confusion - between run-time system and message-passing - library. - - Incorporating &pooma;'s distributed computation model into a - program requires writing very few lines of code. illustrates - this. The partition declaration creates a - UniformGridPartition splitting each dimension of a - container's domain into equally-sized - nuProcessors pieces. The first - GuardLayers argument specifies each patch will have - copy of adjacent patches' outermost values. This may speed - computation because a patch need not synchronize its computation - with other patches' processors. Since each value's computation - requires knowing its surrounding neighbors, the internal guard - layer is one layer deep. The second GuardLayers - argument specifies no external guard layer. External guard layers - simplify computing values along the edges of domains. Since the - program already uses only the interior domain for computation, we - do not use this feature. - - The layout declaration creates a - UniformGridLayout layout. As illustrates, - it needs to know a container's domain, a partition, the computer's - contexts, and a &distributedtag; or &replicatedtag;. These - comprise layout's three parameters; the - contexts are implicitly supplied by the run-time system. - - To create a distributed &array;, it should be created using - a &layout; object and have a &multipatch; engine. Prior - implementations designed for uniprocessors constructed the - container using a &domain; object. A distributed implementation - uses a &layout; object, which conceptually specifies a &domain; - object and its distribution throughout the computer. A - &multipatch; engine supports computations using multiple patches. - The UniformTag indicates the patches all have the - same size. Since patches may reside on different contexts, the - second template parameter is Remote. Its - Brick template parameter specifies the engine for a - particular patch on a particular context. Most distributed - programs use MultiPatch<UniformTag, Remote<Brick> - > or MultiPatch<UniformTag, - Remote<CompressibleBrick> > engines. - - The computations for a distributed implementation are - exactly the same as for a sequential implementation. The &pooma; - Toolkit and a message-passing library automatically perform all - computation. - - The command to run the programs is dependent on the run-time - system. To use &mpi; with the Irix 6.5 operating system, one - can use the mpirun command. For example, - mpirun -np 9 Doof2d-Array-distributed -mpi - --num-patches 3 invokes the &mpi; run-time system with - nine processors. The -mpi argument tells - the &pooma; executable Doof2d-Array-distributed - to use the &mpi; Library. - - HERE - - The command Doof2d-Array-distributed -shmem -np 2 - --num-patches 2 - - To run Doof2d-Array-distributed with the &mm; - Shared Memory Library, use - - HERE - - - - COMMENT: See background.html for a partial - explanation. A context is a distinct - region of memory in some computer. Execution thread is associated - with each context. One or more different processors can be - associated with the same context. - - QUESTION: How do &pooma; parallel concepts compare with - Fortran D or high-performance Fortran FINISH CITE: - {koelbel94:_high_perfor_fortr_handb}? - - QUESTION: What does Cheetah do for us? Must configure with - --messaging and Cheetah library must be available. When running - Doof2d benchmark, use --num-patches N. On LinuxKCC, use - '--num-patches p --run-impls 14 --sim-params N 0 1'. Runtime - system must also provide some support. How do I write about this? - What is an example? How does one install Cheetah? - - -

- - -

- Relations - - UNFINISHED - -

- - - - - - Overview of &pooma; Concepts - - Describe the software application layers similar to - papers/SCPaper-95.html and "Short Tour of - &pooma;" in papers/SiamOO98_paper.ps. - Section 2.2, "Why a Framework?," of - papers/pooma.ps argues why a layered approach - eases use. Section 3.1, "Framework Layer Description," - describes the five layers. - - FINISH: Write short glossary entries for each of these. - - FINISH: Look through the source code to ensure all main - concepts are listed. - - Here are (preliminary) &pooma; equations: - - - &pooma; <quote>Equations</quote> - - - - - field = data + materials + centering + layout + mesh - - - map from space to values - - - array = data + layout - - - map from indices to values - - - mesh = layout + origin + spacings - - - distribute domain through physical space - - - layout = domain + partition + layout_tag (distributed/replicated) - - - distribute domain's blocks among processors/contexts - - - partition = blocks + guard layers - - - split domain into blocks - - - domain = newDomain - - - space of permissible indices - - - -

- - - FINISH: Following is a first try at describing the &pooma; - abstraction layers. See also paper illustration. - - - &pooma; Abstraction Layers - - - - - application program - - - &array; &field; (should have - FieldEngine under it) - - - &engine; - - - evaluators - - - -

- - FINISH: How does parallel execution fit in? - - FINISH: Should we also name and describe each layer? - -

- Domains - -

- Section 4 "Future Improvements in - &pooma; II" of - papers/SiamOO98_paper.ps - - A &domain; is a set of discrete points in some space.… - &domain;s provide all of the expected domain calculus - capabilities such as subsetting and intersection. - -

- - Section 3, "Domains and Views," of - papers/iscope98.pdf describes five types of - domains -

- - -

- Layouts - - UNFINISHED - - Also describe partitions and guard cells within here. - -

- - -

- Meshes - - UNFINISHED -

- - -

- Data-Parallel Statements - - Can we use "An Overview of &pete;" from - papers/PETE_DDJ/ddj_article.html or is this - too low-level? - - Section 3.2.1 of papers/pooma.ps - gives a simple example of data-parallel expression. It also has a - paragraph introducing data-parallel operations and selecting - subsets of domains. Section 3.4 describes the Chained - Expression Object (CEO), apparently a precursor - of &pete;. Regardless, it provides some motivation and - introductory material. - - From Section 4 of - papers/SiamOO98_paper.ps: - - This version of &pete; reduces compile time of user codes - and utilizes compile-time knowledge of expression &domain;s for - better optimization. For example, more efficient loops for - evaluating an expression can be generated if &pete; knows that the - &domain; has unit stride in memory. - - Section 4, "Expressions and Evaluators", of - papers/iscope98.pdf has a good explanation of - &pooma; II's expression trees and expression engines. - - COMMENT: background.html has some related - &pete; material. -

- -

- Containers - -

- &array; - -

- Section 4 "Future Improvements in - &pooma; II" of - papers/SiamOO98_paper.ps - - An &array; can be thought of as a map from one &domain; to - another.… &array;s depend only on the interface of - &domain;s. Thus, a subset of view of an &array; can be - manipulated in all the same ways as the original &array;. - &array;s can perform indirect addressing because the output - &domain; one one &array; can be used as the input &domain; of - another &array;. &array;s also provide individual element - access. -

- - - - (unformatted) From - papers/GenericProgramming_CSE/dubois.html: - The &pooma; &array; concept provides an example of how these - generic-programming features can lead to flexible and efficient - code. An Array maps a fairly arbitrary input domain to an - arbitrary range of outputs. When used by itself, an &array; - object A refers to all of the values in its - domain. Element-wise mathematical operations or functions can be - applied to an array using straightforward notation, like A + B - or sin(A). Expressions involving Array objects are themselves - Arrays. The operation A(d), where d is a domain object that - describes a subset of A's domain, creates a view of A that - refers to that subset of points. Like an array expression, a - view is also an Array. If d represents a single point in the - domain, this indexing operation returns a single value from the - range. Equivalently, one can index an N-dimensional Array by - specifying N indices, which collectively specify a single point - in the input domain: A(i1, i2, ..., iN). - - The &pooma; multi-dimensional Array concept is similar to - the Fortran 90 array facility, but extends it in several - ways. Both &pooma; and Fortran arrays can have up to seven - dimensions, and can serve as containers for arbitrary - types. Both support the notion of views of a portion of the - array, known as array sections in F90. The &pooma; Array concept - supports more complex domains, including bounded, continuous - (floating-point) domains. Furthermore, Array indexing in &pooma; - is polymorphic; that is, the indexing operation X(i1,i2) can - perform the mapping from domain to range in a variety of ways, - depending on the particular type of the Array being - indexed. - - Fortran arrays are dense and the elements are arranged - according to column-major conventions. Therefore, X(i1,i2) - refers to element number i1-1+(i2-1)*numberRowsInA. However, as - Fig. 1 shows, Fortran-style "Brick" storage is not the only - storage format of interest to scientific programmers. For - compatibility with C conventions, one might want to use an array - featuring dense, row-major storage (a C-style Brick). To save - memory, it might be advantageous to use an array that only - stores a single value if all its element values are the - same. Other sparse storage schemes that only store certain - values may also be desirable. To exploit parallelism, it is - convenient for an array's storage to be broken up into patches, - which can be processed independently by different CPUs. Finally, - one can imagine an array with no data at all. For example, the - values can be computed from an expression involving other - arrays, or analytically from the indices. - - - The &pooma; &array; Class Template - - Next we describe &pooma;'s model of the Array concept, the - Array class template. The three most important requirements from - the point of view of overall design are: (1) arbitrary domain, - (2) arbitrary range, and (3) polymorphic indexing. These express - themselves in the template parameters for the &pooma; Array - class. The template - - template <int Dim, class T = double, class EngineTag = Brick> - class Array; - - is a specification for creating a set of classes all named - Array. The template parameters Dim, T, and EngineTag determine - the precise type of the Array. Dim represents the dimension of - the array's domain. T gives the type of array elements, thereby - defining the output range of the array. EngineTag specifies the - the manner of indexing and types of the indices. - - End From - papers/GenericProgramming_CSE/dubois.html: - - Section 2, "Arrays and Engines," of - papers/iscope98.pdf describes both &array;s - and &engine;s. This may or may not duplicate the material in - papers/GenericProgramming_CSE/dubois.html. - -

- Views of &array;s - - Section 3, "Domains and Views," of - papers/iscope98.pdf motivates the need for - views: -

- One of the primary uses of domains is to specify - subsections of &array; objects. Subarrays are a common - feature of array classes; however, it is often difficult to - make such subarrays behave like first-class objects. The - &pooma; II engine concept provides a clean solution to - this problem: subsetting an &array; with a domain object - creates a new &array; that has a view engine. -

- -

- &field; - - QUESTION: Do we include boundary conditions here? - - FINISH: Do we have an example that shows something not possible - with &array;? - - Describe and illustrate multi-material and - multivalue? - - ADD: description of meshes and guard layers. - -

- - -

- <type>TinyMatrix</type> - - Section 3.2.2 of - papers/pooma.ps describes &vector;s and - matrix classes. -

- -

- Engines - - (unformatted) From - papers/GenericProgramming_CSE/dubois.html: - - The Engine Concept - - To implement polymorphic indexing, the Array class defers - data storage and data lookup to an engine object. The requirements - that the Array template places on its engine provide the - definition for the Engine concept. We'll describe these by - examining a simplified version of the Array template, shown in - Fig. 2. - - First, the Array class determines and exports (makes - Engine_t part of Array's public interface) the type of the engine - class that it will use: - - typedef Engine<Dim, T, EngineTag> Engine_t; - - This statement declares Engine_t to be an alias for the type - Engine<Dim,T,EngineTag>. This is the first requirement - placed on engine classes: they must be specializations of a - general Engine template whose template parameters are identical to - those of Array. Next, the Array template determines the type of - scalar arguments (indices) to be used in operator(), the function - that implements &pooma;'s Fortran-style indexing syntax X(i1,i2): - - typedef typename Engine_t::Index_t Index_t; - - This statement defines another type alias: - Array<Dim,T,EngineTag>::Index_t is simply an alias for - Engine_t::Index_t. Engine_t::Index_t is a qualified name, which - means that the type Index_t is found in the class Engine_t. This - is the second requirement for the Engine concept: the class - Engine_t must define a public type called Index_t. This line will - not compile if that definition is not supplied. This indirection - is one of the ways that &pooma; supports polymorphic indexing. If - the Engine works with a discrete integer domain, it defines its - Index_t to be an integral type. If the Engine works in a - continuous domain, it defines its Index_t to be a floating-point - type. - - The data lookup is performed in the operator() function. We - see that Array simply passes the indices on to its engine - object. Thus, we have the third requirement for the Engine - concept: it must provide a version of operator() that takes Dim - values of type Index_t. - - Simply passing the indices on to the engine object may seem - odd. After all, engine(i,j) looks like we're just indexing another - array. There are several advantages to this extra level of - indirection. The Array class is as faithful a model of the Array - concept as possible, while the Engine class is a low-level - interface to a user-defined data source. As a result, Array has a - wide variety of constructors for user convenience, while engines - have but a few. Array supports a wide variety of overloaded - operator() functions for view creation and indexing. Engines - support indexing only. Array does not have direct access to the - data, which is managed by the engine object. Finally, Array has a - wide variety of overloaded mathematical operators and functions, - and works with the Portable Expression Template Engine (PETE) [4] - to provide efficient evaluation of Array expressions. Engines have - no such support. In general, Array is much more complex and - feature-laden than Engine. This is the prime advantage of the - separation of interface and implementation: Array only has to be - implemented once by the &pooma; developers. Engines are simple - enough to be written by users and plugged directly into the Array - framework. - - Figure 3 illustrates the "Brick" specialization of the - Engine template, which implements Fortran-style lookup into a - block of memory. First, there is the general Engine template, - which is empty as there is no default behavior for an unknown - EngineTag. The general template is therefore not a model for the - Engine concept and Array classes attempting to use it will not - compile. Next, there is the definition of the Brick class, a - policy tag whose sole purpose is to select a particular - specialization of the Engine template. Finally, there is the - partial specialization of the Engine template. Examining its body, - we see the required Index_t typedef and the required operator(), - which follows the Fortran prescription for generating an offset - into the data block based on the row, column, and the number of - rows. All of the requirements are met, so the Brick-Engine class - is a model of the Engine concept. - - End From - papers/GenericProgramming_CSE/dubois.html: - - (unformatted) From - papers/GenericProgramming_CSE/dubois.html: - - Compile-time Versus Run-Time Polymorphism - - Encapsulating the indexing in an Engine class has important - advantages, both in terms of flexibility and efficiency. To - illustrate this point, we introduce the PolarGaussian-Engine - specialization in Fig. 4. This is an analytic engine that - calculates its values directly from its inputs. Unlike the - Brick-Engine, this engine is "indexed" with data of the same type - as its output: it maps a set of T's to a single T. Therefore, the - Index_t typedef selects T as the index type, as opposed to the int - in the Brick-Engine specialization. The operator() function also - differs in that it computes the return value according to an - analytic formula. - - Both Engine<Dim,T,Brick> and - Engine<Dim,T,PolarGaussian> can be plugged in to an Array by - simply varying the Array's EngineTag. This is possible despite the - fact that the two classes exhibit dramatically different behavior - because they are both models of the Engine concept. - - Notice that we have achieved polymorphic indexing without - the use of inheritance or virtual functions. For instance, - consider the following code snippet: - - Array<2, double, Brick> a; - Array<2, double, PolarGaussian> b; - - double x = a(2, 3); // x = a.engine.data[2 + 3 * a.engine.numRows]; - double y = b(2.0, 3.0); // y = exp(-(2.0*2.0+3.0*3.0) / b.engine.delta); - - The data lookup functions for the two Arrays perform completely - different operations. Since this is accomplished using static - types, it is known as compile-time polymorphism. Moreover, - everything is known at compile time, so the functions are fully - inlined and optimized, thereby yielding code equivalent to that - shown in the comments above. - - The flexibility and efficiency of compile-time polymorphism - cannot be duplicated with a run-time implementation. To illustrate - this point, in Fig. 5, we re-implement our Array concept using the - classic Envelope-Letter pattern [5], with the array class, - RTArray, being the envelope and the run-time-engine, RTEngine, - being the letter. RTArray defers data lookup to the engine object - by invoking the engine's functions through a pointer to the - RTEngine base class. Figure 6 illustrates the RTEngine base class - and Fig. 7 illustrates two descendants: RTBrick and - RTPolarGaussian. - - The run-time implementation provides the same basic - functionality as the compile-time implementation, but it is not as - flexible or as efficient. It lacks flexibility in that the return - type of the indexing operation must be specified in the RTEngine - base class and in the RTArray class. Thus, in Figs. 5 and 6,we see - versions of RTArray::operator() and RTEngine::index functions that - take both int's and T's. If the programmer wants to add another - index-type option, these classes must be modified. This is a - violation of the open-closed principle proposed by Meyer - [6]. Also, since RTEngine descendants will usually only implement - one version of index, we cannot make RTEngine an abstract base - class. Instead, we have the default versions of index throw an - exception. Thus, compile-time error checking is - weakened. Furthermore, since indexing is done via a virtual - function call, it will almost never be inlined, which is not - acceptable in most scientific applications. - - There are advantages to the Envelope-Letter approach. First, - all RTArray objects have the same type, allowing them to be stored - in homogeneous collections. This can simplify the design of some - applications. Second, RTArray objects can change their engines at - runtime, and thus effectively change their types on the fly??this - is the primary reason for using the Envelope-Letter idiom, and can - be very important in some applications. - - For most scientific applications, however, these issues are - minor, and maximum performance for array indexing is of paramount - importance. Our compile-time approach achieves this performance - while providing the desired polymorphic indexing. - - From Section 4 of - papers/SiamOO98_paper.ps: - - The &array; class is templated on an &engine; type that - handles the actual implementation of the mapping from input to - output. Thus, the &array; interface features are completely - separate from the implementation, which could be a single &c; - array, a function of some kind or some other mechanism. This - flexibility allows an expression itself to be viewed through the - &array; interface. Thus, one can write something like - - foo(A*B+C); - where A, B and - C are &array;s and foo is - a function taking an &array; as an argument. The expression - A*B+C - will only be evaluated by the expression engine as needed by - foo. - - In fact, one can even write &engine;s which are wrappers - around external data structures created in non-&pooma; codes and - know to manipulate these structures. Once this is done, the - external entities have access to the entire &array; interface and - can utilize all of the powerful features of - &pooma; II. - - Section 2, "Arrays and Engines," of - papers/iscope98.pdf describes both &array;s - and &engine;s. This may or may not duplicate the material in - papers/GenericProgramming_CSE/dubois.html. - - Section 4, "Expressions and Evaluators", of - papers/iscope98.pdf has a good explanation of - &pooma; II's expression trees and expression engines. - - - MultiPatch Engine - From README: To actually use multiple - contexts effectively, you need to use the MultiPatch engine with - patch engines that are Remote engines. Then the data will be - distributed across multiple contexts instead of being copied on - every context. See the files in example/Doof2d for a simple - example that creates a MultiPatch array that can be distributed - across multiple contexts and performs a stencil computation on - that array. - - -

- - -

- Relations - - UNFINISHED -

- - -

- Stencils - - Section 3.5.4, "Stencil Objects," of - papers/pooma.ps provides a few uses of - stencils. - - Section 5, "Performance," of - papers/iscope98.pdf motivates and explains - stencils. -

- - -

- Contexts - -

- background.html - In order to be able to cope with the variations in machine - architecture noted above, &pooma;'s distributed execution model - is defined in terms of one or more contexts, each of which may - host one or more threads. A context is a distinct region of - memory in some computer. The threads associated with the context - can access data in that memory region and can run on the - processors associated with that context. Threads running in - different contexts cannot access memory in other contexts. - - A single context may include several physical processors, - or just one. Conversely, different contexts do not have to be on - separate computers—for example, a 32-node SMP computer could - have up to 32 separate contexts. This release of &pooma; only - supports a single context for each application, but can use - multiple threads in the context on supported platforms. Support - for multiple contexts will be added in an upcoming - release. -

- - -

- Utility Types: ???TITLE?? - -

- &vector; - - Section 3.2.2 of - papers/pooma.ps describes &vector;s and - matrix classes. -

- -

- - - - - Writing Sequential Programs - - UNFINISHED - -

- &benchmark; Programs - - Define a &benchmark; program vs. an example or an - executable. Provide a short overview of how to run these - programs. Provide an overview of how to write these programs. - See src/Utilities/Benchmark.h. -

- - -

- Using <type>Inform</type>s for Output - - UNFINISHED -

- - - - - - Writing Distributed Programs - - Discuss the distributed model and guard cells. See docs/parallelism.html. - - Does any of the parallel implementation described in - papers/SCPaper-95.html still apply? - - ?Tuning program for maximize parallel performance? - - external references to &mpi; and threads - - QUESTION: Are there interesting, short parallel programs in - any &mpi; book that we can convert to &pooma;? - -

- Layouts - - An out-of-date description can be found in Section 3.3, - especially 3.3.2, of papers/pooma.ps - describes the global/local interactions and parallel abstraction - layers. -

- -

- Parallel Communication - - An out-of-date description can be found in - Section 3.3.3 of papers/pooma.ps -

- -

- Using Threads - - QUESTION: Where do threads fit into the manual? Do threads - even work? - - From Section 4, of - papers/SiamOO98_paper.ps - - &pooma; II will make use of a new parallel run-time - system called &smarts; that is under development at the ACL. - &smarts; supports lightweight threads, so the evaluator will be - able to farm out data communication tasks and the evaluation of - subsets of an expression to multiple threads, thus increasing the - overlap of communication and computation. Threads will also be - available at the user level for situations in which a - task-parallel approach is deemed appropriate. -

- - - - - - Under the Hood: How &pooma; Works - - from point of view of &cc; interpreter - -

- &pete; - - Use the material in - papers/PETE_DDJ/ddj_article.html, which gives - example code and descriptions of how the code works. - - See material in background.html's Expression - Templates. -

- - - - - - Debugging and Profiling &pooma; Programs - - UNFINISHED - - - - - - Example Program: Jacobi Solver - - QUESTION: Is this chapter necessary? Do we have enough - existing source code to write this chapter? - - - - - - &pooma; Reference Manual - - - TMP: This Chapter Holds These Comments But Will Be Removed - - For each template parameter need to describe the constraints - on it. - - Remove this section when the following concerns have been - addressed. - - Add a partintro explaining file suffixes such as .h, .cpp, .cmpl.cpp, .mk, .conf. Should we also explain use - of inline even when necessary and the template - model, e.g., including .cpp files. - - QUESTION: What are the key concepts around which to organize - the manual? - - QUESTION: What format should the manual use? - -

- Musser, Derge, and Sanai, §20.0. - It is important to state the requirements on the components - as generally as possible. For example, instead of saying - class X must define a member function - operator++(), we say for any - object x of type X, - ++x is defined. -

- - - - - A Typical &pooma; Class - - - Class Member Notation - - - *_t - - - - type within a class. QUESTION: What is the &cc; name for - this? - - - - - *_m - - - - data member - - - - - - &pooma; Class Vocabulary - - component - - one of several values packaged together. For example, a - three-dimensional vector has three components, i.e., three - values. - - - - element-wise - - applied to each element in the group, e.g., an array - - - - reduction - - repeated application of a binary operator to all elements, - yielding one value - - - - tag - - an enumerated value indicating inclusion in a particular - semantic class. The set of values need not be explicitly - declared. - - - - - - - - - Installing and Configuring &pooma; - - - - Installing &pooma;. - - - Requirements for configuration files. - - - - Include descriptions of using &smarts;, &cheetah;, τ, - &pdt;. - - QUESTION: Does it install on windows and on mac? If so, what - are the instructions? See also INSTALL.{mac,unix,windows}. - - README has some - information on &cheetah; and threads in the Message-Based - Parallelism section. - - Which additional packages are necessary and when? - - What configure options should we list? See configure. Be sure to list - debugging option and how its output relates to config/LINUXgcc.suite.mk. - - config/arch has files - for (OS, compiler) pairs. Explain how to modify a configuration - file. List requirements when making a new configuration file (low - priority). - - config/LINUXgcc.suite.mk has output - from configure. Useful to - relate to configuration files and configure's debugging output. - - - - - - Compilation and &make; Files - - We assume Gnu make. Do we know what assumptions are made? - - How do all these files interact with each other? Ala a make - interpreter, give an example of which files are read and - when. - - - config/Shared/README.make - This has short descriptions of many files, - especially in config/Shared. - - makefile - These appear throughout all directories. What are - the equivalences classes and what are their - parts? - - include.mk - What does this do? Occurs in many directories: - when? Template seems to be config/Shared/include2.mk. - - subdir.mk - list of subdirectories; occurs in several - directories: when? src/subdir.mk is a good - example. - - - objfile.mk - - list of object files to construct, presumably from - *.cmpl.cpp files. - src/Utilities/objfile.mk is an - example. - - - config/Shared/rules.mk - most compiler rules - - config/head.mk - read at beginning of each - makefile? - - config/Shared/tail.mk - read at end of each makefile? - - config/Shared/variables.mk - Is this used? - - config/Shared/compilerules.mk - table of origin and target suffixes and commands - for conversion - - - - - - - - - &array;s - - Include src/Pooma/Arrays.h to use &array;s. - The implementation source code is in src/Array. - - FINISH: Define an array. Introduce its parts. - - ADD: some mention of the maximum supported number of - dimensions somewhere. - -

- The &array; Container - - - Template Parameters - - - - - Parameter - Interpretation - - - - - Dim - dimension - - - T - array element type - - - EngineTag - type of computation engine object - - - -

- - QUESTION: How do I introduce class type definitions, when - they are used, i.e., compile-time or run-time, and when - programmers should use them? - - - Compile-Time Types and Values - - - - - Type or Value - Interpretation - - - - - This_t - the &array; object's type - - - Engine_t - the &array; object's engine's type - - - EngineTag_t - indication of engine's category - - - Element_t - the type of the array elements, i.e., T - - - ElementRef_t - the type of a reference to an array element, - i.e., T&. Equivalently, the type to write to a - single element. - - - Domain_t - the array's domain's type, i.e., the type of the - union of all array indices - - - Layout_t - unknown - - - dimensions - integer equalling the number of dimensions, i.e., - Dim - - - rank - integer equalling the number of dimensions, i.e., - Dim; a synonym for - dimensions - - - -

- -

- Constructors and Destructors - - - Constructors and Destructors - - - - - Function - Effect - - - - - - - Array - - - - Creates an array that will be resized - later. - - - - - Array - const Engine_t& - engine - - - Creates an array with an engine equivalent to - the engine. This array will have the - same values as engine. QUESTION: Why - would a user every want to use this - constructor? - - - - - Array - - const - Engine<Dim2, T2, EngineTag2>& - engine - - - const - Initializer& init - - - - What does this do? - - - ADD ALL CONSTRUCTORS AND DESTRUCTORS. - - - -

- - -

- Initializers - - Add a table. -

- - -

- Element Access - - - &array; Element Access - - - - - Function - Effect - - - - - - - Element_t read - - - - unknown: See line 1839. - - - - - Element_t read - - const - Sub1& s1 - - - const - Sub2& s2 - - - - How does the version with template parameters, - e.g., Sub1 differ from the int - version? - - - - - Element_t operator() - - const - Sub1& s1 - - - const - Sub2& s2 - - - - How does this differ from read(const - Sub1& s1, const Sub2& s2)? - - - ADD ALL reads and - operator()s. - - - -

- - -

- Component Access - - When an array stores elements having components, e.g., an - array of vectors, tensors, or arrays, the - comp returns an array consisting of the - specified components. The original and component array share the - same engine so changing the values in one affects values in the - other. - - For example, if &n; × &n; array a - consists of three-dimensional real-valued vectors, - a.comp(1) returns a &n; × &n; - real-valued array of all the middle vector components. Assigning - to the component array will also modify the middle components of - the vectors in a. - - - &array; Component Access - - - - - Function - Effect - - - - - - - UNKNOWN compute this comp - - const - int& - i1 - - - - unknown: See line 1989. - - - ADD ALL comps. - - - -

- -

- Accessors - - - &array; Accessor Methods - - - - - Function - Effect - - - - - - - int first - - int - d - - - - unknown: See line 2050 - - - ADD ALL other accessor methods, including - engine. - - - -

- - -

- Copying &array;s - - Explain how copied arrays and views of arrays share the - same underlying engine so changing values in one also affects the - other. This is called a shallow copy. -

- - -

- Utility Methods - - - &array; Utility Methods - - - - - Function - Effect - - - - - - - void makeOwnCopy - - - - unknown: See line 2044 - - - ADD ALL other utility methods. - - - -

- - -

- Implementation Details - - As a container, an &array;'s implementation is quite - simple. Its privatedata consists of - an engine, and it has no private - functions. - - - &array; Implementation Data - - - - - Data Member - Meaning - - - - - - - private - Engine_t engine_m - - - engine computing the array's values - - - -

- -

- - -

- &dynamicarray;s: Dynamically-Sized Domains - - A DynamicArray is a read-write array with extra - create/destroy methods. It can act just like a regular Array, but - can have a dynamically-changing domain. See src/DynamicArray/DynamicArray.h. - - ADD: Briefly describe what the class does and an example of - where it is used. - - ADD: Check that its interface is actually the same as for - &array;. - - ADD: Check that the operations on dynamic arrays are - actually the same as for &array;. See src/DynamicArray/DynamicArrayOperators.h, - src/DynamicArray/PoomaDynamicArrayOperators.h, - and src/DynamicArray/VectorDynamicArrayOperators.h. - - -

- Implementation Details - - DynamicArray has no - protected or - private members. -

- - -

- Views of &array;s - - UNFINISHED -

- - -

- &array; Assignments - - &pooma; supports assignments to &array;s of other &array;s - and scalar values. QUESTION: Is the following correct? For the - former, the right-hand side array's domain must be at least as - large as the left-hand side array's domain. Corresponding values - are copied. Assigning a scalar value to an array ensures all the - array elements have the same scalar value. - - UNFINISHED: Add a table containing assignment operators - found one lines 2097–2202. -

- - -

- Printing &array;s - - &array;s support output to but not input from IO streams. - In particular, output to ostreams and file streams is - supported. - - Add a table, using src/Array/Array.h, lines - 2408–2421. See the implementation in src/Array/PrintArray.h. - - QUESTION: How does one print a &dynamicarray;. -

- - -

- Expressions Involving &array;s - - In &pooma;, expressions may contain entire &array;s. That - is, &array;s are first-class objects with respect to expressions. - For example, given &array;s a and - b, the expression a + b - is equivalent to an array containing the element-wise sum of the - two arrays. - - Any finite number of the operators listed below can be used - in an expression. The precedence and order of operation is the - same as with ordinary built-in types. - - QUESTION: Do &field;s also support the same set of - operations? - - QUESTION: Some operations in src/Field/FieldOperators.h use both - &array; and &field;. Do we list them here or in the &field; - section or both or somewhere else? - - In the table below, &array; supplants the exact return types - because they are complicated and rarely need to be explicitly - written down. - - - Operators on &array; - - - - - Operator - Value - - - - - - - - Array acos - const Array<Dim,T,EngineTag>& a - - - - an array containing the element-wise inverse - cosine of the array a - - - ADD ALL other operators appearing in src/Array/ArrayOperators.h, - src/Array/ArrayOperatorSpecializations.h, - src/Array/PoomaArrayOperators.h, - and src/Array/VectorArrayOperators.h. - - - -

- - FINISH: Write one or two examples or refer to ones - previously in the text. -

- - -

- Reducing All &array; Elements to One Value - - These reduction functions repeatedly apply a binary - operation to all array elements to yield a value. These functions - are similar to the Standard Template Library's - accumulate function. For example, - sum repeatedly applies the binary plus - operator to all array elements, yielding the sum of all array - elements. - - FINISH: What order of operation, if any, is - guaranteed? - - FINISH: Add a table of the functions in src/Array/Reductions.h. - - How does one use one's own binary function? See src/Engine/Reduction.h. -

- - -

- Utility Functions - -

- Compressed Data - - Add a table containing - elementsCompressed, - compressed, compress, - and uncompress. -

- - -

- Centering Sizes and Number of Materials - - ADD: a description of numMaterials and - centeringSize found in src/Field/Field.h. These functions - are meaningless for &array; but are provided for consistency with - &field;. -

- -

- Obtaining Subfields - - ADD: a description of subField found - in src/Field/Field.h. - This function, meaningless for &array;, is provided for - consistency with &field;. -

- - -

- TMP: What do we do with these …? Remove this - section. - - QUESTION: Do we describe the &leaffunctor;s specialized for - &array;s in src/Array/Array.h or in the &pete; - reference section? What about the functions in src/Array/CreateLeaf.h? - - QUESTION: What is an EngineFunctor? We - probably should describe it in an analogous way as for - &leaffunctor;s. - - QUESTION: Where do we write about - ExpressionTraits for &array;s? - - QUESTION: Do we describe the ElementProperties - specialization at this place or in its section? - - QUESTION: Do we describe the Patch - specialization for &array;s (src/Array/Array.h:1300) in this - place or in a section for patches? -

- - - - - &field;s - - An &array; is a set of values indexed by - coordinates, one value per coordinate. It models the computer - science idea of an array. Similarly, a &field; is a set of values - indexed by coordinate. It models the mathematical and physical - idea of a field represented by a grid of rectangular cells, each - having at least one value. A &field;'s functionality is a superset - of an &array;'s functionality because: - - - A &field; is distributed through space so one can compute - the distances between cells. - - - Each cell can hold multiple values. For example, a - rectangular cell can have one value on each of its faces. - - - Multiple materials can share the same cell. For example, - different values can be stored in the same cell for carbon, - oxygen, and nitrogen. - - - Also, &field;s' values can be related by relations. Thus, if one - field's values change, a dependent field's values can be - automatically computed when needed. FIXME: See also the unfinished - works chapter's entry concerning relations and arrays. - - QUESTION: Should we add a picture comparing and contrasting - an array and a field? - - QUESTION: How much structure can be copied from the &array; - chapter? - - QUESTION: Where is NewMeshTag, defined in - src/Field/Field.h, - used? - - QUESTION: Do we describe the &leaffunctor;s specialized for - &field;s in src/Field/Field.h or in the &pete; - reference section? Use the same decision for &array;s. - - QUESTION: What do the structure and functions in src/Field/Mesh/PositionFunctions.h - do? - - -

- The &field; Container - - ADD: table of template parameters and table of compile-time - types and values. - - -

- Constructors and Destructors - - ADD: this section similar to &array;s's constructor and - destructor section. -

- -

- Initializers - - Add a table. -

- - -

- Element Access - - ADD: a table ala &array;. Be sure to include - all. -

- - -

- Component Access - - ADD: a table ala &array;. -

- - -

- Obtaining Subfields - - ADD: discussion and a table listing ways to obtain - subfields. Although the implementation may treat subfield views - and other field views similarly (?Is this true?), they are - conceptually different ideas so we present them - separately. - - See src/Field/Field.h's - operator[], - subField, …, - material. -

- - -

- Supporting Relations - - ADD: a table with the member functions including - addRelation, - removeRelations, - applyRelations, and - setDirty. -

- - -

- Accessors - - ADD: a table using lines like src/Field/Field.h:1243–1333. -

- - -

- Utility Methods - - ADD: a table including - makeOwnCopy. -

- - -

- Implementation Details - - ADD: a table similar to &array;'s. - -

- -

- - -

- Views of &field;s - - Be sure to relate to &array; views. Note only three - dimensions are supported. - - Be sure to describe f[i]. Does this - refer to a particular material or a particular value within a - cell? I do not remember. See SubFieldView in - src/Field/Field.h. -

- - -

- &field; Assignments - - ADD: Describe supported assignments, relating to &array;'s - assignments. - - UNFINISHED: Add a table containing assignment operators - found on src/Field/Field.h:2097–2202 - and 1512–1611. -

- - -

- Printing &field;s - - QUESTION: How similar is this to printing &array;s? - - &field;s support output to but not input from IO streams. - In particular, output to ostreams and file streams is - supported. - - Add a table, using src/Field/Field.h, lines - 1996–2009. See the implementation in src/Field/PrintField.h. -

- - -

- Combining &field; Elements - - Like &array;s, &field;s support reduction of all elements to - one value. Additionally, the latter supports computing a field's - values using field stencils. QUESTION: How do I describe this - with a minimum of jargon? - - ADD: something similar to &array; reductions. - - FINISH: Add a table of the functions in src/Field/FieldReductions.h. - - FINISH: Add a table of the functions in src/Field/DiffOps/FieldOffsetReductions.h. - QUESTION: Why is only sum defined? -

- - -

- Expressions Involving &field;s - - Do something similar to &array;'s section. See the - operations defined in src/Field/FieldOperators.h, - src/Field/FieldOperatorSpecializations.h, - src/Field/PoomaFieldOperators.h, and - src/Field/VectorFieldOperators.h. - - Some operations involve both &array; and &field; - parameters. Where do we list them? -

- - -

- &field; Stencils: Faster, Local Computations - - ADD: a description of a stencil. Why is it needed? How - does a user use it? How does a user know when to use one? Add - documentation of the material from src/Field/DiffOps/FieldStencil.h. - - How is FieldShiftEngine used by &field; - stencils? Should it be described here or in the &engine; section? - See the the code in src/Field/DiffOps/FieldShiftEngine.h. -

- - -

- Cell Volumes, Face Areas, Edge Lengths, Normals - - ADD: a description of these functions. See src/Field/Mesh/MeshFunctions.h. - These are initialized in, e.g., src/Field/Mesh/UniformRectilinearMesh.h. - Note that these do not work for NoMesh. -

- - -

- Divergence Operators - - ADD: a table having divergence operators, explaining the - current restrictions imposed by what is implemented. See - src/Field/DiffOps/Div.h - and src/Field/DiffOps/Div.UR.h. What - restrictions does UR (mesh) connote? -

- - -

- Utility Functions - -

- Compressed Data - - Add a table containing - elementsCompressed, - compressed, compress, - and uncompress. -

- - -

- Centering Sizes and Number of Materials - - ADD: a description of numMaterials and - centeringSize found in src/Field/Field.h. - - QUESTION: How do these relate to any method functions? -

- - -

- Obtaining Subfields - - ADD: a description of subField found - in src/Field/Field.h. -

- -

- - -

- &field; Centerings - - DO: Describe the purpose of a centering and its definition. - Describe the ability to obtain canonical centerings. Explain how - to construct a unique centering. See src/Field/FieldDentering.h. -

- - -

- Relative &field; Positions - - Permit specifying field positions relative to a field - location. Describe FieldOffset and - FieldOffsetList. See src/Field/FieldOffset.h -

- - -

- Computing Close-by Field Positions - - Given a field location, return the set of field locations - that are closest using ?Manhattan? distance. See src/Field/NearestNeighbors.h. -

- - -

- Mesh ??? - - Unlike &array;s, &field;s are distributed throughout space - so distances between values within the &field can be computed. A - &field;'s mesh stores this spatial distribution. - - QUESTION: What do we need to write about meshes? What is - unimportant implementation and what should be described in this - reference section? - - QUESTION: Where in here should emphasize vertex, not cell, - positions? VERTEX appears repeatedly in src/Field/Mesh/NoMesh.h. - - - Mesh Types - - - - - Mesh Type - Description - - - - - NoMesh<Dim> - no physical spacing, causing a &field; to mimic - an &array; with multiple engines. - - - UniformRectilinearMesh<Dim,T> - physical spacing formed by the Cartesian product - of ????. - - - -

- - -

- Mesh Accessors - - ADD: a table listing accessors, explaining the difference - between (physical and total) and (cell and vertex) domains. See - src/Field/Mesh/NoMesh.h. - Also, include spacings and - origin in src/Field/Mesh/UniformRectilinearMesh.h. - Note NoMesh does not provide the latter two. -

- -

- - -

- TMP: What do we do with these …? Remove this - section. - - QUESTION: Do we describe the Patch - specialization for &field; at this place or in some common place? - Follow &array;'s lead. - - QUESTION: Where do we describe CreateLeaf and - MakeFieldReturn in src/Field/FieldCreateLeaf.h and - src/Field/FieldMakeReturn.h. - - QUESTION: What do we do with FieldEnginePatch - in src/Field/FieldEngine/FieldEnginePatch.h. -

- - - - - &engine;s - - From a user's point of view, a container makes data available - for reading and writing. In fact, the container's &engine; stores - the data or, if the data is computed, performs a computation to - yield the data. - - FINISH: Introduce the various types of engines. Add a table - with a short description of each engine type. - - FINISH: First, we specify a generic &engine;'s interface. - Then, we present &engine; specializations. - - - Types of &engine;s - - - - - Engine Type - Engine Tag - Description - - - - - Brick - Brick - Explicitly store all elements in, e.g., a &cc; - array. - - - Compressible - CompressibleBrick - If all values are the same, use constant storage - for that single value. Otherwise, explicitly store all - elements. - - - Constant - ConstantFunction - Returns the same constant value for all - indices. - - - Dynamic - Dynamic - Manages a contiguous, local, one-dimensional, - dynamically resizable block of data. - - - Component Forwarding - CompFwd<EngineTag, - Components> - Returns the specified components from - EngineTag's engine. Components are - pieces of multi-value elements such as vectors - and tensors. - - - Expression - ExpressionTag<Expr> - Returns the value of the specified &pete; - expression. - - - Index Function - IndexFunction<Functor> - Makes the function - Functoraccepting indices mimic an - array. - - - MultiPatch - MultiPatch<LayoutTag,PatchTag> - Support distributed computation using several - processors (???contexts???). LayoutTag - indicates how the entire array is distributed among the - processors. Each processor uses a PatchTag - engine. - - - Remote - Remote<EngineTag> - unknown - - - Remote Dynamic - Remote<Dynamic> - unknown: specialization - - - Stencil - StencilEngine<Function, - Expression> - Returns values computed by applying the - user-specified function to sets of contiguous values in the - given engine or container. Compare with user function - engines. - - - User Function - UserFunctionEngine<UserFunction,Expression> - Returns values computed by applying the - user-specified function to the given engine or container. - QUESTION: Is the following claim correct? For each returned - value, only one value from the engine or container is - used. - - - -

- - QUESTION: Where do we describe views? - - QUESTION: What does NewEngine do? Should it be - described when describing views? Should it be omitted as an - implementation detail? - - QUESTION: Where do we describe &engine; patches found in - src/Engine/EnginePatch.h? - All patch data in a separate chapter or engine-specific pieces in - this chapter? - - QUESTION: What is notifyEngineWrite? - See also src/Engine/NotifyEngineWrite.h. - - QUESTION: What aspect of MultiPatch uses IsValid in - src/Engine/IsValidLocation.h? - - QUESTION: Who uses intersections? Where should this be - described? See src/Engine/Intersector.h, src/Engine/IntersectEngine.h, and - src/Engine/ViewEngine.h. - -

- &engine; Compile-Time Interface - - ADD: a table of template parameters ala &array;. ADD: - compile-time types and values. -

- - -

- Constructors and Destructors - - ADD: a table of constructors and destructors ala - &array;'s. -

- - -

- Element Access - - ADD: a table with read and - operator(). -

- - -

- Accessors - - ADD: a table of accessors. -

- - -

- &engine; Assignments - - similar to &array;'s assignments. shallow copies. ADD: a - table with one entry -

- - -

- Utility Methods - - ADD: a table including - makeOwnCopy. - - QUESTION: What are dataObject, - isShared, and related methods? -

- - -

- Implementation Details - - ADD: this section. Explain that - dataBlock_m and data_m point - to the same place. The latter speeds access, but what is the - purpose of the former? -

- - -

- Brick and BrickView Engines - - ADD: description of what a brick means. ADD: whatever - specializations the class has, e.g., - offset. - - QUESTION: What does DoubleSliceHelper do? -

- - -

- Compressible Brick and BrickView Engines - - ADD this. -

- - -

- Dynamic and DynamicView Engines: - - ADD this. Manages a contiguous, local, resizable, 1D block - of data. -

- - -

- Component Engines - - I believe these implement array component-forwarding. See - src/Engine/ForwardingEngine.h. -

- - -

- Expression Engines - - Should this be described in the &pete; section? Unlikely. - See src/Engine/ExpressionEngine.h. -

- - -

- &engine; Functors - - QUESTION: What is an EngineFunctor? Should it - have its own section? See src/Engine/EngineFunctor.h. -

- - -

- <type>FieldEngine</type>: A Hierarchy of &engine;s - - A &field; consists of a hierarchy of materials and - centerings. These are implemented using a hierarchy of engines. - See src/Field/FieldEngine/FieldEngine.h - and src/Field/FieldEngine/FieldEngine.ExprEngine.h. -

- - - - - &benchmark; Programs - - Explain how to use &benchmark; programs, especially the - options. Explain how to write a &benchmark; program. See also - src/Utilities/Benchmark.h - and src/Utilities/Benchmark.cmpl.cpp. - - - - - - Layouts and Partitions: Distribute Computation Among - Contexts - - QUESTION: What is the difference between - ReplicatedTag and DistributedTag? - - - - - - &pete;: Evaluating Parallel Expressions - -

- UNKNOWN - -

- Leaf Tag Classes - - NotifyPreReadTag indicates a term is about to - be read. Why is this needed? Defined in src/Utilities/NotifyPreRead.h. -

- - - - - - Views - - QUESTION: Should this have its own chapter or be part of a - container chapter? - - Describe View0, View1, …, - View7 and View1Implementation. - - QUESTION: What causes the need for AltView0 and - AltComponentView? - - Be sure to describe ComponentView in the same - place. This is specialized for &array;s in src/Array/Array.h:1323–1382. - -

- <type>ViewIndexer<Dim,Dim2></type> - - Defined in src/Utilities/ViewIndexer.h, this - type translates indices between a domain and a view of it. -

- - - - Threads - - Perhaps include information in src/Engine/DataObject.h. - - &pooma; options include UNFINISHED - - - - - - Utility Types - - TMP: What is a good order? - -

- <type>Options</type>: Varying Run-Time Execution - - Each &pooma; executable has a Options object, - created by Pooma::initialize, storing - run-time configurable values found in argv. - Default options are found in - Options::usage. - - See src/Utilities/Options.h and - src/Utilities/Options.cmpl.cpp. - - Scatter the specific options to other parts of the - manual. -

- -

- Check Correctness: <type>CTAssert</type>, - <type>PAssert</type>, <type>PInsist</type>, - <type>SameType</type> - - Assertions ensure program invariants are obeyed. - CTAssert, checked at compile time, incur no run-time - cost. PAssert and PInsist are checked - to run-time, the latter producing an explanatory message if the - assertion fails. Compiling with NOCTAssert and - NOPTAssert disable these checks. Compiling with just - NOPTAssert disables only the run-time checks. - - SameType ensures, at compile-time, two types - are the same. - - These are implemented in src/Utilities/PAssert.h and - src/Utilities/PAssert.cmpl.cpp. -

- -

- <type>Clock</type>: Measuring a Program's Execution Time - - See src/Utilities/Clock.h. -

- - -

- Smart Pointers: <type>RefCountedPtr</type>, - <type>RefCountedBlockPtr</type>, and - <type>DataBlockPtr</type> - - See src/Utilities/{RefCountedPtr,RefCountedBlockPtr,DataBlockPtr}.h. - src/Utilities/RefCounted.h - helps implement it. DataBlockPtr uses - &smarts;. -

- -

- <type>Inform</type>: Formatted Output for Multi-context - Execution - - See src/Utilities/Inform.h and src/Utilities/Inform.cmpl.cpp. -

- -

- <type>Statistics</type>: Report &pooma; Execution Statistics - - Collect and print execution statistics. Defined in - src/Utilities/Statistics.h. -

- -

- Random Numbers: <type>Unique</type> - - See src/Utilities/Unique.h. -

- - - - - Types for Implementing &pooma; - - TMP: What is a good order? - - Describe types defined to implement &pooma; but that users do - not directly use. This chapter has lower priority than other - chapters since users (hopefully) do not need to know about these - classes. - -

- <type>Tester</type>: Check Implementation Correctness - - &pooma; implementation test programs frequently consist of a - series of operations followed by correctness checks. The - Tester object supports these tests, returning a - boolean whether all the correctness checks yield true. Under - verbose output, messages are printed for each test. See src/Utilities/Tester.h. -

- -

- <type>ElementProperties<T></type>: Properties a Type - Supports - - This traits class permits optimizations in other templated - classes. See src/Utilities/ElementProperties.h. - -

- -

- <type>TypeInfo<T></type>: Print a String Describing - the Type - - Print a string describing the type. Defined in src/Utilities/TypeInfo.h. It is - specialized for other types in other files, e.g., src/Engine/EngineTypeInfo.h and - src/Field/FieldTypeInfo.h. - Is this a compile-time version of RTTI? -

- -

- <type>LoopUtils</type>: Loop Computations at Compile Time - - At compile time, LoopUtils supports copying - between arrays and computing the dot product of arrays. See - src/Utilities/MetaProg.h. -

- -

- <type>ModelElement<T></type>: Wrap a Type - - A wrapper class used to differentiate overloaded functions. - Defined in src/Utilities/ModelElement.h. Used - only by &array; and DynamicArray. -

- -

- <type>WrappedInt<int></type>: Wrap a Number - - A wrapper class used to differentiate overloaded functions - among different integers. Defined in src/Utilities/WrappedInt.h. Is this - class deprecated? Is it even necessary? -

- -

- Supporting Empty Classes - - The NoInit tag class indicates certain - initializations should be skipped. Defined in src/Utilities/NoInit.h. - - FIXME: Should be macro, not function. - POOMA_PURIFY_CONSTRUCTORS generates an empty - constructor, copy constructor, and destructor to avoid &purify; - warnings. Defined in src/Utilities/PurifyConstructors.h. - -

- -

- <type>Pooled<T></type>: Fast Memory Allocation of - Small Blocks - - Pooled<T> speeds allocation and - deallocation of memory blocks for small objects with - type T. Defined in src/Utilities/Pooled.h, it is - implemented in src/Utilities/Pool.h and src/Utilities/Pool.cmpl.cpp. - src/Utilities/StaticPool.h - no longer seems to be used. -

- -

- <type>UninitializedVector<T,Dim></type>: Create - Without Initializing - - This class optimizes creation of an array of objects by - avoiding running the default constructors. Later initialization - can occur, perhaps using a loop that can be unrolled. Defined in - src/Utilities/UninitializedVector.h, - this is used only by DomainTraits. -

- - - - Algorithms for Implementing &pooma; - - In src/Utilities/algorithms.h, - copy, delete_back, and - delete_shiftup provide additional algorithms - using iterators. - - - - - TMP: Where do we describe these files? - - - - src/Utilities/Conform.h: tag for - checking whether terms in expression have conforming - domains - - - - src/Utilities/DerefIterator.h: - DerefIterator<T> and - ConstDerefIterator<T> automatically - dereference themselves to maintain const - correctness. - - - - src/Utilities/Observable.h, - src/Utilities/Observer.h, - and src/Utilities/ObserverEvent.h: - Observable<T>, - SingleObserveable<T>, - Observer<T>, and ObserverEvent - implement the observer pattern. What is the observer pattern? - Where is this used in the code? - - - - - - - - - - Future Development - -

- Particles - - docs/ParticlesDoc.txt has - out-of-date information. - - See Section 3.2.3 of - papers/pooma.ps for an out-of-date - description. - - papers/jvwr.ps concerns mainly - particles. papers/8thSIAMPOOMAParticles.pdf, - by Julian Cummings and Bill Humphrey, concerns parallel particle - simulations. papers/iscope98linac.pdf - describes a particle beam simulation using &pooma;; it mainly - concerns particles. - -

- Particles - - Do we want to include such a section? - - Section 3, "Sample Applications" of - papers/SiamOO98_paper.ps describes porting a - particle program written using High-Performance Fortran to - &pooma; and presumably why particles were added to &pooma;. It - also describes MC++, a Monte Carlo - neutron transport code. - -

- -

- - -

- Composition of &engine;s - - The i,j-th element of the composition - a∘b of two arrays - a and b equals a(b(i,j)). - The composition engine tagged IndirectionTag<Array1, - Array2>, defined in src/Engine/IndirectionEngine.h is - unfinished. -

- - -

- Improving Consistency of Container Interfaces - -

- Relations for &array;s - - Do &array;s currently support relations? If not, why not? - Should they be added? -

- -

- Supporting the Same Number of Dimensions - - &array; and &field; should support the same maximum number - of dimensions. Currently, &array;s support seven dimensions and - &field;s support only three. By definition, &dynamicarray; - supports only one dimension. - - Relations for &array;s. - - External guards for &array;s. -

- -

- - -

- <function>where</function> Proxies - - QUESTION: Do we even discuss this broken - feature? Where is it used? Some related code is in - src/Array/Array.h:2511–2520. -

- - -

- Very Long Term Development Ideas - - Describe how to write a new configuration file. -

- - - - - - Obtaining and Installing &pooma; - - ADD: Write this section, including extensive instructions - for Unix, MS Windows, and MacOS. List the configuration options. - Be sure to describe configuring for parallel execution. - -

- Supporting Distributed Computation - - To use multiple processors with &pooma; requires installing - the &cheetah; messaging library and an underlying messaging library - such as the Message Passing Interface (&mpi;) Communications - Library or the &mm; Shared Memory Library. In this section, we - first describe how to install &mm;. Read the section only if using - &mm;, not &mpi;. Then we describe how to install &cheetah; and - configure &pooma; to use it. - -

- Obtaining and Installing the &mm; Shared Memory Library - - &cheetah;, and thus &pooma;, can use Ralf Engelschall's &mm; - Shared Memory Library to pass messages between processors. For - example, the &author; uses this library on a two-processor - computer running &linux;. The library, available at - http://www.engelschall.com/sw/mm/, is available for free and has - been successfully tested on a variety of Unix platforms. - - We describe how to download and install the &mm; library. - - - Download the library from the &pooma; Download page - available off the &pooma; home page (&poomaHomePage;). - - - Extract the source code using tar xzvf - mm-1.1.3.tar.gz. Move into the resulting source - code directory mm-1.1.3. - - - Prepare to compile the source code by configuring it - using the configure command. To change - the default installation directory /usr/local, specify - --prefix=directory - option. The other configuration options can be listed by - specifying the --help option. Since the - &author; prefers to keep all &pooma;-related code in his - poomasubdirectory, he - uses ./configure - --prefix=${HOME}/pooma/mm-1.1.3. - - - Create the library by issuing the make - command. This compiles the source code using a &c; compiler. To - use a different compiler than the &mm; configuration chooses, set - the CC to the compiler before configuring. - - - Optionally test the library by issuing the make - test command. If successful, the penultimate line - should be OK - ALL TESTS SUCCESSFULLY - PASSED. - - - Install the &mm; Library by issuing the make - install command. This copies the library files to the - installation directory. The mm-1.1.3 directory containing the - source code may now be removed. - - - -

- - -

- Obtaining and Installing the &cheetah; Messaging Library - - The &cheetah; Library decouples communication from - synchronization. Using asynchronous messaging rather than - synchronous messaging permits a message sender to operate without - the cooperation of the message recipient. Thus, implementing - message sending is simpler and processing is more efficiently - overlapped with it. Remote method invocation is also supported. - The library was developed at the Los Alamos National Laboratory's - Advanced Computing Laboratory. - - &cheetah;'s messaging is implemented using an underlying - messaging library such as the Message Passing Interface (&mpi;) - Communications Library (FIXME: xref linkend="mpi99", ) or the &mm; - Shared Memory Library. &mpi; works on a wide variety of platforms - and has achieved widespread usage. &mm; works under Unix on any - computer with shared memory. Both libraries are available for - free. The instructions below work for whichever library you - choose. - - We describe how to download and install &cheetah;. - - - Download the library from the &pooma; Download page - available off the &pooma; home page (&poomaHomePage;). - - - Extract the source code using tar xzvf - cheetah-1.0.tgz. Move into the resulting source code - directory cheetah-1.0. - - - Edit a configuration file corresponding to your operating - system and compiler. These .conf files are located in the - config directory. For - example, to use &gcc; with the &linux; operating system, use - config/LINUXGCC.conf. - - The configuration file usually does not need - modification. However, if you are using &mm;, ensure - shmem_default_dir specifies its location. - For example, the &author; modified the value to - "/home/oldham/pooma/mm-1.1.3". - - - Prepare to compile the source code by configuring it - using the configure command. Specify the - configuration file using the --arch option. - Its argument should be the configuration file's name, omitting - its .conf suffix. For - example, --arch LINUXGCC. Some other - options include - - - --help - - lists all the available options - - - - --shmem --nompi - - indicates use of &mm;, not &mpi; - - - - --mpi --noshmem - - indicates use of &mpi;, not &mm; - - - - --opt - - causes the compiler to produce optimized source code - - - - --noex - - prevents use of &cc; exceptions - - - - --prefix directory - - specifies the installation directory where the - library will be copied rather than the default. - - - - For example, the &author; uses ./configure --arch - LINUXGCC --shmem --nompi --noex --prefix - ${HOME}/pooma/cheetah-1.0 --opt. The - --arch LINUXGCC indicates use of &gcc; - under a &linux; operating system. The &mm; library is used, - but &cc; exceptions are not. The latter choice matches - &pooma;'s default choice. The library will be installed in - the ${HOME}/pooma/cheetah-1.0. - Finally, the library code will be optimized, hopefully running - faster than unoptimized code. - - - Follow the directions printed by - configure: Change directories to the - lib subdirectory named - by the --arch argument and then type - make to compile the source code and create - the library. - - - Optionally ensure the library works correctly by issuing - the make tests command. - - - Install the library by issuing the make - install command. This copies the library files to - the installation directory. The cheetah-1.0 directory containing - the source code may now be removed. - - - -

- -

- Configuring &pooma; When Using &cheetah; - - To use &pooma; with &cheetah;, one must tell &pooma; the - location of the &cheetah; library using the - --messaging configuration option. To do this, - - - Set the &cheetah; directory environment variable - CHEETAHDIR to the directory containing the - installed &cheetah; library. For - example, declare -x - CHEETAHDIR=${HOME}/pooma/cheetah-1.0 specifies the - installation directory used in the previous section. - - - When configuring &pooma;, specify the - --messaging option. For example, - ./configure --arch LINUXgcc --opt - --messaging configures for &linux;, &gcc;, and an - optimized library using &cheetah;. - - - -

- - -

- - - - - Dealing with Compilation Errors - - Base this low-priority section on errors.html. QUESTION: Where is - errors.html? - - - - - - TMP: Notes to Myself - -

- Miscellaneous - - - - QUESTION: How do I know when to use a type name versus just - the concept? For example, when do I use array - versus &array;? - - - - Krylov solvers are described in Section 3.5.2 of - papers/pooma.ps. - - - - Section 5, "The Polygon Overlay Problem," describes - porting an ANSI &c; program to &pooma;. - - - - A good example book: STL Tutorial and Reference - Guide: &cc; Programming with the Standard Template - Library, second edition, by David R. Musser, - Gillmer J. Derge, and Atul Sanai, ISBN 0-201-37923-6, - QA76.73.C153.M87 2001. - - - - One STL reference book listed functions in margin notes, - easing finding material. Do this. - - - - QUESTION: Does Berna Massingill at Trinity University have - any interest ior access to any parallel computers? - - - -

- - -

- Existing HTML Tutorials - - All these tutorials are out-of-date, but the ideas and text - may still be relevant. - - - index.html - list of all tutorials. No useful - material. - - introduction.html - data-parallel Laplace solver using Jacobi - iteration ala Doof2d - - background.html - short, indirect introduction to &pete;; parallel - execution model; &cc;; templates; &stl;; expression - templates - - tut-01.html - UNFINISHED - - Layout.html - UNFINISHED - - parallelism.html - UNFINISHED - - self-test.html - UNFINISHED - - threading.html - UNFINISHED - - tut-03.html - UNFINISHED - - tut-04.html - UNFINISHED - - tut-05.html - UNFINISHED - - tut-06.html - UNFINISHED - - tut-07.html - UNFINISHED - - tut-08.html - UNFINISHED - - tut-09.html - UNFINISHED - - tut-10.html - UNFINISHED - - tut-11.html - UNFINISHED - - tut-12.html - UNFINISHED - - tut-13.html - UNFINISHED - - - -

- - - - - - - - Bibliography - - FIXME: How do I process these entries? - - - mpi99 - - - WilliamGropp - - - EwingLusk - - - AnthonySkjellum - - - - 1999 - Massachusetts Institute of Technology - - 0-262-57132-3 - - The MIT Press -

Cambridge, MA

- - Using MPI - Portable Parallel Programming with the Message-Passing Interface - second edition - - - - - - - - Glossary - - ADD: Make sure all entries are indexed and perhaps point back - to their first use. WARNING: This is constructed by hand so it is - likely to be full of inconsistencies and errors. - - - S - - - Suite Name - - An arbitrary string denoting a particular toolkit - configuration. For example, the string - SUNKCC-debug might indicate a configuration for - the Sun Solaris - operating system and the &kcc; &cc; compiler with debugging - support. By default, the suite name it is equal to the - configuration's architecture name. - - - - - - - - - &genindex.sgm; - - --- 0 ---- Index: tutorial.xml =================================================================== RCS file: tutorial.xml diff -N tutorial.xml *** /dev/null Fri Mar 23 21:37:44 2001 --- tutorial.xml Tue Dec 11 13:31:10 2001 *************** *** 0 **** --- 1,1051 ---- + + A Tutorial Introduction + + UPDATE: In the following paragraph, fix the cross-reference + to the actual section. + + &pooma; provides different containers and processor + configurations and supports different implementation styles, as + described in . In this + chapter, we present several different implementations of the + &doof2d; two-dimensional diffusion simulation program: + + + a C-style implementation omitting any use of &pooma; + computing each array element individually, + + + a &pooma; &array; implementation computing each array + element individually, + + + a &pooma; &array; implementation using data-parallel + statements, + + + a &pooma; &array; implementation using stencils, which + support local computations, + + + a stencil-based &pooma; &array; implementation supporting + computation on multiple processors + + + a &pooma; &field; implementation using data-parallel + statements, and + + + a data-parallel &pooma; &field; implementation for + multi-processor execution. + + + + These illustrate the &array;, &field;, &engine;, layout, + mesh, and domain data types. They also illustrate various + immediate computation styles (element-wise accesses, data-parallel + expressions, and stencil computation) and various processor + configurations (one sequential processor and multiple + processors). + +

+ Installing &pooma; + + ADD: How does one install &pooma; using Windows or Mac? + + UPDATE: Make a more recent &pooma; source code file + available on &poomaDownloadPage;. For example, + LINUXgcc.conf is not available. + + In this section, we describe how to obtain, build, and + install the &poomaToolkit;. We focus on installing under the + Unix operating system. Instructions for installing on computers + running Microsoft Windows or MacOS, as well as more extensive + instructions for Unix, appear in . + + Obtain the &pooma; source code &poomaSourceFile; + from the &pooma; download page (&poomaDownloadPage;) available off + the &pooma; home page (&poomaHomePage;). The tgz + indicates this is a compressed tar archive file. To extract the + source files, use tar xzvf &poomaSourceFile;. + Move into the source code directory &poomaSource; directory; e.g., + cd &poomaSource;. + + Configuring the source code prepares the necessary paths for + compilation. First, determine a configuration file in + corresponding to your operating system and compiler in the + config/arch/ directory. + For example, LINUXgcc.conf supports compiling + under a &linux; operating system with &gcc; and SGI64KCC.conf supports compiling + under a 64-bit SGI Unix operating + system with &kcc;. Then, configure the source code: + ./configure &dashdash;arch LINUXgcc &dashdash;opt &dashdash;suite + LINUXgcc-opt. The architecture argument to the + &dashdash;arch option is the name of the corresponding + configuration file, omitting its .conf suffix. The + &dashdash;opt indicates the &poomaToolkit; will + contain optimized source code, which makes the code run more + quickly but may impede debugging. Alternatively, the + &dashdash;debug option supports debugging. The + suite name + can be any arbitrary string. We chose + LINUXgcc-opt to remind us of the architecture + and optimization choice. configure creates subdirectories + named by the suite name LINUXgcc-opt for use when + compiling the source files. Comments at the beginning of + lib/suiteName/PoomaConfiguration.h + record the configuration arguments. + + To compile the source code, set the + POOMASUITE environment variable to the suite name + and then type make. To set the environment + variable for the bash shell use + export + POOMASUITE=suiteName, + substituting the suite name's + suiteName. For the + csh shell, use setenv + POOMASUITE LINUXgcc-opt. Issuing the + make command compiles the &pooma; source code + files to create the &pooma; library. The &pooma; makefiles assume + the GNU &make; so substitute the proper + command if necessary. The &pooma; library can be found in, e.g., + lib/LINUXgcc-opt/libpooma-gcc.a. +

+ +

+ + +

+ Element-wise &array; Implementation + + The simplest way to use the &poomaToolkit; is to + use the &pooma; &array; class instead of &c; arrays. &array;s + automatically handle memory allocation and deallocation, support a + wider variety of assignments, and can be used in expressions. + + implements &doof2d; using &array;s and element-wise accesses. + Since the same algorithm is used as , we will concentrate + on the differences. + + + Element-wise &array; Implementation of &doof2d; + &doof2d-array-element; + + + To use &pooma; &array;s, the Pooma/Arrays.h must be included. + + + The &poomaToolkit; structures must be constructed before + their use. + + + Before creating an &array;, its domain must be specified. + The N interval represents the + one-dimensional integral set {0, 1, 2, …, n-1}. An + Interval<2> object represents the entire + two-dimensional index domain. + + + An &array;'s template parameters indicate its dimension, + its value type, and how the values will be stored or computed. + The &brick; &engine; type indicates values will be directly + stored. It is responsible for allocating and deallocating + storage so new and + delete statements are not necessary. + The vertDomain specifies the array index + domain. + + + The first statement initializes all &array; values to the + same scalar value. This is possible because each &array; + knows its domain. The second statement + illustrates &array; element access. Indices, separated by + commas, are surrounded by parentheses rather than surrounded by + square brackets ([]). + + + &array; element access uses parentheses, rather than + square brackets + + + Since &array;s are first-class objects, they + automatically deallocate any memory they require, eliminating + memory leaks. + + + The &poomaToolkit; structures must be destructed after + their use. + + + + + We describe the use of &array; and the &poomaToolkit; in + . + &array;s, declared in the Pooma/Arrays.h, are first-class + objects. They know their index domain, can be used + in expressions, can be assigned scalar and array values, and + handle their own memory allocation and deallocation. + + The creation of the a and + b &array;s requires an object specifying their + index domains. Since these are two-dimensional arrays, their + index domains are also two dimensional. The two-dimensional + Interval<2> object is the Cartesian product of + two one-dimensional Interval<1> objects, each + specifying the integral set {0, 1, 2, …, n-1}. + + An &array;'s template parameters indicate its dimension, the + type of its values, and how the values are stored. Both + a and b are two-dimension + arrays storing &double;s so their dimension + is 2 and its element type is &double;. An &engine; stores an + &array;'s values. For example, a &brick; &engine; explicitly + stores all values. A &compressiblebrick; &engine; also explicitly + stores values if more than value is present, but, if all values + are the same, storage for just that value is required. Since an + engine can store its values any way it desires, it might instead + compute its values using a function or compute the values stored + in separate engines. In practice, most explicitly specified + &engine;s are either &brick; or &compressiblebrick;. + + &array;s support both element-wise access and scalar + assignment. Element-wise access uses parentheses, not square + brackets. For example, b(n/2,n/2) + specifies the central element. The scalar assignment b + = 0.0 assigns the same 0.0 value to all array + elements. This is possible because the array knows the extent of + its domain. + + Any program using the &poomaToolkit; must initialize the + toolkit's data structures using + Pooma::initialize(argc,argv). This + extracts &pooma;-specific command-line options from the + command-line arguments in argv and initializes + the inter-processor communication and other data structures. When + finished, Pooma::finalize() ensures all + computation has finished and the communication and other data + structures are destructed. +

+ + +

+ Data-Parallel &array; Implementation + + &pooma; supports data-parallel &array; accesses. Many + algorithms are more easily expressed using data-parallel + expressions. Also, the &poomaToolkit; might be able to reorder + the data-parallel computations to be more efficient or distribute + them among various processors. In this section, we concentrate + the differences between the data-parallel implementation of + &doof2d; listed in and the + element-wise implementation listed in the previous section . + + + Data-Parallel &array; Implementation of &doof2d; + &doof2d-array-parallel; + + + &pooma; may reorder computation of statements. Calling + Pooma::blockAndEvaluate ensures all + computation finishes before accessing a particular array + element. + + + These variables specify one-dimensional domains {1, 2, + …, n-2}. Their Cartesian product specifies the domain + of the array values that are modified. + + + Data-parallel expressions replace nested loops and array + element accesses. For example, a(I,J) + represents the subset of the a array having + a domain equal to the Cartesian product of I + and J. Intervals can shifted by an additive + or multiplicative constant. + + + + + Data-parallel expressions apply domain objects to containers + to indicate a set of parallel expressions. For example, in the + program listed above, a(I,J) specifies all + of a array excepting the outermost elements. + The array's vertDomain domain consists of the + Cartesian product of {0, 1, 2, …, n-1} and itself, while + I and J each specify {1, 2, + …, n-2}. Thus, a(I,J) is the subset + with a domain of the Cartesian product of {1, 2, …, n-2} + and itself. It is called a view of an + array. It is itself an array, with a domain and supporting + element access, but its storage is the same as + a's. Changing a value in + a(I,J) also changes the same value in + a. Changing a value in the latter also changes + the former if the value is not one of a's + outermost elements. The expression + b(I+1,J+1) indicates the subset of + b with a domain consisting of the Cartesian + product of {2, 3, …, n-1}, i.e., the same domain as + a(I,J) but shifted up one unit and to the + right one unit. Only an &interval;'s value, not its name, is + important. Thus, all uses of J in this program + could be replaced by I without changing the + semantics. + +

+ Adding &array;s + + + + + + Adding two arrays with different domains. + + + When adding arrays, values in corresponding positions are + added even if they have different indices, indicated by the + small numbers adjacent to the arrays. + + + + + The statement assigning to a(I,J) + illustrates that &array;s may participate in expressions. Each + addend is a view of an array, which is itself an array. Each view + has the same domain size so their sum can be formed by + corresponding elements of each array. For example, the lower, + left element of the result equals the sum of the lower, left + elements of the addend arrays. For the computation, indices are + ignored; only the relative positions within each domain are used. + + illustrates adding two arrays with different domain indices. The + indices are indicated by the small numbers to the left and the + bottom of the arrays. Even though 9 and 3 have different indices + (1,1) and (2,0), they are added to each other because they have + the same relative positions within the addends. + + Just before accessing individual &array; values, the code + contains calls to Pooma::blockAndEvaluate. + &pooma; may reorder computation or distribute them among various + processors. Before reading an individual &array; value, calling + the function ensures all computations affecting its value have + finished, i.e., it has the correct value. Calling this function + is necessary only when accessing individual array elements because + &pooma; cannot determine when to call the function itself. For + example, before printing an array, &pooma; will call + blockAndEvaluate itself. +

+ + +

+ Distributed &array; Implementation + + A &pooma; program can execute on one or multiple processors. + To convert a program designed for uniprocessor execution to a + program designed for multiprocessor execution, the programmer need + only specify how each container's domain should be split into + patches. The &poomaToolkit; automatically + distributes the data among the available processors and handles + any required communication among processors. + + + Distributed Stencil &array; Implementation of &doof2d; + &doof2d-array-distributed; + + + Multiple copies of a distributed program may + simultaneously run, perhaps each having its own input and + output. Thus, we use command-line arguments to pass input to + the program. Using an &inform; object ensures only one program + produces output. + + + The UniformGridPartition declaration + specifies how an array's domain will be partition, of split, + into patches. Guard layers are an optimization that can reduce + data communication between patches. The + UniformGridLayout declaration applies the + partition to the given domain, distributing the resulting + patches among various processors. + + + The MultiPatch &engine; distributes requests + for &array; values to the associated patch. Since a patch may + associated with a different processor, its + remote engine has type + Remote<Brick>. &pooma; automatically + distributes the patches among available memories and + processors. + + + The stencil computation, whether for one processor or + multiple processors, is the same. + + + + + Supporting distributed computation requires only minor code + changes. These changes specify how each container's domain is + distributed among the available processors and how input and + output occurs. The rest of the program, including all the + computations, remains the same. When running, the &pooma; + executable interacts with the run-time library to determine which + processors are available, distributes the containers' domains, and + automatically handles all necessary interprocessor communication. + The same executable runs on one or many processors. Thus, the + programmer can write one program, debugging it on a uniprocessor + computer and running it on a supercomputer. + +

+ The &pooma; Distributed Computation Model + + + + + + the &pooma; distributed computation model. + + + The &pooma; distributed computation model combines + partitioning containers' domains and the computer configuration + to create a layout. + + + + + &pooma;'s distributed computing model separates container + domain concepts from computer configuration concepts. See . + The program indicates how each container's domain will be + partitioned. This process is represented in the upper left corner + of the figure. A user-specified partition + specifies how to split the domain into pieces. For example, the + illustrated partition splits the domain into three equal-sized + pieces along the x-dimension and two equal-sized pieces along the + y-dimension. Thus, the domain is split into + patches. The partition also specifies + external and internal guard layers. A guard + layer is a domain surrounding a patch. A patch's + computation only reads but does not write these guarded values. + An external guard layer conceptually + surrounds the entire container domain with boundary values whose + presence permits all domain computations to be performed the same + way even for values along the domain's edge. An + internal guard layer duplicates values from + adjacent patches so communication need not occur during a patch's + computation. The use of guard layers is an optimization; using + external guard layers eases programming and using internal guard + layers reduces communication among processors. Their use is not + required. + + The computer configuration of shared memory and processors + is determined by the run-time system. See the upper right portion + of . + A context is a collection of shared memory + and processors that can execute a program or a portion of a + program. For example, a two-processor desktop computer might have + memory accessible to both processors so it is a context. A + supercomputer consisting of desktop computers networked together + might have as many contexts as computers. The run-time system, + e.g., the Message Passing Interface (&mpi;) Communications Library + (FIXME: xref linkend="mpi99", ) or the &mm; + Shared Memory Library (), communicates + the available contexts to the executable. &pooma; must be + configured for the particular run-time system. See . + + A layout combines patches with + contexts so the program can be executed. If &distributedtag; is + specified, the patches are distributed among the available + contexts. If &replicatedtag; is specified, each set of patches is + replicated among each context. Regardless, the containers' + domains are now distributed among the contexts so the program can + run. When a patch needs data from another patch, the &pooma; + toolkit sends messages to the desired patch uses a message-passing + library. All such communication is automatically performed by the + toolkit with no need for programmer or user input. + + FIXME: The two previous paragraphs demonstrate confusion + between run-time system and message-passing + library. + + Incorporating &pooma;'s distributed computation model into a + program requires writing very few lines of code. illustrates + this. The partition declaration creates a + UniformGridPartition splitting each dimension of a + container's domain into equally-sized + nuProcessors pieces. The first + GuardLayers argument specifies each patch will have + copy of adjacent patches' outermost values. This may speed + computation because a patch need not synchronize its computation + with other patches' processors. Since each value's computation + requires knowing its surrounding neighbors, the internal guard + layer is one layer deep. The second GuardLayers + argument specifies no external guard layer. External guard layers + simplify computing values along the edges of domains. Since the + program already uses only the interior domain for computation, we + do not use this feature. + + The layout declaration creates a + UniformGridLayout layout. As illustrates, + it needs to know a container's domain, a partition, the computer's + contexts, and a &distributedtag; or &replicatedtag;. These + comprise layout's three parameters; the + contexts are implicitly supplied by the run-time system. + + To create a distributed &array;, it should be created using + a &layout; object and have a &multipatch; engine. Prior + implementations designed for uniprocessors constructed the + container using a &domain; object. A distributed implementation + uses a &layout; object, which conceptually specifies a &domain; + object and its distribution throughout the computer. A + &multipatch; engine supports computations using multiple patches. + The UniformTag indicates the patches all have the + same size. Since patches may reside on different contexts, the + second template parameter is Remote. Its + Brick template parameter specifies the engine for a + particular patch on a particular context. Most distributed + programs use MultiPatch<UniformTag, Remote<Brick> + > or MultiPatch<UniformTag, + Remote<CompressibleBrick> > engines. + + The computations for a distributed implementation are + exactly the same as for a sequential implementation. The &pooma; + Toolkit and a message-passing library automatically perform all + computation. + + Input and output for distributed programs is different than + for sequential programs. Although the same instructions run on + each context, each context may have its own input and output + streams. To avoid dealing with multiple input streams, we pass + the input via command-line arguments, which are replicated for + each context. Using &inform; streams avoids having multiple + output streams print. Any context can print to an &inform; stream + but only text sent to context 0 is sent. At the beginning of + the program, we create an &inform; object. Throughout the rest of + the program, we use it instead of std::cout and + std::cerr. + + The command to run the program is dependent on the run-time + system. To use &mpi; with the Irix 6.5 operating system, one + can use the mpirun command. For example, + mpirun -np 4 Doof2d-Array-distributed -mpi 2 10 + 1000 invokes the &mpi; run-time system with four + processors. The -mpi option tells the + &pooma; executable Doof2d-Array-distributed to + use the &mpi; Library. The remaining arguments specify the number + of processors, the number of averagings, and the array size. The + first and last values are used for each dimension. For example, + if three processors are specified, then the x-dimension will have + three processors and the y-dimension will have three processors, + totalling nine processors. The command + Doof2d-Array-distributed -shmem -np 4 2 10 + 1000 uses the &mm; Shared Memory Library + (-shmem) and four processors. As for + &mpi;, the remaining command-line arguments are specified on a + per-dimension basis for the two-dimensional program. +

+ + +

+ Data-Parallel &field; Implementation + + &pooma; &array;s support many scientific computations, but + many scientific computations require values distributed throughout + space, and &array;s have no spatial extent. &pooma; &field;s, + supporting a superset of &array; functionality, model values + distributed throughout space. + + A &field; consists of a set of cells distributed through + space. Like an &array; cell, each &field; cell is addressed via + indices. Unlike an &array; cell, each &field; cell can hold + multiple values. Like &array;s, &field;s can be accessed via + data-parallel expressions and stencils and may be distributed + across processors. Unlike &array; cells, &field; cells exist in a + multi-dimensional volume so, e.g., distances between cells and + normals to cells can be computed. + + In this section, we implement the &doof2d; two-dimensional + diffusion simulation program using &field;s. This simulation does + not require any &field;-specific features, but we chose to present + this program rather than one using &field;-specific features to + permit comparisons with the &array; versions, especially . + + + Data-Parallel &field; Implementation of &doof2d; + &doof2d-field-parallel; + + + To use &field;s, the Pooma/Fields.h must be + included. + + + These statements specify the spacing and number of + &field; values. First, a layout is explicitly. Then, a mesh, + which specifies the spacing between cells, is created. The + &field;'s centering specifies one cell-centered value per + cell. + + + &field;'s first template parameter specifies the type of + mesh to use. The other template parameters are similar to + &array;'s. The constructor arguments specify the &field;'s + centering, its domain of cells, and a mesh specifying the + cells' spatial arrangement. + + + The computation for &field;s is the same as for &array;s + because this example does not use any &field;-specific + features. + + + + + As mentioned above, the fundamental difference between + &array;s and &field;s is the latter has cells and meshes. The + &field; declarations reflect this. To declare a &field;, the + Pooma/Fields.h header file + must be included. A &field;'s domain consists of a set of cells, + sometimes called positions when referring to &array;s. As for + &array;s, a &field;'s domain and its layout must be specified. + Since the above program is designed for uniprocessor computation, + specifying the domain specifies the layout. A &field;'s + mesh specifies its spatial extent. For + example, one can ask the mesh for the distance between two cells + or for the normals to a particular cell. Cells in a + UniformRectilinearMesh all have the same size and are + parallelepipeds. To create the mesh, one specifies the layout, + the location of the spatial point corresponding to the lower, left + domain location, and the size of a particular cell. Since this + program does not use mesh computations, our choices do not much + matter. We specify the domain's lower, left corner is at spatial + location (0.0, 0.0) and each cell's width and height is 1. + Thus, the middle of the cell at domain position (3,4) is (3.5, + 4.5). + + A &field; cell can contain one or more values although each + cell must have the same arrangement. For this simulation, we + desire one value per cell so we place that position at the cell's + center, i.e., a cell centering. The + canonicalCentering function returns such a + centering. We defer discussion of the latter two arguments to + . + + A &field; declaration is analogous to an &array; declaration + but must also specify a centering and a mesh. In , the &array; + declaration specifies the array's dimensionality, the value type, + the engine type, and a layout. &field; declarations specify the + same values. Its first template parameter specifies the mesh's + type, which includes an indication of its dimensionality. The + second and third template parameters specify the value type and + the engine type. Since a &field; has a centering and a mesh in + addition to a layout, those arguments are also necessary. + + &field; operations are a superset of &array; operations so + the &doof2d; computations are the same as for . &field; + accesses require parentheses, not square brackets, and accesses to + particular values should be preceded by calls to + Pooma::blockAndEvaluate. + + To summarize, &field;s support multiple values per cell and + have spatial extent. Thus, their declarations must specify a + centering and a mesh. Otherwise, a &field; program is similar to + one with &array;s. +

+ + +

+ Distributed &field; Implementation + + A &pooma; program using &field;s can execute on one or more + processors. In , we demonstrated how + to modify a uniprocessor stencil &array; implementation to run on + multiple processors. In this section, we demonstrate that the + uniprocessor data-parallel &field; implementation of the previous + section can be converted. Only the container declarations change; + the computations do not. Since the changes are exactly analogous + to those in , + our exposition here will be shorter. + + + Distributed Data-Parallel &field; Implementation of &doof2d; + &doof2d-field-distributed; + + + Multiple copies of a distributed program may + simultaneously run, perhaps each having its own input and + output. Thus, we use command-line arguments to pass input to + the program. Using an &inform; stream ensures only one program + produces output. + + + The UniformGridPartition declaration + specifies how an array's domain will be partition, of split, + into patches. Guard layers are an optimization that can reduce + data communication between patches. The + UniformGridLayout declaration applies the + partition to the given domain, distributing the resulting + patches among various processors. + + + The mesh and centering declarations are the same for + uniprocessor and multi-processor implementations. + + + The MultiPatch &engine; distributes requests + for &array; values to the associated patch. Since a patch may + associated with a different processor, its + remote engine has type + Remote<Brick>. &pooma; automatically + distributes the patches among available memories and + processors. + + + + + This program can be viewed as the combination of and the changes + to form the distributed stencil-based &array; program from the + uniprocessor stencil-based &array; program. + + + + Distributed programs may have multiple processes, each + with its own input and output streams. To pass input to these + processes, this programs uses command-line arguments, which are + replicated for each process. An &inform; stream accepts data + from any context but prints only data from + context 0. + + + A layout for a distributed program specifies a domain, a + partition, and a context mapper. A &distributedtag; context + mapper tag indicates that pieces of the domain should be + distributed among patches, while a &replicatedtag; context + mapper tag indicates the entire domain should be replicated to + each patch. + + + A &multipatch; engine supports the use of multiple + patches, while a remote engine supports + computation distributed among various contexts. Both are + usually necessary for distributed computation. + + + The computation for uniprocessor or distributed + implementations remains the same. The &pooma; toolkit + automatically handles all communication necessary to ensure + up-to-date values are available when needed. + + + The command to invoke a distributed program is + system-dependent. For example, the mpirun -np 4 + Doof2d-Field-distributed -mpi 2 10 1000 command + might use &mpi; communication. + Doof2d-Field-distributed -shmem -np 4 2 10 + 1000 might use the &mm; Shared Memory Library. + + + +

+ + Index: figures/concepts.mp =================================================================== RCS file: concepts.mp diff -N concepts.mp *** /dev/null Fri Mar 23 21:37:44 2001 --- concepts.mp Tue Dec 11 13:31:10 2001 *************** *** 0 **** --- 1,207 ---- + %% Oldham, Jeffrey D. + %% 2001Dec04 + %% Pooma + + %% Illustrations for Pooma Concepts + + %% Assumes TEX=latex. + + input boxes; + + verbatimtex + \documentclass[10pt]{article} + \input{macros.ltx} + \begin{document} + etex + + + %% Container Declaration Concepts and Dependences + beginfig(111) + numeric unit; unit = 0.9cm; + numeric horizSpace; horizSpace = unit; + numeric vertSpace; vertSpace = unit; + + % Draw a line between two boxes. + vardef drawLine(expr start, stop) = + draw b[start].c -- b[stop].c cutbefore bpath b[start] cutafter bpath b[stop]; + enddef; + + % Create boxes for the concepts. + boxit.b0(btex \type{TinyMatrix} etex); + boxit.b1(btex \type{Vector} etex); + boxit.b2(btex \type{DynamicArray} etex); + boxit.b3(btex \type{Array} etex); + boxit.b4(btex \type{Field} etex); + boxit.b5(btex engine etex); + boxit.b6(btex mesh etex); + boxit.b7(btex centering etex); + boxit.b8(btex layout etex); + boxit.b9(btex corner position etex); + boxit.b10(btex cell size etex); + boxit.b11(btex domain etex); + boxit.b12(btex partition etex); + boxit.b13(btex context mapper tag etex); + boxit.b14(btex guard layer etex); + boxit.b15(btex empty etex); + boxit.b16(btex distributed only etex); % dashed box surrounding distributed computation only elements + boxit.b17(btex \type{Tensor} etex); + boxit.b18(btex view box etex); + for t = 0 upto 15: + fixsize(b[t]); + endfor + fixsize(b17); + + % Position the boxes. + b0.c = origin; + % horizontal positioning + for u = (0,3), (8,9), (11,12): + for t = xpart(u) upto ypart(u): + b[t+1].w - b[t].e = (horizSpace,0); + endfor + endfor + for u = (17,0), (5,15), (15,6), (6,7): + b[ypart(u)].w - b[xpart(u)].e = (horizSpace,0); + endfor + % vertical positioning + for u = (2,5), (6,9), (8,12), (12,14): + b[xpart(u)].s - b[ypart(u)].n = (0,vertSpace); + endfor + % distributed (dashed) box + b16.sw = b14.sw; + b16.ne = b13.ne; + % container view (dashed) box + b18.nw = b2.nw; + b18.se = b4.se; + + % Draw the boxes. + for t = 0 upto 18: + if unknown(b[t].c): + show t; + show b[t].c; + fi + endfor + for t = 0 upto 14: + drawunboxed(b[t]); + endfor + drawunboxed(b17); + + % Draw the dependences. + for t = 0 upto 4: % lines to engine + drawLine(t,5); + endfor + drawLine(17,5); + for t = 2 upto 3: % lines to layout + drawLine(t,8); + endfor + for t = 6 upto 7: % lines from field + drawLine(4,t); + endfor + for t = 8 upto 10: % lines from mesh + drawLine(6,t); + endfor + for t = 11 upto 13: % lines from layout + drawLine(8,t); + endfor + drawLine(12,14); % partition - guard layer + + % Draw the dashed box around the distributed dependences. + draw bpath b16 dashed evenly; + label.llft(btex \textsl{multiprocessor computation only} etex, b16.se); + + % Draw the dashed box around the container view box. + draw bpath b18 dashed evenly; + label.ulft(btex \textsl{support views} etex, b18.ne); + endfig; + + + %% Comparisons Between Mathematical Concept And Computational Implementation of Arrays and Fields + beginfig(101) + numeric unit; unit = 0.9cm; + numeric vertSpace; vertSpace = 2.6unit; % vertical space between sections + numeric horizSpace; horizSpace = 8unit; % horizontal space between sections + + % Create and layout boxes for computational Array and Field implementations. + numeric interBoxSpace; interBoxSpace = unit; % space between boxes + numeric arrowAngle; arrowAngle = 20; % angle for arrow leaving index + path upperArrow[]; + for t = 0 upto 1: + boxit.ia[t](btex index etex); + boxit.la[t](btex layout etex); + boxit.pa[t](btex \begin{tabular}{c} processors \\ memory \end{tabular} etex); + boxit.ea[t](btex engine etex); + boxit.va[t](btex value etex); + + va[t].w - ea[t].e = (interBoxSpace,0); + ea[t].c - ia[t].c = 4(va[t].c - ea[t].c); + fixsize(ia[t],ea[t],va[t]); + endfor + + % Create and layout text boxes. + boxit.l1(btex \strut mathematical concept etex); + boxit.l2(btex \strut computational implementation etex); + boxit.l3(btex \strut \type{Array}: etex); + boxit.l4(btex \strut $\mbox{index} \mapsto \mbox{value}$ etex); + boxit.l6(btex \strut \type{Field}: etex); + boxit.l7(btex \strut $\mbox{index} \mapsto \mbox{value}$ etex); + boxit.l9(btex \strut \type{Field}: etex); + boxit.l10(btex \strut $\mbox{indices} \mapsto \mbox{geometric value}$ etex); + fixsize(l1,l2,l3,l4,l6,l7,l9,l10); + + ypart(l1.c - l2.c) = 0; + xpart(l2.c - 0.5[ia[0].w,va[0].e]) = 0; + l1.w - l3.w = l4.w - l7.w = (0,vertSpace); + l4.w - l3.e = l7.nw - l6.ne = (0,0); + for t = 0 upto 1: + xpart(ia[t].w - l[3+3t].e) = 0.65horizSpace; + ypart(ia[t].w - l[3+3t].c) = 0; + endfor + xpart(l10.w - l7.w) = 0; + ypart(l10.w - ia[2].w) = 0; + ypart(l9.w - l10.w) = 0; + xpart(l9.w - l6.w) = 0; + + % Create and layout the mesh boxes. + boxit.ia[2](btex indices etex); + boxit.ea[2](btex mesh etex); + boxit.va[2](btex geometric value etex); + fixsize(ia[2],ea[2],va[2]); + ia[1].w - ia[2].w = 0.6(ia[0].w - ia[1].w); + ypart(va[2].w - ea[2].e) = ypart(ea[2].w - ia[2].e) = 0; + xpart(va[2].e - va[1].e) = 0; + xpart(ea[2].c - 0.5[ia[2].c,va[2].c]) = 0; + + % Finish boxes on arrow for computational Array and Field implementations. + for t = 0 upto 1: + fixpos(ia[t],ea[t],va[t]); + upperArrow[t] = ia[t].c{dir arrowAngle} .. ea[t].c cutbefore bpath ia[t] cutafter bpath ea[t]; + la[t].c = point 1/3 of upperArrow[t]; + pa[t].c = point 2/3 of upperArrow[t]; + endfor + + %% Draw the boxes. + % Draw the computational Array and Field implementations. + for t = 0 upto 1: + % fixsize(ia,la,pa,ea,va); fixpos(ia,la,pa,ea,va); + draw ia[t].c{dir -arrowAngle} .. ea[t].c cutbefore bpath ia[t] cutafter bpath ea[t]; + drawboxed(la[t],ea[t]); + drawunboxed(ia[t],pa[t],va[t]); + draw (subpath (0,1/3) of upperArrow[t]) cutafter bpath la[t]; + drawarrow (subpath (1/3,2/3) of upperArrow[t]) cutbefore bpath la[t] cutafter bpath pa[t]; + draw (subpath (2/3,1) of upperArrow[t]) cutbefore bpath pa[t] cutafter bpath ea[t]; + drawarrow ea[t].e -- va[t].w; + endfor + + % Draw the mesh boxes. + drawunboxed(ia[2],va[2]); + drawboxed(ea[2]); + draw ia[2].e -- ea[2].w cutbefore bpath ia[2] cutafter bpath ea[2]; + drawarrow ea[2].e -- va[2].w cutbefore bpath ea[2] cutafter bpath va[2]; + + % Draw the text boxes. + for t = 1,2,3,4,6,7,9,10: + drawunboxed(l[t]); + endfor; + + endfig; + + bye Index: figures/distributed.mp =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/figures/distributed.mp,v retrieving revision 1.1 diff -c -p -r1.1 distributed.mp *** figures/distributed.mp 2001/12/04 00:07:00 1.1 --- figures/distributed.mp 2001/12/11 20:31:10 *************** input boxes; *** 10,21 **** verbatimtex \documentclass[10pt]{article} \begin{document} etex %% Parts of Distributed Computation beginfig(101) ! numeric unit; unit = 0.9cm; %% Create the Container Storage Partition subfigure. numeric arrayWidth; arrayWidth = 2; % as multiple of unit --- 10,22 ---- verbatimtex \documentclass[10pt]{article} + \input{macros.ltx} \begin{document} etex %% Parts of Distributed Computation beginfig(101) ! numeric unit; unit = 0.8cm; %% Create the Container Storage Partition subfigure. numeric arrayWidth; arrayWidth = 2; % as multiple of unit *************** beginfig(101) *** 189,195 **** %% Draw the subfigure relations structures. drawunboxed(figurePlus,figureArrow); ! label.rt(btex DistributedTag etex, figureArrow.e); endfig; bye --- 190,196 ---- %% Draw the subfigure relations structures. drawunboxed(figurePlus,figureArrow); ! label.rt(btex \type{DistributedTag} etex, figureArrow.e); endfig; bye Index: figures/doof2d.mp =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/figures/doof2d.mp,v retrieving revision 1.1 diff -c -p -r1.1 doof2d.mp *** figures/doof2d.mp 2001/12/04 00:07:00 1.1 --- figures/doof2d.mp 2001/12/11 20:31:10 *************** *** 8,13 **** --- 8,14 ---- verbatimtex \documentclass[10pt]{article} + \input{macros.ltx} \begin{document} etex *************** beginfig(201) *** 64,70 **** endfor % Label the grid. ! labelGrid(btex Array \texttt{b}: Initial Configuration etex, nuCells, origin); endfig; --- 65,71 ---- endfor % Label the grid. ! labelGrid(btex Array \varname{b}: Initial Configuration etex, nuCells, origin); endfig; *************** beginfig(202) *** 93,99 **** endfor % Label the grid. ! labelGrid(btex Array \texttt{a}: After the first averaging etex, nuCells, origin); endfig; --- 94,100 ---- endfor % Label the grid. ! labelGrid(btex Array \varname{a}: After the first averaging etex, nuCells, origin); endfig; *************** beginfig(203) *** 133,139 **** endfor % Label the grid. ! labelGrid(btex Array \texttt{b}: After the second averaging etex, nuCells, origin); endfig; --- 134,140 ---- endfor % Label the grid. ! labelGrid(btex Array \varname{b}: After the second averaging etex, nuCells, origin); endfig; Index: figures/macros.ltx =================================================================== RCS file: macros.ltx diff -N macros.ltx *** /dev/null Fri Mar 23 21:37:44 2001 --- macros.ltx Tue Dec 11 13:31:10 2001 *************** *** 0 **** --- 1,14 ---- + %% Oldham, Jeffrey D. + %% 2001Dec05 + %% Pooma + + %% Macros for Figures + + %% Consistency between these macros and the DocBook/Jade output is desired. + + \newcommand{\type}[1]{\texttt{#1}}% + % Produce a C++ (or other programming language) type. + % Requires: 1. the type's name. + \newcommand{\varname}[1]{\texttt{#1}}% + % Produce a C++ (or other programming language) variable. + % Requires: 1. the variable's name. Index: programs/Doof2d-Array-distributed-annotated.patch =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/programs/Doof2d-Array-distributed-annotated.patch,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-distributed-annotated.patch *** programs/Doof2d-Array-distributed-annotated.patch 2001/12/04 00:07:00 1.1 --- programs/Doof2d-Array-distributed-annotated.patch 2001/12/11 20:31:10 *************** *** 1,8 **** ! *** Doof2d-Array-distributed.cpp Wed Nov 28 07:46:56 2001 ! --- Doof2d-Array-distributed-annotated.cpp Wed Nov 28 07:53:31 2001 *************** ! *** 1,4 **** ! ! #include // has std::cout, ... ! #include // has EXIT_SUCCESS #include "Pooma/Arrays.h" // has Pooma's Array --- 1,7 ---- ! *** Doof2d-Array-distributed.cpp Wed Dec 5 14:04:36 2001 ! --- Doof2d-Array-distributed-annotated.cpp Wed Dec 5 14:07:56 2001 *************** ! *** 1,3 **** ! #include // has EXIT_SUCCESS #include "Pooma/Arrays.h" // has Pooma's Array *************** *** 13,19 **** #include "Pooma/Arrays.h" // has Pooma's Array *************** ! *** 15,19 **** // (i,j). The "C" template parameter permits use of this stencil // operator with both Arrays and Fields. ! template --- 12,18 ---- #include "Pooma/Arrays.h" // has Pooma's Array *************** ! *** 14,18 **** // (i,j). The "C" template parameter permits use of this stencil // operator with both Arrays and Fields. ! template *************** *** 26,65 **** inline typename C::Element_t *************** ! *** 40,52 **** ! Pooma::initialize(argc,argv); ! ! ! // Ask the user for the number of processors. long nuProcessors; ! ! std::cout << "Please enter the number of processors: "; ! ! std::cin >> nuProcessors; ! // Ask the user for the number of averagings. long nuAveragings, nuIterations; ! ! std::cout << "Please enter the number of averagings: "; ! ! std::cin >> nuAveragings; nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. ! --- 41,53 ---- ! Pooma::initialize(argc,argv); ! ! ! // Ask the user for the number of processors. long nuProcessors; ! ! std::cout << "Please enter the number of processors: "; ! ! std::cin >> nuProcessors; ! // Ask the user for the number of averagings. long nuAveragings, nuIterations; ! ! std::cout << "Please enter the number of averagings: "; ! ! std::cin >> nuAveragings; nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. *************** ! *** 54,67 **** // the grid. long n; ! ! std::cout << "Please enter the array size: "; ! ! std::cin >> n; // Specify the arrays' domains [0,n) x [0,n). ! Interval<1> N(0, n-1); --- 25,91 ---- inline typename C::Element_t *************** ! *** 42,46 **** ! // canot use standard input and output. Instead we use command-line ! // arguments, which are replicated, for input, and we use an Inform ! ! // stream for output. ! Inform output; ! ! --- 44,48 ---- ! // canot use standard input and output. Instead we use command-line ! // arguments, which are replicated, for input, and we use an Inform ! ! // stream for output. ! Inform output; ! ! *************** ! *** 48,52 **** ! if (argc != 4) { ! // Incorrect number of command-line arguments. ! ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl; ! return EXIT_FAILURE; ! } ! --- 50,54 ---- ! if (argc != 4) { ! // Incorrect number of command-line arguments. ! ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl; ! return EXIT_FAILURE; ! } ! *************** ! *** 55,63 **** ! // Determine the number of processors. long nuProcessors; ! ! nuProcessors = strtol(argv[1], &tail, 0); ! // Determine the number of averagings. long nuAveragings, nuIterations; ! ! nuAveragings = strtol(argv[2], &tail, 0); nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. ! --- 57,65 ---- ! // Determine the number of processors. long nuProcessors; ! ! nuProcessors = strtol(argv[1], &tail, 0); ! // Determine the number of averagings. long nuAveragings, nuIterations; ! ! nuAveragings = strtol(argv[2], &tail, 0); nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. *************** ! *** 65,69 **** ! // the grid. ! long n; ! ! n = strtol(argv[3], &tail, 0); ! // The dimension must be a multiple of the number of processors ! // since we are using a UniformGridLayout. ! --- 67,71 ---- // the grid. long n; ! ! n = strtol(argv[3], &tail, 0); ! // The dimension must be a multiple of the number of processors ! // since we are using a UniformGridLayout. ! *************** ! *** 71,80 **** // Specify the arrays' domains [0,n) x [0,n). ! Interval<1> N(0, n-1); *************** *** 70,80 **** ! Interval<2> interiorDomain(I,I); // Create the distributed arrays. ! --- 55,68 ---- ! // the grid. ! long n; ! ! std::cout << "Please enter the array size: "; ! ! std::cin >> n; // Specify the arrays' domains [0,n) x [0,n). ! Interval<1> N(0, n-1); --- 96,102 ---- ! Interval<2> interiorDomain(I,I); // Create the distributed arrays. ! --- 73,82 ---- // Specify the arrays' domains [0,n) x [0,n). ! Interval<1> N(0, n-1); *************** *** 86,92 **** // Create the distributed arrays. *************** ! *** 70,85 **** // dimension. Guard layers optimize communication between patches. // Internal guards surround each patch. External guards surround ! // the entire array domain. --- 108,114 ---- // Create the distributed arrays. *************** ! *** 83,98 **** // dimension. Guard layers optimize communication between patches. // Internal guards surround each patch. External guards surround ! // the entire array domain. *************** *** 103,109 **** ! Array<2, double, MultiPatch > > b(layout); // Set up the initial conditions. ! --- 71,86 ---- // dimension. Guard layers optimize communication between patches. // Internal guards surround each patch. External guards surround ! // the entire array domain. --- 125,131 ---- ! Array<2, double, MultiPatch > > b(layout); // Set up the initial conditions. ! --- 85,100 ---- // dimension. Guard layers optimize communication between patches. // Internal guards surround each patch. External guards surround ! // the entire array domain. *************** *** 121,127 **** // Set up the initial conditions. *************** ! *** 89,97 **** // Create the stencil performing the computation. ! Stencil stencil; --- 143,149 ---- // Set up the initial conditions. *************** ! *** 104,112 **** // Create the stencil performing the computation. ! Stencil stencil; *************** *** 131,137 **** ! // Read from b. Write to a. a(interiorDomain) = stencil(b, interiorDomain); ! --- 90,98 ---- // Create the stencil performing the computation. ! Stencil<DoofNinePt> stencil; --- 153,159 ---- ! // Read from b. Write to a. a(interiorDomain) = stencil(b, interiorDomain); ! --- 106,114 ---- // Create the stencil performing the computation. ! Stencil<DoofNinePt> stencil; *************** *** 142,162 **** a(interiorDomain) = stencil(b, interiorDomain); *************** ! *** 102,106 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. ! --- 103,107 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. *************** ! *** 110,111 **** ! --- 111,113 ---- return EXIT_SUCCESS; } + --- 164,184 ---- a(interiorDomain) = stencil(b, interiorDomain); *************** ! *** 117,121 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. ! --- 119,123 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. *************** ! *** 125,126 **** ! --- 127,129 ---- return EXIT_SUCCESS; } + Index: programs/Doof2d-Array-element-annotated.patch =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/programs/Doof2d-Array-element-annotated.patch,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-element-annotated.patch *** programs/Doof2d-Array-element-annotated.patch 2001/12/04 00:07:00 1.1 --- programs/Doof2d-Array-element-annotated.patch 2001/12/11 20:31:10 *************** *** 1,5 **** ! *** Doof2d-Array-element.cpp Tue Nov 27 11:04:04 2001 ! --- Doof2d-Array-element-annotated.cpp Tue Nov 27 12:06:32 2001 *************** *** 1,5 **** ! #include // has std::cout, ... --- 1,5 ---- ! *** Doof2d-Array-element.cpp Tue Dec 4 12:02:10 2001 ! --- Doof2d-Array-element-annotated.cpp Tue Dec 4 12:24:25 2001 *************** *** 1,5 **** ! #include // has std::cout, ... *************** *** 58,65 **** // Set up the initial conditions. ! // All grid values should be zero except for the central value. ! a = b = 0.0; ! b(n/2,n/2) = 1000.0; --- 20,38 ---- // the grid. long n; --- 58,65 ---- // Set up the initial conditions. ! // All grid values should be zero except for the central value. ! for (int j = 1; j < n-1; j++) ! for (int i = 1; i < n-1; i++) --- 20,38 ---- // the grid. long n; *************** *** 78,87 **** // Set up the initial conditions. ! // All grid values should be zero except for the central value. ! a = b = 0.0; ! b(n/2,n/2) = 1000.0; *************** ! *** 41,49 **** // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { --- 78,87 ---- // Set up the initial conditions. ! // All grid values should be zero except for the central value. ! for (int j = 1; j < n-1; j++) ! for (int i = 1; i < n-1; i++) *************** ! *** 43,51 **** // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { *************** *** 91,97 **** ! a(i,j) = weight * (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + ! --- 42,50 ---- // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { --- 91,97 ---- ! a(i,j) = weight * (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + ! --- 44,52 ---- // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { *************** *** 102,115 **** (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + *************** ! *** 51,56 **** // Read from a. Write to b. ! for (int j = 1; j < n-1; j++) ! for (int i = 1; i < n-1; i++) b(i,j) = weight * (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + ! --- 52,57 ---- // Read from a. Write to b. ! for (int j = 1; j < n-1; j++) --- 102,115 ---- (b(i+1,j+1) + b(i+1,j ) + b(i+1,j-1) + b(i ,j+1) + b(i ,j ) + b(i ,j-1) + *************** ! *** 53,58 **** // Read from a. Write to b. ! for (int j = 1; j < n-1; j++) ! for (int i = 1; i < n-1; i++) b(i,j) = weight * (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + ! --- 54,59 ---- // Read from a. Write to b. ! for (int j = 1; j < n-1; j++) *************** *** 117,126 **** b(i,j) = weight * (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + *************** ! *** 60,70 **** // Print out the final central value. - ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; ! // The arrays are automatically deallocated. --- 117,125 ---- b(i,j) = weight * (a(i+1,j+1) + a(i+1,j ) + a(i+1,j-1) + *************** ! *** 62,71 **** // Print out the final central value. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; ! // The arrays are automatically deallocated. *************** *** 129,138 **** Pooma::finalize(); return EXIT_SUCCESS; } ! --- 61,72 ---- // Print out the final central value. ! ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; ! // The arrays are automatically deallocated. --- 128,137 ---- Pooma::finalize(); return EXIT_SUCCESS; } ! --- 63,74 ---- // Print out the final central value. ! ! Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; ! // The arrays are automatically deallocated. Index: programs/Doof2d-Array-parallel-annotated.patch =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/programs/Doof2d-Array-parallel-annotated.patch,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-parallel-annotated.patch *** programs/Doof2d-Array-parallel-annotated.patch 2001/12/04 00:07:00 1.1 --- programs/Doof2d-Array-parallel-annotated.patch 2001/12/11 20:31:10 *************** *** 1,5 **** ! *** Doof2d-Array-parallel.cpp Tue Nov 27 13:00:09 2001 ! --- Doof2d-Array-parallel-annotated.cpp Tue Nov 27 14:07:07 2001 *************** *** 1,4 **** ! #include // has std::cout, ... --- 1,5 ---- ! *** Doof2d-Array-parallel.cpp Tue Dec 4 11:49:43 2001 ! --- Doof2d-Array-parallel-annotated.cpp Tue Dec 4 12:24:36 2001 *************** *** 1,4 **** ! #include // has std::cout, ... *************** *** 28,34 **** nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. *************** ! *** 19,38 **** // the grid. long n; ! std::cout << "Please enter the array size: "; --- 28,34 ---- nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. *************** ! *** 19,43 **** // the grid. long n; ! std::cout << "Please enter the array size: "; *************** *** 49,55 **** ! Array<2, double, Brick> b(vertDomain); // Set up the initial conditions. ! --- 20,39 ---- // the grid. long n; ! std::cout << "Please enter the array size: "; --- 49,60 ---- ! Array<2, double, Brick> b(vertDomain); // Set up the initial conditions. ! // All grid values should be zero except for the central value. ! a = b = 0.0; ! ! // Ensure all data-parallel computation finishes before accessing a value. ! Pooma::blockAndEvaluate(); ! b(n/2,n/2) = 1000.0; ! --- 20,44 ---- // the grid. long n; ! std::cout << "Please enter the array size: "; *************** *** 70,84 **** ! Array<2, double, Brick> b(vertDomain); // Set up the initial conditions. *************** ! *** 45,50 **** // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { ! // Read from b. Write to a. a(I,J) = weight * (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + ! --- 46,51 ---- // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { --- 75,94 ---- ! Array<2, double, Brick> b(vertDomain); // Set up the initial conditions. + // All grid values should be zero except for the central value. + a = b = 0.0; + ! // Ensure all data-parallel computation finishes before accessing a value. + Pooma::blockAndEvaluate(); + b(n/2,n/2) = 1000.0; *************** ! *** 47,52 **** // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { ! // Read from b. Write to a. a(I,J) = weight * (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + ! --- 48,53 ---- // Perform the simulation. ! for (int k = 0; k < nuIterations; ++k) { *************** *** 86,106 **** a(I,J) = weight * (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + *************** ! *** 61,65 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. ! --- 62,66 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. *************** ! *** 69,70 **** ! --- 70,72 ---- return EXIT_SUCCESS; } + --- 96,116 ---- a(I,J) = weight * (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + *************** ! *** 63,67 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. ! --- 64,68 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. *************** ! *** 71,72 **** ! --- 72,74 ---- return EXIT_SUCCESS; } + Index: programs/Doof2d-Array-stencil-annotated.patch =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/programs/Doof2d-Array-stencil-annotated.patch,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-Array-stencil-annotated.patch *** programs/Doof2d-Array-stencil-annotated.patch 2001/12/04 00:07:00 1.1 --- programs/Doof2d-Array-stencil-annotated.patch 2001/12/11 20:31:10 *************** *** 1,5 **** ! *** Doof2d-Array-stencil.cpp Tue Nov 27 17:23:41 2001 ! --- Doof2d-Array-stencil-annotated.cpp Tue Nov 27 17:36:13 2001 *************** *** 1,9 **** ! #include // has std::cout, ... --- 1,5 ---- ! *** Doof2d-Array-stencil.cpp Tue Dec 4 11:49:39 2001 ! --- Doof2d-Array-stencil-annotated.cpp Tue Dec 4 12:26:46 2001 *************** *** 1,9 **** ! #include // has std::cout, ... *************** *** 109,115 **** // Set up the initial conditions. *************** ! *** 71,80 **** b(n/2,n/2) = 1000.0; ! // Create the stencil performing the computation. --- 109,115 ---- // Set up the initial conditions. *************** ! *** 73,82 **** b(n/2,n/2) = 1000.0; ! // Create the stencil performing the computation. *************** *** 120,126 **** ! // Read from b. Write to a. a(interiorDomain) = stencil(b, interiorDomain); ! --- 72,81 ---- b(n/2,n/2) = 1000.0; ! // Create the stencil performing the computation. --- 120,126 ---- ! // Read from b. Write to a. a(interiorDomain) = stencil(b, interiorDomain); ! --- 74,83 ---- b(n/2,n/2) = 1000.0; ! // Create the stencil performing the computation. *************** *** 132,152 **** a(interiorDomain) = stencil(b, interiorDomain); *************** ! *** 85,89 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. ! --- 86,90 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. *************** ! *** 93,94 **** ! --- 94,96 ---- return EXIT_SUCCESS; } + --- 132,152 ---- a(interiorDomain) = stencil(b, interiorDomain); *************** ! *** 87,91 **** // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. ! --- 88,92 ---- // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // The arrays are automatically deallocated. *************** ! *** 95,96 **** ! --- 96,98 ---- return EXIT_SUCCESS; } + Index: programs/Doof2d-C-element-annotated.patch =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/programs/Doof2d-C-element-annotated.patch,v retrieving revision 1.1 diff -c -p -r1.1 Doof2d-C-element-annotated.patch *** programs/Doof2d-C-element-annotated.patch 2001/12/04 00:07:00 1.1 --- programs/Doof2d-C-element-annotated.patch 2001/12/11 20:31:10 *************** *** 75,81 **** a[i][j] = b[i][j] = 0.0; b[n/2][n/2] = 1000.0; ! ! // In the average, weight element with this value. const double weight = 1.0/9.0; // Perform the simulation. --- 75,81 ---- a[i][j] = b[i][j] = 0.0; b[n/2][n/2] = 1000.0; ! ! // In the average, weight elements with this value. const double weight = 1.0/9.0; // Perform the simulation. *************** *** 94,100 **** a[i][j] = b[i][j] = 0.0; b[n/2][n/2] = 1000.0; ! ! // In the average, weight element with this value. const double weight = 1.0/9.0; // Perform the simulation. --- 94,100 ---- a[i][j] = b[i][j] = 0.0; b[n/2][n/2] = 1000.0; ! ! // In the average, weight elements with this value. const double weight = 1.0/9.0; // Perform the simulation. Index: programs/Doof2d-Field-distributed-annotated.patch =================================================================== RCS file: Doof2d-Field-distributed-annotated.patch diff -N Doof2d-Field-distributed-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Field-distributed-annotated.patch Tue Dec 11 13:31:10 2001 *************** *** 0 **** --- 1,176 ---- + *** Doof2d-Field-distributed.cpp Wed Dec 5 14:05:10 2001 + --- Doof2d-Field-distributed-annotated.cpp Wed Dec 5 14:41:24 2001 + *************** + *** 1,3 **** + ! #include // has EXIT_SUCCESS + #include "Pooma/Fields.h" // has Pooma's Field + + --- 1,4 ---- + ! + ! #include <stdlib.h> // has EXIT_SUCCESS + #include "Pooma/Fields.h" // has Pooma's Field + + *************** + *** 12,16 **** + // canot use standard input and output. Instead we use command-line + // arguments, which are replicated, for input, and we use an Inform + ! // stream for output. + Inform output; + + --- 13,17 ---- + // canot use standard input and output. Instead we use command-line + // arguments, which are replicated, for input, and we use an Inform + ! // stream for output. + Inform output; + + *************** + *** 18,22 **** + if (argc != 4) { + // Incorrect number of command-line arguments. + ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl; + return EXIT_FAILURE; + } + --- 19,23 ---- + if (argc != 4) { + // Incorrect number of command-line arguments. + ! output << argv[0] << ": number-of-processors number-of-averagings number-of-values" << std::endl; + return EXIT_FAILURE; + } + *************** + *** 25,33 **** + // Determine the number of processors. + long nuProcessors; + ! nuProcessors = strtol(argv[1], &tail, 0); + + // Determine the number of averagings. + long nuAveragings, nuIterations; + ! nuAveragings = strtol(argv[2], &tail, 0); + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + --- 26,34 ---- + // Determine the number of processors. + long nuProcessors; + ! nuProcessors = strtol(argv[1], &tail, 0); + + // Determine the number of averagings. + long nuAveragings, nuIterations; + ! nuAveragings = strtol(argv[2], &tail, 0); + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + *************** + *** 35,39 **** + // the grid. + long n; + ! n = strtol(argv[3], &tail, 0); + // The dimension must be a multiple of the number of processors + // since we are using a UniformGridLayout. + --- 36,40 ---- + // the grid. + long n; + ! n = strtol(argv[3], &tail, 0); + // The dimension must be a multiple of the number of processors + // since we are using a UniformGridLayout. + *************** + *** 41,50 **** + + // Specify the fields' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<1> J(1,n-2); + + // Partition the fields' domains uniformly, i.e., each patch has the + --- 42,51 ---- + + // Specify the fields' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<1> J(1,n-2); + + // Partition the fields' domains uniformly, i.e., each patch has the + *************** + *** 52,74 **** + // dimension. Guard layers optimize communication between patches. + // Internal guards surround each patch. External guards surround + ! // the entire field domain. + ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors), + ! GuardLayers<2>(1), // internal + ! GuardLayers<2>(0)); // external + ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag()); + + // Specify the fields' mesh, i.e., its spatial extent, and its + ! // centering type. + ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0)); + ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim); + + // The template parameters indicate a mesh and a 'double' + // element type. MultiPatch indicates multiple computation patches, + // i.e., distributed computation. The UniformTag indicates the + ! // patches should have the same size. Each patch has Brick type. + ! Field, double, MultiPatch > > a(cell, layout, mesh); + ! Field, double, MultiPatch > > b(cell, layout, mesh); + + // Set up the initial conditions. + --- 53,75 ---- + // dimension. Guard layers optimize communication between patches. + // Internal guards surround each patch. External guards surround + ! // the entire field domain. + ! UniformGridPartition<2> partition(Loc<2>(nuProcessors, nuProcessors), + ! GuardLayers<2>(1), // internal + ! GuardLayers<2>(0)); // external + ! UniformGridLayout<2> layout(vertDomain, partition, DistributedTag()); + + // Specify the fields' mesh, i.e., its spatial extent, and its + ! // centering type. + ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0)); + ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim); + + // The template parameters indicate a mesh and a 'double' + // element type. MultiPatch indicates multiple computation patches, + // i.e., distributed computation. The UniformTag indicates the + ! // patches should have the same size. Each patch has Brick type. + ! Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag, + ! Remote<Brick> > > a(cell, layout, mesh); + ! Field<UniformRectilinearMesh<2>, double, MultiPatch<UniformTag, + ! Remote<Brick> > > b(cell, layout, mesh); + + // Set up the initial conditions. + *************** + *** 83,87 **** + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(I,J) = weight * + --- 84,88 ---- + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + // Read from b. Write to a. + a(I,J) = weight * + *************** + *** 99,103 **** + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The fields are automatically deallocated. + --- 100,104 ---- + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! output << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The fields are automatically deallocated. + *************** + *** 107,108 **** + --- 108,110 ---- + return EXIT_SUCCESS; + } + + Index: programs/Doof2d-Field-parallel-annotated.patch =================================================================== RCS file: Doof2d-Field-parallel-annotated.patch diff -N Doof2d-Field-parallel-annotated.patch *** /dev/null Fri Mar 23 21:37:44 2001 --- Doof2d-Field-parallel-annotated.patch Tue Dec 11 13:31:10 2001 *************** *** 0 **** --- 1,120 ---- + *** Doof2d-Field-parallel.cpp Tue Dec 4 10:01:28 2001 + --- Doof2d-Field-parallel-annotated.cpp Tue Dec 4 11:04:26 2001 + *************** + *** 1,5 **** + ! #include // has std::cout, ... + ! #include // has EXIT_SUCCESS + ! #include "Pooma/Fields.h" // has Pooma's Field + + // Doof2d: Pooma Fields, data-parallel implementation + --- 1,6 ---- + ! + ! #include <iostream> // has std::cout, ... + ! #include <stdlib.h> // has EXIT_SUCCESS + ! #include "Pooma/Fields.h" // has Pooma's Field + + // Doof2d: Pooma Fields, data-parallel implementation + *************** + *** 12,17 **** + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + --- 13,18 ---- + // Ask the user for the number of averagings. + long nuAveragings, nuIterations; + ! std::cout << "Please enter the number of averagings: "; + ! std::cin >> nuAveragings; + nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. + + *************** + *** 19,44 **** + // the grid. + long n; + ! std::cout << "Please enter the field size: "; + ! std::cin >> n; + + // Specify the fields' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<1> J(1,n-2); + + // Specify the fields' mesh, i.e., its spatial extent, and its + ! // centering type. + ! DomainLayout<2> layout(vertDomain); + ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0)); + ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim); + + // Create the fields. + // The template parameters indicate a mesh, a 'double' element + ! // type, and ordinary 'Brick' storage. + ! Field, double, Brick> a(cell, layout, mesh); + ! Field, double, Brick> b(cell, layout, mesh); + + // Set up the initial conditions. + --- 20,45 ---- + // the grid. + long n; + ! std::cout << "Please enter the field size: "; + ! std::cin >> n; + + // Specify the fields' domains [0,n) x [0,n). + ! Interval<1> N(0, n-1); + ! Interval<2> vertDomain(N, N); + + // Set up interior domains [1,n-1) x [1,n-1) for computation. + ! Interval<1> I(1,n-2); + ! Interval<1> J(1,n-2); + + // Specify the fields' mesh, i.e., its spatial extent, and its + ! // centering type. + ! DomainLayout<2> layout(vertDomain); + ! UniformRectilinearMesh<2> mesh(layout, Vector<2>(0.0), Vector<2>(1.0, 1.0)); + ! Centering<2> cell = canonicalCentering<2>(CellType, Continuous, AllDim); + + // Create the fields. + // The template parameters indicate a mesh, a 'double' element + ! // type, and ordinary 'Brick' storage. + ! Field<UniformRectilinearMesh<2>, double, Brick> a(cell, layout, mesh); + ! Field<UniformRectilinearMesh<2>, double, Brick> b(cell, layout, mesh); + + // Set up the initial conditions. + *************** + *** 51,56 **** + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + --- 52,57 ---- + + // Perform the simulation. + ! for (int k = 0; k < nuIterations; ++k) { + ! // Read from b. Write to a. + a(I,J) = weight * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + *************** + *** 67,71 **** + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The fields are automatically deallocated. + --- 68,72 ---- + // Print out the final central value. + Pooma::blockAndEvaluate(); // Ensure all computation has finished. + ! std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; + + // The fields are automatically deallocated. + *************** + *** 75,76 **** + --- 76,78 ---- + return EXIT_SUCCESS; + } + + From oldham at codesourcery.com Tue Dec 11 21:44:14 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Tue, 11 Dec 2001 13:44:14 -0800 Subject: Patch: Fix Typos in Comments Message-ID: <20011211134414.B29015@codesourcery.com> This patch fixes some typographical errors in Pooma source code comments. 2001-Dec-11 Jeffrey D. Oldham * Field/Mesh/UniformRectilinearMesh.h (UniformRectilinearMeshData): Remove extraneous conjunctive in comment. (UniformRectilinearMeshData::~UniformRectilinearMeshData): Fix typo in introdutory comment. * Partition/UniformGridPartition.h (UniformGridPartition::partition): Fix typos in comment. * Utilities/Pooma.cmpl.cpp: Fix typo in comment. Not tested since the changes are only to comments. Applied to mainline Approved by Mark Mitchell Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: Field/Mesh/UniformRectilinearMesh.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Field/Mesh/UniformRectilinearMesh.h,v retrieving revision 1.3 diff -c -p -r1.3 UniformRectilinearMesh.h *** Field/Mesh/UniformRectilinearMesh.h 2001/12/03 19:38:33 1.3 --- Field/Mesh/UniformRectilinearMesh.h 2001/12/11 19:27:44 *************** public: *** 63,69 **** // Constructors. // We provide a default constructor that creates the object with empty ! // domains and. To be useful, this object must be replaced by another // version via assignment. UniformRectilinearMeshData() --- 63,69 ---- // Constructors. // We provide a default constructor that creates the object with empty ! // domains. To be useful, this object must be replaced by another // version via assignment. UniformRectilinearMeshData() *************** public: *** 150,156 **** //--------------------------------------------------------------------------- // Empty destructor is fine. Note, however, that NoMeshData does not have ! // a virtual destructor. We must be carefult to delete these puppies as // UniformRectilinearMeshData. ~UniformRectilinearMeshData() { } --- 150,156 ---- //--------------------------------------------------------------------------- // Empty destructor is fine. Note, however, that NoMeshData does not have ! // a virtual destructor. We must be careful to delete these puppies as // UniformRectilinearMeshData. ~UniformRectilinearMeshData() { } Index: Partition/UniformGridPartition.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Partition/UniformGridPartition.h,v retrieving revision 1.27 diff -c -p -r1.27 UniformGridPartition.h *** Partition/UniformGridPartition.h 2001/12/03 19:38:34 1.27 --- Partition/UniformGridPartition.h 2001/12/11 19:27:44 *************** public: *** 375,382 **** { iguards = internalGuards_m; ! // Check if we're at an edge, and if so use the ! // external specfication for that edge. for (int d = 0; d < Dim; ++d) { --- 375,382 ---- { iguards = internalGuards_m; ! // Check if we're at an edge, and, if so, use the ! // external specification for that edge. for (int d = 0; d < Dim; ++d) { Index: Pooma/Pooma.cmpl.cpp =================================================================== RCS file: /home/pooma/Repository/r2/src/Pooma/Pooma.cmpl.cpp,v retrieving revision 1.37 diff -c -p -r1.37 Pooma.cmpl.cpp *** Pooma/Pooma.cmpl.cpp 2001/04/11 21:39:28 1.37 --- Pooma/Pooma.cmpl.cpp 2001/12/11 19:27:45 *************** void lockThreads(bool on) *** 855,861 **** } //----------------------------------------------------------------------------- ! // Return whether threads hsould be locked to processors. //----------------------------------------------------------------------------- bool blockingExpressions() --- 855,861 ---- } //----------------------------------------------------------------------------- ! // Return whether threads should be locked to processors. //----------------------------------------------------------------------------- bool blockingExpressions() From oldham at codesourcery.com Thu Dec 13 05:05:31 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Wed, 12 Dec 2001 21:05:31 -0800 Subject: Manual Patch: Finish Concepts Chapter Message-ID: <20011212210531.A13125@codesourcery.com> 2001-Dec-12 Jeffrey D. Oldham This patch mainly finishes the first draft of the Pooma concepts chapter. * concepts.xml: New file containing the chapter describing the Pooma concepts. Some of this material was moved out of manual.xml. The "Computation Modes" and "Computation Environment" material is new. * glossary.xml: Added entries corresponding to concepts added to concepts.xml. * manual.xml: Concepts chapter moved into concepts.xml. Unused material moved to "Writing Sequential Programs" chapter. * figures/concepts.mp: Changed "geometric value" to "spatial value". Applied to mainline. Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: concepts.xml =================================================================== RCS file: concepts.xml diff -N concepts.xml *** /dev/null Fri Mar 23 21:37:44 2001 --- concepts.xml Wed Dec 12 21:02:05 2001 *************** *** 0 **** --- 1,630 ---- + + Overview of &pooma; Concepts + + FIXME: How does multi-threaded computation fit into the + model? + + In the previous chapter, we presented several different + implementations of the &doof2d; simulation program. The + implementations illustrate the various containers, computation + modes, and computation environments that &pooma; supports. In this + chapter, we describe the concepts associated with each of these + three categories. Specific details needed by programmers are + deferred to later chapters. + + + &pooma; Implementation Concepts + + + + Container + Computation Modes + Computation Environment + + + + + &array; + element-wise + sequential + + + &dynamicarray; + data-parallel + distributed + + + &field; + stencil-based + + + + &tensor; + relational + + + + &matrix; + + + + + &vector; + + + + + +

+ + The most important &pooma; concepts can be grouped into three + separate categories: + + + container + + data structure holding one or more values and addressed + by indices + + + + computation modes + + styles of expressing computations + + + + computation environment + + description of resources for computing, e.g., single + processor or multi-processor + + + + See . Many &pooma; programs + select one possibility from each column. For example, used a &array; + container and stencils for sequential computation, while used a &field; + container and data-parallel statements with distributed + computation. A program may use multiple containers and various + computation modes, but the computation environment either has + distributed processors or not. + + In the rest of this chapter, we explore these three + categories. First, we describe &pooma; containers, illustrating + the purposes of each, and explaining the concepts needed to declare + them. Then, we describe the different computation modes and + finally distributed computation concepts. + + +

+ +

+ + +

+ Declaring Sequential Containers + +

+ &array; and &field; Mathematical and Computational Concepts + + + + + + maps from indices to values + + + + + A layout + maps domain indices to the + processors and computer memory used by a container's engines. + See . + A computer computes a container's values using a processor and + memory. The layout specifies the processor(s) and memory to use + for each particular index. A container's layout for a + uniprocessor implementation consists of its domain, the + processor, and its memory. For a multi-processor implementation, + the layout maps portions of the domain to (possibly different) + processors and memory. + + A &field;'s mesh + maps domain indices to + spatial values in &space; such as distance between cells, edge + lengths, and normals to cells. In other words, it provides a + &field;'s spatial extent. See also . + Different mesh types may support different spatial + values. + + A mesh's corner + position specifies the point in &space; corresponding to + the lower, left corner of its domain. Combining this, the + domain, and the cell size fully specifies the mesh's map from + indices to &space;. + + A mesh's cell + size specifies the spatial dimensions of + a &field; cell, e.g., its + width, height, and depth, in &space;. Combining this, the + domain, and the corner position fully specifies the mesh's map + from indices to &space;. + + A domain + is a set of points on which a container can define values. An + interval + consists of all integral points between two values. It is + frequently represented using mathematical interval notation [a,b] + even though it contains only the integral points, e.g., a, a+1, + a+2, …, b. The concept is generalized to multiple + dimensions by forming tensor product of intervals, i.e., all the + integral tuples in an &n;-dimensional space. For example, the + two-dimensional containers in the previous chapter are defined on + a two-dimensional domain with the both dimensions' spanning the + interval [0,n). A stride + is a subset of an interval consisting of regularly-spaced + points. A range + is a subset of an interval formed by the tensor product of strides. + A region + represents a continuous &n;-dimensional domain. +

+ + +

+ Computation Modes + + &pooma; computations can be expressed using a variety of + modes. Most of &pooma; computations involve &array; or &field; + containers, but how their values are accessed and the associated + algorithms using them varies. Element-wise computation involves + explicitly accessing values. A data-parallel computation uses + expressions to represent larger subsets of a container's values. + Stencil-based computations write a computation as repeatedly + applying a local computation to each element of an array. A + relation among containers establishes a dependency between them so + the values of one container are updated whenever any other's + values change. A program may use any or all of these styles, + described below. + + Element-wise + computation accesses individual container values through explicit + notation. For example, values in a two-dimensional + container &container; might be referenced as + &container(3,4) or + &container(i,j+1). This is the usual + notation for languages without objects such as &c;. + + Data-parallel + computation uses expressions to access subsets of a container's + values. For example, in , + a(I,J) represents the subset of &array; + a's values with coordinates in the domain + specified by the one-dimensional &interval;s I + and J. Using data-parallel expressions + frequently eliminates the need for writing explicit loops in + code. + + A stencil + computes a container's value using neighboring data values. Each + stencil consists of an indication of which neighboring values to + read and a function using those values. For example, an averaging + stencil may access all neighbors, averaging them. In &pooma;, we + represent a stencil using a function object having functions + indicating which neighboring values are used. Stencil + computations are frequently used in solving partial differential + equations, image processing, and geometric modeling. + + A relation + is a dependence among containers so the dependent container's + values are updated when its values are needed and any of its + related containers' values have changed. A relation is specified + by a dependent container, independent containers, and a function + computing the dependent container's values using the independent + containers' values. To avoid excess computation, the dependent + container's values are computed only when needed, e.g., for + printing or for computing the values of another dependent + container. Thus, this computation is sometimes called lazy + evaluation. +

+ + +

+ Computation Environment + + A &pooma; program can execute on a wide variety of + computers. The default sequential computing + environment consists of one processor and + associated memory, as found on a personal computer. In contrast, + a distributed computing + environment may have multiple processors + and multiple distributed or shared memories. For example, some + desktop computers have dual processors and shared memory, while a + large supercomputer may have thousands of processors, perhaps + with groups of eight sharing the same memory. + + Using distributed computation requires three things: + + + the programmer must declare how container domains will + be distributed, + + + &pooma; must be configured to use a communications + library, and + + + the &pooma; executable must be run using the + library. + + + All of these were illustrated in and . + illustrates the &pooma; distributed computation model. + described how to declare containers with distributed domains. + Detailed instructions how to configure &pooma; for distributed + computation appear in . + Detailed instructions how to run distributed &pooma; executables + appear in . Here we present + three concepts for distributed computation: patches, context, and + a communication library. + + A partition divides a container's domain into disjoint + patches. + For distributed computation, the patches are distributed among + various processors, which compute the associated values. As + illustrated in , + each patch can be surrounded by guard layers. + + A context + is a collection of shared memory and processors that can execute + a program of a portion of a program. It can have one or more + processors, but all these processors must access the same shared + memory. Usually the computer and its operating system, not the + programmer, determine the available contexts. + + A communication + library passes messages among contexts. + &pooma; uses the communication library to copy information among + contexts, all of which is hidden from both the programmer and the + user. &pooma; works with the Message Passing Interface (&mpi;) + Communications Library (FIXME: xref linkend="mpi99", ) and the &mm; + Shared Memory Library. +

+ Index: glossary.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v retrieving revision 1.1 diff -c -p -r1.1 glossary.xml *** glossary.xml 2001/12/11 20:36:13 1.1 --- glossary.xml 2001/12/13 04:02:05 *************** *** 72,77 **** --- 72,86 ---- + + communication library + + software library passing information among contexts, usually + using messages. + distributed computing environment + + + computing environment *************** *** 137,142 **** --- 146,165 ---- D + + data parallel + + describes an expression representing a subset of a + container's values. For example, + sin(&container;) is an expression + indicating that the sin is applied to each + value in container &container;. + element wise + relation + stencil + + + distributed computing environment *************** *** 166,176 **** &dynamicarray; a &pooma; container generalizing one-dimensional &array;s by supporting domain resizing at run-time. It maps indices to values in constant-time access, ignoring the time to compute the values if applicable. &dynamicarray;s are first-class objects. &array; &field; --- 189,199 ---- &dynamicarray; a &pooma; container generalizing one-dimensional &array;s by supporting domain resizing at run-time. It maps indices to values in constant-time access, ignoring the time to compute the values if applicable. &dynamicarray;s are first-class objects. &array; &field; *************** *** 180,195 **** E engine stores and, if necessary, computes a container's values. These ! can be specialized, e.g., to minimize storage when a domain has ! few distinct values. Separating a container and its storage also ! permits views of a container. ! &engine; ! view of a container --- 203,232 ---- E + + element wise + + describes accesses to individual values within a container. + For example, &container(i,j) represents one + particular value in the container &container;. + data parallel + relation + stencil + + + engine stores and, if necessary, computes a container's values. These can ! be specialized, e.g., to minimize storage when a domain has few ! distinct values. Separating a container and its storage also ! permits views of a ! container. &engine; ! view of a ! container *************** *** 221,227 **** also supports geometric computations such as the distance between two cells and normals to a cell. &field;s are first-class objects. &array; &dynamicarray; --- 258,264 ---- also supports geometric computations such as the distance between two cells and normals to a cell. &field;s are first-class objects. &array; &dynamicarray; *************** *** 236,242 **** may be declared anywhere, stored in automatic variables, accessed anywhere, copied, and passed by both value and reference. &pooma; &array; and &field; are first-class types. --- 273,279 ---- may be declared anywhere, stored in automatic variables, accessed anywhere, copied, and passed by both value and reference. &pooma; &array; and &field; are first-class types. *************** *** 373,380 **** patch ! ! ERE partition guard layer domain --- 410,418 ---- patch ! subset of a container's domain with values computed by a ! particular context. A partition splits a domain into patches. It ! may be surrounded by external and internal guard layers. partition guard layer domain *************** ERE *** 417,422 **** --- 455,475 ---- interval + + + relation + + dependence between a dependent container and one or more + independent containers and an associated function. If a dependent + container's values are needed and one or more of the independent + containers' values have changed, the dependent container's values + are computed using the function and the independent containers' + values. Relations implement lazy evaluation. + data parallel + element wise + stencil + + *************** ERE *** 427,435 **** a computing environment with one processor and associated memory. Only one processor executes a program even if the ! conmputer itself has multiple processors. computing environment distributed computing environment --- 480,503 ---- a computing environment with one processor and associated memory. Only one processor executes a program even if the ! computer itself has multiple processors. computing environment distributed computing environment + + + + + stencil + + set of values neighboring a container value and a function + using those values to compute it. For example, the stencil in a + two-dimensional Conway game of life consists of a value's eight + neighbors and a function that sets the value to + live if it is already live or it has exactly three + live neighbors. + data parallel + element wise + relation Index: manual.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v retrieving revision 1.1 diff -c -p -r1.1 manual.xml *** manual.xml 2001/12/11 20:36:13 1.1 --- manual.xml 2001/12/13 04:02:07 *************** *** 154,159 **** --- 154,161 ---- + + *************** *** 345,1008 **** - &tutorial-chapter; - - - - Overview of &pooma; Concepts - - FIXME: How does multi-threaded computation fit into the - model? - - In the previous chapter, we presented several different - implementations of the &doof2d; simulation program. The - implementations illustrate the various containers, computation - syntaxes, and computation environments that &pooma; supports. In - this chapter, we describe the concepts associated with each of - these three categories. Specific details needed by programmers are - deferred to later chapters. ! ! &pooma; Implementation Concepts ! ! ! ! Container ! Computation Syntax ! Computation Environment ! ! ! ! ! &array; ! element-wise ! sequential ! ! ! &dynamicarray; ! data-parallel ! distributed ! ! ! &field; ! stencil-based ! ! ! ! &tensor; ! relational ! ! ! ! &matrix; ! ! ! ! ! &vector; ! ! ! ! ! !

! ! The most important &pooma; concepts can be grouped into three ! separate categories: ! ! ! container ! ! data structure holding one or more values and addressed ! by indices ! ! ! ! computation syntax ! ! styles of expressing computations ! ! ! ! computation environment ! ! description of resources for computing, e.g., single ! processor or multi-processor ! ! ! ! See . Many &pooma; programs ! select one possibility from each column. For example, used a &array; ! container and stencils for sequential computation, while used a &field; ! container and data-parallel statements with distributed ! computation. A program may use multiple containers and various ! computation syntax, but the computation environment either has ! distributed processors or not. ! ! In the rest of this chapter, we explore these three ! categories. First, we describe &pooma; containers, illustrating ! the purposes of each, and explaining the concepts needed to declare ! them. Then, we describe the different computation syntaxes and ! finally distributed computation concepts. ! ! !

! &pooma; Containers ! ! Most &pooma; programs use containers ! to store groups of values. &pooma; containers are objects that ! store other objects. They control allocation and deallocation of ! and access to these objects. They are a generalization of &c; ! arrays, but &pooma; containers are first-class objects so they can ! be used directly in expressions. They are similar to &cc; ! containers such as vector, list, and ! stack. See for a summary of the ! containers. ! ! This chapter describes many concepts, not all of which are ! needed to begin programming with the &pooma; Toolkit. Below we ! introduce the different categories of concepts. After that, we ! introduce the different &pooma;'s containers and describe how to ! choose the appropriate one for a particular task. ! indicates which concepts must be understood when declaring a ! particular container. All of these concepts are described in ! and ! . ! Use this figure to decide which concepts in the former are ! relevant. Reading the latter section is necessary only if ! computing using multiple processors. The programs in the previous ! chapter illustrate many of these concepts. ! ! ! &pooma; Container Summary ! ! ! ! &array; ! container mapping indices to values and that may be ! used in expressions ! ! ! &dynamicarray; ! one-dimensional &array; whose domain can be dynamically ! resized ! ! ! &field; ! container mapping indices to one or more values and ! residing in multi-dimensional space ! ! ! &tensor; ! multi-dimensional mathematical tensor ! ! ! &matrix; ! two-dimensional mathematical matrix ! ! ! &vector; ! multi-dimensional mathematical vector ! ! ! !

! ! ! ! A &pooma; array;, generalizing a &c; ! array, maps indices to values. Given a index or position in an ! &array;'s domain, it returns the associated value, either by ! returning a stored value or by computing it. The use of indices, ! which are usually ordered tuples, permits constant-time access ! although computing a particular value may require significant ! time. In addition to the functionality provided by &c; arrays, ! the &array; class automatically handles memory allocation and ! deallocation, supports a wider variety of assignments, and can be ! used in expressions. For example, the addition of two arrays can ! be assigned to an array and the product of a scalar element and an ! array is permissible. ! ! ! ! A &pooma; &dynamicarray; extends ! &array; capabilities to support a dynamically-changing domain but ! is restricted to only one dimension. When the &dynamicarray; is ! resized, its values are preserved. ! ! ! ! A &pooma; &field; is an &array; with ! spatial extent. Each domain consists of cells ! in one-, two-, or three-dimensional space. Although indexed ! similarly to &array;s, each cell may contain multiple values and ! multiple materials. A &field;'s mesh stores its spatial ! characteristics and can map yield, e.g., a point contained in a ! cell, the distance between two cells, and a cell's normals. A ! &field; should be used whenever geometric or spatial computations ! are needed, multiple values per index are desired, or a ! computation involves more than one material. ! ! ! ! A &tensor; ! implements a multi-dimensional mathematical tensor. Since it is a ! first-class type, it can be used in expressions such as ! adding two &tensor;s. ! ! ! ! A &matrix; ! implements a two-dimensional mathematical matrix. Since it is a ! first-class type, it can be used in expressions such as ! multiplying matrices and assignments to matrices. ! ! ! ! A &vector; ! implements a multi-dimensional mathematical vector, which is an ! ordered tuple of components. Since it is a first-class type, it ! can be used in expressions such as adding two &vector;s and ! multiplying a &matrix; and a &vector;. ! ! The data of an &array;, &dynamicarray;, or &field; can be ! viewed using more than one container by taking a view. A ! view of ! an existing container &container; is a container whose domain ! is a subset of &container;. The subset can equal the ! original domain. A view acts like a reference in that changing ! any of the view's values also changes the original container's and ! vice versa. While users sometimes explicitly create views, they ! are perhaps more frequently created as temporaries in expressions. ! For example, if A is an &array; and ! I is a domain, A(I) - ! A(I-1) forms the difference between adjacent ! values. ! ! !

! Choosing a Container ! ! The two most commonly used &pooma; containers are &array;s ! and &field;s. contains a ! decision tree describing how to choose an appropriate ! container. ! ! ! Choosing a &pooma; Container ! ! ! ! If modeling mathematical entries, ! use a &vector;, &matrix;, or &tensor;. ! ! ! If indices and values reside in multi-dimensional space ! &space;, ! use a &field;. ! ! ! If there are multiple values per index, ! use a &field;. ! ! ! If there are multiple materials participating in the same computation, ! use a &field;. ! ! ! If the domain's size dynamically changes and is one-dimensional, ! use a &dynamicarray;. ! ! ! Otherwise ! use an &array;. ! ! ! !

! !

- Declaring Sequential Containers !

! Concepts For Declaring Containers ! ! ! ! ! ! concepts involved in declaring containers ! ! ! ! ! In the previous sections, we introduced the &pooma; ! containers and described how to choose one appropriate for a ! given task. In this section, we describe the concepts involved ! in declaring them. Concepts specific to distributed computation ! are described in the next section. ! ! ! illustrates the containers and the concepts involved in their ! declarations. The containers are listed in the top row. Lines ! connect these containers to the components necessary for their ! declarations. For example, an &array; declaration requires an ! engine and a layout. These, in turn, depend on other &pooma; ! concepts. Declarations necessary only for distributed, or ! multiprocessor, computation are surrounded by dashed lines. You ! can use these dependences to indicate the concepts needed for a ! particular container. ! ! An engine ! stores and, if necessary, computes a container's values. A ! container has one or more engines. The separation of a container ! and its storage permits optimizing a program's space ! requirements. For example, a container returning the same value ! for all indices can use a constant engine, which need only store ! one value for the entire domain. A &compressiblebrick; engine ! reduces its space requirements to a constant whenever all its ! values are the same. The separation also permits taking views of containers without ! copying storage. ! !

! &array; and &field; Mathematical and Computational Concepts ! ! ! ! ! ! maps from indices to values ! ! ! ! ! A layout ! maps domain indices to the ! processors and computer memory used by a container's engines. ! See . ! A computer computes a container's values using a processor and ! memory. The layout specifies the processor(s) and memory to use ! for each particular index. A container's layout for a ! uniprocessor implementation consists of its domain, the ! processor, and its memory. For a multi-processor implementation, ! the layout maps portions of the domain to (possibly different) ! processors and memory. ! ! A &field;'s mesh ! maps domain indices to ! geometric values in &space; such as distance between cells, edge ! lengths, and normals to cells. In other words, it provides a ! &field;'s spatial extent. See also . ! Different mesh types may support different geometric ! values. ! ! A mesh's corner ! position specifies the point in &space; corresponding to ! the lower, left corner of its domain. Combining this, the ! domain, and the cell size fully specifies the mesh's map from ! indices to &space;. ! ! A mesh's cell ! size specifies the spatial dimensions of ! a &field; cell, e.g., its ! width, height, and depth, in &space;. Combining this, the ! domain, and the corner position fully specifies the mesh's map ! from indices to &space;. ! ! A domain ! is a set of points on which a container can define values. An ! interval ! consists of all integral points between two values. It is ! frequently represented using mathematical interval notation [a,b] ! even though it contains only the integral points, e.g., a, a+1, ! a+2, …, b. The concept is generalized to multiple ! dimensions by forming tensor product of intervals, i.e., all the ! integral tuples in an &n;-dimensional space. For example, the ! two-dimensional containers in the previous chapter are defined on ! a two-dimensional domain with the both dimensions' spanning the ! interval [0,n). A stride ! is a subset of an interval consisting of regularly-spaced ! points. A range ! is a subset of an interval formed by the tensor product of strides. ! A region ! represents a continuous &n;-dimensional domain. !

! ! !

! Declaring Distributed Containers ! ! In the previous section, we introduced the concepts ! important when declaring containers for use on uniprocessor ! computers. When using multi-processor computers, we augment ! these concepts with those for distributed computation. Reading ! this section is important only for running the same program on ! multiple processors. Many of these concepts were introduced in ! and . ! illustrates the &pooma; distributed computation model. In this ! section, we concentrate on the concepts necessary to declare a ! distributed container. ! ! As we noted in , a &pooma; ! programmer must specify how each container's domain should be ! distributed among the available processors and memory spaces. ! Using this information, the Toolkit automatically distributes the ! data among the available processors and handles any required ! communication among them. The three concepts necessary for ! declaring distributed containers are a partition, a guard layer, ! and a context mapper tag. ! ! A partition ! specified how to divide a container's domain into distributed ! pieces. For example, the partition illustrated in ! would divide a two-dimensional domain into three equally-sized ! pieces along the x-dimension and two equally-sized pieces along ! the y-dimension. Partitions can be independent of the size of ! container's domain. The example partition will work on any ! domain as long as the size of its x-dimension is a multiple of ! three. A domain is separated into disjoint patches. ! ! A guard ! layer is extra domain ! surrounding each patch. This region has read-only values. An ! external guard ! layer specifies values surrounding the ! domain. Its presence eases computation along the domain's edges ! by permitting the same computations as for more internal ! computations. An internal guard ! layer duplicates values from adjacent ! patches so communication with adjacent patches need not occur ! during a patch's computation. The use of guard layers is an ! optimization; using external guard layers eases programming and ! using internal guard layers reduces communication among ! processors. Their use is not required. ! ! A context ! mapper indicates how a container's ! patches are mapped to processors and shared memory. For example, ! the &distributedtag; indicates that the patches should be ! distributed among the processors so each patch occurs once in the ! entire computation. The &replicatedtag; indicates that the ! patches should be replicated among the processors so each ! processing unit has its own copy of all the patches. While it ! could be wasteful to have different processors perform the same ! computation, replicating a container can reduce possibly more ! expensive communication costs. !

! ! !

! ????Computation Syntax???? ! ! UNFINISHED !

! ! !

! Computation Environment ! ! A &pooma; program can execute on a wide variety of ! computers. The default sequential computing ! environment consists of one processor and ! associated memory, as found on a personal computer. In contrast, ! a distributed computing ! environment may have multiple processors ! and multiple distributed or shared memories. For example, some ! desktop computers have dual processors and shared memory. A ! large supercomputer may have thousands of processors, perhaps ! with groups of eight sharing the same memory. ! ! Using distributed computation requires three things: the ! programmer must declare how container domains will be ! distributed, &pooma; must be configured to use a communications ! library, and the &pooma; executable must be run using the ! library. All of these were illustrated in and . ! illustrates the &pooma; distributed computation model. ! described how to declare containers with distributed domains. ! Detailed instructions how to configure &pooma; for distributed ! computation appear in . More ! detailed instructions how to run distributed &pooma; executables ! appear in . Here we present ! three concepts for distributed computation: context, layout, and ! a communication library. ! ! A context ! is a collection of shared memory and processors that can execute ! a program of a portion of a program. It can have one or more ! processors, but all these processors must access the same shared ! memory. Usually the computer and its operating system, not the ! programmer, determine the available contexts. ! ! HERE -

- -

- Extraneous Material - - Describe the software application layers similar to - papers/SCPaper-95.html and "Short Tour of - &pooma;" in papers/SiamOO98_paper.ps. - Section 2.2, "Why a Framework?," of - papers/pooma.ps argues why a layered approach - eases use. Section 3.1, "Framework Layer Description," - describes the five layers. ! FINISH: Write short glossary entries for each of these. ! ! FINISH: Look through the source code to ensure all main ! concepts are listed. ! ! Here are (preliminary) &pooma; equations: ! ! &pooma; <quote>Equations</quote> ! ! ! ! field = data + materials + centering + layout + mesh ! ! ! map from space to values ! ! ! array = data + layout ! ! ! map from indices to values ! ! ! mesh = layout + origin + spacings ! ! ! distribute domain through physical space ! ! ! layout = domain + partition + context_mapper_tag (distributed/replicated) ! ! ! distribute domain's blocks among processors/contexts ! ! ! partition = blocks + guard layers ! ! ! split domain into blocks ! ! ! domain = newDomain ! ! ! space of permissible indices ! ! ! !

- FINISH: Following is a first try at describing the &pooma; - abstraction layers. See also paper illustration. ! ! &pooma; Abstraction Layers ! ! ! ! application program ! ! ! &array; &field; (should have ! FieldEngine under it) ! ! ! &engine; ! ! ! evaluators ! ! ! !

- FINISH: How does parallel execution fit in? ! FINISH: Should we also name and describe each layer? !

! Data-Parallel Statements ! Can we use "An Overview of &pete;" from papers/PETE_DDJ/ddj_article.html or is this too low-level? ! Section 3.2.1 of papers/pooma.ps gives a simple example of data-parallel expression. It also has a paragraph introducing data-parallel operations and selecting subsets of domains. Section 3.4 describes the Chained --- 347,424 ---- ! &tutorial-chapter; + &concepts-chapter; ! ! Writing Sequential Programs ! QUESTIONS: How do I arrange this section? What material do I ! include? What other books or models can I follow? ! &pooma; can reorder computations to permit more efficient ! computation. When running a sequential program, reordering may ! permit omission of unneeded computations. For example, if only ! values from a particular field are printed, only computations ! involving the field and containers dependent on it need to occur. ! When running a distributed program, reordering may permit ! computation and communication among processors to overlap. &pooma; ! automatically tracks dependences between data-parallel expressions, ! ensuring correct ordering. It does not track statements accessing ! particular &array; and &field; values so the programmer must ! precede these statements with calls to ! Pooma::blockAndEvaluate(). Each call forces ! the executable to wait until all computation has completed. Thus, ! the desired values are known to be available. In practice, some ! calls to Pooma::blockAndEvaluate may not be ! necessary, but omitting them requires knowledge of &pooma;'s ! dependence computations, so the &author; recommends calling ! Pooma::blockAndEvaluate before each access to ! a particular value in an &array; or &field;. Omitting a necessary ! call may lead to a race condition. See for ! instructions how to diagnose and eliminate these race conditions. ! UNFINISHED !

! &benchmark; Programs + Define a &benchmark; program vs. an example or an + executable. Provide a short overview of how to run these + programs. Provide an overview of how to write these programs. + See src/Utilities/Benchmark.h. +

! Using <type>Inform</type>s for Output ! UNFINISHED !

! Miscellaneous + Section 3, "Domains and Views," of + papers/iscope98.pdf describes five types of + domains. !

! Containers !

&array;

! Section 4 "Future Improvements in &pooma; II" of papers/SiamOO98_paper.ps ! An &array; can be thought of as a map from one &domain; to another.… &array;s depend only on the interface of &domain;s. Thus, a subset of view of an &array; can be --- 444,461 ---- &pete; material.

! Containers !

&array;

! Section 4 "Future Improvements in &pooma; II" of papers/SiamOO98_paper.ps ! An &array; can be thought of as a map from one &domain; to another.… &array;s depend only on the interface of &domain;s. Thus, a subset of view of an &array; can be *************** HERE *** 1058,1064 **** code. An Array maps a fairly arbitrary input domain to an arbitrary range of outputs. When used by itself, an &array; object A refers to all of the values in its ! domain. Element-wise mathematical operations or functions can be applied to an array using straightforward notation, like A + B or sin(A). Expressions involving Array objects are themselves Arrays. The operation A(d), where d is a domain object that --- 475,481 ---- code. An Array maps a fairly arbitrary input domain to an arbitrary range of outputs. When used by itself, an &array; object A refers to all of the values in its ! domain. Element-wise mathematical operations or functions can be applied to an array using straightforward notation, like A + B or sin(A). Expressions involving Array objects are themselves Arrays. The operation A(d), where d is a domain object that *************** HERE *** 1084,1090 **** indexed. Fortran arrays are dense and the elements are arranged ! according to column-major conventions. Therefore, X(i1,i2) refers to element number i1-1+(i2-1)*numberRowsInA. However, as Fig. 1 shows, Fortran-style "Brick" storage is not the only storage format of interest to scientific programmers. For --- 501,507 ---- indexed. Fortran arrays are dense and the elements are arranged ! according to column-major conventions. Therefore, X(i1,i2) refers to element number i1-1+(i2-1)*numberRowsInA. However, as Fig. 1 shows, Fortran-style "Brick" storage is not the only storage format of interest to scientific programmers. For *************** HERE *** 1103,1116 **** The &pooma; &array; Class Template ! Next we describe &pooma;'s model of the Array concept, the Array class template. The three most important requirements from the point of view of overall design are: (1) arbitrary domain, (2) arbitrary range, and (3) polymorphic indexing. These express themselves in the template parameters for the &pooma; Array class. The template ! template <int Dim, class T = double, class EngineTag = Brick> class Array; is a specification for creating a set of classes all named --- 520,533 ---- The &pooma; &array; Class Template ! Next we describe &pooma;'s model of the Array concept, the Array class template. The three most important requirements from the point of view of overall design are: (1) arbitrary domain, (2) arbitrary range, and (3) polymorphic indexing. These express themselves in the template parameters for the &pooma; Array class. The template ! template <int Dim, class T = double, class EngineTag = Brick> class Array; is a specification for creating a set of classes all named *************** HERE *** 1150,1186 ****
&field; ! QUESTION: Do we include boundary conditions here? FINISH: Do we have an example that shows something not possible with &array;? ! Describe and illustrate multi-material and multivalue? - - ADD: description of meshes and guard layers. !
! !

&field; ! QUESTION: Do we include boundary conditions here? FINISH: Do we have an example that shows something not possible with &array;? ! Describe and illustrate multi-material and multivalue? ! ADD: description of meshes and guard layers. !

! Engines ! (unformatted) From papers/GenericProgramming_CSE/dubois.html: ! The Engine Concept ! To implement polymorphic indexing, the Array class defers data storage and data lookup to an engine object. The requirements that the Array template places on its engine provide the definition for the Engine concept. We'll describe these by examining a simplified version of the Array template, shown in Fig. 2. ! First, the Array class determines and exports (makes Engine_t part of Array's public interface) the type of the engine class that it will use: *************** HERE *** 1215,1221 **** concept: it must provide a version of operator() that takes Dim values of type Index_t. ! Simply passing the indices on to the engine object may seem odd. After all, engine(i,j) looks like we're just indexing another array. There are several advantages to this extra level of indirection. The Array class is as faithful a model of the Array --- 630,636 ---- concept: it must provide a version of operator() that takes Dim values of type Index_t. ! Simply passing the indices on to the engine object may seem odd. After all, engine(i,j) looks like we're just indexing another array. There are several advantages to this extra level of indirection. The Array class is as faithful a model of the Array *************** HERE *** 1386,1497 ****

- -

- Relations - - UNFINISHED -

- - -

- Stencils - - Section 3.5.4, "Stencil Objects," of - papers/pooma.ps provides a few uses of - stencils. - - Section 5, "Performance," of - papers/iscope98.pdf motivates and explains - stencils. -

- - -

- Contexts - -

- background.html - In order to be able to cope with the variations in machine - architecture noted above, &pooma;'s distributed execution model - is defined in terms of one or more contexts, each of which may - host one or more threads. A context is a distinct region of - memory in some computer. The threads associated with the context - can access data in that memory region and can run on the - processors associated with that context. Threads running in - different contexts cannot access memory in other contexts. - - A single context may include several physical processors, - or just one. Conversely, different contexts do not have to be on - separate computers—for example, a 32-node SMP computer could - have up to 32 separate contexts. This release of &pooma; only - supports a single context for each application, but can use - multiple threads in the context on supported platforms. Support - for multiple contexts will be added in an upcoming - release. -

- - - - - - Writing Sequential Programs - - &pooma; can reorder computations to permit more efficient - computation. When running a sequential program, reordering may - permit omission of unneeded computations. For example, if only - values from a particular field are printed, only computations - involving the field and containers dependent on it need to occur. - When running a distributed program, reordering may permit - computation and communication among processors to overlap. &pooma; - automatically tracks dependences between data-parallel expressions, - ensuring correct ordering. It does not track statements accessing - particular &array; and &field; values so the programmer must - precede these statements with calls to - Pooma::blockAndEvaluate(). Each call forces - the executable to wait until all computation has completed. Thus, - the desired values are known to be available. In practice, some - calls to Pooma::blockAndEvaluate may not be - necessary, but omitting them requires knowledge of &pooma;'s - dependence computations, so the &author; recommends calling - Pooma::blockAndEvaluate before each access to - a particular value in an &array; or &field;. Omitting a necessary - call may lead to a race condition. See for - instructions how to diagnose and eliminate these race conditions. - - Section 3, "Domains and Views," of - papers/iscope98.pdf describes five types of - domains. - - UNFINISHED - -

- - -

- Using <type>Inform</type>s for Output - - UNFINISHED -

- - -

- Miscellaneous - - Section 3, "Domains and Views," of - papers/iscope98.pdf describes five types of - domains.

--- 801,806 ---- *************** HERE *** 3579,3588 **** To use multiple processors with &pooma; requires installing the &cheetah; messaging library and an underlying messaging library such as the Message Passing Interface (&mpi;) Communications ! Library or the &mm; Shared Memory Library. In this section, we ! first describe how to install &mm;. Read the section only if using ! &mm;, not &mpi;. Then we describe how to install &cheetah; and ! configure &pooma; to use it.

Obtaining and Installing the &mm; Shared Memory Library --- 2888,2897 ---- To use multiple processors with &pooma; requires installing the &cheetah; messaging library and an underlying messaging library such as the Message Passing Interface (&mpi;) Communications ! Library or the &mm; Shared Memory Library. In the following ! section, we first describe how to install &mm;. Read it only if ! using &mm;, not &mpi;. Then we describe how to install &cheetah; ! and configure &pooma; to use it.

Obtaining and Installing the &mm; Shared Memory Library *************** HERE *** 3834,3839 **** --- 3143,3154 ---- Miscellaneous + + Section 5, "Performance," of + papers/iscope98.pdf motivates and explains + stencils. + + If there is time, present another example program, e.g., a Jacobi solver. Index: figures/concepts.mp =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/figures/concepts.mp,v retrieving revision 1.1 diff -c -p -r1.1 concepts.mp *** figures/concepts.mp 2001/12/11 20:36:13 1.1 --- figures/concepts.mp 2001/12/13 04:02:07 *************** beginfig(101) *** 144,150 **** boxit.l6(btex \strut \type{Field}: etex); boxit.l7(btex \strut $\mbox{index} \mapsto \mbox{value}$ etex); boxit.l9(btex \strut \type{Field}: etex); ! boxit.l10(btex \strut $\mbox{indices} \mapsto \mbox{geometric value}$ etex); fixsize(l1,l2,l3,l4,l6,l7,l9,l10); ypart(l1.c - l2.c) = 0; --- 144,150 ---- boxit.l6(btex \strut \type{Field}: etex); boxit.l7(btex \strut $\mbox{index} \mapsto \mbox{value}$ etex); boxit.l9(btex \strut \type{Field}: etex); ! boxit.l10(btex \strut $\mbox{indices} \mapsto \mbox{spatial value}$ etex); fixsize(l1,l2,l3,l4,l6,l7,l9,l10); ypart(l1.c - l2.c) = 0; *************** beginfig(101) *** 163,169 **** % Create and layout the mesh boxes. boxit.ia[2](btex indices etex); boxit.ea[2](btex mesh etex); ! boxit.va[2](btex geometric value etex); fixsize(ia[2],ea[2],va[2]); ia[1].w - ia[2].w = 0.6(ia[0].w - ia[1].w); ypart(va[2].w - ea[2].e) = ypart(ea[2].w - ia[2].e) = 0; --- 163,169 ---- % Create and layout the mesh boxes. boxit.ia[2](btex indices etex); boxit.ea[2](btex mesh etex); ! boxit.va[2](btex spatial value etex); fixsize(ia[2],ea[2],va[2]); ia[1].w - ia[2].w = 0.6(ia[0].w - ia[1].w); ypart(va[2].w - ea[2].e) = ypart(ea[2].w - ia[2].e) = 0; From oldham at codesourcery.com Fri Dec 14 05:19:26 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Thu, 13 Dec 2001 21:19:26 -0800 Subject: Manual Patch: Some Concepts Changes Message-ID: <20011213211926.A29012@codesourcery.com> 2001-Dec-13 Jeffrey D. Oldham These changes mainly represent wordsmithing of the concepts chapter and some preliminary work on the "Writing Sequential Programs" chapter. * concepts.xml: Wordsmithing and a little rearrangement. * glossary.xml (interval): Improve wording. * makefile (manual.dvi): Improve dependence information. * manual.xml: Add planning material for the "Writing Sequential Programs" chapter. * tutorial.xml: Fix an article. * figures/concepts.mp: Shrink the figure's horizontal extent. Applied to mainline Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: concepts.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v retrieving revision 1.1 diff -c -p -r1.1 concepts.xml *** concepts.xml 2001/12/13 04:04:05 1.1 --- concepts.xml 2001/12/14 04:12:54 *************** *** 6,20 **** In the previous chapter, we presented several different implementations of the &doof2d; simulation program. The ! implementations illustrate the various containers, computation ! modes, and computation environments that &pooma; supports. In this ! chapter, we describe the concepts associated with each of these ! three categories. Specific details needed by programmers are ! deferred to later chapters. ! &pooma; Implementation Concepts --- 6,55 ---- In the previous chapter, we presented several different implementations of the &doof2d; simulation program. The ! implementations illustrate the various containers, computation modes, ! and computation environments that &pooma; supports. In this chapter, ! we describe the concepts associated with each of these three ! categories. Specific details needed for their use are deferred to ! later chapters. + The most important &pooma; concepts can be grouped into three + separate categories: + + + container + + data structure holding one or more values and addressed + by indices + + + + computation modes + + styles of expressing computations and accesses to container + values + + + + computation environment + + description of resources for computing, e.g., single + processor or multi-processor. + + + + See . Many &pooma; programs + select one possibility from each column. For example, used &array; + containers and stencils for sequential computation, while used &field; + containers and data-parallel statements with distributed + computation. A program may use multiple containers and various + computation modes, but the computation environment is either + distributed or not. +

! &pooma; Concepts *************** *** 57,134 ****

- - The most important &pooma; concepts can be grouped into three - separate categories: - - - container - - data structure holding one or more values and addressed - by indices - - - - computation modes - - styles of expressing computations - - - - computation environment - - description of resources for computing, e.g., single - processor or multi-processor - - - - See . Many &pooma; programs - select one possibility from each column. For example, used a &array; - container and stencils for sequential computation, while used a &field; - container and data-parallel statements with distributed - computation. A program may use multiple containers and various - computation modes, but the computation environment either has - distributed processors or not. ! In the rest of this chapter, we explore these three ! categories. First, we describe &pooma; containers, illustrating ! the purposes of each, and explaining the concepts needed to declare ! them. Then, we describe the different computation modes and ! finally distributed computation concepts.

&pooma; Containers ! Most &pooma; programs use containers ! to store groups of values. &pooma; containers are objects that ! store other objects. They control allocation and deallocation of ! and access to these objects. They are a generalization of &c; ! arrays, but &pooma; containers are first-class objects so they can ! be used directly in expressions. They are similar to &cc; ! containers such as vector, list, and ! stack. See for a summary of the containers. ! This chapter describes many concepts, not all of which are ! needed to begin programming with the &pooma; Toolkit. Below we ! introduce the different categories of concepts. After that, we ! introduce the different &pooma;'s containers and describe how to ! choose the appropriate one for a particular task. indicates which concepts must be understood when declaring a ! particular container. All of these concepts are described in ! and . ! Use this figure to decide which concepts in the former are ! relevant. Reading the latter section is necessary only if ! computing using multiple processors. The programs in the previous ! chapter illustrate many of these concepts. --- 92,133 ----

! In the rest of this chapter, we explore these three categories. ! First, we describe &pooma; containers, illustrating the purposes of ! each, and explaining the concepts needed to declare them. Then, we ! describe the different computation modes and distributed computation ! concepts.

&pooma; Containers ! Most &pooma; programs use containers to ! store groups of values. &pooma; containers are objects that store ! other objects such as numbers or vectors. They control allocation ! and deallocation of and access to these stored objects. They are a ! generalization of &c; arrays, but &pooma; containers are first-class ! objects so they can be used directly in expressions. They are ! similar to &cc; containers such as vector, ! list, and stack. See for a summary of the containers. ! This section describes many concepts, but one need not ! understand them all to begin programming with the &pooma; Toolkit. ! First, we introduce the different &pooma;'s containers and describe ! how to choose an appropriate one for a particular task. indicates which concepts must be understood when declaring a ! particular container. All of these concepts are described in and . Use ! this figure to decide which concepts in the former are relevant. ! Reading the latter section is necessary only if computing using ! multiple processors. The programs in the previous chapter ! illustrate many of these concepts. *************** *** 175,191 **** A &pooma; array;, generalizing a &c; ! array, maps indices to values. Given a index or position in an &array;'s domain, it returns the associated value, either by returning a stored value or by computing it. The use of indices, which are usually ordered tuples, permits constant-time access ! although computing a particular value may require significant ! time. In addition to the functionality provided by &c; arrays, ! the &array; class automatically handles memory allocation and ! deallocation, supports a wider variety of assignments, and can be ! used in expressions. For example, the addition of two arrays can ! be assigned to an array and the product of a scalar element and an array is permissible. --- 174,190 ---- A &pooma; &array; generalizes a &c; array ! and maps indices to values. Given an index or position in an &array;'s domain, it returns the associated value, either by returning a stored value or by computing it. The use of indices, which are usually ordered tuples, permits constant-time access ! although computing a particular value may require significant time. ! In addition to the functionality provided by &c; arrays, the &array; ! class automatically handles memory allocation and deallocation, ! supports a wider variety of assignments, and can be used in ! expressions. For example, the addition of two arrays can be ! assigned to an array and the product of a scalar element and an array is permissible. *************** *** 200,215 **** A &pooma; &field; is an &array; with spatial extent. Each domain consists of cells ! in one-, two-, or three-dimensional space. Although indexed ! similarly to &array;s, each cell may contain multiple values and ! multiple materials. A &field;'s mesh stores its spatial ! characteristics and can map yield, e.g., a point contained in a ! cell, the distance between two cells, and a cell's normals. A &field; should be used whenever geometric or spatial computations ! are needed, multiple values per index are desired, or a ! computation involves more than one material. --- 199,214 ---- A &pooma; &field; is an &array; with spatial extent. Each domain consists of cells in ! one-, two-, or three-dimensional space. Although indexed similarly ! to &array;s, each cell may contain multiple values and multiple ! materials. A &field;'s mesh stores its spatial ! characteristics and can map yield, e.g., the cell at a particular ! point, the distance between two cells, or a cell's normals. A &field; should be used whenever geometric or spatial computations ! are needed, multiple values per index are desired, or a computation ! involves more than one material. *************** *** 222,229 **** A &matrix; implements a two-dimensional mathematical matrix. Since it is a ! first-class type, it can be used in expressions such as ! multiplying matrices and assignments to matrices. --- 221,228 ---- A &matrix; implements a two-dimensional mathematical matrix. Since it is a ! first-class type, it can be used in expressions such as assignments ! to matrices and multiplying matrices. *************** *** 234,259 **** multiplying a &matrix; and a &vector;.The data of an &array;, &dynamicarray;, or &field; can be ! viewed using more than one container by taking a view. A ! view of an existing container &container; is a container whose domain ! is a subset of &container;. The subset can equal the ! original domain. A view acts like a reference in that changing ! any of the view's values also changes the original container's and ! vice versa. While users sometimes explicitly create views, they ! are perhaps more frequently created as temporaries in expressions. ! For example, if A is an &array; and ! I is a domain, A(I) - ! A(I-1) forms the difference between adjacent ! values.

Choosing a Container The two most commonly used &pooma; containers are &array;s ! and &field;s. contains a decision tree describing how to choose an appropriate container. --- 233,257 ---- multiplying a &matrix; and a &vector;. The data of an &array;, &dynamicarray;, or &field; can be ! viewed using more than one container by taking a view. A view of an existing container &container; is a container whose domain ! is a subset of &container;. The subset can equal the original ! domain. A view acts like a reference in that changing any of the ! view's values also changes the original container's and vice versa. ! While users sometimes explicitly create views, they are perhaps more ! frequently created as temporaries in expressions. For example, if ! A is an &array; and I is a ! domain, A(I) - A(I-1) uses two views to form ! the difference between adjacent values.

Choosing a Container The two most commonly used &pooma; containers are &array;s ! and &field;s, while &vector;, &matrix;, or &tensor; frequently ! represent mathematical objects. contains a decision tree describing how to choose an appropriate container. *************** *** 298,303 **** --- 296,307 ----

Declaring Sequential Containers + In the previous sections, we introduced the &pooma; + containers and described how to choose one appropriate for a + given task. In this section, we describe the concepts involved + in declaring them. Concepts specific to distributed computation + are described in the next section. +

Concepts For Declaring Containers *************** *** 310,347 **** - In the previous sections, we introduced the &pooma; - containers and described how to choose one appropriate for a - given task. In this section, we describe the concepts involved - in declaring them. Concepts specific to distributed computation - are described in the next section. - illustrates the containers and the concepts involved in their declarations. The containers are listed in the top row. Lines connect these containers to the components necessary for their declarations. For example, an &array; declaration requires an ! engine and a layout. These, in turn, depend on other &pooma; concepts. Declarations necessary only for distributed, or ! multiprocessor, computation are surrounded by dashed lines. You ! can use these dependences to indicate the concepts needed for a ! particular container. An engine stores and, if necessary, computes a container's values. A container has one or more engines. The separation of a container ! and its storage permits optimizing a program's space requirements. For example, a container returning the same value for all indices can use a constant engine, which need only store one value for the entire domain. A &compressiblebrick; engine reduces its space requirements to a constant whenever all its values are the same. The separation also permits taking views of containers without ! copying storage. !

&array; and &field; Mathematical and Computational Concepts --- 314,345 ---- illustrates the containers and the concepts involved in their declarations. The containers are listed in the top row. Lines connect these containers to the components necessary for their declarations. For example, an &array; declaration requires an ! engine and a layout. These, in turn, can depend on other &pooma; concepts. Declarations necessary only for distributed, or ! multiprocessor, computation are surrounded by dashed lines. These ! dependences to indicate the concepts needed for a particular ! container. An engine stores and, if necessary, computes a container's values. A container has one or more engines. The separation of a container ! from its storage permits optimizing a program's space and time requirements. For example, a container returning the same value for all indices can use a constant engine, which need only store one value for the entire domain. A &compressiblebrick; engine reduces its space requirements to a constant whenever all its values are the same. The separation also permits taking views of containers without copying ! storage. !

&array; and &field; Mathematical and Computational Concepts *************** *** 356,387 **** A layout maps domain indices to the ! processors and computer memory used by a container's engines. ! See . ! A computer computes a container's values using a processor and ! memory. The layout specifies the processor(s) and memory to use ! for each particular index. A container's layout for a ! uniprocessor implementation consists of its domain, the ! processor, and its memory. For a multi-processor implementation, ! the layout maps portions of the domain to (possibly different) ! processors and memory. A &field;'s mesh maps domain indices to ! spatial values in &space; such as distance between cells, edge lengths, and normals to cells. In other words, it provides a &field;'s spatial extent. See also . ! Different mesh types may support different spatial ! values. A mesh's corner position specifies the point in &space; corresponding to ! the lower, left corner of its domain. Combining this, the domain, and the cell size fully specifies the mesh's map from indices to &space;. --- 354,408 ---- A layout maps domain indices to the ! processors and computer memory used by a container's engines. See ! . ! A program computes a container's values using a processor and ! memory. The layout specifies the processors and memory to use for ! each particular index. A container's layout for a uniprocessor ! implementation consists of its domain, the processor, and its ! memory. For a multi-processor implementation, the layout maps ! portions of the domain to (possibly different) processors and ! memory. + A domain + is a set of points on which a container can define values. There + are several different types of domains. An interval + consists of all integral points between two endpoints. It is + frequently represented using mathematical interval notation [a,b] + even though it contains only the integral points, e.g., a, a+1, + a+2, …, b. The concept is generalized to multiple + dimensions by forming tensor product of intervals, i.e., all the + integral tuples in an &n;-dimensional space. For example, the + two-dimensional containers in the previous chapter are defined on a + two-dimensional domain with the both dimensions' spanning the + interval [0,n). A domain need not contain all integral points + between its endpoints. A stride + is a subset of an interval consisting of regularly-spaced points. + A range + is a subset of an interval formed by the tensor product of strides. + A region + represents a continuous &n;-dimensional domain. + A &field;'s mesh maps domain indices to ! spatial values in &space; such as distances between cells, edge lengths, and normals to cells. In other words, it provides a &field;'s spatial extent. See also . ! Different mesh types may support different spatial values. A mesh's corner position specifies the point in &space; corresponding to ! the cell in the lower, left corner of its domain. Combining this, the domain, and the cell size fully specifies the mesh's map from indices to &space;. *************** *** 393,434 **** width, height, and depth, in &space;. Combining this, the domain, and the corner position fully specifies the mesh's map from indices to &space;. - - A domain - is a set of points on which a container can define values. An - interval - consists of all integral points between two values. It is - frequently represented using mathematical interval notation [a,b] - even though it contains only the integral points, e.g., a, a+1, - a+2, …, b. The concept is generalized to multiple - dimensions by forming tensor product of intervals, i.e., all the - integral tuples in an &n;-dimensional space. For example, the - two-dimensional containers in the previous chapter are defined on - a two-dimensional domain with the both dimensions' spanning the - interval [0,n). A stride - is a subset of an interval consisting of regularly-spaced - points. A range - is a subset of an interval formed by the tensor product of strides. - A region - represents a continuous &n;-dimensional domain.

Declaring Distributed Containers ! In the previous section, we introduced the concepts ! important when declaring containers for use on uniprocessor ! computers. When using multi-processor computers, we augment ! these concepts with those for distributed computation. Reading ! this section is important only for running the same program on ! multiple processors. Many of these concepts were introduced in ! and . illustrates the &pooma; distributed computation model. In this --- 414,432 ---- width, height, and depth, in &space;. Combining this, the domain, and the corner position fully specifies the mesh's map from indices to &space;.

Declaring Distributed Containers ! In the previous section, we introduced the concepts important ! when declaring containers for use on uniprocessor computers. When ! using multi-processor computers, we augment these concepts with ! those for distributed computation. Reading this section is ! important only for running a program on multiple processors. Many ! of these concepts were introduced in and . illustrates the &pooma; distributed computation model. In this *************** *** 447,480 **** A partition ! specified how to divide a container's domain into distributed pieces. For example, the partition illustrated in would divide a two-dimensional domain into three equally-sized ! pieces along the x-dimension and two equally-sized pieces along ! the y-dimension. Partitions can be independent of the size of ! container's domain. The example partition will work on any ! domain as long as the size of its x-dimension is a multiple of ! three. A domain is separated into disjoint patches. ! A guard ! layer is extra domain ! surrounding each patch. This region has read-only values. An ! external guard layer specifies values surrounding the ! domain. Its presence eases computation along the domain's edges ! by permitting the same computations as for more internal computations. An internal guard layer duplicates values from adjacent patches so communication with adjacent patches need not occur during a patch's computation. The use of guard layers is an optimization; using external guard layers eases programming and ! using internal guard layers reduces communication among ! processors. Their use is not required. A context --- 445,476 ---- A partition ! specifies how to divide a container's domain into distributed pieces. For example, the partition illustrated in would divide a two-dimensional domain into three equally-sized ! pieces along the x-dimension and two equally-sized pieces along the ! y-dimension. Partitions can be independent of the size of ! container's domain. The example partition will work on any domain ! as long as the size of its x-dimension is a multiple of three. A ! domain is separated into disjoint patches. ! A guard ! layer surrounds each patch with read-only ! values. An external guard layer specifies values surrounding the ! entire domain. Its presence eases computation along the domain's ! edges by permitting the same computations as for more internal computations. An internal guard layer duplicates values from adjacent patches so communication with adjacent patches need not occur during a patch's computation. The use of guard layers is an optimization; using external guard layers eases programming and ! using internal guard layers reduces communication among processors. ! Their use is not required. A context *************** *** 496,512 **** Computation Modes &pooma; computations can be expressed using a variety of ! modes. Most of &pooma; computations involve &array; or &field; containers, but how their values are accessed and the associated ! algorithms using them varies. Element-wise computation involves ! explicitly accessing values. A data-parallel computation uses ! expressions to represent larger subsets of a container's values. ! Stencil-based computations write a computation as repeatedly ! applying a local computation to each element of an array. A ! relation among containers establishes a dependency between them so ! the values of one container are updated whenever any other's ! values change. A program may use any or all of these styles, ! described below. Element-wise --- 492,508 ---- Computation Modes &pooma; computations can be expressed using a variety of ! modes. Many &pooma; computations involve &array; or &field; containers, but how their values are accessed and the associated ! algorithms using them varies. For example, element-wise computation ! involves explicitly accessing a container's values. A data-parallel ! computation uses expressions to represent larger subsets of a ! container's values. Stencil-based computations express a ! computation as repeatedly applying a local computation to each ! element of an array. A relation among containers establishes a ! dependency among them so the values of one container are updated ! whenever any other's values change. A program may use any or all of ! these styles, which are described below. Element-wise *************** *** 515,521 **** container &container; might be referenced as &container(3,4) or &container(i,j+1). This is the usual ! notation for languages without objects such as &c;. Data-parallel --- 511,517 ---- container &container; might be referenced as &container(3,4) or &container(i,j+1). This is the usual ! notation for non-object-oriented languages such as &c;. Data-parallel *************** *** 523,529 **** values. For example, in , a(I,J) represents the subset of &array; ! a's values with coordinates in the domain specified by the one-dimensional &interval;s I and J. Using data-parallel expressions frequently eliminates the need for writing explicit loops in --- 519,525 ---- values. For example, in , a(I,J) represents the subset of &array; ! a's values having coordinates in the domain specified by the one-dimensional &interval;s I and J. Using data-parallel expressions frequently eliminates the need for writing explicit loops in *************** *** 532,555 **** A stencil computes a container's value using neighboring data values. Each ! stencil consists of an indication of which neighboring values to read and a function using those values. For example, an averaging ! stencil may access all neighbors, averaging them. In &pooma;, we ! represent a stencil using a function object having functions ! indicating which neighboring values are used. Stencil ! computations are frequently used in solving partial differential ! equations, image processing, and geometric modeling. A relation ! is a dependence among containers so the dependent container's ! values are updated when its values are needed and any of its ! related containers' values have changed. A relation is specified ! by a dependent container, independent containers, and a function computing the dependent container's values using the independent containers' values. To avoid excess computation, the dependent ! container's values are computed only when needed, e.g., for ! printing or for computing the values of another dependent container. Thus, this computation is sometimes called lazy evaluation.

--- 528,552 ---- A stencil computes a container's value using neighboring data values. Each ! stencil consists of a specification of which neighboring values to read and a function using those values. For example, an averaging ! stencil may access all its adjacent neighbors, averaging them. In ! &pooma;, we represent a stencil using a function object with ! additional functions indicating which neighboring values are used. ! Stencil computations are frequently used in solving partial ! differential equations, image processing, and geometric ! modeling. A relation ! is a dependence among containers such the dependent container's ! values are updated when its values are needed and any of its related ! containers' values have changed. A relation is specified by a ! dependent container, independent containers, and a function computing the dependent container's values using the independent containers' values. To avoid excess computation, the dependent ! container's values are computed only when needed, e.g., for printing ! the container or for computing the values of another dependent container. Thus, this computation is sometimes called lazy evaluation.

*************** *** 558,589 ****

Computation Environment ! A &pooma; program can execute on a wide variety of ! computers. The default sequential computing environment consists of one processor and ! associated memory, as found on a personal computer. In contrast, ! a distributed computing environment may have multiple processors and multiple distributed or shared memories. For example, some desktop computers have dual processors and shared memory, while a ! large supercomputer may have thousands of processors, perhaps ! with groups of eight sharing the same memory. Using distributed computation requires three things: ! the programmer must declare how container domains will ! be distributed, &pooma; must be configured to use a communications ! library, and ! the &pooma; executable must be run using the ! library. All of these were illustrated in Computation Environment ! A &pooma; program can execute on a wide variety of computers. ! The default sequential computing environment consists of one processor and ! its associated memory, as found on a personal computer. In ! contrast, a distributed computing environment may have multiple processors and multiple distributed or shared memories. For example, some desktop computers have dual processors and shared memory, while a ! large supercomputer may have thousands of processors, perhaps with ! groups of eight sharing the same memory. Using distributed computation requires three things: ! The program must declare how container domains will ! be distributed. &pooma; must be configured to use a communications ! library. ! The &pooma; executable must be run using the library. All of these were illustrated in illustrates the &pooma; distributed computation model. ! described how to declare containers with distributed domains. ! Detailed instructions how to configure &pooma; for distributed ! computation appear in . ! Detailed instructions how to run distributed &pooma; executables ! appear in . Here we present ! three concepts for distributed computation: patches, context, and ! a communication library. A partition divides a container's domain into disjoint illustrates the &pooma; distributed computation model. ! described how to declare containers with distributed domains. Here ! we present three concepts for distributed computation: patches, ! context, and a communication library. A partition divides a container's domain into disjoint A context ! is a collection of shared memory and processors that can execute ! a program of a portion of a program. It can have one or more processors, but all these processors must access the same shared memory. Usually the computer and its operating system, not the programmer, determine the available contexts. --- 603,610 ---- A context ! is a collection of shared memory and processors that can execute a ! program or a portion of a program. It can have one or more processors, but all these processors must access the same shared memory. Usually the computer and its operating system, not the programmer, determine the available contexts. *************** *** 625,630 **** user. &pooma; works with the Message Passing Interface (&mpi;) Communications Library (FIXME: xref linkend="mpi99", ) and the &mm; ! Shared Memory Library.

--- 617,623 ---- user. &pooma; works with the Message Passing Interface (&mpi;) Communications Library (FIXME: xref linkend="mpi99", ) and the &mm; ! Shared Memory Library. See for details.

Index: glossary.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v retrieving revision 1.2 diff -c -p -r1.2 glossary.xml *** glossary.xml 2001/12/13 04:04:05 1.2 --- glossary.xml 2001/12/14 04:12:54 *************** *** 341,347 **** interval ! a set of integral points between two values. This domain is frequently represented using mathematical interval notation [a,b] even though it contains only the integral points, e.g., a, a+1, a+2, …, b. It is also generalized to the tensor --- 341,347 ---- interval ! a set of integral points between two endpoints. This domain is frequently represented using mathematical interval notation [a,b] even though it contains only the integral points, e.g., a, a+1, a+2, …, b. It is also generalized to the tensor Index: makefile =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/makefile,v retrieving revision 1.2 diff -c -p -r1.2 makefile *** makefile 2001/12/11 20:36:13 1.2 --- makefile 2001/12/14 04:12:54 *************** CXXFLAGS= -g -Wall -pedantic -W -Wstrict *** 25,30 **** --- 25,32 ---- all: manual.ps + manual.dvi: manual.xml concepts.xml tutorial.xml + %.all: %.ps %.pdf %.html chmod 644 $*.ps $*.pdf mv $*.ps $*.pdf $* Index: manual.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v retrieving revision 1.2 diff -c -p -r1.2 manual.xml *** manual.xml 2001/12/13 04:04:05 1.2 --- manual.xml 2001/12/14 04:12:57 *************** *** 356,365 **** Writing Sequential Programs ! QUESTIONS: How do I arrange this section? What material do I ! include? What other books or models can I follow? &pooma; can reorder computations to permit more efficient computation. When running a sequential program, reordering may --- 356,529 ---- Writing Sequential Programs ! Proposed order. Basically follow the order in the proposed ! reference section. ! ! starting, stopping ! &array; ! &dynamicarray; ! &field; ! &vector; ! &matrix; ! &tensor; ! engine ! domain ! correctness, e.g., PAssert ! &pooma; command-line options ! ! Include views of containers in the appropriate sections. + + &c;: A Reference Manual uses this + structure for &c; libraries: + + + function declarations, separated by rules from rest of text + + + text explanation + + + table of structure members if appropriate + + + example + + + + + STL Tutorial and Reference Guide, second + edition, uses this structure for STL functions: + + + text description with declaration mixed in + + + example program mixed into text. It is an entire program, + not a program fragment. + + + + + A tutorial chapter for containers has + + + explanation of template types + + + bulleted list of container types + + + example constructors + + + example programs + + + member and related functions with example programs + + + list of accessors and relation functions + + + + + The reference chapter for containers has + + + a section listing common members and types for all containers + + + a section listing common member functions for all containers + + + requirements for various container specialties + + + The section describing vectors contains + + + files (header files) + + + class declaration + + + description + + + type definitions + + + constructors, destructors, and related functions + + + comparison operators + + + element access member functions + + + insert and erase member functions + + + notes + + + + + Josuttis's The &cc; Standard Library: A Tutorial + and Reference uses this structure for its STL container + chapter: + + + short introduction + + + common container abilities + + + common container operations (with table) + + + vector abilities + + + vector operations: + + + create, copy, and destroy operations (mostly table) + + + nonmodifying operations (mostly table) + + + assignments (mostly table) + + + element access (mostly table) + + + iterator functions (mostly table) + + + inserting and removing elements (mostly table) + + + + + + using vectors as ordinary arrays + + + exception handling + + + example program + + + &pooma; can reorder computations to permit more efficient computation. When running a sequential program, reordering may Index: tutorial.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v retrieving revision 1.1 diff -c -p -r1.1 tutorial.xml *** tutorial.xml 2001/12/11 20:36:13 1.1 --- tutorial.xml 2001/12/14 04:12:58 *************** *** 518,524 ****

Stencil &array; Implementation ! Many computations are local, computing a &array;'s value by using close-by &array; values. Encapsulating this computation in a stencil can yield faster code because the compiler can determine all accesses come from the same array. Each stencil consists of a --- 518,524 ----

Stencil &array; Implementation ! Many computations are local, computing an &array;'s value by using close-by &array; values. Encapsulating this computation in a stencil can yield faster code because the compiler can determine all accesses come from the same array. Each stencil consists of a Index: figures/concepts.mp =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/figures/concepts.mp,v retrieving revision 1.2 diff -c -p -r1.2 concepts.mp *** figures/concepts.mp 2001/12/13 04:04:05 1.2 --- figures/concepts.mp 2001/12/14 04:12:58 *************** endfig; *** 116,122 **** %% Comparisons Between Mathematical Concept And Computational Implementation of Arrays and Fields beginfig(101) ! numeric unit; unit = 0.9cm; numeric vertSpace; vertSpace = 2.6unit; % vertical space between sections numeric horizSpace; horizSpace = 8unit; % horizontal space between sections --- 116,122 ---- %% Comparisons Between Mathematical Concept And Computational Implementation of Arrays and Fields beginfig(101) ! numeric unit; unit = 0.8cm; numeric vertSpace; vertSpace = 2.6unit; % vertical space between sections numeric horizSpace; horizSpace = 8unit; % horizontal space between sections *************** beginfig(101) *** 137,144 **** endfor % Create and layout text boxes. ! boxit.l1(btex \strut mathematical concept etex); ! boxit.l2(btex \strut computational implementation etex); boxit.l3(btex \strut \type{Array}: etex); boxit.l4(btex \strut $\mbox{index} \mapsto \mbox{value}$ etex); boxit.l6(btex \strut \type{Field}: etex); --- 137,144 ---- endfor % Create and layout text boxes. ! boxit.l1(btex \strut \underline{mathematical concept} etex); ! boxit.l2(btex \strut \underline{computational implementation} etex); boxit.l3(btex \strut \type{Array}: etex); boxit.l4(btex \strut $\mbox{index} \mapsto \mbox{value}$ etex); boxit.l6(btex \strut \type{Field}: etex); *************** beginfig(101) *** 152,158 **** l1.w - l3.w = l4.w - l7.w = (0,vertSpace); l4.w - l3.e = l7.nw - l6.ne = (0,0); for t = 0 upto 1: ! xpart(ia[t].w - l[3+3t].e) = 0.65horizSpace; ypart(ia[t].w - l[3+3t].c) = 0; endfor xpart(l10.w - l7.w) = 0; --- 152,158 ---- l1.w - l3.w = l4.w - l7.w = (0,vertSpace); l4.w - l3.e = l7.nw - l6.ne = (0,0); for t = 0 upto 1: ! xpart(ia[t].w - l[3+3t].e) = 0.6horizSpace; ypart(ia[t].w - l[3+3t].c) = 0; endfor xpart(l10.w - l7.w) = 0; From oldham at codesourcery.com Mon Dec 17 18:15:52 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Mon, 17 Dec 2001 10:15:52 -0800 Subject: Patch: More Typo Fixes Message-ID: <20011217101552.B505@codesourcery.com> More typo fixes in comments. 2001-Dec-17 Jeffrey D. Oldham * README: Fixed typos in 2.1.0 entry. * src/Evaluator/PatchKernel.h: Fix typo in overview comment. * src/Pooma/Pooma.h: Remove extraneous semicolon from comment. (initialize): Fix spelling mistake in comment. * src/Utilities/Options.h: Fix spelling mistake in "Utility functions" comment. Not tested: since only comments were changed Applied to mainline Approved by Jim Crotinger and Mark Mitchell Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: README =================================================================== RCS file: /home/pooma/Repository/r2/README,v retrieving revision 1.60 diff -c -p -r1.60 README *** README 2001/10/05 01:29:02 1.60 --- README 2001/12/17 16:44:26 *************** Pooma includes two models for a Geometry *** 472,478 **** o NoGeometry - only includes positions at which the field is defined ! DiscreteGoemetry depends on a Centering concept. Pooma II meshes (see below) do not know about centering - it enters at the geometry level. Centering is simply a mechanism to determine where the points of the Field are defined, relative to the mesh points. Pooma 2.1.0 --- 472,478 ---- o NoGeometry - only includes positions at which the field is defined ! DiscreteGeometry depends on a Centering concept. Pooma II meshes (see below) do not know about centering - it enters at the geometry level. Centering is simply a mechanism to determine where the points of the Field are defined, relative to the mesh points. Pooma 2.1.0 *************** POOMA 2.1.0 implements the following int *** 699,705 **** CHANGES TO TENSOR CLASS AND NEW TINYMATRIX CLASS ------------------------------------------------ ! The Tensor class now takes only one parameter to specify it's size; it represents a square (D x D) mathematical tensor. The class declaration is --- 699,705 ---- CHANGES TO TENSOR CLASS AND NEW TINYMATRIX CLASS ------------------------------------------------ ! The Tensor class now takes only one parameter to specify its size; it represents a square (D x D) mathematical tensor. The class declaration is Index: src/Evaluator/PatchKernel.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Evaluator/PatchKernel.h,v retrieving revision 1.18 diff -c -p -r1.18 PatchKernel.h *** src/Evaluator/PatchKernel.h 2000/06/08 22:16:13 1.18 --- src/Evaluator/PatchKernel.h 2001/12/17 16:44:26 *************** *** 38,44 **** //----------------------------------------------------------------------------- // Overview: ! // A PatchKernel encapsulates perfoming operations on a patch of an expression. //----------------------------------------------------------------------------- //----------------------------------------------------------------------------- --- 38,45 ---- //----------------------------------------------------------------------------- // Overview: ! // A PatchKernel encapsulates performing operations on a patch of an ! // expression. //----------------------------------------------------------------------------- //----------------------------------------------------------------------------- Index: src/Pooma/Pooma.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Pooma/Pooma.h,v retrieving revision 1.31 diff -c -p -r1.31 Pooma.h *** src/Pooma/Pooma.h 2001/11/05 23:46:29 1.31 --- src/Pooma/Pooma.h 2001/12/17 16:44:26 *************** *** 47,53 **** // // Pooma::printStats // Pooma::debugLevel ! // Pooma::infoMessages; // Pooma::warnMessages // Pooma::errorMessages // Pooma::logMessages --- 47,53 ---- // // Pooma::printStats // Pooma::debugLevel ! // Pooma::infoMessages // Pooma::warnMessages // Pooma::errorMessages // Pooma::logMessages *************** namespace Pooma { *** 294,300 **** // Initialize POOMA, using the given Options container instead of argc,argv. // If the 2nd argument is true, also initialize the run-time system. ! // If the 3rd argument is true, call arch-specific initalize(). // Return success. bool initialize(Pooma::Options &opts, bool initRTS = true, --- 294,300 ---- // Initialize POOMA, using the given Options container instead of argc,argv. // If the 2nd argument is true, also initialize the run-time system. ! // If the 3rd argument is true, call arch-specific initialize(). // Return success. bool initialize(Pooma::Options &opts, bool initRTS = true, Index: src/Utilities/Options.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Utilities/Options.h,v retrieving revision 1.4 diff -c -p -r1.4 Options.h *** src/Utilities/Options.h 2000/06/30 02:00:16 1.4 --- src/Utilities/Options.h 2001/12/17 16:44:26 *************** private: *** 285,291 **** // Utility functions. //============================================================ ! // These used to be private methodes in the Options class, but they // are generally useful for parsing options, so they're now in the Pooma // namespace. --- 285,291 ---- // Utility functions. //============================================================ ! // These used to be private methods in the Options class, but they // are generally useful for parsing options, so they're now in the Pooma // namespace. From oldham at codesourcery.com Mon Dec 17 18:28:52 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Mon, 17 Dec 2001 10:28:52 -0800 Subject: Manual Patch: New Introductory Chapter Message-ID: <20011217102852.C505@codesourcery.com> This patch mainly adds an introductory chapter and a very small part of the sequential program chapter. 2001-Dec-17 Jeffrey D. Oldham * concepts.xml: Minor wordsmithing fixes, e.g., removal of old temporary paragraphs, spelling changes, and better use of entities. * glossary.xml: s/multi-processor/multiprocessor/ (architecture): New entry. (first class): Refill. * introduction.xml: New introductory chapter. * makefile (manual.dvi): Add dependence on introduction.xml and glossary.xml. * manual.xml: Add a few new entity declarations and use them. Move introductory chapter material to introduction.xml. Begin writing sequential program chapter. Add a few bibliographic entries. * tutorial.xml: Add more uses of entities changed in manual.xml. * figures/introduction.mp: New figure illustrating role of Pooma in science/math process. Applied to mainline Approved by me! Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: concepts.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v retrieving revision 1.2 diff -c -p -r1.2 concepts.xml *** concepts.xml 2001/12/14 04:18:13 1.2 --- concepts.xml 2001/12/17 16:56:50 *************** *** 1,9 **** Overview of &pooma; Concepts - FIXME: How does multi-threaded computation fit into the - model? - In the previous chapter, we presented several different implementations of the &doof2d; simulation program. The implementations illustrate the various containers, computation modes, --- 1,6 ---- *************** *** 33,39 **** computation environment description of resources for computing, e.g., single ! processor or multi-processor. --- 30,36 ---- computation environment description of resources for computing, e.g., single ! processor or multiprocessor. *************** *** 115,121 **** containers. This section describes many concepts, but one need not ! understand them all to begin programming with the &pooma; Toolkit. First, we introduce the different &pooma;'s containers and describe how to choose an appropriate one for a particular task. --- 112,118 ---- containers. This section describes many concepts, but one need not ! understand them all to begin programming with the &poomatoolkit;. First, we introduce the different &pooma;'s containers and describe how to choose an appropriate one for a particular task. *************** *** 361,367 **** memory. The layout specifies the processors and memory to use for each particular index. A container's layout for a uniprocessor implementation consists of its domain, the processor, and its ! memory. For a multi-processor implementation, the layout maps portions of the domain to (possibly different) processors and memory. --- 358,364 ---- memory. The layout specifies the processors and memory to use for each particular index. A container's layout for a uniprocessor implementation consists of its domain, the processor, and its ! memory. For a multiprocessor implementation, the layout maps portions of the domain to (possibly different) processors and memory. *************** *** 422,428 **** In the previous section, we introduced the concepts important when declaring containers for use on uniprocessor computers. When ! using multi-processor computers, we augment these concepts with those for distributed computation. Reading this section is important only for running a program on multiple processors. Many of these concepts were introduced in In the previous section, we introduced the concepts important when declaring containers for use on uniprocessor computers. When ! using multiprocessor computers, we augment these concepts with those for distributed computation. Reading this section is important only for running a program on multiple processors. Many of these concepts were introduced in As we noted in , a &pooma; ! programmer must specify how each container's domain should be ! distributed among the available processors and memory spaces. ! Using this information, the Toolkit automatically distributes the ! data among the available processors and handles any required ! communication among them. The three concepts necessary for ! declaring distributed containers are a partition, a guard layer, ! and a context mapper tag. A partition --- 431,444 ---- distributed container. As we noted in , a &pooma; programmer ! must specify how each container's domain should be distributed ! among the available processors and memory spaces. Using this ! information, the &toolkit; automatically distributes the data among ! the available processors and handles any required communication ! among them. The three concepts necessary for declaring distributed ! containers are a partition, a guard layer, and a context mapper ! tag. A partition *************** *** 615,622 **** &pooma; uses the communication library to copy information among contexts, all of which is hidden from both the programmer and the user. &pooma; works with the Message Passing Interface (&mpi;) ! Communications Library (FIXME: xref linkend="mpi99", ) and the &mm; Shared Memory Library. See for details.

--- 612,620 ---- &pooma; uses the communication library to copy information among contexts, all of which is hidden from both the programmer and the user. &pooma; works with the Message Passing Interface (&mpi;) ! Communications Library ! ! () and the &mm; Shared Memory Library. See for details.

Index: glossary.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/glossary.xml,v retrieving revision 1.3 diff -c -p -r1.3 glossary.xml *** glossary.xml 2001/12/14 04:18:13 1.3 --- glossary.xml 2001/12/17 16:56:50 *************** *** 17,22 **** --- 17,31 ---- A + + architecture + + particular hardware (processor) interface. Examples + architectures include linux, sgin32, + sgi64, and sun. + + + &array; *************** *** 25,32 **** ignoring the time to compute the values if applicable. &array;s are first-class objects. &dynamicarray;s and &field;s generalize &array;. &dynamicarray; &field; --- 34,42 ---- ignoring the time to compute the values if applicable. &array;s are first-class objects. &dynamicarray;s and &field;s generalize ! &array;. &dynamicarray; &field; *************** *** 165,171 **** computing environment with one or more processors each having associated memory, possibly shared. In some contexts, it ! refers to strictly multi-processor computation. computing environment sequential computing environment --- 175,181 ---- computing environment with one or more processors each having associated memory, possibly shared. In some contexts, it ! refers to strictly multiprocessor computation. computing environment sequential computing environment *************** *** 364,370 **** a map from an index to processor(s) and memory used to compute the container's associated value. For a uniprocessor implementation, a container's layout always consists of its ! domain, the processor, and its memory. For a multi-processor implementation, the layout maps portions of the domain to (possibly different) processors and memory. container --- 374,380 ---- a map from an index to processor(s) and memory used to compute the container's associated value. For a uniprocessor implementation, a container's layout always consists of its ! domain, the processor, and its memory. For a multiprocessor implementation, the layout maps portions of the domain to (possibly different) processors and memory. container *************** *** 572,580 **** a container derived from another. The former's domain is a subset of the latter's, but, where the domains intersect, accessing a value through the view is the same as accessing it ! through the original container. Only &array;s, &dynamicarray;s, ! and &field;s support views. ! container --- 582,591 ---- a container derived from another. The former's domain is a subset of the latter's, but, where the domains intersect, accessing a value through the view is the same as accessing it ! through the original container. In Fortran 90, these are ! called array sections. Only &array;s, &dynamicarray;s, and ! &field;s support views.container Index: introduction.xml =================================================================== RCS file: introduction.xml diff -N introduction.xml *** /dev/null Fri Mar 23 21:37:44 2001 --- introduction.xml Mon Dec 17 09:56:51 2001 *************** *** 0 **** --- 1,348 ---- + + Introduction + + The Parallel Object-Oriented Methods and Applications + POOMA &toolkitcap; is a &cc; &toolkit; for + writing high-performance scientific programs for sequential and + distributed computation. The &toolkit; provides a variety of + tools: + + + containers and other abstractions suitable for scientific + computation, + + + several container storage classes to reduce a program's + storage requirements, + + + support for a variety of computation modes including + data-parallel expressions, stencil-based computations, and lazy + evaluation, + + + support for writing parallel and distributed programs, + + + automatic creation of all interprocessor communication for + parallel and distributed programs, and + + + automatic out-of-order execution and loop rearrangement + for fast program execution. + + + Since the &toolkit; provides high-level abstractions, &pooma; + programs are much shorter than corresponding &fortran; or &c; + programs, requiring less time to write and less time to debug. + Using these high-level abstractions, the same code runs on a wide + variety of computers almost as fast as carefully crafted + machine-specific hand-written programs. The &toolkit; is freely + available, open-source software compatible with any modern &cc; + compiler. + + &pooma; Goals. + The goals for the &poomatoolkit; have remained unchanged + since its inception in 1994: + + + Code portability across serial, distributed, and parallel + architectures with no change to source code. + + + Development of reusable, cross-problem-domain components + to enable rapid application development. + + + Code efficiency for kernels and components relevant to + scientific simulation. + + + [&toolkitcap;] design and development driven by + applications from a diverse set of scientific problem + domains. + + + Shorter time from problem inception to working parallel + simulations. + + + + + + + Code Portability for Sequential and Distributed Programs. + &pooma; programs run on sequential, distributed, and parallel + computers with no change in source code. The programmer writes two + or three lines specifying how each container's domain should be + distributed among available processors. Using these directives and + run-time information about the computer's configuration, the + &toolkit; automatically distributes pieces of the container + domains, called patches, among the available + processors. If a computation needs values from another patch, + &pooma; automatically passes the value to the place it is needed. + The same program, and even the same executable, works regardless of + the number of the available processors and the size of the + containers' domains. A programmer interested in only sequential + execution can omit the two or three lines specifying how the + domains are to be distributed. + + +

+ Science, Algorithms, Engineering, and &pooma; + + + + + + how &pooma; helps translate algorithms into programs + + + + + + Rapid Application Development. + The &poomatoolkit; is designed to enable rapid development of + scientific and distributed applications. For example, its vector, + matrix, and tensor classes model the corresponding mathematical + concepts. Its &array; and &field; classes model the discrete + spaces and mathematical arrays frequently found in computational + science and math. See . The left column + illustrates theoretical science and math, the middle column + computational science and math, and the right column computer + science implementations. For example, theoretical physics + frequently uses continuous fields in three-dimension space, while + algorithms for the corresponding computational physics problem + usually uses discrete fields. &pooma; containers, classes, and + functions ease the engineering to map these algorithms to computer + programs. For example, the &pooma; &field; container models + discrete fields; both map locations in discrete space to values and + permit computations of spatial distances and values. The &pooma; + &array; container models the mathematical concept of an array, used + in numerical analysis. + + + &pooma; containers support a variety of computation modes, + easing transition of algorithms into code. For example, many + algorithms for solving partial differential equations use + stencil-based computations. &pooma; supports stencil-based + computations on &array;s and &field;s. It also supports + data-parallel computation. For computations where one &field;'s + values is a function of several other &field;'s values, the + programmer can specify a relation. Relations are lazily evaluated; + whenever the dependent &field;'s values are needed and it is + related to a &field; whose values have changed, the former + &field;'s values are computed. Lazy evaluation also assists + correctness by eliminating the (frequently forgotten) need for a + programmer to ensure a &field;'s values are up-to-date before being + used. + + Efficient Code. + &pooma; incorporates a variety of techniques to ensure it + produces code that executes as quickly as special-case, + hand-written code. + + These techniques include extensive use of templates, out-of-order + evaluation to permit communication and computation to overlap, + availability of guard layers to reduce processors' synchronicity, + and use of &pete; to produce fast inner loops. + + + Using templates permits the expressiveness of using pointers + and function arguments but ensures as much as work as possible + occurs at compile time, not run time. Also, more code is exposed + to the compiler's optimizer, further speeding execution. For + example, use of template parameters to define the &pooma; &array; + container permits the use of specialized data storage classes + called engines, fast creation of views of a portion of an &array;, + and polymorphic indexing. An &array;'s engine template parameter + specifies how data is stored and indexed. Some &array;s expect + almost all values to be used, while others might be mostly empty. + In the latter case, using a specialized engine storing the few + nonzero values would greatly reduce space requirements. Using + engines also permits fast creation of container views, known as + array sections in Fortran 90. A view's + engine is the same as the original container's engine, while the + view object maps its restricted domain to the original domain. + Space requirements and execution time are minimal. Using templates + also permits containers to support polymorphic indexing, e.g., + indexing both by integers and by three-dimensional coordinates. + For example, a container defers returning values to its engine + using a templatized index operator. The engine can define indexing + functions with different function arguments, without the need to + add corresponding container functions. Some of these features can + be expressed without using templates, but doing so increases + execution time. For example, a container could have a pointer to + an engine object, but this requires a pointer dereference for each + operation. Implementing polymorphic indexing without templates + would require adding virtual function corresponding to each of the + indexing functions. + + + + To ensure multiprocessor &pooma; programs execute quickly, it + is important that interprocessor communication overlaps with + intraprocessor computation as much as possible and communication is + minimized. Asynchronous communication, out-of-order evaluation, and + use of guard layers all help achieve this. &pooma; uses the + asynchronous communication facilities of the &cheetah; communication + library. When a processor needs data stored or computed by another + processor, a message is sent between the two. For synchronous + communication, the sender must issue an explicit send, and the + recipient must issue an explicit receive. This synchronizes them. + &cheetah; permits the sender to put and get data without the + intervention of the remote site and also invoke functions at the + remote site to ensure the data is up-to-date. Thus, out-of-order + evaluation must be supported. Out-of-order evaluation has another + benefit: only computations directly or indirectly related to values + that are printed need occur. + + Using guard layers also helps overlap communication and + computation. For distributed computation, each container's domain is + split into pieces distributed among the available processors. + Frequently, computing a container value is local, involving just the + value itself and a few neighbors. Computing a value near the edge of + a processor's domain may require knowing a few values from a + neighboring domain. Guard layers permit these values to be copied + locally so they need not be repeatedly communicated. + + &pooma; uses &pete; technology to ensure inner loops using + &pooma;'s object-oriented containers run as quickly as hand-coded + loops. &pete; (the Portable Expression Template + Engine) uses expression-template technology to convert + data-parallel statements frequently found in the inner loops of + programs into efficient loops without any intermediate + computations. For example, consider evaluating the A += + -B + 2 * C; statement where A and + C are vector<double>s and + B is a vector<int>s. + Ordinary evaluation might introduce intermediaries for + -B, 2*C, and their + sum. The presence of these intermediaries in inner loops can + measurably slow evaluation. To produce a loop without + intermediaries, &pete; stores each expression as a parse tree. The + resulting parse trees can be combined into a larger parse tree. + Using its templates, the parse tree is converted, at compile time, + to an outer loop with contents corresponding to evaluating each + component of the result. Thus, no intermediate values are computed + or stored. For example, the code corresponding to A += + -B + 2 * C; is + + vector<double>::iterator iterA = A.begin(); + vector<int>::const_iterator iterB = B.begin(); + vector<double>::const_iterator iterC = C.begin(); + while (iterA != A.end()) { + *iterA += -*iterB + 2 * *iterC; + ++iterA; ++iterB; ++iterC; + } + + Furthermore, since the code is available at compile-, not run-, time, + it can be further optimized, e.g., moving any loop-invariant code out + of the loop. + + Used for Diverse Set of Scientific Problems. + &pooma; has been used to solve a wide variety of scientific + problems. Most recently, physicists at Los Alamos National + Laboratory implemented an entire library of hydrodynamics codes as + part of the U.S. government's Science-based Stockpile Stewardship + (SBSS) program to simulate nuclear weapons. + Other applications include a matrix solver, an accelerator code + simulating the dynamics of high-intensity charged particle beams in + linear accelerators, and a Monte Carlo neutron transport + code. + + + Easy Implementation. + &pooma;'s tools greatly reduce the time to implement + applications. As we noted above, &pooma;'s containers and + expression syntax model the computational models and algorithms + most frequently found in scientific programs. Using these + high-level tools which are known to be correct reduce the time + needed to debug programs. Programmers can write and test programs + using their one or two-processor personal computers. With no + additional work, the same program runs on computers with hundreds + of processors; the code is exactly the same, and the &toolkit; + automatically handles distribution of the data, all data + communication, and all synchronization. Using all these tools + greatly reduces programming time. For example, a team of two + physicists and two support people at Los Alamos National Laboratory + implemented a suite of hydrodynamics kernels in six months. Their + work replaced the previous suite of less-powerful kernels which had + taken sixteen people several years to implement and debug. Despite + not previously implementing any of the kernels, they averaged one + new kernel every three days, including the time to read the + corresponding scientific papers! + + +

+ History of &pooma; + + The &poomatoolkit; developed at Los Alamos National + Laboratory to assist nuclear fusion and fission research. + In 1994, the &toolkit; grew out of the Object-Oriented + Particle Simulation (OOPS) class library developed for + particle-in-cell simulations. The goals of the Framework, as it + was called at the time, were driven by the Numerical Tokamak's + Parallel Platform Paradox: +

+ The average time required to implement a moderate-sized + application on a parallel computer architecture is equivalent to + the half-life of the latest parallel supercomputer. +

+ The framework's goal of being able to quickly write efficient + scientific code that could be run on a wide variety of platforms + remains unchanged today. Development, driven mainly by the + Advanced Computing Laboratory at Los Alamos, proceeded rapidly. + A matrix solver application was written using the framework. + + Support for hydrodynamics, Monte Carlo simulations, and molecular + dynamics modeling soon followed. + + By 1998, &pooma; was part of the U.S. Department of + Energy's Accelerated Strategic Computing Initiative + (ASCI). The Comprehensive Test Ban Treaty + forbid nuclear weapons testing so they were instead simulated. + ASCI's goal was to radically advance the state + of the art in high-performance computing and numerical simulations + so the nuclear weapon simulations could use 100-teraflop + computers. A linear accelerator code linac and a Monte Carlo neutron + transport code MC++ + were written. + + + + &pooma; 2 involved a new conceptual framework and a + complete rewriting of the source code to improve performance. The + + &array; class was introduced with its use of engines, separating + container use from container storage. An asynchronous scheduler + permitted out-of-order execution to improve cache coherency. + Incorporating the Portable + Expression Template Engine (PETE) + permitted faster loop execution. Soon, container views and + ConstantFunction and IndexFunction + engines were added. Release 2.1.0 included &field;s with + their spatial extent and &dynamicarray;s with the ability to + dynamically change its domain size. Support for particles and + their interaction with &field;s was added. The &pooma; messaging + implementation was revised in release 2.3.0. Use of the + &cheetah; Library separated &pooma; from the actual messaging + library used. Support for applications running on clusters of + computers was added. During the past two years, the &field; + abstraction and implementation was improved to increase its + flexibility, add support for multiple values and materials in the + same cell, and permit lazy evaluation. Simultaneously, the + execution speed of the inner loops was greatly increased. The + particle code has not yet been ported to the new &field; + abstraction. +

+ + Index: makefile =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/makefile,v retrieving revision 1.3 diff -c -p -r1.3 makefile *** makefile 2001/12/14 04:18:13 1.3 --- makefile 2001/12/17 16:56:51 *************** CXXFLAGS= -g -Wall -pedantic -W -Wstrict *** 25,31 **** all: manual.ps ! manual.dvi: manual.xml concepts.xml tutorial.xml %.all: %.ps %.pdf %.html chmod 644 $*.ps $*.pdf --- 25,31 ---- all: manual.ps ! manual.dvi: manual.xml introduction.xml tutorial.xml concepts.xml glossary.xml %.all: %.ps %.pdf %.html chmod 644 $*.ps $*.pdf Index: manual.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v retrieving revision 1.3 diff -c -p -r1.3 manual.xml *** manual.xml 2001/12/14 04:18:13 1.3 --- manual.xml 2001/12/17 16:56:54 *************** *** 30,35 **** --- 30,37 ---- Doof2d" > + Fortran"> + Make"> MM"> *************** *** 42,48 **** POOMA"> ! POOMA Toolkit"> Purify"> --- 44,50 ---- POOMA"> ! POOMA &toolkitcap;"> Purify"> *************** *** 53,58 **** --- 55,64 ---- Tau"> + + + + *************** *** 74,79 **** --- 80,88 ---- Engine"> + false"> + + Field"> Inform"> *************** *** 88,99 **** --- 97,113 ---- MultiPatch"> + Options"> + ReplicatedTag"> Stencil"> Tensor"> + true"> + + Vector"> *************** *** 135,171 **** ! ! ! ! ! ! ! ! ! ! ! ! ! ]> &pooma; ! A &cc; Toolkit for High-Performance Parallel Scientific Computing JeffreyD.Oldham --- 149,195 ---- + + + + + + + + + + + ! ! ! ! ! ! ! ! ! ! ! ! ]> &pooma; ! A &cc; &toolkitcap; for High-Performance Parallel Scientific Computing JeffreyD.Oldham *************** *** 254,353 **** Programming with &pooma; - - - Introduction - - QUESTION: Add a partintro to the part above? - - &pooma; abbreviates Parallel Object-Oriented Methods - and Application. ! This document is an introduction to &pooma; v2.1, a &cc; ! toolkit for high-performance scientific computation. &pooma; ! runs efficiently on single-processor desktop machines, ! shared-memory multiprocessors, and parallel supercomputers ! containing dozens or hundreds of processors. What's more, by making ! extensive use of the advanced features of the ANSI/ISO &cc; ! standard—particularly templates—&pooma; presents a ! compact, easy-to-read interface to its users. ! ! From Section of ! papers/iscope98.pdf: ! ! Scientific software developers have struggled with the need ! to express mathematical abstractions in an elegant and maintainable ! way without sacrificing performance. The &pooma; (Parallel ! Object-Oriented Methods and Applications) framework, written in ! ANSI/ISO &cc;, has ! demonstrated both high expressiveness and high performance for ! large-scale scientific applications on platforms ranging from ! workstations to massively parallel supercomputers. &pooma; provides ! high-level abstractions for multidimensional arrays, physical ! meshes, mathematical fields, and sets of particles. &pooma; also ! exploits techniques such as expression templates to optimize serial ! performance while encapsulating the details of parallel ! communication and supporting block-based data compression. ! Consequently, scientists can quickly assemble parallel simulation ! codes by focusing directly on the physical abstractions relevant to ! the system under study and not the technical difficulties of ! parallel communication and machine-specific optimization. ! ! ADD: diagram of science and &pooma;. See the diagram that ! Mark and I wrote. ! ! Mention efficient evaluation of &pooma; expressions. See ! pooma-publications/iscope98.pdf, ! Section 4. ! !

! Evolution of &pooma; ! ! QUESTION: Is this interesting? Even if it is, it should be ! short. ! ! The file papers/SCPaper-95.html ! describes ?&pooma;1? and its abstraction layers. ! ! The "Introduction" of ! papers/Siam0098.ps describes the DoE's ! funding motivation for &pooma;: Accelerated Strategic Computing ! Initiative (ASCI) and Science-based Stockpile Stewardship (SBSS), ! pp. 1–2. ! ! See list of developers on p. 1 of ! papers/pooma.ps. ! ! See list of developers on p. 1 of ! papers/pooma.ps. See history and motivation ! on p. 3 of papers/pooma.ps. ! Use README for ! information. -

- introduction.html - - &pooma; was designed and implemented by scientists working - at the Los Alamos National Laboratory's Advanced Computing - Laboratory. Between them, these scientists have written and tuned - large applications on almost every commercial and experimental - supercomputer built in the last two decades. As the technology - used in those machines migrates down into departmental computing - servers and desktop multiprocessors, &pooma; is a vehicle for its - designers' experience to migrate as well. In particular, - &pooma;'s authors understand how to get good performance out of - modern architectures, with their many processors and multi-level - memory hierarchies, and how to handle the subtly complex problems - that arise in real-world applications. -

- -

- - - - &tutorial-chapter; &concepts-chapter; --- 278,288 ---- Programming with &pooma; ! ! &introductory-chapter; &tutorial-chapter; &concepts-chapter; *************** *** 356,361 **** --- 291,305 ---- Writing Sequential Programs + FIXME: Explain the chapter's purpose. + HERE + + FIXME: Explain the format of each section. + HERE + + FIXME: Explain the order of the sections. + HERE + Proposed order. Basically follow the order in the proposed reference section. *************** *** 373,380 **** Include views of containers in the appropriate sections. - - &c;: A Reference Manual uses this structure for &c; libraries: --- 317,322 ---- *************** *** 524,555 **** ! &pooma; can reorder computations to permit more efficient ! computation. When running a sequential program, reordering may ! permit omission of unneeded computations. For example, if only ! values from a particular field are printed, only computations ! involving the field and containers dependent on it need to occur. ! When running a distributed program, reordering may permit ! computation and communication among processors to overlap. &pooma; ! automatically tracks dependences between data-parallel expressions, ! ensuring correct ordering. It does not track statements accessing ! particular &array; and &field; values so the programmer must ! precede these statements with calls to ! Pooma::blockAndEvaluate(). Each call forces ! the executable to wait until all computation has completed. Thus, ! the desired values are known to be available. In practice, some ! calls to Pooma::blockAndEvaluate may not be ! necessary, but omitting them requires knowledge of &pooma;'s ! dependence computations, so the &author; recommends calling ! Pooma::blockAndEvaluate before each access to ! a particular value in an &array; or &field;. Omitting a necessary ! call may lead to a race condition. See for ! instructions how to diagnose and eliminate these race conditions. UNFINISHED

&benchmark; Programs --- 466,663 ---- + +

+ Beginning and Ending &pooma; Programs + + Every &pooma; program must begin with a call to + initialize and end with a call to + finalize. These functions respectively + prepare and shut down &pooma;'s run-time structures. + +

+ Files + + + #include "Pooma/Pooma.h" // or "Pooma/Arrays.h" or "Pooma/Fields.h" or ... + +

+ +

+ Declarations + + + + bool Pooma::initialize + + int &argc, + char ** &argv, + bool initRTS = true, + bool getCLArgsArch = true, + bool initArch = true + + + + + bool Pooma::initialize + + Pooma::Options &opts, + bool initRTS = true, + bool initArch = true + + + + + bool Pooma::finalize + + + + + bool Pooma::finalize + + bool quitRTS, + bool quitArch + + + +

+ +

+ Description + + Before its use, the &poomatoolkit; must be initialized by a + call to initialize. This usually occurs in + the main function. The first form removes + and processes any &pooma;-specific arguments from the + command-line arguments argv and + argc. describes these options. + The third, fourth, and fifth arguments all have a default value + of &true;. If initRTS is + &true;, the run-time system is initialized. E.g., the contexts + are prepared for use. If getCLArgsArch is &true, + architecture-specific command-line arguments are removed from + argv and argc. + Architecture-specific initialization occurs if getCLArgsArch is &true;. An architecture is specified + by a hardware interface, e.g., processor type, but frequently is + also associated with an operating system or compiler. For + example, Metrowerks for the Macintosh has an + architecture-specific initialization. The function always + returns &true;. + + initialize's alternative form + assumes the &pooma;-specific and architecture-specific + command-line arguments have already been removed from + argv and argc and stored in + opts. Its other two + parameters have the same meaning, and the two functions' + semantics are otherwise the same. + + After its use, the &poomatoolkit; should be shut down using + a call to finalize. This usually occurs in + the main function. The former, and more + frequently used, form first prints any statistics and turns off + all default &pooma; streams. Then it shuts down the run-time + system if it was previously initialized and then shuts down + architecture-specific objects if they were previously + initialized. The latter form gives provides explicit control + whether the run-time system (quitRTS) and architecture-specific + objects (quitArch) are + shut down. Both functions always returns &true;. + + Including almost any &pooma; header file, rather than just + Pooma/Pooma.h suffices + since most other &pooma; header files include it. +

+ +

+ Example Program + + Since every &pooma; program must call + initialize and + finalize, the simplest &pooma; program also + must call them. This program also illustrates their usual + use. + + &initialize-finalize; +

+ +

+ &pooma; Command-line Options + + Every &pooma; program accepts a set of &pooma;-specific + command-line options to set values at run-time. + +

+ Options Summary + + + + &dashdash;pooma-info + + + HERE Who uses this? + + + + ! FIXME: Be sure to list default values. ! ! ! !

! ! ! ! QUESTION: Should I defer documenting &options; to the ! reference manual, instead just listing commonly used options in ! the previous section? ! ! UNFINISHED ! !

! !

! TMP: Place these somewhere. + &pooma; can reorder computations to permit more efficient + computation. When running a sequential program, reordering may + permit omission of unneeded computations. For example, if only + values from a particular field are printed, only computations + involving the field and containers dependent on it need to occur. + When running a distributed program, reordering may permit + computation and communication among processors to overlap. + &pooma; automatically tracks dependences between data-parallel + expressions, ensuring correct ordering. It does not track + statements accessing particular &array; and &field; values so the + programmer must precede these statements with calls to + Pooma::blockAndEvaluate(). Each call forces + the executable to wait until all computation has completed. Thus, + the desired values are known to be available. In practice, some + calls to Pooma::blockAndEvaluate may not be + necessary, but omitting them requires knowledge of &pooma;'s + dependence computations, so the &author; recommends calling + Pooma::blockAndEvaluate before each access to + a particular value in an &array; or &field;. Omitting a necessary + call may lead to a race condition. See for + instructions how to diagnose and eliminate these race + conditions. + + Where talk about various &pooma; streams? + UNFINISHED +

+ +

&benchmark; Programs *************** *** 561,573 ****

- Using <type>Inform</type>s for Output - - UNFINISHED -

- -

Miscellaneous --- 669,674 ---- *************** *** 604,610 **** &pooma; II's expression trees and expression engines. COMMENT: background.html has some related &pete; material.

--- 705,711 ---- &pooma; II's expression trees and expression engines. COMMENT: background.html has some related &pete; material.

*************** *** 652,659 **** in the input domain: A(i1, i2, ..., iN). The &pooma; multi-dimensional Array concept is similar to ! the Fortran 90 array facility, but extends it in several ! ways. Both &pooma; and Fortran arrays can have up to seven dimensions, and can serve as containers for arbitrary types. Both support the notion of views of a portion of the array, known as array sections in F90. The &pooma; Array concept --- 753,760 ---- in the input domain: A(i1, i2, ..., iN). The &pooma; multi-dimensional Array concept is similar to ! the &fortran; 90 array facility, but extends it in several ! ways. Both &pooma; and &fortran; arrays can have up to seven dimensions, and can serve as containers for arbitrary types. Both support the notion of views of a portion of the array, known as array sections in F90. The &pooma; Array concept *************** *** 664,673 **** depending on the particular type of the Array being indexed. ! Fortran arrays are dense and the elements are arranged according to column-major conventions. Therefore, X(i1,i2) refers to element number i1-1+(i2-1)*numberRowsInA. However, as ! Fig. 1 shows, Fortran-style "Brick" storage is not the only storage format of interest to scientific programmers. For compatibility with C conventions, one might want to use an array featuring dense, row-major storage (a C-style Brick). To save --- 765,774 ---- depending on the particular type of the Array being indexed. ! &fortran; arrays are dense and the elements are arranged according to column-major conventions. Therefore, X(i1,i2) refers to element number i1-1+(i2-1)*numberRowsInA. However, as ! Fig. 1 shows, &fortran;-style "Brick" storage is not the only storage format of interest to scientific programmers. For compatibility with C conventions, one might want to use an array featuring dense, row-major storage (a C-style Brick). To save *************** *** 691,697 **** themselves in the template parameters for the &pooma; Array class. The template ! template <int Dim, class T = double, class EngineTag = Brick> class Array; is a specification for creating a set of classes all named --- 792,798 ---- themselves in the template parameters for the &pooma; Array class. The template ! template <int Dim, class T = double, class EngineTag = Brick> class Array; is a specification for creating a set of classes all named *************** *** 771,777 **** general Engine template whose template parameters are identical to those of Array. Next, the Array template determines the type of scalar arguments (indices) to be used in operator(), the function ! that implements &pooma;'s Fortran-style indexing syntax X(i1,i2): typedef typename Engine_t::Index_t Index_t; --- 872,878 ---- general Engine template whose template parameters are identical to those of Array. Next, the Array template determines the type of scalar arguments (indices) to be used in operator(), the function ! that implements &pooma;'s &fortran;-style indexing syntax X(i1,i2): typedef typename Engine_t::Index_t Index_t; *************** *** 816,822 **** framework. Figure 3 illustrates the "Brick" specialization of the ! Engine template, which implements Fortran-style lookup into a block of memory. First, there is the general Engine template, which is empty as there is no default behavior for an unknown EngineTag. The general template is therefore not a model for the --- 917,923 ---- framework. Figure 3 illustrates the "Brick" specialization of the ! Engine template, which implements &fortran;-style lookup into a block of memory. First, there is the general Engine template, which is empty as there is no default behavior for an unknown EngineTag. The general template is therefore not a model for the *************** *** 826,832 **** specialization of the Engine template. Finally, there is the partial specialization of the Engine template. Examining its body, we see the required Index_t typedef and the required operator(), ! which follows the Fortran prescription for generating an offset into the data block based on the row, column, and the number of rows. All of the requirements are met, so the Brick-Engine class is a model of the Engine concept. --- 927,933 ---- specialization of the Engine template. Finally, there is the partial specialization of the Engine template. Examining its body, we see the required Index_t typedef and the required operator(), ! which follows the &fortran; prescription for generating an offset into the data block based on the row, column, and the number of rows. All of the requirements are met, so the Brick-Engine class is a model of the Engine concept. *************** *** 1899,1904 **** --- 2000,2023 ---- TMP: What do we do with these …? Remove this section. +

+ introduction.html + + &pooma; was designed and implemented by scientists working + at the Los Alamos National Laboratory's Advanced Computing + Laboratory. Between them, these scientists have written and tuned + large applications on almost every commercial and experimental + supercomputer built in the last two decades. As the technology + used in those machines migrates down into departmental computing + servers and desktop multiprocessors, &pooma; is a vehicle for its + designers' experience to migrate as well. In particular, + &pooma;'s authors understand how to get good performance out of + modern architectures, with their many processors and multi-level + memory hierarchies, and how to handle the subtly complex problems + that arise in real-world applications. +

+ QUESTION: Do we describe the &leaffunctor;s specialized for &array;s in src/Array/Array.h or in the &pete; *************** *** 2879,2898 **** ! TMP: Where do we describe these files? src/Utilities/Conform.h: tag for checking whether terms in expression have conforming domains src/Utilities/DerefIterator.h: DerefIterator<T> and ConstDerefIterator<T> automatically dereference themselves to maintain const --- 2998,3017 ---- ! TMP: Where do we describe these files? src/Utilities/Conform.h: tag for checking whether terms in expression have conforming domains src/Utilities/DerefIterator.h: DerefIterator<T> and ConstDerefIterator<T> automatically dereference themselves to maintain const *************** *** 2901,2910 **** src/Utilities/Observable.h, src/Utilities/Observer.h, and src/Utilities/ObserverEvent.h: Observable<T>, SingleObserveable<T>, Observer<T>, and ObserverEvent --- 3020,3029 ---- src/Utilities/Observable.h, src/Utilities/Observer.h, and src/Utilities/ObserverEvent.h: Observable<T>, SingleObserveable<T>, Observer<T>, and ObserverEvent *************** *** 2915,2920 **** --- 3034,3053 ---- + + + TMP: Items to Discuss in Reference Manual + + + + Discuss &options; and related material. Add developer + command-line options listed in Utilities/Options.cmpl.cpp and also + possibly &dashdash;pooma-threads + n. + + + *************** *** 2946,2952 **** Section 3, "Sample Applications" of papers/SiamOO98_paper.ps describes porting a ! particle program written using High-Performance Fortran to &pooma; and presumably why particles were added to &pooma;. It also describes MC++, a Monte Carlo neutron transport code. --- 3079,3085 ---- Section 3, "Sample Applications" of papers/SiamOO98_paper.ps describes porting a ! particle program written using High-Performance &fortran; to &pooma; and presumably why particles were added to &pooma;. It also describes MC++, a Monte Carlo neutron transport code. *************** *** 3332,3338 **** QUESTION: How do &pooma; parallel concepts compare with ! Fortran D or high-performance Fortran FINISH CITE: {koelbel94:_high_perfor_fortr_handb}? --- 3465,3471 ---- QUESTION: How do &pooma; parallel concepts compare with ! &fortran; D or high-performance &fortran; FINISH CITE: {koelbel94:_high_perfor_fortr_handb}? *************** *** 3500,3505 **** --- 3633,3856 ---- Using MPI Portable Parallel Programming with the Message-Passing Interface second edition + + + + pooma95 + + + JohnV. W.Reynders + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + PaulJ.Hinker + + Dakota Software Systems, Inc. +

Rapid CitySD

+ + + + JulianC.Cummings + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + SusanR.Atlas + + Parallel Solutions, Inc. +

Santa FeNM

+ + + + SubhankarBanerjee + + New Mexico State University +

Las CrucesNM

+ + + + WilliamF.Humphrey + + University of Illinois at Urbana-Champaign +

Urbana-ChampaignIL

+ + + + SteveR.Karmesin + + California Institute of Technology +

PasadenaCA

+ + + + KatarzynaKeahey + + Indiana University +

BloomingtonIN

+ + + + MarydellTholburn + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + &pooma; + A Framework for Scientific Simulation on Parallel Architectures + unpublished + + + + pooma-sc95 + + + SusanAtlas + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + SubhankarBanerjee + + New Mexico State University +

Las CrucesNM

+ + + + JulianC.Cummings + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + PaulJ.Hinker + + Advanced Computing Laboratory +

Los AlamosNM

+ + + + M.Srikant + + New Mexico State University +

Las CrucesNM

+ + + + JohnV. W.Reynders + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + MarydellTholburn + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + &pooma; + A High Performance Distributed Simulation Environment for + Scientific Applications + + + + + pooma-siam98 + + + JulianC.Cummings + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + JamesA.Crotinger + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + ScottW.Haney + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + WilliamF.Humphrey + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + SteveR.Karmesin + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + JohnV. W.Reynders + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + StephenA.Smith + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + TimothyJ.Williams + + Los Alamos National Laboratory +

Los AlamosNM

+ + + + Raid Application Development and Enhanced Code + Interoperability using the &pooma; Framework + + + + + + pete-99 + + + ScottHaney + + + JamesCrotinger + + + SteveKarmesin + + + StephenSmith + + + Easy Expression Templates Using &pete;: The Portable + Expression Template Engine + Index: tutorial.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v retrieving revision 1.2 diff -c -p -r1.2 tutorial.xml *** tutorial.xml 2001/12/14 04:18:13 1.2 --- tutorial.xml 2001/12/17 16:56:55 *************** *** 36,42 **** a data-parallel &pooma; &field; implementation for ! multi-processor execution. --- 36,42 ---- a data-parallel &pooma; &field; implementation for ! multiprocessor execution. *************** *** 94,100 **** zero. Before presenting various implementations of %doof2d;, we ! explain how to install the &poomaToolkit;. REMOVE: &doof2d; algorithm and code is illustrated in Section 4.1 of --- 94,100 ---- zero. Before presenting various implementations of %doof2d;, we ! explain how to install the &poomatoolkit;. REMOVE: &doof2d; algorithm and code is illustrated in Section 4.1 of *************** *** 111,117 **** LINUXgcc.conf is not available. In this section, we describe how to obtain, build, and ! install the &poomaToolkit;. We focus on installing under the Unix operating system. Instructions for installing on computers running Microsoft Windows or MacOS, as well as more extensive instructions for Unix, appear in LINUXgcc.conf is not available. In this section, we describe how to obtain, build, and ! install the &poomatoolkit;. We focus on installing under the Unix operating system. Instructions for installing on computers running Microsoft Windows or MacOS, as well as more extensive instructions for Unix, appear in &dashdash;arch option is the name of the corresponding configuration file, omitting its .conf suffix. The ! &dashdash;opt indicates the &poomaToolkit; will contain optimized source code, which makes the code run more quickly but may impede debugging. Alternatively, the &dashdash;debug option supports debugging. The --- 142,148 ---- &dashdash;arch option is the name of the corresponding configuration file, omitting its .conf suffix. The ! &dashdash;opt indicates the &poomatoolkit; will contain optimized source code, which makes the code run more quickly but may impede debugging. Alternatively, the &dashdash;debug option supports debugging. The *************** *** 178,184 ****

Hand-Coded Implementation ! Before implementing &doof2d; using the &poomaToolkit;, we present a hand-coded implementation of &doof2d;. See . After querying the user for the number of averagings, the arrays' memory is --- 178,184 ----

Hand-Coded Implementation ! Before implementing &doof2d; using the &poomatoolkit;, we present a hand-coded implementation of &doof2d;. See . After querying the user for the number of averagings, the arrays' memory is *************** *** 290,296 ****

Element-wise &array; Implementation ! The simplest way to use the &poomaToolkit; is to use the &pooma; &array; class instead of &c; arrays. &array;s automatically handle memory allocation and deallocation, support a wider variety of assignments, and can be used in expressions. --- 290,296 ----

Element-wise &array; Implementation ! The simplest way to use the &poomatoolkit; is to use the &pooma; &array; class instead of &c; arrays. &array;s automatically handle memory allocation and deallocation, support a wider variety of assignments, and can be used in expressions. *************** *** 309,315 **** class="headerfile">Pooma/Arrays.h must be included. ! The &poomaToolkit; structures must be constructed before their use. --- 309,315 ---- class="headerfile">Pooma/Arrays.h must be included. ! The &poomatoolkit; structures must be constructed before their use. *************** *** 347,359 **** memory leaks. ! The &poomaToolkit; structures must be destructed after their use. ! We describe the use of &array; and the &poomaToolkit; in . &array;s, declared in the Pooma/Arrays.h, are first-class --- 347,359 ---- memory leaks. ! The &poomatoolkit; structures must be destructed after their use. ! We describe the use of &array; and the &poomatoolkit; in . &array;s, declared in the Pooma/Arrays.h, are first-class *************** *** 391,398 **** elements. This is possible because the array knows the extent of its domain. ! Any program using the &poomaToolkit; must initialize the ! toolkit's data structures using Pooma::initialize(argc,argv). This extracts &pooma;-specific command-line options from the command-line arguments in argv and initializes --- 391,398 ---- elements. This is possible because the array knows the extent of its domain. ! Any program using the &poomatoolkit; must initialize the ! &toolkit;'s data structures using Pooma::initialize(argc,argv). This extracts &pooma;-specific command-line options from the command-line arguments in argv and initializes *************** *** 408,414 **** &pooma; supports data-parallel &array; accesses. Many algorithms are more easily expressed using data-parallel ! expressions. Also, the &poomaToolkit; might be able to reorder the data-parallel computations to be more efficient or distribute them among various processors. In this section, we concentrate the differences between the data-parallel implementation of --- 408,414 ---- &pooma; supports data-parallel &array; accesses. Many algorithms are more easily expressed using data-parallel ! expressions. Also, the &poomatoolkit; might be able to reorder the data-parallel computations to be more efficient or distribute them among various processors. In this section, we concentrate the differences between the data-parallel implementation of *************** *** 618,624 **** indicates a particular dimension. Index parameters i and j are in dimension 0 and 1. upperExtent serves an ! analogous purpose. The &poomaToolkit; uses these functions when distribution computation among various processors, but it does not use these functions to ensure nonexistent &array; values are not accessed. Caveat stencil user! --- 618,624 ---- indicates a particular dimension. Index parameters i and j are in dimension 0 and 1. upperExtent serves an ! analogous purpose. The &poomatoolkit; uses these functions when distribution computation among various processors, but it does not use these functions to ensure nonexistent &array; values are not accessed. Caveat stencil user! *************** *** 632,638 **** To convert a program designed for uniprocessor execution to a program designed for multiprocessor execution, the programmer need only specify how each container's domain should be split into ! patches. The &poomaToolkit; automatically distributes the data among the available processors and handles any required communication among processors. --- 632,638 ---- To convert a program designed for uniprocessor execution to a program designed for multiprocessor execution, the programmer need only specify how each container's domain should be split into ! patches. The &poomatoolkit; automatically distributes the data among the available processors and handles any required communication among processors. *************** *** 746,761 **** configured for the particular run-time system. See . ! A layout combines patches with ! contexts so the program can be executed. If &distributedtag; is ! specified, the patches are distributed among the available ! contexts. If &replicatedtag; is specified, each set of patches is ! replicated among each context. Regardless, the containers' ! domains are now distributed among the contexts so the program can ! run. When a patch needs data from another patch, the &pooma; ! toolkit sends messages to the desired patch uses a message-passing ! library. All such communication is automatically performed by the ! toolkit with no need for programmer or user input. FIXME: The two previous paragraphs demonstrate confusion between run-time system and message-passing --- 746,761 ---- configured for the particular run-time system. See . ! A layout combines patches with contexts ! so the program can be executed. If &distributedtag; is specified, ! the patches are distributed among the available contexts. If ! &replicatedtag; is specified, each set of patches is replicated ! among each context. Regardless, the containers' domains are now ! distributed among the contexts so the program can run. When a patch ! needs data from another patch, the &poomatoolkit; sends messages to ! the desired patch uses a message-passing library. All such ! communication is automatically performed by the &toolkit; with no need ! for programmer or user input. FIXME: The two previous paragraphs demonstrate confusion between run-time system and message-passing *************** *** 803,811 **** > or MultiPatch<UniformTag, Remote<CompressibleBrick> > engines. ! The computations for a distributed implementation are ! exactly the same as for a sequential implementation. The &pooma; ! Toolkit and a message-passing library automatically perform all computation. Input and output for distributed programs is different than --- 803,811 ---- > or MultiPatch<UniformTag, Remote<CompressibleBrick> > engines. ! The computations for a distributed implementation are exactly ! the same as for a sequential implementation. The &poomatoolkit; and ! a message-passing library automatically perform all computation. Input and output for distributed programs is different than *************** *** 988,994 **** The mesh and centering declarations are the same for ! uniprocessor and multi-processor implementations. The MultiPatch &engine; distributes requests --- 988,994 ---- The mesh and centering declarations are the same for ! uniprocessor and multiprocessor implementations. The MultiPatch &engine; distributes requests *************** *** 1032,1038 **** The computation for uniprocessor or distributed ! implementations remains the same. The &pooma; toolkit automatically handles all communication necessary to ensure up-to-date values are available when needed. --- 1032,1038 ---- The computation for uniprocessor or distributed ! implementations remains the same. The &poomatoolkit; automatically handles all communication necessary to ensure up-to-date values are available when needed. Index: figures/introduction.mp =================================================================== RCS file: introduction.mp diff -N introduction.mp *** /dev/null Fri Mar 23 21:37:44 2001 --- introduction.mp Mon Dec 17 09:56:55 2001 *************** *** 0 **** --- 1,194 ---- + %% Oldham, Jeffrey D. + %% 2001Dec14 + %% Pooma + + %% Illustrations for Introduction + + %% Assumes TEX=latex. + + input boxes; + + verbatimtex + \documentclass[10pt]{article} + \usepackage{amsmath} + \input{macros.ltx} + \begin{document} + etex + + %% Relationship between science, computational science, and Pooma. + beginfig(101) + numeric unit; unit = 0.8cm; + numeric horizSpace; horizSpace = 8unit; + numeric vertSpace; vertSpace = unit; + numeric nuBoxes; % number of boxes + + % Ensure a list of boxes all have the same width. + % input <- suffixes for the boxes; + % output-> all boxes have the same width (maximum picture width + defaultdx) + vardef samewidth(suffix $)(text t) = + save p_; pair p_; + p_ = maxWidthAndHeight($)(t); + numericSetWidth(xpart(p_)+2defaultdx)($)(t); + enddef; + + % Ensure a list of boxes all have the same height. + % input <- suffixes for the boxes; + % output-> all boxes have the same height (maximum picture height + defaultdy) + vardef sameheight(suffix $)(text t) = + save p_; pair p_; + p_ = maxWidthAndHeight($)(t); + numericSetWidth(ypart(p_)+2defaultdy)($)(t); + enddef; + + % Given a list of boxes, determine the maximum picture width and + % maximum picture height. + % input <- suffixes for the boxes + % output-> pair of maximum picture width and height + vardef maxWidthAndHeight(suffix f)(text t) = + save w_, h_; numeric w_, h_; + w_ = xpart((urcorner pic_.f - llcorner pic_.f)); + h_ = ypart((urcorner pic_.f - llcorner pic_.f)); + forsuffixes uu = t: + if xpart((urcorner pic_.uu - llcorner pic_.uu)) > w_ : + w_ := xpart((urcorner pic_.uu - llcorner pic_.uu)); + fi + if ypart((urcorner pic_.uu - llcorner pic_.uu)) > h_ : + h_ := ypart((urcorner pic_.uu - llcorner pic_.uu)); + fi + endfor + (w_, h_) + enddef; + + % Given a width, ensure a box has the given width. + % input <- box width + % suffix for the one box + % output-> the box has the given width by setting its .dx + vardef numericSetWidthOne(expr width)(suffix f) = + f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f)); + enddef; + + % Given a width, ensure all boxes have the given width. + % input <- box width + % suffixes for the boxes + % output-> all boxes have the given width by setting their .dx + vardef numericSetWidth(expr width)(suffix f)(text t) = + f.dx = 0.5(width - xpart(urcorner pic_.f - llcorner pic_.f)); + forsuffixes $ = t: + $.dx = 0.5(width - xpart(urcorner pic_.$ - llcorner pic_.$)); + endfor + enddef; + + % Given a height, ensure all boxes have the given height. + % input <- box height + % suffixes for the boxes + % output-> all boxes have the given height by setting their .dx + vardef numericSetHeight(expr height)(suffix f)(text t) = + f.dy = 0.5(height - ypart(urcorner pic_.f - llcorner pic_.f)); + forsuffixes $ = t: + $.dy = 0.5(height - ypart(urcorner pic_.$ - llcorner pic_.$)); + endfor + enddef; + + % Ensure a list of boxes and circles all to have the same width, height, + % and diameter. + % input <- suffixes for the boxes and circles + % output-> all boxes have .dx and .dy set so they have the same width, + % height, and radius + % The boxes are squares and the circles are circular, not oval. + vardef sameWidthAndHeight(suffix f)(text t) = + save p_; pair p_; + p_ = maxWidthAndHeight(f)(t); + if (xpart(p_)+2defaultdx >= ypart(p_)+2defaultdy): + numericSetWidth(xpart(p_)+2defaultdx)(f)(t); + numericSetHeight(xpart(p_)+2defaultdx)(f)(t); + else: + numericSetWidth(ypart(p_)+2defaultdy)(f)(t); + numericSetHeight(ypart(p_)+2defaultdy)(f)(t); + fi + enddef; + + % Ensure a list of boxes and circles all to have the same width and + % the same height. Unlike sameWidthAndHeight, the width and height + % can differ. + % input <- suffixes for the boxes and circles + % output-> all boxes have .dx and .dy set so they have the same width, + % height, and radius + % The boxes are squares and the circles are circular, not oval. + vardef sameWidthSameHeight(suffix f)(text t) = + save p_; pair p_; + p_ = maxWidthAndHeight(f)(t); + numericSetWidth(xpart(p_)+2defaultdx)(f)(t); + numericSetHeight(ypart(p_)+2defaultdy)(f)(t); + enddef; + + % Create the boxes. + boxit.b0(btex \textsl{science / math} etex); + boxit.b1(btex \textsl{algorithms} etex); + boxit.b2(btex \textsl{engineering} etex); + boxit.b3(btex \strut $\real^{\dimension} \maps \text{values}$ etex); + boxit.b4(btex \strut $\text{discrete space} \maps \text{values}$ etex); + boxit.b5(btex \strut $(\text{layout}, \text{engine}) \maps \text{values}$ etex); + boxit.b6(btex \strut linear algebra etex); + boxit.b7(btex \strut $\naturalNus^{\dimension} \maps \text{values}$ etex); + boxit.b8(btex etex); + nuBoxes = 8; + boxit.b9(btex \textsl{implementation} etex); + sameWidthSameHeight(b3,b4,b5,b6,b7,b8); + for t = 0 upto nuBoxes+1: + fixsize(b[t]); + endfor + + % Position the boxes. + b0.c = origin; + for t = 0 step 3 until nuBoxes: + b[t+2].c - b[t+1].c = b[t+1].c - b[t].c = (horizSpace, 0); + endfor + for t = 0 step 3 until nuBoxes-3: + b[t].s - b[t+3].n = (0, vertSpace); + endfor + b9.c = 0.5[b1.c,b2.c]; + + % Draw the boxes. + for t = 0 upto nuBoxes+1: + if unknown(b[t].c): + show t; + show b[t].c; + fi + endfor + + for t = 0 upto 2: + drawunboxed(b[t]); + endfor + for t = 3 upto nuBoxes-1: + drawboxed(b[t]); + endfor + drawunboxed(b9); + + % Label the boxes. + label.top(btex continuous field etex, b3.n); + label.top(btex discrete field etex, b4.n); + label.top(btex \pooma\ container etex, b5.n); + label.top(btex mathematical array etex, b7.n); + % label.top(btex custom implementation etex, b8.n); + + % Draw the arrows. + vardef drawAndLabelArrow(expr start, stop, txt, parr) = + path p; p = start -- stop; + drawarrow p; + label.top(txt rotated angle (direction parr of p), point parr of p); + enddef; + vardef drawAndLabelArrowDashed(expr start, stop, txt, parr) = + path p; p = start -- stop; + drawarrow p dashed evenly; + label.top(txt rotated angle (direction parr of p), point parr of p); + enddef; + % drawAndLabelArrowDashed(b4.e, b8.w, btex etex, 0.5); + % drawAndLabelArrowDashed(b7.e, b8.w, btex etex, 0.5); + drawAndLabelArrow(b3.e, b4.w, btex discretization etex, 0.5); + drawAndLabelArrow(b4.e, b5.w, btex \type{Field} etex, 0.5); + drawAndLabelArrow(b6.e, b7.w, btex \begin{tabular}{c} numerical\\ analysis \end{tabular} etex, 0.5); + drawAndLabelArrow(b7.e, b5.w, btex \type{Array} etex, 0.3); + + endfig; + + bye From oldham at codesourcery.com Tue Dec 18 00:50:38 2001 From: oldham at codesourcery.com (Jeffrey Oldham) Date: Mon, 17 Dec 2001 16:50:38 -0800 Subject: Patch: Speed Pooma Evaluations Message-ID: <20011217165038.A3289@codesourcery.com> 1. This patch permits the KCC optimizer to move most of non-data array accesses out of the inner loop. By using local variables rather than references, the optimizer can determine that the inner loops' assignments do not change the local objects' data members, e.g., strides_m. Thus, these values need not be reloaded inside the inner loops so the code is similar to hand-coded C loops. In effect, the compiler can determine which data members are loop-invariant and which are not. Example execution time (seconds) for Linux/KCC's Doof2d with N=1000 include C Brick FieldBrick stencil Brick before change 6.20 9.89 13.29 7.29 after change 6.21 7.48 7.44 7.16 2. The idea can be implemented in at least two different ways. Stephen Smith suggested the idea for the attached patch. It relies on two assumptions: 1) cheap, shallow copies and 2) copies of all non-pointer data members. 3. In Mark Mitchell's suggested implementation, container and engine data members that are invariant during loop iterations are explicitly stored in LoopInvariant_t structures. These constant structures are constructed before the loop and passed to the reads and writes within the loop. These operations use the constant structures rather than the containers' and engines' data members. Thus, the optimizer can determine that the uses of the constant data members can be hoisted out of the loops. Although this implementation can deliver cleaner code since we, as smart humans, might be able to determine better code, it requires much more programmer time and code. We can always implement the idea if needed. A patch for part of the work is attached. 4. Two other sets of loops could be sped up using a similar technique but were not. a. Evaluator/LoopApply.h uses a function object. Since we do not know whether we can copy the object much less copy back into the original, I do not know how to transform the loops. b. Engine/RemoteEngine.h's EngineBlockSerialize could be modified but I could not find any user code to confirm the transformation's correctness. Thanks to Mark Mitchell for finding the idea and creating the technique. Thanks to Stephen Smith for finding the slicker implementation. 2001-11-02 Jeffrey D. Oldham * InlineEvaluator.h (KernelEvaluator::evaluate() for Dim=1..7: Use local variables for the left-hand side and the right-hand side. This permits the KCC optimizer to move loop-invariant code out of the innermost loop, significantly reducing running times. * ReductionEvaluator.h (ReductionEvaluator::evaluate() for Dim=1..7: Use local variables for the expression and the accumulator variable. This permits the KCC optimizer to move loop-invariant code out of the innermost loop, significantly reducing running times. Tested on Linux/KCC by compiling Doof2d and running all the array regression tests. Only the inner loops of Doof2d and src/Evaluator/tests/ReductionTest1 were investigated. (Doof2d compiled 17Dec using LINUXgcc --opt.) Approved by Stephen Smith Applied to mainline Thanks, Jeffrey D. Oldham oldham at codesourcery.com -------------- next part -------------- Index: InlineEvaluator.h =================================================================== RCS file: /home/pooma/Repository/r2/src/Evaluator/InlineEvaluator.h,v retrieving revision 1.26 diff -c -p -r1.26 InlineEvaluator.h *** InlineEvaluator.h 2001/04/13 02:15:06 1.26 --- InlineEvaluator.h 2001/11/03 00:15:14 *************** struct KernelEvaluator *** 157,165 **** { CTAssert(Domain::unitStride); PAssert(domain[0].first() == 0); int e0 = domain[0].length(); for (int i0=0; i0 --- 157,167 ---- { CTAssert(Domain::unitStride); PAssert(domain[0].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); for (int i0=0; i0 *************** struct KernelEvaluator *** 169,179 **** CTAssert(Domain::unitStride); PAssert(domain[0].first() == 0); PAssert(domain[1].first() == 0); int e0 = domain[0].length(); int e1 = domain[1].length(); for (int i1=0; i1 --- 171,183 ---- CTAssert(Domain::unitStride); PAssert(domain[0].first() == 0); PAssert(domain[1].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); int e1 = domain[1].length(); for (int i1=0; i1 *************** struct KernelEvaluator *** 184,196 **** PAssert(domain[0].first() == 0); PAssert(domain[1].first() == 0); PAssert(domain[2].first() == 0); int e0 = domain[0].length(); int e1 = domain[1].length(); int e2 = domain[2].length(); for (int i2=0; i2 --- 188,202 ---- PAssert(domain[0].first() == 0); PAssert(domain[1].first() == 0); PAssert(domain[2].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); int e1 = domain[1].length(); int e2 = domain[2].length(); for (int i2=0; i2 *************** struct KernelEvaluator *** 202,207 **** --- 208,215 ---- PAssert(domain[1].first() == 0); PAssert(domain[2].first() == 0); PAssert(domain[3].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); int e1 = domain[1].length(); int e2 = domain[2].length(); *************** struct KernelEvaluator *** 210,216 **** for (int i2=0; i2 --- 218,224 ---- for (int i2=0; i2 *************** struct KernelEvaluator *** 223,228 **** --- 231,238 ---- PAssert(domain[2].first() == 0); PAssert(domain[3].first() == 0); PAssert(domain[4].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); int e1 = domain[1].length(); int e2 = domain[2].length(); *************** struct KernelEvaluator *** 233,239 **** for (int i2=0; i2 --- 243,249 ---- for (int i2=0; i2 *************** struct KernelEvaluator *** 247,252 **** --- 257,264 ---- PAssert(domain[3].first() == 0); PAssert(domain[4].first() == 0); PAssert(domain[5].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); int e1 = domain[1].length(); int e2 = domain[2].length(); *************** struct KernelEvaluator *** 259,266 **** for (int i2=0; i2 --- 271,278 ---- for (int i2=0; i2 *************** struct KernelEvaluator *** 275,280 **** --- 287,294 ---- PAssert(domain[4].first() == 0); PAssert(domain[5].first() == 0); PAssert(domain[6].first() == 0); + LHS localLHS(lhs); + RHS localRHS(rhs); int e0 = domain[0].length(); int e1 = domain[1].length(); int e2 = domain[2].length(); *************** struct KernelEvaluator *** 289,296 **** for (int i2=0; i2 --- 124,137 ---- { CTAssert(Domain::unitStride); PAssert(domain[0].first() == 0); + Expr localExpr(e); int e0 = domain[0].length(); ! T answer(localExpr.read(0)); for (int i0 = 1; i0 < e0; ++i0) ! op(answer, localExpr.read(i0)); ! ! ret = answer; } template *************** struct ReductionEvaluator --- 159,168 ---- else i00 = 0; for (int i0 = i00; i0 < e0; ++i0) ! op(answer, localExpr.read(i0, i1)); } + + ret = answer; } template *************** struct ReductionEvaluator --- 193,202 ---- else i00 = 0; for (int i0 = i00; i0 < e0; ++i0) ! op(answer, localExpr.read(i0, i1, i2)); } + + ret = answer; } template *************** struct ReductionEvaluator --- 230,239 ---- else i00 = 0; for (int i0 = i00; i0 < e0; ++i0) ! op(answer, localExpr.read(i0, i1, i2, i3)); } + + ret = answer; } template *************** struct ReductionEvaluator --- 270,279 ---- else i00 = 0; for (int i0 = i00; i0 < e0; ++i0) ! op(answer, localExpr.read(i0, i1, i2, i3, i4)); } + + ret = answer; } template *************** struct ReductionEvaluator --- 313,322 ---- else i00 = 0; for (int i0 = i00; i0 < e0; ++i0) ! op(answer, localExpr.read(i0, i1, i2, i3, i4, i5)); } + + ret = answer; } template *************** struct ReductionEvaluator

--- 101,107 ---- Interval<2> dom(Interval<1>(0,14),Interval<1>(0,9)); UniformGridPartition<2> asypart(Loc<2>(3,2),igl,egl); Array<2,double,MultiPatchEngine<UniformTag,CompressibleBrick> ! > Array(UniformGridLayout<2>(dom,asypart,ReplicatedTag());

*************** or overlap onto the data space of patche *** 129,151 **** DynamicLayout ! An inherently 1 dimensional Layout, that allows the patches to be resized. DomainLayout<Dim> A single patch domain defined by a single Interval. !

! Partitioners:

UniformGridLayout, GridLayout and SparseTileLayout

Partitioners:

--- 101,107 ----
Interval<2> dom(Interval<1>(0,14),Interval<1>(0,9));
UniformGridPartition<2> asypart(Loc<2>(3,2),igl,egl);
Array<2,double,MultiPatchEngine<UniformTag,CompressibleBrick> ! > Array(UniformGridLayout<2>(dom,asypart,ReplicatedTag());

************* or overlap onto the data space of patche * 129,151

DynamicLayout !
An inherently 1 dimensional Layout, that allows the patches to be resized.

DomainLayout<Dim>
A single patch domain defined by a single Interval.
!