Explanation of blockAndEvaluate()

Jeffrey Oldham oldham at codesourcery.com
Tue Dec 4 20:43:13 UTC 2001


Mark requested that Stephen Smith's explanation be posted to the
pooma-dev mailing list so it is archived for posterity.

Jeffrey's complaint:
> When I run the attached Pooma program (from examples/Manual/Doof2d/)
> for one-processor, it works fine, returning 55.0221 for 4 averagings
> and an array size of 20.  When I run it with Pooma configured with
> --messaging and use the MM Shared Memory Library, it returns 0.  Just
> before the blockAndEvaluate() call, the "b" array has the proper value
> but afterwards it has changed to zero.  Why?  Why is it ever dangerous
> to call blockAndEvaluate()?  How do I explain when to call
> blockAndEvaluate()?

The program is attached.

Stephen Smith's (stephens at proximation.com) reply:
> This code is missing a blockAndEvaluate, it should look
> like:
>  
> a = b = 0;
> Pooma::blockAndEvaluate();
> b(n/2,n/2) = 1000.0;
> 
> Currently the default is that all code is dangerous, which may
> not be a good thing.  To ensure correctness you either need
> to run with --poomaBlockingExpressions or add blockAndEvaluate()
> all the necessary places.
> 
> Here's the basic issue:
> 
> 1: a = b;
> 2: c = a;
> 3: e = c;
> 4: c(5) = 7;
> 5: d = c + e;
> 6: cout << d(5) << d(3) << endl;
> 
> For this code to work correctly, the data-parallel expressions
> writing to c must be done before statement 4 is run and the
> data-parallel expression writing to d must be done before the
> line that prints values from d.  Using blockingExpressions()
> ensures correctness by inserting blockAndEvaluate() after EVERY
> data-parallel statement:
> 
> 1: a = b;
>    blockAndEvaluate();
> 2: c = a;
>    blockAndEvaluate();
> 3: e = c;
>    blockAndEvaluate();
> 4: c(5) = 7;
> 5: d = c + e;
>    blockAndEvaluate();
> 6: cout << d(5) << d(3) << endl;
> 
> This may not be very efficient when the arrays are decomposed
> into patches, because all the patches in statement 1 must execute
> before any from statement 2.  It would be a lot more cache efficient
> to perform (a = b; c = a; e = c;) on one patch, then move to the next
> patch.
> 
> In the past, my recommendation to users was to add blockAndEvaluate
> immediately before any serial code:
> 
> 1: a = b;
> 2: c = a;
> 3: e = c;
>    blockAndEvaluate();
> 4: c(5) = 7;
> 5: d = c + e;
>    blockAndEvaluate();
> 6: cout << d(5) << d(3) << endl;
> 
> This approach is guaranteed to ensure correctness.  There was no
> way for use to implement this automatically.  We know inside POOMA
> every time a data-parallel expression occurs, but we don't know what
> the next statement is going to be.  There's no simple way to check for
> serial access without slowing the code down incredibly.  All the inner
> loops which get run by SMARTS also access elements through operator(),
> so we would have to put an if test for every element access that would
> say "Are we running inside the evaluator, or back in the users code?"
> 
> So the use of blockAndEvaluate is an optimization.  Perhaps it would be
> better to make --blockingExpressions the default and if users want more
> efficient code they can add the necessary blockAndEvaluates and run
> --withoutBlockingExpressions.  Note that if they really understand
> the parallelism issues, they could get trickier:
>  
> 1: a = b;
> 2: c = a;
>    blockAndEvaluate();
> 3: e = c;
> 4: c(5) = 7;
> 5: d = c + e;
>    blockAndEvaluate();
> 6: cout << d(5) << d(3) << endl;
> 
> is also correct because we've guaranteed that c has been computed.  Note
> that blockAndEvaluate() causes EVERY expression to finally be computed.
> We had at one point thought about a more specific syntax:
> 
> blockOnEvaluation(c);
> c(5) = 7;
> 
> This syntax would ensure that all the expressions relating to a given
> array are finished.  (That would allow the main branch of the code to
> continue while some computations are still going.)
> 
> This idea is a ways off from even being prototyped, though.

Thanks,
Jeffrey D. Oldham
oldham at codesourcery.com
-------------- next part --------------
#include <iostream>		// has std::cout, ...
#include <stdlib.h>		// has EXIT_SUCCESS
#include "Pooma/Arrays.h"	// has Pooma's Array

// Doof2d: Pooma Arrays, element-wise implementation

int main(int argc, char *argv[])
{
  // Prepare the Pooma library for execution.
  Pooma::initialize(argc,argv);
  
  // Ask the user for the number of averagings.
  long nuAveragings, nuIterations;
  std::cout << "Please enter the number of averagings: ";
  std::cin >> nuAveragings;
  nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings.

  // Ask the user for the number n of elements along one dimension of
  // the grid.
  long n;
  std::cout << "Please enter the array size: ";
  std::cin >> n;

  // Specify the arrays' domains [0,n) x [0,n).
  Interval<1> N(0, n-1);
  Interval<2> vertDomain(N, N);

  // Create the arrays.
  // The template parameters indicate 2 dimensions, a 'double' element
  // type, and ordinary 'Brick' storage.
  Array<2, double, Brick> a(vertDomain);
  Array<2, double, Brick> b(vertDomain);

  // Set up the initial conditions.
  // All grid values should be zero except for the central value.
  a = b = 0.0;
  b(n/2,n/2) = 1000.0;

  // In the average, weight element with this value.
  const double weight = 1.0/9.0;

  // Perform the simulation.
  for (int k = 0; k < nuIterations; ++k) {
    // Read from b.  Write to a.
    for (int j = 1; j < n-1; j++)
      for (int i = 1; i < n-1; i++)
        a(i,j) = weight *
          (b(i+1,j+1) + b(i+1,j  ) + b(i+1,j-1) +
           b(i  ,j+1) + b(i  ,j  ) + b(i  ,j-1) +
           b(i-1,j+1) + b(i-1,j  ) + b(i-1,j-1));

    // Read from a.  Write to b.
    for (int j = 1; j < n-1; j++)
      for (int i = 1; i < n-1; i++)
        b(i,j) = weight *
          (a(i+1,j+1) + a(i+1,j  ) + a(i+1,j-1) +
           a(i  ,j+1) + a(i  ,j  ) + a(i  ,j-1) +
           a(i-1,j+1) + a(i-1,j  ) + a(i-1,j-1));
  }

  // Print out the final central value.
  std::cout << "before: " << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl; // TMP
  Pooma::blockAndEvaluate();	// Ensure all computation has finished.
  std::cout << (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) << std::endl;

  // The arrays are automatically deallocated.

  // Tell the Pooma library execution has finished.
  Pooma::finalize();
  return EXIT_SUCCESS;
}


More information about the pooma-dev mailing list