[pooma-dev] Plan for Reducing Pooma's Running Time

Tue Aug 21 16:11:19 UTC 2001

> -----Original Message-----
> From: Jeffrey Oldham [mailto:oldham at codesourcery.com]
> Sent: Tuesday, August 14, 2001 9:47 PM
> To: pooma-dev at pooma.codesourcery.com
> Subject: [pooma-dev] Plan for Reducing Pooma's Running Time
> 
[...]

> To permit comparisons between executions of different programs, we can
> compute the abstraction ratio for using data-parallel or Pooma2
> abstractions.  The abstraction ratio is the ratio of a program's
> running time to the corresponding C program's running time.  We want
> this ratio to be at most one.

Two comments: 

First, I think you just said that we want to have the abstraction penalty be
negative ("the ratio to be at most one"), which strikes me as unlikely.
Especially if the C compiler takes advantage of restrict. 

Second, it is impractical for us to write C code for comparison with the
POOMA kernals running in parallel. Or at least it is impractical to do this
for very many kernels. Thus we also need to look at scaling and other
measurements of parallel performance. 

> We need to resolve timing granularity problems and which time
> measurements to make, e.g., wall-clock or CPU time.

This is really only an issue with the Benchmark class or other experiments
that simply amount to timing the execution of an entire run. The profiling
tools shouldn't have this problem.

> 
> 
> Infrastructure:
> 
> We should establish daily builds and benchmark runs to check that
> running times do not increase while we try to reduce running times.
> Running times on both Irix and Linux is desirable.  We'll use QMTest
> to perform the testing.
> 
> Question: Should we post these reports on a daily basis?

Probably not - if we could automate putting them on a website, that would be
cool.

> 
> We should use KCC.  Some preliminary performance indicates that KCC's
> and gcc's performances differ.  Tools to profile the code include
> Linux's gprof (instructions available in info pages and
> http://sources.redhat.com/binutils/docs-2.10/gprof.html) and Irix's
> ssrun(1) and prof(1).
> 
> Question: Are there other profiling tools?

When I tried gprof with KCC, it crashed (gprof did, that is). I haven't yet
looked at Gaby's notes to figure out if I did something wrong, or if we have
a different configuration. 

At any rate, gprof is OK for serial benchmarking, which is where we want to
start, but we need something else when we start benchmarking in parallel.
The tool that we've used before is called Tau. I think there are some links
to it on the acl web site. I've never used it on linux, so we'll have to
check that out. I believe this is supposed to work either with threads or
with message passing, but currently it doesn't handle both. But neither does
POOMA at this point, so that's OK. 

> Work:
> 
> Scott Haney suggests first speeding Array execution since (New)Fields
> use Arrays.  A good initial step would be checking that the KCC
> optimizer produces code similar to C code without Pooma abstraction
> overheads. 

I've run ABCTest with KCC and, not too surprisingly, there is an observed
abstraction penalty - the C code got about 45 MFlops for large arrays, and
the POOMA (Brick) code got about 30. Given that these asymptotic results
should be measuring memory speed, this is probably the result of the C loops
being jammed or something, resulting in some load-store optimizations that
the optimizer can't do with the POOMA code since POOMA does not inline
everything. I haven't looked at the C output from KCC yet (which can be a
pain to decipher - there used to be a product called "cback" that was
designed to clean up the output from CFRONT. I wonder if it still exists.). 

> First, we can compare C-style arrays with Brick Array
> engines on uniprocessor machines.  Then, we work with multipatch Array
> engines, trying to reduce the overhead of having multiple patches.
> Trying various patch sizes on a uniprocessor machine will demonstrate
> the overhead of having multipatches.  We'll defer threaded and
> multi-processor execution to later.

The various benchmarks and the Benchmark class were designed with these sort
of tests in mind.

> 
> Stephen will soon post a list of Array benchmarks, what they test, and
> what they do not test.  We can write additional programs to fill any
> deficiencies in our testing.  Each individual researcher can speed a
> benchmark's execution.
> 
> Work on the NewField should be delayed until Stephen Smith and I merge
> our work into the mainline.  Currently, there is one benchmark program
> benchmarks/Doof2d that use NewField.h.  We also will have the Lee et
> al. statigraphic flow code.  Are these sufficient for testing?  If
> not, should we write more test cases?  Will we want to finish the
> Caramana et al. hydrodynamics program?
> 
> Question: Who besides Jeffrey has access to a multi-processor computer
> with more than a handful of processors?

I've got an account on chi now, and should be able to get back onto nirvana
without too much hassle (I hope). 

> 
> Question: Do we need to check for memory leaks?  Bluemountain has
> purify, which should reveal leaks.  Perhaps we can modify the QMTest
> scripts to ease checking.

This isn't a performance issue, but we definitely want to put purify in our
test suite.

> 
> Procedure for Modifying Pooma Code:
> 
> Even though we'll probably work on a separate development branch, we
> need to ensure that the Pooma code compiles at all times to permit
> multiple programmers to work on the same code.  Before committing a
> code change,
> 
> 1. Make sure the Pooma library compiles with the change.  Also check
>    that associated executables still run.
> 2. Obtain patch approval from at least one other person.
> 3. Commit the patch.
> 4. Send email to pooma-dev at pooma.codesourcery.com, listing
>   a. the changes and an explanation,
>   b. the test platform, and
>   c. the patch approver.

I never do step 4 - given that all this information should be in the CVS
checkin message and that we have a CVS mailing list, why make a redundant
post to pooma-dev?

> 
> To Do List:
> 
> o Complete this list.
> o Add this list in the Pooma CVS tree for easy sharing and 
> modification.
> o Describe the existing benchmarks.   Stephen
> o Determine what execution tasks are not covered by existing 
> code.	Stephen
> o Determine interesting benchmarks using Arrays.
>     Stephen recommends starting with benchmarks/Doof2dUMP.	Gaby?
> o Establish nightly Pooma builds for Linux and Irix, producing summary
>   reports.  Jeffrey
> o Ensure Pooma compiles with the threads package.	Jim?

I can work on SMARTS.

I think it is important that we get Tau up and working with POOMA on the
platforms that we'll be profiling on. This may not be a small task. 

  Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20010821/9e9969b0/attachment.html>