From jhh at zianet.com  Mon Sep 15 08:12:44 2003
From: jhh at zianet.com (John Hall)
Date: Mon, 15 Sep 2003 02:12:44 -0600
Subject: PETE tool
Message-ID: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com>

Gang:
I have been reading the text of the articles on PETE, although the 
permissions seem to be clobbered for the figures on the CodeSourcery 
website. I would like to play around a little with PETE itself and it 
doesn't seem to be at all accessible, even at Los Alamos' internal 
website. Does anyone have an idea of how I can locate the PETE tool? 
Jeff Oldham referred me to the CodeSourcery/.../POOMA/src/PETE 
directory, but, that is just the compilation from using PETE for the 
POOMA types stuff.

I am now working on porting a new code to POOMA/Tecolote along with Don 
and Jean Marshall and we are trying this time to optimize for single 
processor performance while still being able to run in parallel. So 
this time we want to use all of the good stuff y'all designed for POOMA 
R2 including the various engines, centerings, etc. While  Blanca has 
now been officially killed at Los Alamos, some people are now finally 
beginning to see why we were so interested in POOMA.

While I am at it, I need to do in R2 the equivalent of an R1 loop over 
all of the vnodes on a processor for every processor and then store a 
sparse collection of locs which I then use across multiple conformant 
fields. Does anyone have an example of a safe mechanism for doing this? 
This is a really big deal in that it will allow me to speed up certain 
sparse operations by at least an order of magnitude over the 
data-parallel treatment.

Think of it as walking across the same few sparse locations on a 
hundred different fields and you will have an idea of what I am talking 
about. While pseudo-code is probably sufficient, make sure that it uses 
the real names of the relevant objects as I am still coming slowly up 
to speed on POOMA again.

The reason I am looking at PETE is because I want to fill in the 
operator list for a linked list (maybe a vector, but preferably a 
linked list), so that I can write some code on an extremely sparse 
compressed companion data storage to a data parallel field. E.g. I have 
a single data parallel field which has all of the pure cells and cell 
averages for the mixed cells along with a sparse linked list of the 
mixed cells (a very few cells compared to the problem size). I then 
want to be able to overload the operations for the linked list to allow 
me to do some simple calculations using PETE.

It is really great to be using this technology again. I got a grant for 
$300K to pay for next year's work, so I can work on this stuff full 
time this next year.

Believe it or not, I will be funding Mark Mitchell, et al. to develop 
an open-source Fortran 2000 compiler over the next few years. Is this a 
crazy world or what? I am hoping to use some of the money to continue 
to improve the g++ codebase. Version 3.4 of g++ has the new ISO parser 
we started three years ago fully in place so g++ is now an entirely ISO 
standards conformant compiler except for the export keyword.

Also, does anyone know the name of the code that Chris Luccini was 
working on at Sandia?

Thanks for any help you can give me,
John Hall
(505)234-2743 (Home Carlsbad)
(505)661-3535 (Home Los Alamos)
(505)628-1373 (Work Carlsbad)
(505)667-7568 (Work Los Alamos)

P.S. Don and Jean say "hi!". They are staying with me this month and 
next and we are working night and day to get a jump start on this 
project. Forgive me for mixing personal news with technical requests, 
but, its 2 AM and I didn't feel like writing two messages. Just 
thinking about you guys again brings back warm feelings.


From rguenth at tat.physik.uni-tuebingen.de  Mon Sep 15 08:28:35 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Mon, 15 Sep 2003 10:28:35 +0200 (CEST)
Subject: [pooma-dev] PETE tool
In-Reply-To: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com>
Message-ID: <Pine.LNX.4.44.0309151026560.2498-100000@bellatrix.tat.physik.uni-tuebingen.de>

On Mon, 15 Sep 2003, John Hall wrote:

> Gang:
> I have been reading the text of the articles on PETE, although the
> permissions seem to be clobbered for the figures on the CodeSourcery
> website. I would like to play around a little with PETE itself and it
> doesn't seem to be at all accessible, even at Los Alamos' internal
> website. Does anyone have an idea of how I can locate the PETE tool?
> Jeff Oldham referred me to the CodeSourcery/.../POOMA/src/PETE
> directory, but, that is just the compilation from using PETE for the
> POOMA types stuff.

I just put two versions I downloaded once accessible here:

http://www.tat.physik.uni-tuebingen.de/~rguenth/pooma/pete-2.0.tgz
http://www.tat.physik.uni-tuebingen.de/~rguenth/pooma/pete-2.1.0.tgz

Maybe this helps. Nice to see you guys working on POOMA again!

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


From oldham at codesourcery.com  Tue Sep 16 18:15:48 2003
From: oldham at codesourcery.com (Jeffrey D. Oldham)
Date: Tue, 16 Sep 2003 11:15:48 -0700
Subject: PETE Webpage
In-Reply-To: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com>
References: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com>
Message-ID: <3F675354.7080508@codesourcery.com>

PETE is the expression template framework defining data-parallel 
operators on array-like containers used by POOMA and other tools.  A 
PETE webpage is now available at http://www.codesourcery.com/pooma/pete/ .

PETE versions 2.0 and 2.1.0 are available via the download link on this 
page.  Also, a PETE CVS repository has been established. This repository 
contains a version of 2.1.0 updated for compilation using g++ 3.4 and 
presumably an EDG compiler.

A PETE mailing list has also been established.  It is expected to be a 
very, very low volume mailing list.  Please subscribe if you wish.

Jeffrey D. Oldham
oldham at codesourcery.com


From rguenth at tat.physik.uni-tuebingen.de  Tue Sep 16 18:33:19 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Tue, 16 Sep 2003 20:33:19 +0200 (CEST)
Subject: [pooma-dev] PETE Webpage
In-Reply-To: <3F675354.7080508@codesourcery.com>
References: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com>
 <3F675354.7080508@codesourcery.com>
Message-ID: <Pine.LNX.4.56.0309162026420.586@goofy>

On Tue, 16 Sep 2003, Jeffrey D. Oldham wrote:

> PETE is the expression template framework defining data-parallel
> operators on array-like containers used by POOMA and other tools.  A
> PETE webpage is now available at http://www.codesourcery.com/pooma/pete/ .
>
> PETE versions 2.0 and 2.1.0 are available via the download link on this
> page.  Also, a PETE CVS repository has been established. This repository
> contains a version of 2.1.0 updated for compilation using g++ 3.4 and
> presumably an EDG compiler.
>
> A PETE mailing list has also been established.  It is expected to be a
> very, very low volume mailing list.  Please subscribe if you wish.

I presume there is no CVS history available then? Also there seems to be
PETE material at NERSC

http://acts.nersc.gov/pete/main.html

including tutorial documents linked to the following PDF

http://acts.nersc.gov/pete/documents/Tutorials.pdf

which may be useful for people.

Nice to see PETE has a new home,

Richard.


From rguenth at tat.physik.uni-tuebingen.de  Fri Sep 19 09:51:33 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Fri, 19 Sep 2003 11:51:33 +0200 (CEST)
Subject: Reference Documentation
Message-ID: <Pine.LNX.4.44.0309191146090.2498-100000@bellatrix.tat.physik.uni-tuebingen.de>

Hi!

I still have a load of pending patches to make the inline documentation
available to doxygen. Current state is that I'm still waiting for feedback
on how to organize the needed extra files, see the thread starting at

http://www.codesourcery.com/archives/pooma-dev/msg00315.html

other patches will touch individual source files and reformat the
comments according to doxygen style. Results of doxygenifization can
be viewed at

http://www.tat.physik.uni-tuebingen.de/~rguenth/pooma/reference/

Thanks,

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


From dmarshal at dc.rr.com  Fri Sep 26 06:27:41 2003
From: dmarshal at dc.rr.com (Jean Marshall)
Date: Fri, 26 Sep 2003 00:27:41 -0600
Subject: Sparse Engine
Message-ID: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com>

Hi guys:

John and I are starting to write a sparse engine -- for sparse storage of 
our material-dependent fields.  We think we have come up with an 
optimization for our Eulerian code that should really make it scream.  We 
have been studying the IndirectionEngine example, which is very similar to 
what we need.

Unfortunately, the IndirectionEngine example problem, indirect_test1.cpp, 
only demonstrates how to build the engine, not an array or a field.  Could 
someone please show us how to move forward with this example to building a 
complete array and field version?

Unlike the IndirectionEngine example, we only need local communications 
(not all-to-all) along with the same type of guard cell update found in a 
normal field.

Any help will be greatly appreciated!

Jean, John, Don

Jean and Don Marshall

84250 Indio Springs Dr #291
Indio, CA 92203-3413

760-775-1576  home
760-574-0182  Jean's cell
760-574-0192  Don's cell


From rguenth at tat.physik.uni-tuebingen.de  Fri Sep 26 07:11:06 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Fri, 26 Sep 2003 09:11:06 +0200 (CEST)
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com>
Message-ID: <Pine.LNX.4.44.0309260909570.898-100000@bellatrix.tat.physik.uni-tuebingen.de>

On Fri, 26 Sep 2003, Jean Marshall wrote:

> Hi guys:
>
> John and I are starting to write a sparse engine -- for sparse storage of
> our material-dependent fields.  We think we have come up with an
> optimization for our Eulerian code that should really make it scream.  We
> have been studying the IndirectionEngine example, which is very similar to
> what we need.

Can you elaborate some more on the use and principle of this engine? Is it
like compressed brick?

Richard.

>
> Unfortunately, the IndirectionEngine example problem, indirect_test1.cpp,
> only demonstrates how to build the engine, not an array or a field.  Could
> someone please show us how to move forward with this example to building a
> complete array and field version?
>
> Unlike the IndirectionEngine example, we only need local communications
> (not all-to-all) along with the same type of guard cell update found in a
> normal field.
>
> Any help will be greatly appreciated!
>
> Jean, John, Don
>
> Jean and Don Marshall
>
> 84250 Indio Springs Dr #291
> Indio, CA 92203-3413
>
> 760-775-1576  home
> 760-574-0182  Jean's cell
> 760-574-0192  Don's cell
>
>

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


From jxyh at lanl.gov  Fri Sep 26 07:59:06 2003
From: jxyh at lanl.gov (John H. Hall)
Date: Fri, 26 Sep 2003 01:59:06 -0600
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <Pine.LNX.4.44.0309260909570.898-100000@bellatrix.tat.physik.uni-tuebingen.de>
Message-ID: <4D9D7602-EFF7-11D7-AD5E-0003938E6E0A@lanl.gov>

Richard:
OK. Here goes. The basic idea is that we have a hierarchical field  
structure (built using hierarchical engines similar to the current  
multi-material field abstraction) which has a collection of 1-D  
dynamicFields (for the sparse unstructured storage), a shared Index  
(n-D) integer Array (or Field), and a single (n-D) scalar, vector or  
tensor field which has either the data for a pure cell, or a cell  
average value for mixed-material cell's data. As the problem evolves  
the material interfaces migrate and so the actual position of the  
unstructured cells changes. However, all the indirect indexing is still  
local to the processor (except for normal guard cell communications).  
So this is much simpler than a real unstructured problem with  
all-to-all communications. In the general case, the sparse dynamic  
fields are only used to compute the cell-average quantities before a  
data-parallel computation across the single multi-material or  
cell-average field is performed. We would also like to take some views  
of the field in which all of the data for a particular material is  
gathered/scattered to/from a single spare dynamic work Array that is  
shared in this hierarchical structure.

Field<Mesh_t, Real, MixedCellEngine> would look like this:
___________________
|__________________| Single material Gather/Scatter 1-D Dynamic Work  
Array (both mixed and pure cells)
______
|_____| mat A (1-D Dynamic Array/Field)
_______
|______| mat B (1-D Dynamic Array/Field)
______
|_____| mat C (1-D Dynamic Array/Field)
_______________________________________________________________________
|______________________________________________________________________| 
  Cell Average Quantities (n-D)
_______________________________________________________________________
|______________________________________________________________________| 
  Integer Index Array (n-D)
Single Index Array is shared by all Sparse Fields (e.g. Density,  
Pressure, etc.). This shares duty between
providing the material index for a pure cell and an offset into a  
collection tracking the unstructured
mixed cell data for a mixed cell.

Multi-Patch should still work although the guard cell communications  
might be slightly more complicated.

The number of cells which are indirectly addressed is very small (< 5%  
of the total) so even using compressible brick we are wasting a lot of  
memory bandwidth and performing numerous extraneous computations. A  
comparison code using this structure is running 20 times faster than  
the equivalent data parallel POOMA R1 computation for the single  
processor serial case. We believe we can match that performance by  
building an engine that encapsulates the sparse nature reflected in the  
problem and by making more use of the new engines POOMA R2 provides  
(stencil, etc.).

Again, most of the computations are performed on the Cell-Average  
Quantities, so we just take a view, operator[]?, that returns that  
single field.
John and Jean

On Friday, September 26, 2003, at 01:11  AM, Richard Guenther wrote:

> On Fri, 26 Sep 2003, Jean Marshall wrote:
>
>> Hi guys:
>>
>> John and I are starting to write a sparse engine -- for sparse  
>> storage of
>> our material-dependent fields.  We think we have come up with an
>> optimization for our Eulerian code that should really make it scream.  
>>  We
>> have been studying the IndirectionEngine example, which is very  
>> similar to
>> what we need.
>
> Can you elaborate some more on the use and principle of this engine?  
> Is it
> like compressed brick?
>
> Richard.
>
>>
>> Unfortunately, the IndirectionEngine example problem,  
>> indirect_test1.cpp,
>> only demonstrates how to build the engine, not an array or a field.   
>> Could
>> someone please show us how to move forward with this example to  
>> building a
>> complete array and field version?
>>
>> Unlike the IndirectionEngine example, we only need local  
>> communications
>> (not all-to-all) along with the same type of guard cell update found  
>> in a
>> normal field.
>>
>> Any help will be greatly appreciated!
>>
>> Jean, John, Don
>>
>> Jean and Don Marshall
>>
>> 84250 Indio Springs Dr #291
>> Indio, CA 92203-3413
>>
>> 760-775-1576  home
>> 760-574-0182  Jean's cell
>> 760-574-0192  Don's cell
>>
>>
>
> --
> Richard Guenther <richard dot guenther at uni-tuebingen dot de>
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4387 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20030926/a4e1008e/attachment.bin>

From rguenth at tat.physik.uni-tuebingen.de  Fri Sep 26 08:07:37 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Fri, 26 Sep 2003 10:07:37 +0200 (CEST)
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <4D9D7602-EFF7-11D7-AD5E-0003938E6E0A@lanl.gov>
Message-ID: <Pine.LNX.4.44.0309261004280.898-100000@bellatrix.tat.physik.uni-tuebingen.de>

Ok, still trying to understand - this is something like (statically)
specifying which cells participate in computation? Like you would
have a usual brick engine in conjunction with a bitfield specifying
a mask and using this in the evaluator loop (of course this would be
less memory efficient)? So this would be a cheap way to do this
compared to using the sparse tile layout?

Thanks,

Richard.

On Fri, 26 Sep 2003, John H.Hall wrote:

> Richard:
> OK. Here goes. The basic idea is that we have a hierarchical field
> structure (built using hierarchical engines similar to the current
> multi-material field abstraction) which has a collection of 1-D
> dynamicFields (for the sparse unstructured storage), a shared Index
> (n-D) integer Array (or Field), and a single (n-D) scalar, vector or
> tensor field which has either the data for a pure cell, or a cell
> average value for mixed-material cell's data. As the problem evolves
> the material interfaces migrate and so the actual position of the
> unstructured cells changes. However, all the indirect indexing is still
> local to the processor (except for normal guard cell communications).
> So this is much simpler than a real unstructured problem with
> all-to-all communications. In the general case, the sparse dynamic
> fields are only used to compute the cell-average quantities before a
> data-parallel computation across the single multi-material or
> cell-average field is performed. We would also like to take some views
> of the field in which all of the data for a particular material is
> gathered/scattered to/from a single spare dynamic work Array that is
> shared in this hierarchical structure.
>
> Field<Mesh_t, Real, MixedCellEngine> would look like this:
> ___________________
> |__________________| Single material Gather/Scatter 1-D Dynamic Work
> Array (both mixed and pure cells)
> ______
> |_____| mat A (1-D Dynamic Array/Field)
> _______
> |______| mat B (1-D Dynamic Array/Field)
> ______
> |_____| mat C (1-D Dynamic Array/Field)
> _______________________________________________________________________
> |______________________________________________________________________|
>   Cell Average Quantities (n-D)
> _______________________________________________________________________
> |______________________________________________________________________|
>   Integer Index Array (n-D)
> Single Index Array is shared by all Sparse Fields (e.g. Density,
> Pressure, etc.). This shares duty between
> providing the material index for a pure cell and an offset into a
> collection tracking the unstructured
> mixed cell data for a mixed cell.
>
> Multi-Patch should still work although the guard cell communications
> might be slightly more complicated.
>
> The number of cells which are indirectly addressed is very small (< 5%
> of the total) so even using compressible brick we are wasting a lot of
> memory bandwidth and performing numerous extraneous computations. A
> comparison code using this structure is running 20 times faster than
> the equivalent data parallel POOMA R1 computation for the single
> processor serial case. We believe we can match that performance by
> building an engine that encapsulates the sparse nature reflected in the
> problem and by making more use of the new engines POOMA R2 provides
> (stencil, etc.).
>
> Again, most of the computations are performed on the Cell-Average
> Quantities, so we just take a view, operator[]?, that returns that
> single field.
> John and Jean

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


From jhh at zianet.com  Fri Sep 26 09:33:01 2003
From: jhh at zianet.com (John Hall)
Date: Fri, 26 Sep 2003 03:33:01 -0600
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <Pine.LNX.4.44.0309261004280.898-100000@bellatrix.tat.physik.uni-tuebingen.de>
Message-ID: <6BE65ABA-F004-11D7-AD5E-0003938E6E0A@zianet.com>

Richard:
The idea is to get rid of the loop over materials found in our previous  
iterations of our code. In fact we simply need to compute cell-average  
quantities for the mixed cells and then perform a single data-parallel  
computation over the single mixed material field which does the work  
for all materials at once. So we do a little unstructured work to  
compute the cell average quantities and then we do a data parallel  
computation. There are some other advantages that accrue to the use of  
this unstructured approach that would allow us to store some  
information that would normally be too expensive to store, but, we  
won't go into that here.

The complication is that we want to (in the grand tradition of POOMA)  
hide the underlying complexity of our storage scheme and make things  
appear beautiful and logically simple. A good example might be using a  
storage scheme for a symmetric matrix that only stores an upper  
triangular matrix, but, that allows you to access any index into the  
array and it internally maps the indices into the correct storage  
location.

In our example, the index array is positive for a pure cell and simply  
is the material ID for the material contained in that cell. If the  
index array contains a negative value, then it has traditionally been  
an index into an unstructured linked-list of the mixed cells data. We  
can then access this data and compute a cell-average value which we  
store in that cell of the multi-material field and then we perform our  
data-parallel operations on that multi-material field.

We occasionally need to gather all of the pure and mixed material  
values of a single material so that we can do a single-material  
calculation like an EOS evaluation, so that is why we want the work  
array (which we compress/deallocate when we are not using it). So the  
various views of the data that we would take are the multi-material  
cell average view, the gathered single material view and the overall  
complicated-storage scheme view. To get the kind of performance the old  
code has we will also need to introduce windowing and activity flags.  
Basically, we are attempting to throw away any unnecessary computations  
and minimize the data we are pushing through cache.

The sparse tile layout doesn't have the concept of indirect addressing  
using an index field. It is simply intended for block-AMR type meshes.  
If we do AMR it would probably be a completely unstructured problem in  
which any cell can be refined, rather than a block type. Unfortunately,  
this again introduces the possibility of all-to-all communications  
(slow) to find your neighbors, etc.

We have also been dealing with the issue of how best to do masking. I  
am beginning to think that we need another sparse storage idea so that  
we end up with something equivalent to a block where in which the data  
is collected into lists using a test and the computation is done simply  
over that collection, which gets progressively smaller as the number of  
tests increases. Currently, when using a mask, you end up traversing  
all of the data, maybe even doing the computation everywhere and then  
simply throwing away the result where the mask is not set (either by a  
conditional or by multiplying by 0.0). Building the list for extremely  
sparse data can be a huge win. Like I said, the old version of this  
algorithm is running 20 times faster than the data-parallel version.  
This is only possible by simply doing less work.

We would also like to have exterior guards on a box with a lot of very  
little logically distinct but shared memory patches without guard cells  
within the box. Then we could maybe achieve some reasonable   
compression and our computations should approach AMR storage schemes  
without the undesired Gibb's phenomenon due to poor impedance matching  
across T-joints that AMR has.

I should note that we are aware of the issue of not using certain types  
of dynamically allocated data structures because the guard cell copy  
scheme might only move the pointer to the data and not the actual data.  
We are taking this into account.

Hope this helps,
John Hall


On Friday, September 26, 2003, at 02:07  AM, Richard Guenther wrote:

> Ok, still trying to understand - this is something like (statically)
> specifying which cells participate in computation? Like you would
> have a usual brick engine in conjunction with a bitfield specifying
> a mask and using this in the evaluator loop (of course this would be
> less memory efficient)? So this would be a cheap way to do this
> compared to using the sparse tile layout?
>
> Thanks,
>
> Richard.
>
> On Fri, 26 Sep 2003, John H.Hall wrote:
>
>> Richard:
>> OK. Here goes. The basic idea is that we have a hierarchical field
>> structure (built using hierarchical engines similar to the current
>> multi-material field abstraction) which has a collection of 1-D
>> dynamicFields (for the sparse unstructured storage), a shared Index
>> (n-D) integer Array (or Field), and a single (n-D) scalar, vector or
>> tensor field which has either the data for a pure cell, or a cell
>> average value for mixed-material cell's data. As the problem evolves
>> the material interfaces migrate and so the actual position of the
>> unstructured cells changes. However, all the indirect indexing is  
>> still
>> local to the processor (except for normal guard cell communications).
>> So this is much simpler than a real unstructured problem with
>> all-to-all communications. In the general case, the sparse dynamic
>> fields are only used to compute the cell-average quantities before a
>> data-parallel computation across the single multi-material or
>> cell-average field is performed. We would also like to take some views
>> of the field in which all of the data for a particular material is
>> gathered/scattered to/from a single spare dynamic work Array that is
>> shared in this hierarchical structure.
>>
>> Field<Mesh_t, Real, MixedCellEngine> would look like this:
>> ___________________
>> |__________________| Single material Gather/Scatter 1-D Dynamic Work
>> Array (both mixed and pure cells)
>> ______
>> |_____| mat A (1-D Dynamic Array/Field)
>> _______
>> |______| mat B (1-D Dynamic Array/Field)
>> ______
>> |_____| mat C (1-D Dynamic Array/Field)
>> ______________________________________________________________________ 
>> _
>> |_____________________________________________________________________ 
>> _|
>>   Cell Average Quantities (n-D)
>> ______________________________________________________________________ 
>> _
>> |_____________________________________________________________________ 
>> _|
>>   Integer Index Array (n-D)
>> Single Index Array is shared by all Sparse Fields (e.g. Density,
>> Pressure, etc.). This shares duty between
>> providing the material index for a pure cell and an offset into a
>> collection tracking the unstructured
>> mixed cell data for a mixed cell.
>>
>> Multi-Patch should still work although the guard cell communications
>> might be slightly more complicated.
>>
>> The number of cells which are indirectly addressed is very small (< 5%
>> of the total) so even using compressible brick we are wasting a lot of
>> memory bandwidth and performing numerous extraneous computations. A
>> comparison code using this structure is running 20 times faster than
>> the equivalent data parallel POOMA R1 computation for the single
>> processor serial case. We believe we can match that performance by
>> building an engine that encapsulates the sparse nature reflected in  
>> the
>> problem and by making more use of the new engines POOMA R2 provides
>> (stencil, etc.).
>>
>> Again, most of the computations are performed on the Cell-Average
>> Quantities, so we just take a view, operator[]?, that returns that
>> single field.
>> John and Jean
>
> --
> Richard Guenther <richard dot guenther at uni-tuebingen dot de>
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


From smino at tkm.physik.uni-karlsruhe.de  Fri Sep 26 12:21:49 2003
From: smino at tkm.physik.uni-karlsruhe.de (Sergei Mingaleev)
Date: Fri, 26 Sep 2003 14:21:49 +0200
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com>
References: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com>
Message-ID: <E1A2rba-0005v7-00@tkmphoton3.tkm.uni-karlsruhe.de>

Hello, Jean:

>> John and I are starting to write a sparse engine -- for sparse storage of
>> our material-dependent fields.  
>> ....
>> only demonstrates how to build the engine, not an array or a field.  Could
>> someone please show us how to move forward with this example to building a
>> complete array and field version?

I has written some time ago a simple version of the SparseEngine - 
see it attached.  At the momemnt the Engine works only with 2D 
and 4D arrays, and it cannot be parallelized. 

I don't really like it - it should be completely rewritten - but may 
be, it could be useful for you as an example?

The main file:  SparseEngine.h

All other files are included from the main file:

  SparseEngine2.h - support for 2D Arrays. 
  SparseEngine4.h - support for 4D Arrays.
  SparseOperators.h - some operators.

The program starts with: 

#define SPL_DEBUG_SPARSE
#include "SparseEngine.h"

The Sparse Array can be created as usually:

  Array<2,Sparse>  A(I,J);

I have not tried, but I guess, it should work with Field class, too.

The SparseEngine contains some specific functions, which can 
be accessed as demonstrated in the example below: 

#ifdef SPL_DEBUG_SPARSE
  A.engine().pack();  // compactify the Sparse Array...
  cout << "Sparse Array Filling = " 
       << int(100*(1.0-A.engine().free()/(double)(A.engine().size())))
       << " %" << endl;
#endif

There are predefined constants:

#define SPARSE_TOLERANCE  1e-10
#define SPARSITY_LEVEL  0.5

Before I tried to define these constants in the SparseEngine constructors 
and initialize() functions, but it worked badly - the problem is that
constructors and initializators of Arrays have some restrictive 
assumptions concerning the corresponding functions of Engines. 

Best wishes,
Sergei.

-- 
----  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
Dr. Sergei Mingaleev
Institut fur Theorie der Kondensierten Materie
Universitat Karlsruhe, 76128 Karlsruhe, Germany 
------------------------------------------------------
Phone:  +49-(721)-608-2136   Fax:   +49-(721)-608-7779
E-mail: smino at tkm.physik.uni-karlsruhe.de
Web:    http://www-tkm.physik.uni-karlsruhe.de/~smino/  
        http://wwwrsphysse.anu.edu.au/nonlinear/sfm/
------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SparseEngine.h
Type: text/x-c++
Size: 4262 bytes
Desc: SparseEngine.h
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20030926/54ecc45a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SparseEngine2.h
Type: text/x-c++
Size: 12399 bytes
Desc: SparseEngine2.h
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20030926/54ecc45a/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SparseEngine4.h
Type: text/x-c++
Size: 8033 bytes
Desc: SparseEngine4.h
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20030926/54ecc45a/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SparseOperators.h
Type: text/x-c++
Size: 1119 bytes
Desc: SparseOperators.h
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20030926/54ecc45a/attachment-0003.bin>

From rguenth at tat.physik.uni-tuebingen.de  Fri Sep 26 14:09:42 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Fri, 26 Sep 2003 16:09:42 +0200 (CEST)
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <E1A2rba-0005v7-00@tkmphoton3.tkm.uni-karlsruhe.de>
Message-ID: <Pine.LNX.4.44.0309261605240.898-100000@bellatrix.tat.physik.uni-tuebingen.de>

On Fri, 26 Sep 2003, Sergei Mingaleev wrote:

> Hello, Jean:
>
> >> John and I are starting to write a sparse engine -- for sparse storage of
> >> our material-dependent fields.
> >> ....
> >> only demonstrates how to build the engine, not an array or a field.  Could
> >> someone please show us how to move forward with this example to building a
> >> complete array and field version?
>
> I has written some time ago a simple version of the SparseEngine -
> see it attached.  At the momemnt the Engine works only with 2D
> and 4D arrays, and it cannot be parallelized.

I think this notion of a sparse engine is different from Jeans. In fact
the whole point of sparsity in Jeans case is probably the multi-material
stuff and resulting optimizations (which I dont get completely at the
moment).

Just the sparsity you invented looks like it could be done better by
having a (possibly shared) bitmap of valid locations and a evaluator
taking that into account. Memory usage would be reduced by not accessing
the unused parts and such only wasting virtual memory. Of course the
bitmap (if not changing) could be compressed f.i. with a run-length
encoding. This should be an efficient way to have arbitrary
shaped boundaries at least in the serial case. For the parallel case you'd
probably need to do some clever load balancing tricks to not get hurt by
mostly "empty" bricks.

Richard.


From smino at tkm.physik.uni-karlsruhe.de  Fri Sep 26 15:24:13 2003
From: smino at tkm.physik.uni-karlsruhe.de (Sergei Mingaleev)
Date: Fri, 26 Sep 2003 17:24:13 +0200
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <Pine.LNX.4.44.0309261605240.898-100000@bellatrix.tat.physik.uni-tuebingen.de>
References: <Pine.LNX.4.44.0309261605240.898-100000@bellatrix.tat.physik.uni-tuebingen.de>
Message-ID: <E1A2uS5-0000pO-00@tkmphoton3.tkm.uni-karlsruhe.de>

Hi Richard,

>> I think this notion of a sparse engine is different from Jeans. In fact

Yes, now I see that it is quite different...

>> Just the sparsity you invented looks like it could be done better by
>> having a (possibly shared) bitmap of valid locations and a evaluator
>> taking that into account. Memory usage would be reduced by not accessing
>> the unused parts and such only wasting virtual memory. 

Do you mean creation of the bitmap array with the same size as the size of 
the Sparse Array? This realization is good only for not very large Sparse 
Arrays, but what if we need to work with the array having (1000000 x 1000000) 
points or larger one? In this case the bitmap will be about 100 GBytes - too  
huge! So, we need to remember only positions of non-zero elements. And we 
need some fast way of determining if the point (i,j) has non-zero value of 
A(i,j) or not? - it would be very slow just to search for the given point 
(i,j) in the list of non-zero elements. Thus, we need to use some complicated 
chain-like organization of the list of non-zero elements, with possibility to 
add, as fast as possible, new non-zero elements, and remove (set to zero) old 
elements. 

My realization of the SparseEngine uses the standard storage scheme commonly 
used for Sparse Matrices - and for 2D arrays it is rather efficient for both, 
memeory usage and speed of access/modification of elements. Unfortunately, it 
can be hardly extended to arbitrary-dimensional arrays. 

By the way - the tolerance, determined initially by the constant 
SPARSE_TOLERANCE, can be later on changed to new value by the command: 

A.engine().tolerance()=1.0e-5;

One can also add the command: 

A.engine().resize(N);

to be able to increase/decrease the physical memory occupied by the Sparse 
Array. I am not only sure - may be, there is some more elegant way to 
add such kind of functionality? 

Cheers,
Sergei.


From rguenth at tat.physik.uni-tuebingen.de  Fri Sep 26 22:26:32 2003
From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther)
Date: Sat, 27 Sep 2003 00:26:32 +0200 (CEST)
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <E1A2uS5-0000pO-00@tkmphoton3.tkm.uni-karlsruhe.de>
References: <Pine.LNX.4.44.0309261605240.898-100000@bellatrix.tat.physik.uni-tuebingen.de>
 <E1A2uS5-0000pO-00@tkmphoton3.tkm.uni-karlsruhe.de>
Message-ID: <Pine.LNX.4.58.0309270020100.603@goofy>

On Fri, 26 Sep 2003, Sergei Mingaleev wrote:

> Hi Richard,
>
> >> I think this notion of a sparse engine is different from Jeans. In fact
>
> Yes, now I see that it is quite different...
>
> >> Just the sparsity you invented looks like it could be done better by
> >> having a (possibly shared) bitmap of valid locations and a evaluator
> >> taking that into account. Memory usage would be reduced by not accessing
> >> the unused parts and such only wasting virtual memory.
>
> Do you mean creation of the bitmap array with the same size as the size of
> the Sparse Array? This realization is good only for not very large Sparse
> Arrays, but what if we need to work with the array having (1000000 x 1000000)
> points or larger one? In this case the bitmap will be about 100 GBytes - too
> huge! So, we need to remember only positions of non-zero elements.

Yes, you'd reduce the memory requirement by doing run length encoding.
This way the size of the bitmap will not be bigger as the number of used
cells (usually a lot less).

> And we
> need some fast way of determining if the point (i,j) has non-zero value of
> A(i,j) or not? - it would be very slow just to search for the given point
> (i,j) in the list of non-zero elements. Thus, we need to use some complicated
> chain-like organization of the list of non-zero elements, with possibility to
> add, as fast as possible, new non-zero elements, and remove (set to zero) old
> elements.

You should be able to do log n time searches in the bitmap, if you really
need to. But in the common use of applying an Evaluator youd just traverse
the bitmap in optimal oder and determining which elements are used is
nearly a noop.

But maybe we're again talking about "different" sparsity here... I'd call
the unused (what you call zero) elements not participate in calculation,
just like with an arbitrary shaped domain. You seem to suggest more like a
compressed engine approach?

> My realization of the SparseEngine uses the standard storage scheme commonly
> used for Sparse Matrices - and for 2D arrays it is rather efficient for both,
> memeory usage and speed of access/modification of elements. Unfortunately, it
> can be hardly extended to arbitrary-dimensional arrays.

Yes, for sparse matrices one usually uses very special data-structures.
And these tend to be used for statically shaped matrices only, too.

> By the way - the tolerance, determined initially by the constant
> SPARSE_TOLERANCE, can be later on changed to new value by the command:
>
> A.engine().tolerance()=1.0e-5;
>
> One can also add the command:
>
> A.engine().resize(N);
>
> to be able to increase/decrease the physical memory occupied by the Sparse
> Array. I am not only sure - may be, there is some more elegant way to
> add such kind of functionality?

Hmm, this sounds different to what I have in mind. It sounds like you want
to do a multidimensional wavelet compression here.

Richard.


From smino at tkm.physik.uni-karlsruhe.de  Sat Sep 27 14:45:35 2003
From: smino at tkm.physik.uni-karlsruhe.de (Sergei Mingaleev)
Date: Sat, 27 Sep 2003 16:45:35 +0200
Subject: [pooma-dev] Sparse Engine
In-Reply-To: <Pine.LNX.4.58.0309270020100.603@goofy>
References: <Pine.LNX.4.44.0309261605240.898-100000@bellatrix.tat.physik.uni-tuebingen.de> <E1A2uS5-0000pO-00@tkmphoton3.tkm.uni-karlsruhe.de> <Pine.LNX.4.58.0309270020100.603@goofy>
Message-ID: <E1A3GKF-00054C-00@tkmphoton3.tkm.uni-karlsruhe.de>

>> But maybe we're again talking about "different" sparsity here... I'd call
>> the unused (what you call zero) elements not participate in calculation,
>> just like with an arbitrary shaped domain. You seem to suggest more like a
>> compressed engine approach?

Yes, we are talking about different sparsity here. I mean just an extension 
of the Sparse Matrix approach - and the requirements for the preformance in 
this case are rather specific - optimization for the matrix-matrix and 
matrix-vector multiplication, in particular. Even log(N) search is too long 
if we have many array-array multiplications - the situation which I have in 
some of my programs. Of course, for different problems one can need different 
types of sparsity and, correspondingly, different Engines. For example, 
sometimes I feel that I really need Arrays with arbitrary shaped domains 
which can be realized as you suggested. 

By the way - sometimes, for some specific problems, we really need some 
additional Engines and some other classes/subroutines, which are not generic 
enough to be included into Pooma - but which could be very useful being 
included into some "contributions" packages - they can include, besides new  
Engines, support for Array/Field visualization, input/output of classes in 
different storage formats, some primitive linear/nonlinear algebra 
subroutines like solving a system of equations, etc. Is it possible to create 
and manage the directory for such contributions on Pooma.CodeSourcery.com? 
I think it would be a good place also for testing new unstable features of 
Pooma or for alternative realizations of some of its classes. 

Cheers,
Sergei.

-- 
----  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
Dr. Sergei Mingaleev
Institut fur Theorie der Kondensierten Materie
Universitat Karlsruhe, 76128 Karlsruhe, Germany 
------------------------------------------------------
Phone:  +49-(721)-608-2136   Fax:   +49-(721)-608-7779
E-mail: smino at tkm.physik.uni-karlsruhe.de
Web:    http://www-tkm.physik.uni-karlsruhe.de/~smino/  
        http://wwwrsphysse.anu.edu.au/nonlinear/sfm/
------------------------------------------------------