From jhh at zianet.com Mon Sep 15 08:12:44 2003 From: jhh at zianet.com (John Hall) Date: Mon, 15 Sep 2003 02:12:44 -0600 Subject: PETE tool Message-ID: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com> Gang: I have been reading the text of the articles on PETE, although the permissions seem to be clobbered for the figures on the CodeSourcery website. I would like to play around a little with PETE itself and it doesn't seem to be at all accessible, even at Los Alamos' internal website. Does anyone have an idea of how I can locate the PETE tool? Jeff Oldham referred me to the CodeSourcery/.../POOMA/src/PETE directory, but, that is just the compilation from using PETE for the POOMA types stuff. I am now working on porting a new code to POOMA/Tecolote along with Don and Jean Marshall and we are trying this time to optimize for single processor performance while still being able to run in parallel. So this time we want to use all of the good stuff y'all designed for POOMA R2 including the various engines, centerings, etc. While Blanca has now been officially killed at Los Alamos, some people are now finally beginning to see why we were so interested in POOMA. While I am at it, I need to do in R2 the equivalent of an R1 loop over all of the vnodes on a processor for every processor and then store a sparse collection of locs which I then use across multiple conformant fields. Does anyone have an example of a safe mechanism for doing this? This is a really big deal in that it will allow me to speed up certain sparse operations by at least an order of magnitude over the data-parallel treatment. Think of it as walking across the same few sparse locations on a hundred different fields and you will have an idea of what I am talking about. While pseudo-code is probably sufficient, make sure that it uses the real names of the relevant objects as I am still coming slowly up to speed on POOMA again. The reason I am looking at PETE is because I want to fill in the operator list for a linked list (maybe a vector, but preferably a linked list), so that I can write some code on an extremely sparse compressed companion data storage to a data parallel field. E.g. I have a single data parallel field which has all of the pure cells and cell averages for the mixed cells along with a sparse linked list of the mixed cells (a very few cells compared to the problem size). I then want to be able to overload the operations for the linked list to allow me to do some simple calculations using PETE. It is really great to be using this technology again. I got a grant for $300K to pay for next year's work, so I can work on this stuff full time this next year. Believe it or not, I will be funding Mark Mitchell, et al. to develop an open-source Fortran 2000 compiler over the next few years. Is this a crazy world or what? I am hoping to use some of the money to continue to improve the g++ codebase. Version 3.4 of g++ has the new ISO parser we started three years ago fully in place so g++ is now an entirely ISO standards conformant compiler except for the export keyword. Also, does anyone know the name of the code that Chris Luccini was working on at Sandia? Thanks for any help you can give me, John Hall (505)234-2743 (Home Carlsbad) (505)661-3535 (Home Los Alamos) (505)628-1373 (Work Carlsbad) (505)667-7568 (Work Los Alamos) P.S. Don and Jean say "hi!". They are staying with me this month and next and we are working night and day to get a jump start on this project. Forgive me for mixing personal news with technical requests, but, its 2 AM and I didn't feel like writing two messages. Just thinking about you guys again brings back warm feelings. From rguenth at tat.physik.uni-tuebingen.de Mon Sep 15 08:28:35 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Mon, 15 Sep 2003 10:28:35 +0200 (CEST) Subject: [pooma-dev] PETE tool In-Reply-To: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com> Message-ID: On Mon, 15 Sep 2003, John Hall wrote: > Gang: > I have been reading the text of the articles on PETE, although the > permissions seem to be clobbered for the figures on the CodeSourcery > website. I would like to play around a little with PETE itself and it > doesn't seem to be at all accessible, even at Los Alamos' internal > website. Does anyone have an idea of how I can locate the PETE tool? > Jeff Oldham referred me to the CodeSourcery/.../POOMA/src/PETE > directory, but, that is just the compilation from using PETE for the > POOMA types stuff. I just put two versions I downloaded once accessible here: http://www.tat.physik.uni-tuebingen.de/~rguenth/pooma/pete-2.0.tgz http://www.tat.physik.uni-tuebingen.de/~rguenth/pooma/pete-2.1.0.tgz Maybe this helps. Nice to see you guys working on POOMA again! Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From oldham at codesourcery.com Tue Sep 16 18:15:48 2003 From: oldham at codesourcery.com (Jeffrey D. Oldham) Date: Tue, 16 Sep 2003 11:15:48 -0700 Subject: PETE Webpage In-Reply-To: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com> References: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com> Message-ID: <3F675354.7080508@codesourcery.com> PETE is the expression template framework defining data-parallel operators on array-like containers used by POOMA and other tools. A PETE webpage is now available at http://www.codesourcery.com/pooma/pete/ . PETE versions 2.0 and 2.1.0 are available via the download link on this page. Also, a PETE CVS repository has been established. This repository contains a version of 2.1.0 updated for compilation using g++ 3.4 and presumably an EDG compiler. A PETE mailing list has also been established. It is expected to be a very, very low volume mailing list. Please subscribe if you wish. Jeffrey D. Oldham oldham at codesourcery.com From rguenth at tat.physik.uni-tuebingen.de Tue Sep 16 18:33:19 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Tue, 16 Sep 2003 20:33:19 +0200 (CEST) Subject: [pooma-dev] PETE Webpage In-Reply-To: <3F675354.7080508@codesourcery.com> References: <624A6553-E754-11D7-A431-0003938E6E0A@zianet.com> <3F675354.7080508@codesourcery.com> Message-ID: On Tue, 16 Sep 2003, Jeffrey D. Oldham wrote: > PETE is the expression template framework defining data-parallel > operators on array-like containers used by POOMA and other tools. A > PETE webpage is now available at http://www.codesourcery.com/pooma/pete/ . > > PETE versions 2.0 and 2.1.0 are available via the download link on this > page. Also, a PETE CVS repository has been established. This repository > contains a version of 2.1.0 updated for compilation using g++ 3.4 and > presumably an EDG compiler. > > A PETE mailing list has also been established. It is expected to be a > very, very low volume mailing list. Please subscribe if you wish. I presume there is no CVS history available then? Also there seems to be PETE material at NERSC http://acts.nersc.gov/pete/main.html including tutorial documents linked to the following PDF http://acts.nersc.gov/pete/documents/Tutorials.pdf which may be useful for people. Nice to see PETE has a new home, Richard. From rguenth at tat.physik.uni-tuebingen.de Fri Sep 19 09:51:33 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 19 Sep 2003 11:51:33 +0200 (CEST) Subject: Reference Documentation Message-ID: Hi! I still have a load of pending patches to make the inline documentation available to doxygen. Current state is that I'm still waiting for feedback on how to organize the needed extra files, see the thread starting at http://www.codesourcery.com/archives/pooma-dev/msg00315.html other patches will touch individual source files and reformat the comments according to doxygen style. Results of doxygenifization can be viewed at http://www.tat.physik.uni-tuebingen.de/~rguenth/pooma/reference/ Thanks, Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From dmarshal at dc.rr.com Fri Sep 26 06:27:41 2003 From: dmarshal at dc.rr.com (Jean Marshall) Date: Fri, 26 Sep 2003 00:27:41 -0600 Subject: Sparse Engine Message-ID: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com> Hi guys: John and I are starting to write a sparse engine -- for sparse storage of our material-dependent fields. We think we have come up with an optimization for our Eulerian code that should really make it scream. We have been studying the IndirectionEngine example, which is very similar to what we need. Unfortunately, the IndirectionEngine example problem, indirect_test1.cpp, only demonstrates how to build the engine, not an array or a field. Could someone please show us how to move forward with this example to building a complete array and field version? Unlike the IndirectionEngine example, we only need local communications (not all-to-all) along with the same type of guard cell update found in a normal field. Any help will be greatly appreciated! Jean, John, Don Jean and Don Marshall 84250 Indio Springs Dr #291 Indio, CA 92203-3413 760-775-1576 home 760-574-0182 Jean's cell 760-574-0192 Don's cell From rguenth at tat.physik.uni-tuebingen.de Fri Sep 26 07:11:06 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 26 Sep 2003 09:11:06 +0200 (CEST) Subject: [pooma-dev] Sparse Engine In-Reply-To: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com> Message-ID: On Fri, 26 Sep 2003, Jean Marshall wrote: > Hi guys: > > John and I are starting to write a sparse engine -- for sparse storage of > our material-dependent fields. We think we have come up with an > optimization for our Eulerian code that should really make it scream. We > have been studying the IndirectionEngine example, which is very similar to > what we need. Can you elaborate some more on the use and principle of this engine? Is it like compressed brick? Richard. > > Unfortunately, the IndirectionEngine example problem, indirect_test1.cpp, > only demonstrates how to build the engine, not an array or a field. Could > someone please show us how to move forward with this example to building a > complete array and field version? > > Unlike the IndirectionEngine example, we only need local communications > (not all-to-all) along with the same type of guard cell update found in a > normal field. > > Any help will be greatly appreciated! > > Jean, John, Don > > Jean and Don Marshall > > 84250 Indio Springs Dr #291 > Indio, CA 92203-3413 > > 760-775-1576 home > 760-574-0182 Jean's cell > 760-574-0192 Don's cell > > -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From jxyh at lanl.gov Fri Sep 26 07:59:06 2003 From: jxyh at lanl.gov (John H. Hall) Date: Fri, 26 Sep 2003 01:59:06 -0600 Subject: [pooma-dev] Sparse Engine In-Reply-To: Message-ID: <4D9D7602-EFF7-11D7-AD5E-0003938E6E0A@lanl.gov> Richard: OK. Here goes. The basic idea is that we have a hierarchical field structure (built using hierarchical engines similar to the current multi-material field abstraction) which has a collection of 1-D dynamicFields (for the sparse unstructured storage), a shared Index (n-D) integer Array (or Field), and a single (n-D) scalar, vector or tensor field which has either the data for a pure cell, or a cell average value for mixed-material cell's data. As the problem evolves the material interfaces migrate and so the actual position of the unstructured cells changes. However, all the indirect indexing is still local to the processor (except for normal guard cell communications). So this is much simpler than a real unstructured problem with all-to-all communications. In the general case, the sparse dynamic fields are only used to compute the cell-average quantities before a data-parallel computation across the single multi-material or cell-average field is performed. We would also like to take some views of the field in which all of the data for a particular material is gathered/scattered to/from a single spare dynamic work Array that is shared in this hierarchical structure. Field would look like this: ___________________ |__________________| Single material Gather/Scatter 1-D Dynamic Work Array (both mixed and pure cells) ______ |_____| mat A (1-D Dynamic Array/Field) _______ |______| mat B (1-D Dynamic Array/Field) ______ |_____| mat C (1-D Dynamic Array/Field) _______________________________________________________________________ |______________________________________________________________________| Cell Average Quantities (n-D) _______________________________________________________________________ |______________________________________________________________________| Integer Index Array (n-D) Single Index Array is shared by all Sparse Fields (e.g. Density, Pressure, etc.). This shares duty between providing the material index for a pure cell and an offset into a collection tracking the unstructured mixed cell data for a mixed cell. Multi-Patch should still work although the guard cell communications might be slightly more complicated. The number of cells which are indirectly addressed is very small (< 5% of the total) so even using compressible brick we are wasting a lot of memory bandwidth and performing numerous extraneous computations. A comparison code using this structure is running 20 times faster than the equivalent data parallel POOMA R1 computation for the single processor serial case. We believe we can match that performance by building an engine that encapsulates the sparse nature reflected in the problem and by making more use of the new engines POOMA R2 provides (stencil, etc.). Again, most of the computations are performed on the Cell-Average Quantities, so we just take a view, operator[]?, that returns that single field. John and Jean On Friday, September 26, 2003, at 01:11 AM, Richard Guenther wrote: > On Fri, 26 Sep 2003, Jean Marshall wrote: > >> Hi guys: >> >> John and I are starting to write a sparse engine -- for sparse >> storage of >> our material-dependent fields. We think we have come up with an >> optimization for our Eulerian code that should really make it scream. >> We >> have been studying the IndirectionEngine example, which is very >> similar to >> what we need. > > Can you elaborate some more on the use and principle of this engine? > Is it > like compressed brick? > > Richard. > >> >> Unfortunately, the IndirectionEngine example problem, >> indirect_test1.cpp, >> only demonstrates how to build the engine, not an array or a field. >> Could >> someone please show us how to move forward with this example to >> building a >> complete array and field version? >> >> Unlike the IndirectionEngine example, we only need local >> communications >> (not all-to-all) along with the same type of guard cell update found >> in a >> normal field. >> >> Any help will be greatly appreciated! >> >> Jean, John, Don >> >> Jean and Don Marshall >> >> 84250 Indio Springs Dr #291 >> Indio, CA 92203-3413 >> >> 760-775-1576 home >> 760-574-0182 Jean's cell >> 760-574-0192 Don's cell >> >> > > -- > Richard Guenther > WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4387 bytes Desc: not available URL: From rguenth at tat.physik.uni-tuebingen.de Fri Sep 26 08:07:37 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 26 Sep 2003 10:07:37 +0200 (CEST) Subject: [pooma-dev] Sparse Engine In-Reply-To: <4D9D7602-EFF7-11D7-AD5E-0003938E6E0A@lanl.gov> Message-ID: Ok, still trying to understand - this is something like (statically) specifying which cells participate in computation? Like you would have a usual brick engine in conjunction with a bitfield specifying a mask and using this in the evaluator loop (of course this would be less memory efficient)? So this would be a cheap way to do this compared to using the sparse tile layout? Thanks, Richard. On Fri, 26 Sep 2003, John H.Hall wrote: > Richard: > OK. Here goes. The basic idea is that we have a hierarchical field > structure (built using hierarchical engines similar to the current > multi-material field abstraction) which has a collection of 1-D > dynamicFields (for the sparse unstructured storage), a shared Index > (n-D) integer Array (or Field), and a single (n-D) scalar, vector or > tensor field which has either the data for a pure cell, or a cell > average value for mixed-material cell's data. As the problem evolves > the material interfaces migrate and so the actual position of the > unstructured cells changes. However, all the indirect indexing is still > local to the processor (except for normal guard cell communications). > So this is much simpler than a real unstructured problem with > all-to-all communications. In the general case, the sparse dynamic > fields are only used to compute the cell-average quantities before a > data-parallel computation across the single multi-material or > cell-average field is performed. We would also like to take some views > of the field in which all of the data for a particular material is > gathered/scattered to/from a single spare dynamic work Array that is > shared in this hierarchical structure. > > Field would look like this: > ___________________ > |__________________| Single material Gather/Scatter 1-D Dynamic Work > Array (both mixed and pure cells) > ______ > |_____| mat A (1-D Dynamic Array/Field) > _______ > |______| mat B (1-D Dynamic Array/Field) > ______ > |_____| mat C (1-D Dynamic Array/Field) > _______________________________________________________________________ > |______________________________________________________________________| > Cell Average Quantities (n-D) > _______________________________________________________________________ > |______________________________________________________________________| > Integer Index Array (n-D) > Single Index Array is shared by all Sparse Fields (e.g. Density, > Pressure, etc.). This shares duty between > providing the material index for a pure cell and an offset into a > collection tracking the unstructured > mixed cell data for a mixed cell. > > Multi-Patch should still work although the guard cell communications > might be slightly more complicated. > > The number of cells which are indirectly addressed is very small (< 5% > of the total) so even using compressible brick we are wasting a lot of > memory bandwidth and performing numerous extraneous computations. A > comparison code using this structure is running 20 times faster than > the equivalent data parallel POOMA R1 computation for the single > processor serial case. We believe we can match that performance by > building an engine that encapsulates the sparse nature reflected in the > problem and by making more use of the new engines POOMA R2 provides > (stencil, etc.). > > Again, most of the computations are performed on the Cell-Average > Quantities, so we just take a view, operator[]?, that returns that > single field. > John and Jean -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From jhh at zianet.com Fri Sep 26 09:33:01 2003 From: jhh at zianet.com (John Hall) Date: Fri, 26 Sep 2003 03:33:01 -0600 Subject: [pooma-dev] Sparse Engine In-Reply-To: Message-ID: <6BE65ABA-F004-11D7-AD5E-0003938E6E0A@zianet.com> Richard: The idea is to get rid of the loop over materials found in our previous iterations of our code. In fact we simply need to compute cell-average quantities for the mixed cells and then perform a single data-parallel computation over the single mixed material field which does the work for all materials at once. So we do a little unstructured work to compute the cell average quantities and then we do a data parallel computation. There are some other advantages that accrue to the use of this unstructured approach that would allow us to store some information that would normally be too expensive to store, but, we won't go into that here. The complication is that we want to (in the grand tradition of POOMA) hide the underlying complexity of our storage scheme and make things appear beautiful and logically simple. A good example might be using a storage scheme for a symmetric matrix that only stores an upper triangular matrix, but, that allows you to access any index into the array and it internally maps the indices into the correct storage location. In our example, the index array is positive for a pure cell and simply is the material ID for the material contained in that cell. If the index array contains a negative value, then it has traditionally been an index into an unstructured linked-list of the mixed cells data. We can then access this data and compute a cell-average value which we store in that cell of the multi-material field and then we perform our data-parallel operations on that multi-material field. We occasionally need to gather all of the pure and mixed material values of a single material so that we can do a single-material calculation like an EOS evaluation, so that is why we want the work array (which we compress/deallocate when we are not using it). So the various views of the data that we would take are the multi-material cell average view, the gathered single material view and the overall complicated-storage scheme view. To get the kind of performance the old code has we will also need to introduce windowing and activity flags. Basically, we are attempting to throw away any unnecessary computations and minimize the data we are pushing through cache. The sparse tile layout doesn't have the concept of indirect addressing using an index field. It is simply intended for block-AMR type meshes. If we do AMR it would probably be a completely unstructured problem in which any cell can be refined, rather than a block type. Unfortunately, this again introduces the possibility of all-to-all communications (slow) to find your neighbors, etc. We have also been dealing with the issue of how best to do masking. I am beginning to think that we need another sparse storage idea so that we end up with something equivalent to a block where in which the data is collected into lists using a test and the computation is done simply over that collection, which gets progressively smaller as the number of tests increases. Currently, when using a mask, you end up traversing all of the data, maybe even doing the computation everywhere and then simply throwing away the result where the mask is not set (either by a conditional or by multiplying by 0.0). Building the list for extremely sparse data can be a huge win. Like I said, the old version of this algorithm is running 20 times faster than the data-parallel version. This is only possible by simply doing less work. We would also like to have exterior guards on a box with a lot of very little logically distinct but shared memory patches without guard cells within the box. Then we could maybe achieve some reasonable compression and our computations should approach AMR storage schemes without the undesired Gibb's phenomenon due to poor impedance matching across T-joints that AMR has. I should note that we are aware of the issue of not using certain types of dynamically allocated data structures because the guard cell copy scheme might only move the pointer to the data and not the actual data. We are taking this into account. Hope this helps, John Hall On Friday, September 26, 2003, at 02:07 AM, Richard Guenther wrote: > Ok, still trying to understand - this is something like (statically) > specifying which cells participate in computation? Like you would > have a usual brick engine in conjunction with a bitfield specifying > a mask and using this in the evaluator loop (of course this would be > less memory efficient)? So this would be a cheap way to do this > compared to using the sparse tile layout? > > Thanks, > > Richard. > > On Fri, 26 Sep 2003, John H.Hall wrote: > >> Richard: >> OK. Here goes. The basic idea is that we have a hierarchical field >> structure (built using hierarchical engines similar to the current >> multi-material field abstraction) which has a collection of 1-D >> dynamicFields (for the sparse unstructured storage), a shared Index >> (n-D) integer Array (or Field), and a single (n-D) scalar, vector or >> tensor field which has either the data for a pure cell, or a cell >> average value for mixed-material cell's data. As the problem evolves >> the material interfaces migrate and so the actual position of the >> unstructured cells changes. However, all the indirect indexing is >> still >> local to the processor (except for normal guard cell communications). >> So this is much simpler than a real unstructured problem with >> all-to-all communications. In the general case, the sparse dynamic >> fields are only used to compute the cell-average quantities before a >> data-parallel computation across the single multi-material or >> cell-average field is performed. We would also like to take some views >> of the field in which all of the data for a particular material is >> gathered/scattered to/from a single spare dynamic work Array that is >> shared in this hierarchical structure. >> >> Field would look like this: >> ___________________ >> |__________________| Single material Gather/Scatter 1-D Dynamic Work >> Array (both mixed and pure cells) >> ______ >> |_____| mat A (1-D Dynamic Array/Field) >> _______ >> |______| mat B (1-D Dynamic Array/Field) >> ______ >> |_____| mat C (1-D Dynamic Array/Field) >> ______________________________________________________________________ >> _ >> |_____________________________________________________________________ >> _| >> Cell Average Quantities (n-D) >> ______________________________________________________________________ >> _ >> |_____________________________________________________________________ >> _| >> Integer Index Array (n-D) >> Single Index Array is shared by all Sparse Fields (e.g. Density, >> Pressure, etc.). This shares duty between >> providing the material index for a pure cell and an offset into a >> collection tracking the unstructured >> mixed cell data for a mixed cell. >> >> Multi-Patch should still work although the guard cell communications >> might be slightly more complicated. >> >> The number of cells which are indirectly addressed is very small (< 5% >> of the total) so even using compressible brick we are wasting a lot of >> memory bandwidth and performing numerous extraneous computations. A >> comparison code using this structure is running 20 times faster than >> the equivalent data parallel POOMA R1 computation for the single >> processor serial case. We believe we can match that performance by >> building an engine that encapsulates the sparse nature reflected in >> the >> problem and by making more use of the new engines POOMA R2 provides >> (stencil, etc.). >> >> Again, most of the computations are performed on the Cell-Average >> Quantities, so we just take a view, operator[]?, that returns that >> single field. >> John and Jean > > -- > Richard Guenther > WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ From smino at tkm.physik.uni-karlsruhe.de Fri Sep 26 12:21:49 2003 From: smino at tkm.physik.uni-karlsruhe.de (Sergei Mingaleev) Date: Fri, 26 Sep 2003 14:21:49 +0200 Subject: [pooma-dev] Sparse Engine In-Reply-To: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com> References: <5.2.1.1.2.20030926001610.00b9c9e8@pop-server.dc.rr.com> Message-ID: Hello, Jean: >> John and I are starting to write a sparse engine -- for sparse storage of >> our material-dependent fields. >> .... >> only demonstrates how to build the engine, not an array or a field. Could >> someone please show us how to move forward with this example to building a >> complete array and field version? I has written some time ago a simple version of the SparseEngine - see it attached. At the momemnt the Engine works only with 2D and 4D arrays, and it cannot be parallelized. I don't really like it - it should be completely rewritten - but may be, it could be useful for you as an example? The main file: SparseEngine.h All other files are included from the main file: SparseEngine2.h - support for 2D Arrays. SparseEngine4.h - support for 4D Arrays. SparseOperators.h - some operators. The program starts with: #define SPL_DEBUG_SPARSE #include "SparseEngine.h" The Sparse Array can be created as usually: Array<2,Sparse> A(I,J); I have not tried, but I guess, it should work with Field class, too. The SparseEngine contains some specific functions, which can be accessed as demonstrated in the example below: #ifdef SPL_DEBUG_SPARSE A.engine().pack(); // compactify the Sparse Array... cout << "Sparse Array Filling = " << int(100*(1.0-A.engine().free()/(double)(A.engine().size()))) << " %" << endl; #endif There are predefined constants: #define SPARSE_TOLERANCE 1e-10 #define SPARSITY_LEVEL 0.5 Before I tried to define these constants in the SparseEngine constructors and initialize() functions, but it worked badly - the problem is that constructors and initializators of Arrays have some restrictive assumptions concerning the corresponding functions of Engines. Best wishes, Sergei. -- ---- --- --- --- --- --- --- --- --- --- --- Dr. Sergei Mingaleev Institut fur Theorie der Kondensierten Materie Universitat Karlsruhe, 76128 Karlsruhe, Germany ------------------------------------------------------ Phone: +49-(721)-608-2136 Fax: +49-(721)-608-7779 E-mail: smino at tkm.physik.uni-karlsruhe.de Web: http://www-tkm.physik.uni-karlsruhe.de/~smino/ http://wwwrsphysse.anu.edu.au/nonlinear/sfm/ ------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: SparseEngine.h Type: text/x-c++ Size: 4262 bytes Desc: SparseEngine.h URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SparseEngine2.h Type: text/x-c++ Size: 12399 bytes Desc: SparseEngine2.h URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SparseEngine4.h Type: text/x-c++ Size: 8033 bytes Desc: SparseEngine4.h URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SparseOperators.h Type: text/x-c++ Size: 1119 bytes Desc: SparseOperators.h URL: From rguenth at tat.physik.uni-tuebingen.de Fri Sep 26 14:09:42 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Fri, 26 Sep 2003 16:09:42 +0200 (CEST) Subject: [pooma-dev] Sparse Engine In-Reply-To: Message-ID: On Fri, 26 Sep 2003, Sergei Mingaleev wrote: > Hello, Jean: > > >> John and I are starting to write a sparse engine -- for sparse storage of > >> our material-dependent fields. > >> .... > >> only demonstrates how to build the engine, not an array or a field. Could > >> someone please show us how to move forward with this example to building a > >> complete array and field version? > > I has written some time ago a simple version of the SparseEngine - > see it attached. At the momemnt the Engine works only with 2D > and 4D arrays, and it cannot be parallelized. I think this notion of a sparse engine is different from Jeans. In fact the whole point of sparsity in Jeans case is probably the multi-material stuff and resulting optimizations (which I dont get completely at the moment). Just the sparsity you invented looks like it could be done better by having a (possibly shared) bitmap of valid locations and a evaluator taking that into account. Memory usage would be reduced by not accessing the unused parts and such only wasting virtual memory. Of course the bitmap (if not changing) could be compressed f.i. with a run-length encoding. This should be an efficient way to have arbitrary shaped boundaries at least in the serial case. For the parallel case you'd probably need to do some clever load balancing tricks to not get hurt by mostly "empty" bricks. Richard. From smino at tkm.physik.uni-karlsruhe.de Fri Sep 26 15:24:13 2003 From: smino at tkm.physik.uni-karlsruhe.de (Sergei Mingaleev) Date: Fri, 26 Sep 2003 17:24:13 +0200 Subject: [pooma-dev] Sparse Engine In-Reply-To: References: Message-ID: Hi Richard, >> I think this notion of a sparse engine is different from Jeans. In fact Yes, now I see that it is quite different... >> Just the sparsity you invented looks like it could be done better by >> having a (possibly shared) bitmap of valid locations and a evaluator >> taking that into account. Memory usage would be reduced by not accessing >> the unused parts and such only wasting virtual memory. Do you mean creation of the bitmap array with the same size as the size of the Sparse Array? This realization is good only for not very large Sparse Arrays, but what if we need to work with the array having (1000000 x 1000000) points or larger one? In this case the bitmap will be about 100 GBytes - too huge! So, we need to remember only positions of non-zero elements. And we need some fast way of determining if the point (i,j) has non-zero value of A(i,j) or not? - it would be very slow just to search for the given point (i,j) in the list of non-zero elements. Thus, we need to use some complicated chain-like organization of the list of non-zero elements, with possibility to add, as fast as possible, new non-zero elements, and remove (set to zero) old elements. My realization of the SparseEngine uses the standard storage scheme commonly used for Sparse Matrices - and for 2D arrays it is rather efficient for both, memeory usage and speed of access/modification of elements. Unfortunately, it can be hardly extended to arbitrary-dimensional arrays. By the way - the tolerance, determined initially by the constant SPARSE_TOLERANCE, can be later on changed to new value by the command: A.engine().tolerance()=1.0e-5; One can also add the command: A.engine().resize(N); to be able to increase/decrease the physical memory occupied by the Sparse Array. I am not only sure - may be, there is some more elegant way to add such kind of functionality? Cheers, Sergei. From rguenth at tat.physik.uni-tuebingen.de Fri Sep 26 22:26:32 2003 From: rguenth at tat.physik.uni-tuebingen.de (Richard Guenther) Date: Sat, 27 Sep 2003 00:26:32 +0200 (CEST) Subject: [pooma-dev] Sparse Engine In-Reply-To: References: Message-ID: On Fri, 26 Sep 2003, Sergei Mingaleev wrote: > Hi Richard, > > >> I think this notion of a sparse engine is different from Jeans. In fact > > Yes, now I see that it is quite different... > > >> Just the sparsity you invented looks like it could be done better by > >> having a (possibly shared) bitmap of valid locations and a evaluator > >> taking that into account. Memory usage would be reduced by not accessing > >> the unused parts and such only wasting virtual memory. > > Do you mean creation of the bitmap array with the same size as the size of > the Sparse Array? This realization is good only for not very large Sparse > Arrays, but what if we need to work with the array having (1000000 x 1000000) > points or larger one? In this case the bitmap will be about 100 GBytes - too > huge! So, we need to remember only positions of non-zero elements. Yes, you'd reduce the memory requirement by doing run length encoding. This way the size of the bitmap will not be bigger as the number of used cells (usually a lot less). > And we > need some fast way of determining if the point (i,j) has non-zero value of > A(i,j) or not? - it would be very slow just to search for the given point > (i,j) in the list of non-zero elements. Thus, we need to use some complicated > chain-like organization of the list of non-zero elements, with possibility to > add, as fast as possible, new non-zero elements, and remove (set to zero) old > elements. You should be able to do log n time searches in the bitmap, if you really need to. But in the common use of applying an Evaluator youd just traverse the bitmap in optimal oder and determining which elements are used is nearly a noop. But maybe we're again talking about "different" sparsity here... I'd call the unused (what you call zero) elements not participate in calculation, just like with an arbitrary shaped domain. You seem to suggest more like a compressed engine approach? > My realization of the SparseEngine uses the standard storage scheme commonly > used for Sparse Matrices - and for 2D arrays it is rather efficient for both, > memeory usage and speed of access/modification of elements. Unfortunately, it > can be hardly extended to arbitrary-dimensional arrays. Yes, for sparse matrices one usually uses very special data-structures. And these tend to be used for statically shaped matrices only, too. > By the way - the tolerance, determined initially by the constant > SPARSE_TOLERANCE, can be later on changed to new value by the command: > > A.engine().tolerance()=1.0e-5; > > One can also add the command: > > A.engine().resize(N); > > to be able to increase/decrease the physical memory occupied by the Sparse > Array. I am not only sure - may be, there is some more elegant way to > add such kind of functionality? Hmm, this sounds different to what I have in mind. It sounds like you want to do a multidimensional wavelet compression here. Richard. From smino at tkm.physik.uni-karlsruhe.de Sat Sep 27 14:45:35 2003 From: smino at tkm.physik.uni-karlsruhe.de (Sergei Mingaleev) Date: Sat, 27 Sep 2003 16:45:35 +0200 Subject: [pooma-dev] Sparse Engine In-Reply-To: References: Message-ID: >> But maybe we're again talking about "different" sparsity here... I'd call >> the unused (what you call zero) elements not participate in calculation, >> just like with an arbitrary shaped domain. You seem to suggest more like a >> compressed engine approach? Yes, we are talking about different sparsity here. I mean just an extension of the Sparse Matrix approach - and the requirements for the preformance in this case are rather specific - optimization for the matrix-matrix and matrix-vector multiplication, in particular. Even log(N) search is too long if we have many array-array multiplications - the situation which I have in some of my programs. Of course, for different problems one can need different types of sparsity and, correspondingly, different Engines. For example, sometimes I feel that I really need Arrays with arbitrary shaped domains which can be realized as you suggested. By the way - sometimes, for some specific problems, we really need some additional Engines and some other classes/subroutines, which are not generic enough to be included into Pooma - but which could be very useful being included into some "contributions" packages - they can include, besides new Engines, support for Array/Field visualization, input/output of classes in different storage formats, some primitive linear/nonlinear algebra subroutines like solving a system of equations, etc. Is it possible to create and manage the directory for such contributions on Pooma.CodeSourcery.com? I think it would be a good place also for testing new unstable features of Pooma or for alternative realizations of some of its classes. Cheers, Sergei. -- ---- --- --- --- --- --- --- --- --- --- --- Dr. Sergei Mingaleev Institut fur Theorie der Kondensierten Materie Universitat Karlsruhe, 76128 Karlsruhe, Germany ------------------------------------------------------ Phone: +49-(721)-608-2136 Fax: +49-(721)-608-7779 E-mail: smino at tkm.physik.uni-karlsruhe.de Web: http://www-tkm.physik.uni-karlsruhe.de/~smino/ http://wwwrsphysse.anu.edu.au/nonlinear/sfm/ ------------------------------------------------------