[RFC] Specialize (internal) guard cell handling, more data-flow analysis
Richard Guenther
rguenth at tat.physik.uni-tuebingen.de
Wed Jul 7 17:22:59 UTC 2004
I'm at the point thinking on how to improve MPI scalarization even more.
And again the obvious point is we're doing way too much (unnecessary)
communication.
The problem is we "lower" the representation of guard cells and
necessary updates of them too early (in the intersectors) and create
usual iterates out of them. A better approach would be to compute
necessary guards at intersection time only (as I introduced in the
previous performance improvement patches), _not_ update them, but store
this information in the iterates. We can then, before finally issuing
the iterates, do data-flow analysis of the necessary guard cells and
insert optimal update iterates at optimal places (in principle).
As iterates are per-patch, in the process of getting the above done, I'd
suggest moving the dirty flag and its handling from the MultiPatchEngine
down to the BrickEngine, together with more accurate information about
the up-to-date-ness of the guards (use f.i. a GuardLayer<> to count the
up-to-date cells).
Were there any previous plans on improving this situation within POOMA?
Did you never experience performance problems with the communication?
How did SMARTS improve situation with the guard updates (did it?)?
Thanks for any input (before I start hacking this up)!
Richard.
More information about the pooma-dev
mailing list