[pooma-dev] Temporary copies do appear...??
Richard Guenther
rguenth at tat.physik.uni-tuebingen.de
Fri May 21 08:31:40 UTC 2004
Radek Pecher wrote:
> Basically, simple algebraic expressions based on the tiny Vector class
> do create temporary Full-engine copies of individual subexpressions,
> as opposed to what POOMA claims to prevent. The following short main
> code:
>
>
>
> #include "Pooma/Arrays.h"
>
> int main(int argc, char* argv[])
> {
> Pooma::initialize(argc, argv);
>
> Vector<2> v1(1, 2), v2;
> v2 = v1*v1 + v1*v1;
>
> Pooma::finalize();
> return 0;
> }
You are right that gcc 3.3 does not optimize the copy calls. But
compiling the above with g++-3.4 -O2 -fpeel-loops results in straight
line code. Using Intel 8.0 compiler the asm code is a bit obfuscated
and there are calls to destructors left (not inlining these seems to be
a common problem of the Intel compiler).
I don't know wether one can structurally avoid the extra constructor
calls inside the Vector code, but maybe you can have a look at it? This
is certainly a point where optimization will be useful (if not for
compilation speed).
> g++ -ftemplate-depth-60 -Drestrict=__restrict__ -fno-exceptions
> -DNOPAssert -DNOCTAssert -O2 -fno-default-inline -funroll-loops
> -fstrict-aliasing -o Main Main.cpp -I$HOME/lib/Optim/POOMA/linux/lib/
> PoomaConfiguration-gcc -I$HOME/lib/Optim/POOMA/linux/src -I$HOME/lib/
> Optim/POOMA/linux/lib -fno-exceptions -L$HOME/lib/Optim/POOMA/linux/
> lib -lpooma-gcc -lm
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Also, if you are using gcc, you may consider applying the leafify patch
to your gcc distribution available at
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/
and making the POOMA evaluators use it (I can provide a patch to you).
That's worth about 50% performance increase.
Hope that helps,
Richard.
More information about the pooma-dev
mailing list