[pooma-dev] Yes, Vector temporaries do appear in every operation...!!
Richard Guenther
rguenth at tat.physik.uni-tuebingen.de
Fri May 28 11:35:15 UTC 2004
On Fri, 28 May 2004, Radek Pecher wrote:
> | Note that without your debugging stuff in the constructors, these
> | get inlined and optimized away by the optimizer. Of course one
> | could argue creating the copies should be avoided in the first
> | place, but I cannot see how this can be done, as, f.i. for
> | BinaryOp<Vector1, Vector2, OpMultiply>::operator() we clearly need
> | to return a _new_ Vector as result. To avoid this one would have
> | to expression-template the vector itself, so only primitive
> | variable types are ever copied. But I don't think this will work
> | or pay off.
>
> I actually compiled the code with the original (unmodified) version of
> Vector.h first and used GDB to run it and disassemble it. Without
> much analysing, I noticed several looping jumps at the place of the
> algebraic expression which only confirms that the optimising compiler
> did not produce the required code:
> v2(0) = v1(0)*v1(0) + v1(0)*v1(0);
> v2(1) = v1(1)*v1(1) + v1(1)*v1(1);
> as was supposed to. (And I also tried several other optimisation
> configurations, of course.)
I don't have these temporaries. Compiling with gcc 3.4, using options
-O2 -funroll-loops -DNOPAssert -S I get:
.L171:
fldl -24(%ebp)
leal -24(%ebp), %eax
leal -72(%ebp), %ecx
fldl -16(%ebp)
fxch %st(1)
movl %eax, -88(%ebp)
fmul %st(0), %st
fxch %st(1)
movl %eax, -84(%ebp)
leal -104(%ebp), %edx
fmul %st(0), %st
fxch %st(1)
movl %eax, -120(%ebp)
movl %eax, -116(%ebp)
leal -56(%ebp), %eax
cmpl %eax, %ebx
fstl -72(%ebp)
fxch %st(1)
fstl -64(%ebp)
fxch %st(1)
fstl -104(%ebp)
fadd %st(0), %st
fxch %st(1)
fstl -96(%ebp)
fadd %st(0), %st
fxch %st(1)
movl %ecx, -136(%ebp)
movl %edx, -132(%ebp)
fstl -56(%ebp)
fxch %st(1)
fstl -48(%ebp)
je .L282
fxch %st(1)
fstpl -40(%ebp)
fstpl -32(%ebp)
jmp .L179
.p2align 4,,7
.L282:
fstp %st(0)
fstp %st(0)
.p2align 4,,15
.L179:
which I haven't analyzed for optimal-ness in detail, but certainly there
is no loop left and no calls to constructors/destructors. There are
unnecessary stores to not-removed temporaries though.
> As to the need for the return of a Vector, I suppose that
> Vector<2, double, BinaryVectorOp<...> > is all is needed (with the
> references to its two operands). There is no need at all to take this
> object and make its Full-engine copy for any subsequent operations.
Well, yes, this would be a step to expression-template the vector
classes. You then need assignment operators / constructors that know
how to transfer this into a regular Vector - which would be the expression
template expanders.
Maybe it's really simple - you might want to try ;)
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
More information about the pooma-dev
mailing list