InlineEvaluator implementation question
Richard Guenther
rguenth at tat.physik.uni-tuebingen.de
Mon Dec 16 15:55:52 UTC 2002
Hi!
Does anyone remember why we create copies of the LHS and RHS inside
the KernelEvaluator<InlineKernelTag>::evaluate() methods (within
ReductionEvaluator<InlineKernelTag>::evaluate() is similar code)? I.e.
there is code like
template<class LHS,class Op,class RHS,class Domain>
inline static void evaluate(const LHS& lhs,const Op& op,const RHS& rhs,
const Domain& domain,WrappedInt<1>)
{
CTAssert(Domain::unitStride);
PAssert(domain[0].first() == 0);
LHS localLHS(lhs);
RHS localRHS(rhs);
int e0 = domain[0].length();
for (int i0=0; i0<e0; ++i0)
op(localLHS(i0),localRHS.read(i0));
}
instead of
template<class LHS,class Op,class RHS,class Domain>
inline static void evaluate(const LHS& lhs,const Op& op,const RHS& rhs,
const Domain& domain,WrappedInt<1>)
{
CTAssert(Domain::unitStride);
PAssert(domain[0].first() == 0);
int e0 = domain[0].length();
for (int i0=0; i0<e0; ++i0)
op(lhs(i0),rhs.read(i0));
}
Changing the evaluate methods to not copy saves some code size but doesnt
seem to affect performance (checked gcc3.0 only).
Richard.
--
Richard Guenther <richard.guenther at uni-tuebingen.de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
More information about the pooma-dev
mailing list