Temporary copies do appear...??
Radek Pecher
radek.pecher at eng.ox.ac.uk
Fri May 21 08:07:39 UTC 2004
Dear POOMA developers,
My name is Radek and I am a researcher at the Oxford University, UK.
I have started implementing POOMA into our numerical model of liquid
crystals in 3D. I feel that the it is a suitable tool for this
challenging problem where we have 10 unknowns at each node of the
finite-element mesh. If things go right, we will be happy to express
our thanks to POOMA in all our publications that will follow.
The reason why I am contacting you today is to inform you about a
possible POOMA problem that I have encountered why testing my POOMA/
PETE based implementation of an automatic-differentiation class which
otherwise works perfectly (I can share the code with you if you are
interested, by the way). Please note that I already found a couple of
minor POOMA bugs, such as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- although Tensor.h:338 claims:
// The format is: ((t(0,0) t(1,0),... ) ( t(0,1) t(1,1) ... ) ... ))
the truth is in fact:
// The format is: ((t(0,0) t(0,1),... ) ( t(1,0) t(1,1) ... ) ... ))
- this is contrary to TinyMatrix because of the i,j-swapping
(compare: Tensor.h:361 and TinyMatrix.h:236)
====================================================================
- line /src/Tiny/VectorOperators.h:189
inline typename BinaryReturn< Vector<D,T1,E>, T2, TAG >::Type_t
should correctly be:
inline typename BinaryReturn< T1, Vector<D,T2,E>, TAG >::Type_t
- this error may cause problems if T1 and T2 are different types and
when stricter type-conversions are imposed
====================================================================
- line /src/DynamicArray/DynamicArray.h:373
: Array<Dim, T, EngineTag>(s1, model)
should correctly be:
: Array<1, T, EngineTag>(s1, model)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
but the problem that I would like to describe in the
rest of this email seems more serious than that.
Basically, simple algebraic expressions based on the tiny Vector class
do create temporary Full-engine copies of individual subexpressions,
as opposed to what POOMA claims to prevent. The following short main
code:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
#include "Pooma/Arrays.h"
int main(int argc, char* argv[])
{
Pooma::initialize(argc, argv);
Vector<2> v1(1, 2), v2;
v2 = v1*v1 + v1*v1;
Pooma::finalize();
return 0;
}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
was tested by modifying the file /src/Tiny/Vector.h by adding the
following line:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
PrintTypeName(this); PrintTypeName(x); std::cout << std::endl;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
to the Vector(const X& x)-constructor on line 117 and the
VectorEngine(const X& x)-constructor on line 290. The diagnostic
function PrintTypeName is defined as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
template<class T> inline void PrintTypeName(const T& t)
{
std::ostringstream out;
out << "c++filt " << typeid(t).name();
system(out.str().c_str());
}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
where the GNU tool c++filt is used to demangle the type names. The
following optimising g++ (v. 3.3.1) command has been used to build the
executable under SuSE Linux 9.0 (i586):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
g++ -ftemplate-depth-60 -Drestrict=__restrict__ -fno-exceptions
-DNOPAssert -DNOCTAssert -O2 -fno-default-inline -funroll-loops
-fstrict-aliasing -o Main Main.cpp -I$HOME/lib/Optim/POOMA/linux/lib/
PoomaConfiguration-gcc -I$HOME/lib/Optim/POOMA/linux/src -I$HOME/lib/
Optim/POOMA/linux/lib -fno-exceptions -L$HOME/lib/Optim/POOMA/linux/
lib -lpooma-gcc -lm
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
The code execution output is listed in the following box:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
VectorEngine<2, double, Full>*
Vector<2, double, BinaryVectorOp<Vector<2, double, Full>, Vector<2,
double, Full>, OpMultiply> >
Vector<2, double, Full>*
Vector<2, double, BinaryVectorOp<Vector<2, double, Full>, Vector<2,
double, Full>, OpMultiply> >
VectorEngine<2, double, Full>*
Vector<2, double, BinaryVectorOp<Vector<2, double, Full>, Vector<2,
double, Full>, OpMultiply> >
Vector<2, double, Full>*
Vector<2, double, BinaryVectorOp<Vector<2, double, Full>, Vector<2,
double, Full>, OpMultiply> >
VectorEngine<2, double, Full>*
Vector<2, double, BinaryVectorOp<Vector<2, double, Full>, Vector<2,
double, Full>, OpAdd> >
Vector<2, double, Full>*
Vector<2, double, BinaryVectorOp<Vector<2, double, Full>, Vector<2,
double, Full>, OpAdd> >
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Clearly, every operation in the expression v1*v1 + v1*v1 invokes a
Full-engine copy of the BinaryVectorOp-engine subexpression result.
Is this behaviour correct or am I doing something wrong, please? Do I
need any better-optimising compiler (I already ordered the latest
Intel's ICC, the successor of KAI) or any other command-line flags?
If there is any way how to prevent this waste of resources, I would
very much appreciate your kind help.
Sincerely,
Radek
__________________________________
Dr. Radek Pecher
Research Assistant
Department of Engineering Science
University of Oxford
Parks Road, Oxford, OX1 3PJ, UK
Tel: +44 (0)1865 273044
Fax: +44 (0)1865 273905
radek.pecher at eng.ox.ac.uk
More information about the pooma-dev
mailing list