Bug in icpc causing Don's problems
Mark Mitchell
mark at codesourcery.com
Thu Sep 29 22:15:08 UTC 2005
I've analyzed the GEMP problem that Don is having.
The short answer is that this is a bug in icpc.
The long answer is that icpc is mishandling the calling conventions for:
std::complex<float>
std::operator*<float>(std::complex<float> const&,
std::complex<float> const&)
In particular, it's inconsistent between the caller and callee.
In particular, icpc is generating an out-of-line copy of the function.
(Why it's not being inlined is another question; you might be able to
work around the bug by banging on the inline-harder button.)
Here's the code generated:
_ZStmlIfESt7complexIT_ERKS2_S4_:
pushq %rsi #375.5
movq (%rdi), %rdx #376.26
movss 4(%rdi), %xmm5 #376.26
movss (%rdi), %xmm3 #376.26
movss (%rsi), %xmm1 #377.11
movss 4(%rsi), %xmm2 #377.11
movaps %xmm3, %xmm4 #377.11
movaps %xmm5, %xmm0 #377.11
mulss %xmm1, %xmm5 #377.11
mulss %xmm1, %xmm4 #377.11
mulss %xmm2, %xmm0 #377.11
mulss %xmm2, %xmm3 #377.11
movq %rdx, (%rsp) #376.26
subss %xmm0, %xmm4 #377.11
movss %xmm4, (%rsp) #377.7
addss %xmm3, %xmm5 #377.11
movss %xmm5, 4(%rsp) #377.7
movq (%rsp), %rax #378.14
popq %rcx #378.14
ret #378.14
Basically, the inputs are pointed to by %rsi and %rdi; the return value
is stored at %rsp and %rsp + 4.
However, the caller expects the return value in %xmm0:
call _ZStmlIfESt7complexIT_ERKS2_S4_ #76.45
movlps %xmm0, -64(%rbp) #76.45
The caller is correct. Because std::complex<float> is a POD, the value
should go in %xmm0, according to the AMD64 ABI.
Note, by contrast, the code generated by G++ for the same function:
_ZStmlIfESt7complexIT_ERKS2_S4_:
.LFB1749:
movss (%rdi), %xmm3
movss 4(%rdi), %xmm5
movaps %xmm3, %xmm2
movaps %xmm5, %xmm0
movss (%rsi), %xmm1
movss 4(%rsi), %xmm4
mulss %xmm1, %xmm2
mulss %xmm4, %xmm0
mulss %xmm4, %xmm3
mulss %xmm5, %xmm1
subss %xmm0, %xmm2
addss %xmm1, %xmm3
movss %xmm2, -16(%rsp)
movss %xmm3, -12(%rsp)
movq -16(%rsp), %xmm0
ret
Note that GCC correctly loads the value into %xmm0 at the end of the
function.
We should report this problem to Intel. I know the Intel tools manager,
so I'm sure I can get a bug report processed. Will you please send me
(a) the command-line you're using to do the compilation, and (b) put the
preprocessed source (output of "icpc -E") somewhere? I'll take it from
there.
--
Mark Mitchell
CodeSourcery, LLC
mark at codesourcery.com
(916) 791-8304
More information about the vsipl++
mailing list