[pooma-dev] KCC versus icc
Richard Guenther
rguenth at tat.physik.uni-tuebingen.de
Thu Mar 13 10:15:12 UTC 2003
On Thu, 27 Feb 2003, Paul A. Renard wrote:
> Richard (and Jeffrey);
>
> I've tried various combinations of -O2, -ip, and -ipo. Both -ip options make
> the loop represented by the data-parallel expression run slower. I also tried
> profile-directed optimization, and that did indeed make a very-slight
> improvement (on the order of a percent), but icc is still producing code for
> that particular loop that is 3.5X-4X slower. The compiler seems to be inlining
> quite a bit. In fact, all the constructors and destructors look like they are
> inlined. KCC doesn't seem to inline the constructors/destructors for this
> case. Both cases eventually call the evaluator, and at that point I get lost
> in the machine code.
I finally got to look what options I used to get at least some performance
out of icc. The key was to bump the insn number for always inlined small
functions artificially high, i.e. I used something along
icpc -O2 -unroll -xM -tpp6 -ip -Qoption,c,-ip_ninl_min_stats=1000
This way I get the same performance as gcc 3.3 when using --param
max-inline-slope=1000000 (icpc is slightly faster for in-cache operation -
but who has data that fits into cache...)
Hope this helps.
Richard.
--
Richard Guenther <richard.guenther at uni-tuebingen.de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
More information about the pooma-dev
mailing list