[pooma-dev] KCC versus icc

Richard Guenther rguenth at tat.physik.uni-tuebingen.de
Thu Mar 13 10:15:12 UTC 2003


On Thu, 27 Feb 2003, Paul A. Renard wrote:

> Richard (and Jeffrey);
>
> I've tried various combinations of -O2, -ip, and -ipo.    Both -ip options make
> the loop represented by the data-parallel expression run slower.  I also tried
> profile-directed optimization, and that did indeed make a very-slight 
> improvement (on the order of a percent), but icc is still producing code for
> that particular loop that is 3.5X-4X slower.  The compiler seems to be inlining
> quite a bit.  In fact, all the constructors and destructors look like they are
> inlined.  KCC doesn't seem to inline the constructors/destructors for this
> case.  Both cases eventually call the evaluator, and at that point I get lost
> in the machine code.

I finally got to look what options I used to get at least some performance
out of icc. The key was to bump the insn number for always inlined small
functions artificially high, i.e. I used something along

icpc -O2 -unroll -xM -tpp6 -ip -Qoption,c,-ip_ninl_min_stats=1000

This way I get the same performance as gcc 3.3 when using --param
max-inline-slope=1000000 (icpc is slightly faster for in-cache operation -
but who has data that fits into cache...)

Hope this helps.

Richard.

--
Richard Guenther <richard.guenther at uni-tuebingen.de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/




More information about the pooma-dev mailing list