[vsipl++] [patch] Minor CFAR changes
Jules Bergmann
jules at codesourcery.com
Thu Jun 8 22:49:21 UTC 2006
Mark Mitchell wrote:
> Don McCoy wrote:
>> Jules Bergmann wrote:
>>> Attached graphs show original (cfar-orig) and new (cfar) performance,
>>> for GCC 3.4 and GCC 4.1 on Pastec. The changes for the slice version
>>> have a larger impact. Using 4.1 is a win!
>
> What's Pastec?
Pastec is another name for the GTRI cluster, aka durip (some acronym or
such).
>
> It's nice to know GCC 4.1 is good for something!
Good job!
> But, from what you
> said this morning, don't those results still fall short, relative to the
> C code?
Yes, that's right. I'm producing results for those cases now. However,
it looks like 4.1 boosted our "slice" version, while at the same time
pessimizing the plain C "vector" version.
For a particular dataset size (dataset #3 at 200 gates):
Variation MFLOPS
3.4 VSIPL++ slice 136
3.4 VSIPL++ vector 60
3.4 C vector 141
3.4 C+SIMD vector 470
4.1 VSIPL++ slice 226
4.1 VSIPL++ vector 100
4.1 C vector 128
4.1 C+SIMD vector 830
(I need to repackage/rerun the VSIPL++ + SIMD approach.)
Question on SIMD: For the C+SIMD version, I used the intrinsics from
xmmintrin.h (__m128, _mm_add_ps(), etc). This works with both 3.4 and
4.1. For the VSIPL++ SIMD version, I used the GCC vector extensions
(typedef float v4sf __attribute++ ((vector_size(16))), '+' operator).
The typedefs work with 3.4 and 4.1, but the operators (+, *, etc) only
work with 4.1. Is there any difference in code generated from these two
approaches? In particular, would it be worthwhile at all to recode the
C+SIMD version to use the vector extensions?
thanks,
-- Jules
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list