[vsipl++] patch: Enhancements to SIMD loop fusion
Jules Bergmann
jules at codesourcery.com
Thu Aug 17 11:19:15 UTC 2006
Stefan Seefeld wrote:
> The attached patch adds some optimizations as well as more functionality
> (support for complex types, as well as fused multiply-add) to the
> SIMD loop fusion harness.
>
> As SSE(2) doesn't provide fused multiply-add, the fma() implementation
> falls back on mul() and add(). For AltiVec fma() still needs to be implemented.
>
> No regressions were observed with gcc 4.1.
>
> OK to commit ?
Yes, please. This looks good. thanks -- Jules
>
> + static simd_type fma(simd_type const& v1, simd_type const& v2,
> + simd_type const& v3)
> + { assert(0); return v1; } // FIXME: need to be implemented.
This is:
{ return vec_madd(v1, v2, v3); }
(notice that add and mul are implemented in terms of vec_madd because
AltiVec only has fused multiply add)
> + }
> +#else
> + // loop using proxy interface. This generates the best code
> + // with gcc 3.4 (with gcc 4.1 the difference to the first case
> + // above is negligible).
I thought this also generates the best code with 4.1.
> + while (n >= vec_size)
> + {
> + lp.store(rp.load());
> + n -= vec_size;
> + lp.increment();
> + rp.increment();
> + }
> +#endif
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list