[vsipl++] patch: Enhancements to SIMD loop fusion

Thu Aug 17 11:19:15 UTC 2006

Stefan Seefeld wrote:
> The attached patch adds some optimizations as well as more functionality
> (support for complex types, as well as fused multiply-add) to the
> SIMD loop fusion harness.
> 
> As SSE(2) doesn't provide fused multiply-add, the fma() implementation
> falls back on mul() and add(). For AltiVec fma() still needs to be implemented.
> 
> No regressions were observed with gcc 4.1.
> 
> OK to commit ?

Yes, please.  This looks good.  thanks -- Jules

>  
> +  static simd_type fma(simd_type const& v1, simd_type const& v2,
> +		       simd_type const& v3)
> +  { assert(0); return v1; } // FIXME: need to be implemented.

This is:

	{ return vec_madd(v1, v2, v3); }

(notice that add and mul are implemented in terms of vec_madd because 
AltiVec only has fused multiply add)

> +    }
> +#else
> +    // loop using proxy interface. This generates the best code
> +    // with gcc 3.4 (with gcc 4.1 the difference to the first case
> +    // above is negligible).

I thought this also generates the best code with 4.1.

> +    while (n >= vec_size)
> +    {
> +      lp.store(rp.load());
> +      n -= vec_size;
> +      lp.increment();
> +      rp.increment();
> +    }
> +#endif

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705