[vsipl++] patch: SIMD loop fusion
Jules Bergmann
jules at codesourcery.com
Wed Jul 26 21:06:54 UTC 2006
Stefan Seefeld wrote:
> The attached patch adds support for SIMD (i.e. vectorized) loop fusion
> to our expression block evaluation harness. It is enabled for homogenous
> scalar expressions (i.e. all leaf blocks have float or double value_type)
> on platforms supporting SSE(2) instructions (such as 'gcc -msse2').
Stefan,
Excellent! I have some comments below and a general comment:
- We need to make sure that mixed types are handled properly
I.e. we need to avoid dispatching on expressions like:
Vector<float> fA, fB;
Vector<double> dC;
dC = fA + fB; // 1
fA = dC + fB; // 2
IIUC, the Proxy_factor dectects #2. We should extend the expression
evalutor to catch #1.
plus a couple of general nits:
- When you use single-char template parameters, can you document
what they are? i.e. "B is a block type"?
- There are a few 80+ character lines, can you split them?
Once these are fixed, please check this in.
thanks,
-- Jules
> ------------------------------------------------------------------------
> Index: src/vsip/impl/simd/expr_evaluator.hpp
> ===================================================================
B is a block? Can you capture that in a comment? Likewise below for
other single character template parameters (other than T).
> +template <typename B>
> +struct Proxy_factory
> +{
> + typedef Direct_access_traits<typename B::value_type> access_traits;
> + typedef Proxy<access_traits> proxy_type;
> + static bool const ct_valid = true;
I think this should check if block B has direct access.
ct_valid = Ext_data_cost<B>::value == 0;
> +
> + static bool
> + rt_valid(B const &b)
> + {
> + Ext_data<B> dda(b, SYNC_IN);
> + return dda.stride(0) == 1 &&
> + Simd_traits<typename B::value_type>::alignment_of(dda.data())
== 0;
> + }
> + static proxy_type
> + create(B const &b)
> + {
> + Ext_data<B> dda(b, SYNC_IN);
> + return proxy_type(dda.data());
> + }
> +};
> +template <typename LB,
> + typename RB>
> +struct Serial_expr_evaluator<1, LB, RB, Simd_tag>
> +{
> + static bool const ct_valid =
> + // Is SIMD supported at all ?
> + simd::Simd_traits<typename LB::value_type>::is_accel &&
> + // Check that direct access is possible.
> + Ext_data_cost<LB>::value == 0 &&
> + simd::Proxy_factory<RB>::ct_valid;
To be safe, this should check if LB::value_type == RB::value_type
> +
> + static bool rt_valid(LB& lhs, RB const& rhs)
> + {
> + Ext_data<LB> dda(lhs, SYNC_OUT);
> + return (dda.stride(0) == 1 &&
> + simd::Simd_traits<typename LB::value_type>::
> + alignment_of(dda.data()) == 0 &&
> + simd::Proxy_factory<RB>::rt_valid(rhs));
> + }
> +
> + static void exec(LB& lhs, RB const& rhs)
> + {
Maybe these types could named be LAT/RAT to follow LB/RB and li/ri?
> + typedef typename simd::LValue_access_traits<typename
LB::value_type> WAT;
> + typedef typename simd::Proxy_factory<RB>::access_traits EAT;
> + length_type const vec_size = simd::Simd_traits<typename
LB::value_type>::vec_size;
> + Ext_data<LB> dda(lhs, SYNC_OUT);
> + length_type size = dda.size(0) / vec_size;
> + simd::Proxy<WAT> lp(dda.data());
> + simd::Iterator<WAT> li(lp);
> + // Map the expression block to an expression iterator and loop.
> + simd::Iterator<EAT> ri(simd::Proxy_factory<RB>::create(rhs));
> + while (size--) *li++ = *ri++;
> + // Process the remainder, using simple loop fusion.
> + size = dda.size(0);
> + length_type remainder = size % vec_size;
> + for (index_type i = size - remainder; i != size; ++i) lhs.put(i,
rhs.get(i));
> + }
> Index: src/vsip/impl/simd/expr_iterator.hpp
> ===================================================================
> +template <typename T>
> +struct Unary_operator_map<T, op::Plus>
> +{
> + typedef typename Simd_traits<T>::simd_type simd_type;
> + static bool const is_supported = true;
> + static simd_type
> + apply(simd_type const &op)
> + { return Simd_traits<T>::add(Simd_traits<T>::zero(), op);}
This looks correct, but it would be faster to just return 'op' directly.
> +template <typename T>
> +class Proxy<Scalar_access_traits<T> >
> +{
> +public:
> + typedef T value_type;
> + typedef typename Simd_traits<value_type>::simd_type simd_type;
> +
> + Proxy(value_type value) : value_(value) {}
> +
> + simd_type load() const
> + { return Simd_traits<value_type>::load_scalar_all(value_);}
You could create a simd_valud_ member variable to avoid the call to
'load_scalar_all' each time load is called. This increases the SIMD
register pressure, but it would be a win in simple expressions like:
vector = scalar * vector;
> +
> + void increment() {}
> +
> +private:
> + value_type value_;
> +};
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list