[vsipl++] SIMD all unaligned dispatch

Jules Bergmann jules at codesourcery.com
Tue Jun 19 12:34:52 UTC 2007


Assem Salama wrote:
 > Everyone,
 >  This patch includes some missing pieces not included in previous patch.
 > This should make a fresh checkout compile ok :) I apologize for last
 > patch's incompleteness.

Assem,

What is the reason for extending the length of type_list?  Is that
needed for this patch?

Rather than add a new evaluator ("all unaligned"), I would like to
have a single evaluator handle the cases where views have the same
alignment (whether it is 0 or N).  The only difference between the two
is cleanup code before SIMD processing.  Can you make that change and
repost a patch?

Also, did you have a chance to benchmark the iterator change (#4)
below?

				-- Jules


 >
 > Thanks,
 > Assem
 >
 >
 > ------------------------------------------------------------------------
 >

 > Index: src/vsip/opt/simd/expr_evaluator.hpp
 > ===================================================================

 > +  static bool rt_valid(LB& lhs, RB const& rhs)
 > +  {
 > +    Ext_data<LB, layout_type> dda(lhs, SYNC_OUT);
 > +    int lhs_a = simd::Proxy_factory<LB,       true>::alignment(lhs);

[1] Instead of calling Proxy_factory::alignment (which internally
creates another Ext_data object -- which is both extra overhead and
potentially undefined), use Simd_traits::alignment_of directly.

 > +    return (dda.stride(0) == 1 &&
 > +            simd::Proxy_factory<RB, true>::rt_valid(rhs, lhs_a));
 > +
 > +
 > +  }
 > +

 > +    // First, deal with unaligned pointers
 > +    typename Ext_data<LB, layout_type>::raw_ptr_type  raw_ptr = 
dda.data();
 > +    while(simd::Simd_traits<typename 
LB::value_type>::alignment_of(raw_ptr) &&
 > +          n > 0)
 > +    {
 > +      lhs.put(size-n, rhs.get(size-n));
 > +      n--;
 > +      raw_ptr++;
 > +    }

[2] What updates the pointers held by lp and rp?  They are still
unaligned, right?

Ah, I see.  You've changed Proxy::Proxy to force alignment below.


 > Index: src/vsip/opt/simd/eval_generic.hpp
 > ===================================================================
 > --- src/vsip/opt/simd/eval_generic.hpp	(revision 174261)
 > +++ src/vsip/opt/simd/eval_generic.hpp	(working copy)
 > @@ -664,6 +664,8 @@
 >
 >    static bool rt_valid(DstBlock& dst, SrcBlock const& src)
 >    {
 > +    typedef simd::Simd_traits<typename SrcBlock::value_type> simd;
 > +
 >      // check if all data is unit stride
 >      Ext_data<DstBlock, dst_lp>     ext_dst(dst,              SYNC_OUT);
 >      Ext_data<Block1,   a_lp>       ext_a(src.first().left(), SYNC_IN);
 > @@ -672,7 +674,11 @@
 >             ext_a.stride(0) == 1 &&
 >  	   ext_b.stride(0) == 1 &&
 >  	   // make sure (A op B, A, k)
 > -	   (&(src.first().left()) == &(src.second())));
 > +	   (&(src.first().left()) == &(src.second())) &&
 > +	   // make sure everyting is aligned!
 > +	   !simd::alignment_of(ext_dst.data()) &&
 > +	   !simd::alignment_of(ext_a.data()) &&
 > +	   !simd::alignment_of(ext_b.data()));

[3] Doesn't threshold handle initial unaligned values?  If so, it is
sufficient to check that dst, a, and b all have the same alignment.



 >    static void exec(DstBlock& dst, SrcBlock const& src)
 > Index: src/vsip/opt/simd/expr_iterator.hpp
 > ===================================================================
 > --- src/vsip/opt/simd/expr_iterator.hpp	(revision 174261)
 > +++ src/vsip/opt/simd/expr_iterator.hpp	(working copy)
 > @@ -268,13 +268,14 @@
 >    simd_type load() const
 >    { return simd::perm(x0_, x1_, sh_); }
 >
 > -  void increment(length_type n = 1)
 > +  //void increment(length_type n = 1)
 > +  void increment()
 >    {
 > -    ptr_unaligned_ += n * Simd_traits<value_type>::vec_size;
 > -    ptr_aligned_   += n;
 > +    ptr_unaligned_ += Simd_traits<value_type>::vec_size;
 > +    ptr_aligned_++;
 >
 >      // update x0
 > -    x0_ = (n == 1)? x1_:simd::load((value_type*)ptr_aligned_);
 > +    x0_ = x1_;

[4] Did you ever benchmark the difference between these two?

 >
 > -  Proxy(value_type *ptr) : ptr_(ptr) {}
 > +  Proxy(value_type *ptr) : ptr_(ptr)
 > +  {
 > +    // Force alignment of pointer.
 > +    intptr_t int_ptr = (intptr_t)ptr_;
 > +    int_ptr &= ~(Simd_traits<value_type>::alignment-1);
 > +    ptr_ = (value_type*) int_ptr;
 > +  }
 > +

[5] For LValue_access_traits, this ignores the IsAligned template
parameter.  since we appear to only handle the case where the LHS
is aligned, we should specialize this for IsAligned = true.




-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705



More information about the vsipl++ mailing list