[vsipl++] SIMD threshold
Jules Bergmann
jules at codesourcery.com
Fri May 11 19:52:48 UTC 2007
Assem Salama wrote:
> Everyone,
> This patch implements a SIMD threshold using the ite operator.
Assem,
This looks good. I have a couple of commments below. Please
address those, then check it in.
Next, can you benchmark this on belgarath, our PowerPC system?
In particular, if you can compare performance with and without
the evlauator.
thanks,
-- Jules
>
> ------------------------------------------------------------------------
>
> Index: src/vsip/opt/simd/threshold.hpp
> ===================================================================
> +// Simd function to do threshold only when K is 0
> +template <typename T>
> +void simd_thresh0(T* Z, T* A, T* B, int n)
[1] For defensive programming, it is a good idea to make A and B
'const' so they aren't accidentally modified. Likewise below.
Hmmm, I notice that vmul also has them non-const (who wrote that? :).
Also, for coding standard consistency, the function name should be
at the start of the line, i.e.
void
simd_thresh0(...
Likewise below.
> +{
> + typedef Simd_traits<T> simd;
> + typedef Simd_traits<int> simdi;
> + typedef typename simd::simd_type simd_type;
> + typedef typename simdi::simd_type simd_itype;
[2] Since you're requiring the caller to guarentee that Z, A, and B
are all SIMD aligned (which is OK by the way), you should:
a) document it
b) check it with an assert
> +
> + simd::enter();
> +
> + while (n >= simd::vec_size)
> + {
> + n -= simd::vec_size;
> +
> + simd_type A_v = simd::load(A);
> + simd_type B_v = simd::load(B);
> + simd_itype mask = simd_itype(simd::gt(A_v,B_v));
> + simd_itype nmask = simdi::bnot(mask);
> + simd_itype res = simdi::band(simd_itype(A_v),nmask);
> + simd::store(Z,simd_type(res));
> +
> + A += simd::vec_size;
> + B += simd::vec_size;
> + Z += simd::vec_size;
> + }
> +
> + simd::exit();
> +}
> +template <typename T>
> +struct Simd_threshold<T, true>
> +{
> + static void exec(T* Z, T* A, T* B, T k, int n)
> + {
> + typedef Simd_traits<T> simd;
> + typedef Simd_traits<int> simdi;
> + typedef typename simd::simd_type simd_type;
> + typedef typename simdi::simd_type simd_itype;
> +
> + // handle mis-aligned vectors
> + if (simd::alignment_of(A) != simd::alignment_of(B) ||
> + simd::alignment_of(Z) != simd::alignment_of(A))
> + {
> + Simd_threshold<T,false>::exec(Z,A,B,k,n);
> + return;
> + }
> +
> + // clean up initial unaligned values
> + while (simd::alignment_of(A) != 0)
> + {
> + if(*A > *B) *Z = *A;
> + else *Z = k;
> + A++;B++;Z++;
> + n--;
> + }
> +
> + if (n == 0) return;
> +
> +
> + if(k != T(0)) {
> + simd_thresh0(Z,A,B,n);
> + } else {
> + simd_thresh(Z,A,B,k,n);
> + }
> +
> + // handle last bits
> + while(n)
[3] And 'n' is what at this point? :)
This cleanup code is going to recompute all the work done by
simd_thresh0 or simd_thresh.
You either need to push this cleanup code into simd_thresh
(which knows the value of 'n' after it has done as much SIMD
work as possible), or have simd_thresh return the new value
of n.
My preference would be to push this down into simd_thresh.
The sharing here is nice, but pushing it down makes simd_thresh
more self-contained.
I wouldn't pass n by reference. The compiler might not generate
optimal code if it is worried about n's value changing out from
underneath.
> + {
> + if(*A > *B) *Z = *A;
> + else *Z = k;
> + A++;B++;Z++;
> + n--;
> + }
> +
> + }
> +};
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list