[vsipl++] [patch] vmmul, IPP scalar-view, dense<2> opt.

Mark Mitchell mark at codesourcery.com
Wed Dec 21 22:21:42 UTC 2005


Jules Bergmann wrote:
> This patch contains several performance enhancements:
> 
>  - Optimization of dense 2-D and 3-D get and put.  Previously, it
>    tried to be clever and abstract the 2-D and 3-D varieties into
>    a get(Point) and put(Point), which where later "de-abstracted"
>    by Layout, by converting the point into a location in memory to
>    access.  Our compilers have trouble collapsing this.  Changing
>    the implementation to pass the indices directly to layout
>    improves performance.

That's too bad (about the compilers), but this is exactly the right
decision; the basic goal is to deliver something that works well for the
customers, and the customers don't care about the fact that the code
underneath is a little more verbose.

I suspect that the problem (FYI) is that compilers in general, and GCC
in particular, are not as good at eliminating structures as they are
scalars.  When you bind things together into a point, you get something
that doesn't fit in registers, and compilers tend to drop these things
on the stack and then be unable to undo the damage.

>  - Dispatch matrix expressions through Serial_dispatch_evaluator.
> 
>    This, in along with an additional trasnspose tag in LibraryTagList,
>    lets us plug fast transpose algorithms into the dispatch
>    framework.
> 
>  - Add evaluator to decompose vector-matrix multiply into individual
>    vector-vector or scalar-vector element-wise operations (depending on
>    whether the multiply is by-row or by-column and what the output
>    dimension-ordering is), which are themselves dispatched.  For IPP,
>    this results in better performance than plain loop fusion.  For
>    non-IPP, this results in better performance then loop fusion if the
>    decomposition is to scalar-vector (because this reduces the memory
>    bandwidth of the operation), or the same performance if the
>    decomposition is to vector-vector.
> 
>  - Add IPP dispatch for scalar-vector add, subtract, multiply, and
>    divide operations.  New scalar-view test for additional coverage.

I'm nowhere near understanding the code, but I think these are all great
ideas!

-- 
Mark Mitchell
CodeSourcery, LLC
mark at codesourcery.com
(650) 331-3385 x713



More information about the vsipl++ mailing list