[patch] vmmul, IPP scalar-view, dense<2> opt.
Jules Bergmann
jules at codesourcery.com
Wed Dec 21 21:17:37 UTC 2005
This patch contains several performance enhancements:
- Optimization of dense 2-D and 3-D get and put. Previously, it
tried to be clever and abstract the 2-D and 3-D varieties into
a get(Point) and put(Point), which where later "de-abstracted"
by Layout, by converting the point into a location in memory to
access. Our compilers have trouble collapsing this. Changing
the implementation to pass the indices directly to layout
improves performance.
- Dispatch matrix expressions through Serial_dispatch_evaluator.
This, in along with an additional trasnspose tag in LibraryTagList,
lets us plug fast transpose algorithms into the dispatch
framework.
- Add evaluator to decompose vector-matrix multiply into individual
vector-vector or scalar-vector element-wise operations (depending on
whether the multiply is by-row or by-column and what the output
dimension-ordering is), which are themselves dispatched. For IPP,
this results in better performance than plain loop fusion. For
non-IPP, this results in better performance then loop fusion if the
decomposition is to scalar-vector (because this reduces the memory
bandwidth of the operation), or the same performance if the
decomposition is to vector-vector.
- Add IPP dispatch for scalar-vector add, subtract, multiply, and
divide operations. New scalar-view test for additional coverage.
Ok to commit?
-- Jules
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20051221/8abd50a1/attachment.ksh>
More information about the vsipl++
mailing list