[patch] Fix transpose
Jules Bergmann
jules at codesourcery.com
Tue Apr 11 12:22:38 UTC 2006
This patch fixes the illegal instruction error with the new fast
transpose when running on EM64t machines. The code was using the macro
__amd64__ to determine if 3DNow! instructions were supported. However,
this macro is defined when compiling for both EM64t and AMD64. Now it
uses the __3dNOW__ macro.
It adds some options to the benchmark driver. The '-single SIZE' option
runs a single benchmark size with a loop count of 1. The center_range
sweeps the problem sizes so that a specific non-power-of-2 value (the
center) is covered. This is useful for the HPEC corner-turn benchmark,
which has a 50x5000 sized matrix.
It adds several cases to the mpi_alltoall benchmark:
- MPI_Alltoallv case, with support for different sets of
source and destination processors (previously the benchmark required
that source and destination be the same)
- Extended persistent_x case, with support for different sets of
src/dst processors and an attempt to order messages to reduce
contention.
Finally, it updates the interface of Plain_block (a block used only for
testing) to make the Direct Data Access interface public. This is
necessary for subblocks to implement their own DDA. This bug in
Plain_block was exposed by previous changes to use memcpy for matrix
copy when possible.
-- Jules
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060411/c2f330a2/attachment.ksh>
More information about the vsipl++
mailing list