[patch] Fix transpose

Jules Bergmann jules at codesourcery.com
Tue Apr 11 12:22:38 UTC 2006


This patch fixes the illegal instruction error with the new fast 
transpose when running on EM64t machines.  The code was using the macro 
__amd64__ to determine if 3DNow! instructions were supported.  However, 
this macro is defined when compiling for both EM64t and AMD64.  Now it 
uses the __3dNOW__ macro.

It adds some options to the benchmark driver.  The '-single SIZE' option 
runs a single benchmark size with a loop count of 1.  The center_range 
sweeps the problem sizes so that a specific non-power-of-2 value (the 
center) is covered.  This is useful for the HPEC corner-turn benchmark, 
which has a 50x5000 sized matrix.

It adds several cases to the mpi_alltoall benchmark:
  - MPI_Alltoallv case, with support for different sets of
    source and destination processors (previously the benchmark required
    that source and destination be the same)
  - Extended persistent_x case, with support for different sets of
    src/dst processors and an attempt to order messages to reduce
    contention.

Finally, it updates the interface of Plain_block (a block used only for 
testing) to make the Direct Data Access interface public.  This is 
necessary for subblocks to implement their own DDA.  This bug in 
Plain_block was exposed by previous changes to use memcpy for matrix 
copy when possible.

				-- Jules



-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060411/c2f330a2/attachment.ksh>


More information about the vsipl++ mailing list