From don at codesourcery.com  Mon Jun  2 04:51:35 2008
From: don at codesourcery.com (Don McCoy)
Date: Sun, 01 Jun 2008 22:51:35 -0600
Subject: [patch] fix in SAL backend for outer product
Message-ID: <48437C57.5070502@codesourcery.com>

This patch fixes a block dimension order check in the outer product
evaluator.  Note that this evaluator is only available when
VSIP_IMPL_SAL_USE_MAT_MUL is defined to something other than '0' (the
default).

As a result of the bug, the existing test coverage was missing the cases
where the output matrix was  column-major in the smallest dimension. 
Now those cases are getting picked up and validated.

Ok to commit?

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: so.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080601/4f6f1774/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: so.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080601/4f6f1774/attachment-0001.ksh>

From jules at codesourcery.com  Mon Jun  2 13:50:45 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 02 Jun 2008 09:50:45 -0400
Subject: [vsipl++] [patch] Kernel updates
In-Reply-To: <483F3DB9.9070102@codesourcery.com>
References: <483D79CC.9060804@codesourcery.com> <483D96FB.8030301@codesourcery.com> <483D9ADE.60202@codesourcery.com> <483F3DB9.9070102@codesourcery.com>
Message-ID: <4843FAB5.4070607@codesourcery.com>


>> It's sort of a moot point right now, as C++ kernels don't work with ALF 3.0.
> 
> */me looks at alf_functions.cpp in the SBIR demo code*
> 
> */me is rather confused*
> 
> As far as I can tell, C++ kernels seem to work fine with ALF 3.0, so
> long as the relevant functions (and the ALF macro stuff at the end) are
> wrapped in an 'extern "C"' block.

Thanks Brooks, I will poke further on this.

FWIW, before giving up on using C++ for the split-complex FFT kernel, I 
wrapped the ALF macros in extern "C" but then ran into a ALF runtime error.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon Jun  2 14:29:59 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 02 Jun 2008 10:29:59 -0400
Subject: [vsipl++] patch: enable/disable fftwf, fftw, and fftwl individually
In-Reply-To: <4841625E.7070505@codesourcery.com>
References: <483F91C5.5060701@codesourcery.com> <48405F4C.8040305@codesourcery.com> <48406A32.5030102@codesourcery.com> <48407438.7040301@codesourcery.com> <4841625E.7070505@codesourcery.com>
Message-ID: <484403E7.8030306@codesourcery.com>


> The 'provide_...' macros have values of either '0' or '1', but never ''.
> Thus, the original test:
> 
>>  if test "x$provide_fft_long_double" != "x"; then
> 
> will always succeed. I don't mind whether we use 0/1 or ''/'something', 
> but we should either test the real state or not test at all. So, the 
> least disruptive change would be just to take out the test, as it is 
> always true.

Ok, sounds like we're on the same page.

I was less concerned with removing the check than with changing the 
definition of the macro.

Instead of

>  if test "$neutral_acconfig" = 'y'; then
> -  if test "x$provide_fft_float" != "x"; then
> -    CPPFLAGS="$CPPFLAGS -DVSIP_IMPL_PROVIDE_FFT_FLOAT=$provide_fft_float"
> +  if test $provide_fft_float != 0; then
> +    CPPFLAGS="$CPPFLAGS -DVSIP_IMPL_PROVIDE_FFT_FLOAT=1"
>    fi

How about

>  if test "$neutral_acconfig" = 'y'; then
> -  if test "x$provide_fft_float" != "x"; then
>      CPPFLAGS="$CPPFLAGS -DVSIP_IMPL_PROVIDE_FFT_FLOAT=$provide_fft_float"
> -  fi

With that, and some address of the sh portability question (such as 
telling me not to worry about it :), and the patch looks good.

					-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Tue Jun  3 22:46:26 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 03 Jun 2008 18:46:26 -0400
Subject: patch: enhancements for freqswap()
Message-ID: <4845C9C2.4090502@codesourcery.com>

The attached patch reimplements vsip::freqswap(), yielding two important 
changes:

* The recently discovered restriction that the input Block type must be 
allocatable is removed.

* It uses a return-block optimization, allowing it avoid a temporary 
view by using expression template techniques.

Note that the algorithm itself was slightly enhanced to allow to operate 
on a block in-place, therefor removing the constraint that input and 
output must not alias.

OK to commit ?

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freqswap.hpp.diff
Type: text/x-patch
Size: 9353 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080603/0aab19bd/attachment.bin>

From jules at codesourcery.com  Wed Jun  4 12:27:39 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 04 Jun 2008 08:27:39 -0400
Subject: [vsipl++] patch: enhancements for freqswap()
In-Reply-To: <4845C9C2.4090502@codesourcery.com>
References: <4845C9C2.4090502@codesourcery.com>
Message-ID: <48468A3B.7050409@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch reimplements vsip::freqswap(), yielding two important 
> changes:
> 
> * The recently discovered restriction that the input Block type must be 
> allocatable is removed.
> 
> * It uses a return-block optimization, allowing it avoid a temporary 
> view by using expression template techniques.

You're now a fully trained return-block optimization expert! :)

> Note that the algorithm itself was slightly enhanced to allow to operate 
> on a block in-place, therefor removing the constraint that input and 
> output must not alias.

Is the 2D algorithm enhanced too?  I could see the 1D algorithm was 
enhanced, but wasn't about the 2D algorithm.

> 
> OK to commit ?
> 

Yes, this looks good.

				thanks,
				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Wed Jun  4 18:28:05 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 04 Jun 2008 14:28:05 -0400
Subject: [patch] Distributed transpose
Message-ID: <4846DEB5.9050300@codesourcery.com>

This patch fixes handling of distributed transposes of the form:

	A = B.transpose();

Ok to apply?
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: par-trans.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080604/6c21318b/attachment.ksh>

From don at codesourcery.com  Wed Jun  4 21:59:15 2008
From: don at codesourcery.com (Don McCoy)
Date: Wed, 04 Jun 2008 15:59:15 -0600
Subject: [patch] CML bindings for matrix transpose operations
Message-ID: <48471033.2060700@codesourcery.com>

This is patterned after the existing serial evaluator for transpose
operations using SIMD instructions, except that it dispatches operations
to CML.  One important difference is that it handles split complex as
well as interleaved.  Matrix copies are also performed (when the block
layouts match), but only if the strides are unit in the smallest
dimension, so as to have the potential to use the SPU's.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ct.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080604/13773eed/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ct.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080604/13773eed/attachment-0001.ksh>

From stefan at codesourcery.com  Thu Jun  5 00:39:46 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Wed, 04 Jun 2008 20:39:46 -0400
Subject: [vsipl++] [patch] CML bindings for matrix transpose operations
In-Reply-To: <48471033.2060700@codesourcery.com>
References: <48471033.2060700@codesourcery.com>
Message-ID: <484735D2.3030507@codesourcery.com>

Don,

I have some small comments and questions (one for Jules). Overall this 
looks good.


Don McCoy wrote:
> This is patterned after the existing serial evaluator for transpose
> operations using SIMD instructions, except that it dispatches operations
> to CML.  One important difference is that it handles split complex as
> well as interleaved.  Matrix copies are also performed (when the block
> layouts match), but only if the strides are unit in the smallest
> dimension, so as to have the potential to use the SPU's.

> +// These macros support scalar and interleaved complex types

I don't find these macros very useful. Most of them you use exactly 
once. And even the ones you use twice would result in simpler code if 
spelled out. For example:

> +
> +#define VSIP_IMPL_CML_TRANS(T, FCN, CML_FCN)    \
> +  inline void                                   \
> +  FCN(                                          \
> +    T* a, ptrdiff_t rsa, ptrdiff_t csa,         \
> +    T* z, ptrdiff_t rsz, ptrdiff_t csz,         \
> +    size_t m, size_t n)                         \
> +  {                                             \
> +    typedef Scalar_of<T>::type CML_T;           \
> +    CML_FCN(                                    \
> +      reinterpret_cast<CML_T*>(a), rsa, csa,    \
> +      reinterpret_cast<CML_T*>(z), rsz, csz,    \
> +      m, n );                                   \
> +  }
> +
> +VSIP_IMPL_CML_TRANS(float,               transpose, cml_mtrans_f)
> +VSIP_IMPL_CML_TRANS(std::complex<float>, transpose, cml_cmtrans_f)
> +#undef VSIP_IMPL_CML_TRANS
> +

actually boils down to

inline void
transpose(float *a, ptrdiff_t rsa, ptrdiff_t csa,
           float *z, ptrdiff_t rsz, ptrdiff_t csz,
           size_t m, size_t n)
{
   cml_mtrans_f(a, rsa, csa, z, rsz, csz, m, n);
}

for the first case, which I find much more readable than the macro
code above.


> +  static bool rt_valid(DstBlock& dst, SrcBlock const& src)
> +  { 
> +    bool rt = true;
> +
> +    // If performing a copy, both source and destination blocks
> +    // must be unit stride.
> +    if (Type_equal<src_order_type, dst_order_type>::value)
> +    {
> +      Ext_data<DstBlock> dst_ext(dst, SYNC_OUT);
> +      Ext_data<SrcBlock> src_ext(src, SYNC_IN);

These objects only exist to check the strides, right ? I'm aware that we 
don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we 
? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this 
context. Jules ?

> +
> +      dimension_type const s_dim1 = src_order_type::impl_dim1;
> +      dimension_type const d_dim1 = src_order_type::impl_dim1;

Why two constants, if they hold the same value ?

> +      if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1)
> +        rt = false;
> +    }
> +
> +    return rt; 
> +  }


> +
> +  static void exec(DstBlock& dst, SrcBlock const& src, row2_type, row2_type)
> +  {
> +    vsip::impl::Ext_data<DstBlock> dst_ext(dst, vsip::impl::SYNC_OUT);
> +    vsip::impl::Ext_data<SrcBlock> src_ext(src, vsip::impl::SYNC_IN);
> +

Why the full qualification here (but not above) ? (I know, this is 
really picky, but I like compact and concise code. ;-) )


Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From jules at codesourcery.com  Thu Jun  5 13:50:44 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 05 Jun 2008 09:50:44 -0400
Subject: [vsipl++] [patch] CML bindings for matrix transpose operations
In-Reply-To: <484735D2.3030507@codesourcery.com>
References: <48471033.2060700@codesourcery.com> <484735D2.3030507@codesourcery.com>
Message-ID: <4847EF34.5020306@codesourcery.com>

Stefan, Don,

> 
> for the first case, which I find much more readable than the macro
> code above.

I agree that macros are pretty much always less readable than straight 
code (in fact the first thing I usually do when debugging a problem in a 
macro is manually expand it).

However, in defense of Don's approach, the macros will quickly scale 
when CML supports additional types (double, complex<double>, int, short, 
etc etc).  Moreover, the functions created by the macros don't do much 
(just wrap a CML function with a consistent overloaded function 
interface), so there isn't much to understand.

The other way to get more mileage out of the macros is to use them for 
more than just a single function.  Transpose is an example of a 1-input, 
1-output matrix function.  Instead of calling the macro 
VSIP_IMPL_CML_TRANS, it could be called VSIP_IMPL_CML_MUNARY ("matrix 
function, unary ~ 1 argument").  Then the macro could be reused across 
mtrans, matrix copy (when we add it), and so on.  We don't have too many 
unary matrix functions yet, but the idea generalizes.

> 
> 
>> +  static bool rt_valid(DstBlock& dst, SrcBlock const& src)
>> +  { +    bool rt = true;
>> +
>> +    // If performing a copy, both source and destination blocks
>> +    // must be unit stride.
>> +    if (Type_equal<src_order_type, dst_order_type>::value)
>> +    {
>> +      Ext_data<DstBlock> dst_ext(dst, SYNC_OUT);
>> +      Ext_data<SrcBlock> src_ext(src, SYNC_IN);
> 
> These objects only exist to check the strides, right ? I'm aware that we 
> don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we 
> ? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this 
> context. Jules ?

I think SYNC_OUT and SYNC_IN make sense because they indicate how the 
data access will be used.

Because ct_valid includes a check that the cost of data access is 0, in 
rt_valid there is no danger that data access will require a copy.

Using SYNC_OUT and SYNC_IN allows the rt_valid declarations to exactly 
match those used in exec.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Thu Jun  5 14:06:13 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 05 Jun 2008 10:06:13 -0400
Subject: [vsipl++] [patch] CML bindings for matrix transpose operations
In-Reply-To: <48471033.2060700@codesourcery.com>
References: <48471033.2060700@codesourcery.com>
Message-ID: <4847F2D5.1090307@codesourcery.com>

Don McCoy wrote:
> This is patterned after the existing serial evaluator for transpose
> operations using SIMD instructions, except that it dispatches operations
> to CML.  One important difference is that it handles split complex as
> well as interleaved.  Matrix copies are also performed (when the block
> layouts match), but only if the strides are unit in the smallest
> dimension, so as to have the potential to use the SPU's.

Don,

This looks good.  I have a couple of minor comments below, but please
check it in.

				-- Jules


> +#define VSIP_IMPL_CML_COPY_UNIT(T, FCN, CML_FCN)   \
> +  inline void                                      \
> +  FCN(                                             \
> +    T* a, ptrdiff_t rsa,                           \
> +    T* z, ptrdiff_t rsz,                           \
> +    size_t n)                                      \
> +  {                                                \
> +    typedef Scalar_of<T>::type CML_T;              \
> +    CML_FCN(                                       \
> +      reinterpret_cast<CML_T*>(a), rsa,            \
> +      reinterpret_cast<CML_T*>(z), rsz,            \
> +      n * (Is_complex<T>::value ? 2 : 1));         \
> +  }
> +
> +VSIP_IMPL_CML_COPY_UNIT(float,          copy_unit, cml_vcopy_f)
> +VSIP_IMPL_CML_COPY_UNIT(complex<float>, copy_unit, cml_vcopy_f)

What does the '_unit' suffix indicate?  'copy_unit' appears to take a
vector with a stride.  It could easily handle a non-unit stride
vector.

For naming, you might call this function 'vcopy' to indicate that it
copies a vector (as opposed to a matrix).


> +    if      (s == 'r' && d == 'r')    return "Expr_Trans (rr copy)";
> +    else if (s == 'r' && d == 'c')    return "Expr_Trans (rc trans)";
> +    else if (s == 'c' && d == 'r')    return "Expr_Trans (cr trans)";
> +    else /* (s == 'c' && d == 'c') */ return "Expr_Trans (cc copy)";

You could rename these                           ^^^^^^^^^^
to something CML specific "Cml_tag matrix (rr copy)" etc.


> +  static bool rt_valid(DstBlock& dst, SrcBlock const& src)
> +  { 
...
> +      dimension_type const s_dim1 = src_order_type::impl_dim1;
> +      dimension_type const d_dim1 = src_order_type::impl_dim1;

As Stefan points out, this looks suspicious.  Did you really mean
*dst*_order_type for the second?

Also, because the copy sub-evaluators require the block to be dense
('assert(dst_ext.stride(0) == dst.size(2, 1))' etc), shouldn't
rt_valid enfore this restriction too?

> +
> +      if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1)
> +        rt = false;
> +    }
> +
> +    return rt; 
> +  }


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Thu Jun  5 18:43:58 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 05 Jun 2008 14:43:58 -0400
Subject: [patch] Fix lapack MV and VM prod stride bug
Message-ID: <484833EE.4020504@codesourcery.com>

This patch fixes a stride bug in the lapack MV and VM prod evaluators. 
Don's more thorough tests for the Cell turned these up!

Patch applied.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lapack-prod.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080605/0e026c7e/attachment.ksh>

From stefan at codesourcery.com  Thu Jun  5 19:06:13 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Thu, 05 Jun 2008 15:06:13 -0400
Subject: [vsipl++] [patch] Distributed transpose
In-Reply-To: <4846DEB5.9050300@codesourcery.com>
References: <4846DEB5.9050300@codesourcery.com>
Message-ID: <48483925.8020703@codesourcery.com>

Jules Bergmann wrote:
> This patch fixes handling of distributed transposes of the form:
> 
>     A = B.transpose();
> 
> Ok to apply?


Jules,

the patch looks good.

There were a couple of changes of the form:

> -/* Copyright (c) 2005 by CodeSourcery, LLC.  All rights reserved. */
> +/* Copyright (c) 2005, 2008 by CodeSourcery, LLC.  All rights reserved. */
>  

where I'd like to suggest we remove the obsolete 'LLC.', while applying 
other changes. (I don't think this is worth a full sweep over the sources !)

Also, this patch reminds me of two issues, which I hope we will find the 
time to address over the coming months:

* We need to address the many header interdependencies, in particular 
between the serial and the parallel set. There is some circularity that 
is very hard to break out of.

* There are sets of templates that form a self-contained block of 
functionality that I think is important to document, both, from a low 
level as well as middel-level view. I'm in particular thinking of the 
Combine_return_type logic (which I have run across multiple times this 
week already)


Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From jules at codesourcery.com  Thu Jun  5 20:32:22 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 05 Jun 2008 16:32:22 -0400
Subject: [vsipl++] [patch] Distributed transpose
In-Reply-To: <48483925.8020703@codesourcery.com>
References: <4846DEB5.9050300@codesourcery.com> <48483925.8020703@codesourcery.com>
Message-ID: <48484D56.4080905@codesourcery.com>


> * We need to address the many header interdependencies, in particular 
> between the serial and the parallel set. There is some circularity that 
> is very hard to break out of.

I completely agree.

Was there a particular dependency between the parallel/serial headers? 
Unfortunately the spec has a circular definition where maps return the 
processor set in a Vector, but Vectors have blocks, and blocks have 
maps, and maps have processor sets ... well you get the idea.

> 
> * There are sets of templates that form a self-contained block of 
> functionality that I think is important to document, both, from a low 
> level as well as middel-level view. I'm in particular thinking of the 
> Combine_return_type logic (which I have run across multiple times this 
> week already)

I completely agree too.

I'll take an action to add some documentation for Combine_return_type.

You'll notice I did add some gratuitous documentation to one of 
Subset_map_decl's functors ;)  Let's take this opportunity as we get 
back into the library (relearning what we've written before but didn't 
document), to add some documentation.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Thu Jun  5 20:46:27 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Thu, 05 Jun 2008 16:46:27 -0400
Subject: [vsipl++] [patch] Distributed transpose
In-Reply-To: <48484D56.4080905@codesourcery.com>
References: <4846DEB5.9050300@codesourcery.com> <48483925.8020703@codesourcery.com> <48484D56.4080905@codesourcery.com>
Message-ID: <484850A3.9060709@codesourcery.com>

Jules Bergmann wrote:
> 
>> * We need to address the many header interdependencies, in particular 
>> between the serial and the parallel set. There is some circularity 
>> that is very hard to break out of.
> 
> I completely agree.
> 
> Was there a particular dependency between the parallel/serial headers? 
> Unfortunately the spec has a circular definition where maps return the 
> processor set in a Vector, but Vectors have blocks, and blocks have 
> maps, and maps have processor sets ... well you get the idea.

Yes, that's the one. :-)

One issue that so far I have only been able to work around by somewhere 
injecting "#include <vsip/parallel.hpp>" was that some 
(expression-)block types use a Local_or_global_map, which derives from 
Global_map, which comes from parallel/*.
And unless all relevant templates (traits, etc.) are seen by the 
compiler, some operations may raise a compilation-error pointing out 
that it's invalid to mix local and global blocks in assignments...

>> * There are sets of templates that form a self-contained block of 
>> functionality that I think is important to document, both, from a low 
>> level as well as middel-level view. I'm in particular thinking of the 
>> Combine_return_type logic (which I have run across multiple times this 
>> week already)
> 
> I completely agree too.
> 
> I'll take an action to add some documentation for Combine_return_type.
> 
> You'll notice I did add some gratuitous documentation to one of 
> Subset_map_decl's functors ;)  Let's take this opportunity as we get 
> back into the library (relearning what we've written before but didn't 
> document), to add some documentation.

Yes ! (I have a list of topics that I want to document myself, so I'm 
happy to wrap that up, too.)

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From don at codesourcery.com  Sat Jun  7 22:52:09 2008
From: don at codesourcery.com (Don McCoy)
Date: Sat, 07 Jun 2008 16:52:09 -0600
Subject: [patch] Fix for Fir destructor not getting called
Message-ID: <484B1119.8060907@codesourcery.com>

This patch fixes a problem in which the Ref_counted_ptr holder's count_
member was being incremented one too many times, resulting in the Fir
destructor not getting called.

The problem occurs because in the dispatch mechanism
(Evaluator::exec()), the Fir_impl backend is created and stored in a
Ref_counted_ptr object, then passed to the Fir class and stored in yet
another Ref_counted_ptr object.  This results in a reference count of 2
after creation, inhibiting the destructor from being called when the Fir
object goes out of scope.

While this does in fact fix the problem, I would like to verify that it
is the correct fix in this case.  Comments?

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rc.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080607/622ca9c0/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080607/622ca9c0/attachment-0001.ksh>

From stefan at codesourcery.com  Sun Jun  8 13:41:27 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Sun, 08 Jun 2008 09:41:27 -0400
Subject: [vsipl++] [patch] Fix for Fir destructor not getting called
In-Reply-To: <484B1119.8060907@codesourcery.com>
References: <484B1119.8060907@codesourcery.com>
Message-ID: <484BE187.6020200@codesourcery.com>

Don McCoy wrote:
> This patch fixes a problem in which the Ref_counted_ptr holder's count_
> member was being incremented one too many times, resulting in the Fir
> destructor not getting called.
> 
> The problem occurs because in the dispatch mechanism
> (Evaluator::exec()), the Fir_impl backend is created and stored in a
> Ref_counted_ptr object, then passed to the Fir class and stored in yet
> another Ref_counted_ptr object.  This results in a reference count of 2
> after creation, inhibiting the destructor from being called when the Fir
> object goes out of scope.
> 
> While this does in fact fix the problem, I would like to verify that it
> is the correct fix in this case.  Comments?

Yes, I believe this is the correct fix: the created objects are of type 
Ref_count, which initializes the counter to 1. When we pass the newly 
created object to Ref_counted_ptr, we really mean to hand the object's 
ownership to it, too, so we must *not* increment the counter.

I have to admit that I find the ref counting API still quite confusing...

Regards,
		Stefan


-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From jules at codesourcery.com  Mon Jun  9 21:09:47 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 09 Jun 2008 17:09:47 -0400
Subject: [patch] Fix eval diag when CML not present
Message-ID: <484D9C1B.8060405@codesourcery.com>

This patch disables some CML specific diag bits when CML is not configured.

Patch applied.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: diag.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080609/b8c3b712/attachment.ksh>

From jules at codesourcery.com  Tue Jun 10 15:46:58 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 10 Jun 2008 11:46:58 -0400
Subject: [patch] Fix freqswap errata
Message-ID: <484EA1F2.40706@codesourcery.com>

This patch fixes a couple of bugs in freqswap that broke the cheby 
window test:

  - In-place freqswap was broken for odd vector and matrix sizes.
    For vectors, this was straight forward to fix.

    For matrices, this requires either creating temporary vectors
    approximately the size of the number of rows and columns of the
    matrix, or doing the swap in two phases.  Under the rationale that
    memory allocation is to be avoided outside of early binding, I
    implemented the two phase swap.

    The vector fix was enough to get the cheby window test to pass.
    However, I also optimized cheby to use an out-of-place freqswap
    to avoid a copy, which turned up another bug:

  - Freqswap_functor stored the referee block as a reference.
    This works for by-reference blocks, but not by-value blocks,
    such as expressions.  Fixed by using appropriate View_block_storage
    traits.

This patch also extends the freqswap test to cover the in-place and RHS 
expression cases.

Patch applied.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fs.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080610/1bee920e/attachment.ksh>

From jules at codesourcery.com  Tue Jun 10 16:56:53 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 10 Jun 2008 12:56:53 -0400
Subject: [vsipl++] [patch] Fix freqswap errata
In-Reply-To: <484EA1F2.40706@codesourcery.com>
References: <484EA1F2.40706@codesourcery.com>
Message-ID: <484EB255.8050303@codesourcery.com>

This patch fixes a bug I just introduced that broke freqswap for cases 
where the input and output blocks have different types!

It would also be possible to cast the pointers to (void*).  However, 
this approach allows the check to be avoided at compile-time when the 
blocks have different types.

Ok to apply?

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fs2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080610/2a5062af/attachment.ksh>

From stefan at codesourcery.com  Tue Jun 10 16:59:42 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 10 Jun 2008 12:59:42 -0400
Subject: [vsipl++] [patch] Fix freqswap errata
In-Reply-To: <484EB255.8050303@codesourcery.com>
References: <484EA1F2.40706@codesourcery.com> <484EB255.8050303@codesourcery.com>
Message-ID: <484EB2FE.1080607@codesourcery.com>

Jules Bergmann wrote:
> This patch fixes a bug I just introduced that broke freqswap for cases 
> where the input and output blocks have different types!
> 
> It would also be possible to cast the pointers to (void*).  However, 
> this approach allows the check to be avoided at compile-time when the 
> blocks have different types.
> 
> Ok to apply?

This looks good. Thanks !

		Stefan


-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From don at codesourcery.com  Wed Jun 11 04:42:23 2008
From: don at codesourcery.com (Don McCoy)
Date: Tue, 10 Jun 2008 22:42:23 -0600
Subject: [patch] CML backend fir FIR filters
Message-ID: <484F57AF.8070206@codesourcery.com>

This patch adds support for real, single-precision, floating point FIR
filters using CML. 

There is some overlap with another patch of mine (06/07: [patch] Fix for
Fir destructor not getting called) as well as one of Stefan's (06/09:
patch: Fix various FIR backends to support negative strides), but it's
minimal as most of my changes are in the file opt/cbe/cml/fir.hpp.  I
will merge before committing, once those patches are in.

Ok to commit?

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fb.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080610/ff1f234f/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fb.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080610/ff1f234f/attachment-0001.ksh>

From jules at codesourcery.com  Wed Jun 11 12:02:33 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 11 Jun 2008 08:02:33 -0400
Subject: [vsipl++] [patch] fix in SAL backend for outer product
In-Reply-To: <48437C57.5070502@codesourcery.com>
References: <48437C57.5070502@codesourcery.com>
Message-ID: <484FBED9.6000702@codesourcery.com>

Don McCoy wrote:
> This patch fixes a block dimension order check in the outer product
> evaluator.  Note that this evaluator is only available when
> VSIP_IMPL_SAL_USE_MAT_MUL is defined to something other than '0' (the
> default).
> 
> As a result of the bug, the existing test coverage was missing the cases
> where the output matrix was  column-major in the smallest dimension. 
> Now those cases are getting picked up and validated.
> 
> Ok to commit?

Don, Yes, this looks good, please check it in.  -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Wed Jun 11 12:45:14 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 11 Jun 2008 08:45:14 -0400
Subject: [vsipl++] [patch] CML backend fir FIR filters
In-Reply-To: <484F57AF.8070206@codesourcery.com>
References: <484F57AF.8070206@codesourcery.com>
Message-ID: <484FC8DA.1020206@codesourcery.com>

Don McCoy wrote:
> This patch adds support for real, single-precision, floating point FIR
> filters using CML. 
> 
> There is some overlap with another patch of mine (06/07: [patch] Fix for
> Fir destructor not getting called) as well as one of Stefan's (06/09:
> patch: Fix various FIR backends to support negative strides), but it's
> minimal as most of my changes are in the file opt/cbe/cml/fir.hpp.  I
> will merge before committing, once those patches are in.

Don,

This looks good, along with the reference counting and stride fixes.

Please apply.

				thanks,
				-- Jules

> +  Fir_impl(Fir_impl const &fir)
> +    : base(fir),
> +      fir_obj_ptr_(NULL),
> +      filter_state_(fir.filter_state_)
> +  {
> +    fir_create(
> +      &fir_obj_ptr_,
> +      fir.fir_obj_ptr_->K,

CML objects are intended to be opaque.

Let's keep this for now, but create an issue (#175) to add a new CML
attribute function that returns a pointer to the kernel coefficients.

	float*
	cml_fir_attr_kernel_f(
	   cml_fir_f* obj);


> +      1, // kernel stride
> +      this->decimation(),
> +      this->filter_state_,
> +      this->kernel_size(),
> +      this->input_size());
> +  }


> +template <typename T, symmetry_type S, obj_state C> 
> +struct Evaluator<Fir_tag, Cml_tag,
> +                 Ref_counted_ptr<Fir_backend<T, S, C> >
> +                 (aligned_array<T>, 
> +                  length_type, length_type, length_type,
> +                  unsigned, alg_hint_type)>
> +{
> +  static bool const ct_valid = // false;

Some left-over debug code         ^^^^^^^^^

> +    Type_equal<T, float>::value;
> +
> +  typedef Ref_counted_ptr<Fir_backend<T, S, C> > return_type;
> +  // rt_valid takes the first argument by reference to avoid taking
> +  // ownership.
> +  static bool rt_valid(aligned_array<T> const &, length_type k,
> +                       length_type i, length_type d,
> +                       unsigned, alg_hint_type)
> +  {
> +    length_type o = k * (1 + (S != nonsym)) - (S == sym_even_len_odd) - 1;
> +    assert(i > 0); // input size
> +    assert(d > 0); // decimation
> +    assert(o + 1 > d); // M >= decimation
> +    assert(i >= o);    // input_size >= M 
> +
> +    length_type output_size = (i + d - 1) / d;
> +    return i == output_size * d;

What is rt_valid checking exactly?

I think I see.  Does CML FIR not work on fixed size inputs if i % d !=
0?  Argh!  That's a mistake on my part with the API design.  I wonder
how often such cases happen in practice.

Regardless, can you add a comment to that effect?

"CML FIR objects have fixed output size, whereas VSIPL++ FIR objects
have fixed input size.  If input size is not a multiple of the
decimation, output size will vary from frame to frame."


> +  }

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Wed Jun 11 14:33:53 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 11 Jun 2008 10:33:53 -0400
Subject: [patch] Fix FFTW3 macro checks
Message-ID: <484FE251.5030705@codesourcery.com>

The FFTW_HAVE_{*} variables are either set to 1 or left undefined. 
However, they were being checked with an "#if", which caused an error 
when left undefined.  This patch changes the guard to #ifdef instead.

Since I twisted Stefan's arm :) into keeping the PROVIDE_FFT_{*} macros 
set to either 1 or 0, I just want to make sure this looks OK.

Ok to apply?

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fftw3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080611/028416a8/attachment.ksh>

From don at codesourcery.com  Wed Jun 11 22:44:40 2008
From: don at codesourcery.com (Don McCoy)
Date: Wed, 11 Jun 2008 16:44:40 -0600
Subject: [vsipl++] [patch] CML backend fir FIR filters
In-Reply-To: <484FC8DA.1020206@codesourcery.com>
References: <484F57AF.8070206@codesourcery.com> <484FC8DA.1020206@codesourcery.com>
Message-ID: <48505558.9080803@codesourcery.com>

Jules Bergmann wrote:
>   
>> +  Fir_impl(Fir_impl const &fir)
>> +    : base(fir),
>> +      fir_obj_ptr_(NULL),
>> +      filter_state_(fir.filter_state_)
>> +  {
>> +    fir_create(
>> +      &fir_obj_ptr_,
>> +      fir.fir_obj_ptr_->K,
>>     
>
> CML objects are intended to be opaque.
>
> Let's keep this for now, but create an issue (#175) to add a new CML
> attribute function that returns a pointer to the kernel coefficients.
>
> 	float*
> 	cml_fir_attr_kernel_f(
> 	   cml_fir_f* obj);
I should have mentioned this.  It seemed "wrong" to dip beneath the CML
API to get what I wanted, but the above fix will take care of it.  I'll
do this soon.


> What is rt_valid checking exactly?
>
> I think I see.  Does CML FIR not work on fixed size inputs if i % d !=
> 0?  Argh!  That's a mistake on my part with the API design.  I wonder
> how often such cases happen in practice.
>   
Several instances in our test harness were invoking the fir_opt
evaluator instead of ours.  I looked into it and found out this rt_valid
check is the reason.

> Regardless, can you add a comment to that effect?
>
> "CML FIR objects have fixed output size, whereas VSIPL++ FIR objects
> have fixed input size.  If input size is not a multiple of the
> decimation, output size will vary from frame to frame."
>
>
>   
I'll add the comment and leave it that way for now.  I'm not 100% sure
that CML won't do what we want, but I did try it.  It leads to this
error because it ends up exceeding the length of the output vector:


i = 1024, os*d = 1026
fir: ../vpp/src/vsip/core/subblock.hpp:293:
vsip::impl::Subset_block<Block>::Subset_block(const
vsip::Domain<vsip::impl::Subset_block<Block>::dim>&, Block&) [with Blo\
ck = vsip::Dense<1u, float, vsip::tuple<0u, 1u, 2u>, vsip::Local_map>]:
Assertion `dom_[d].size() == 0 || dom_[d].impl_last() < blk_->size(dim,
d)' failed.
/bin/sh: line 1: 14399 Aborted                 ./tests/fir


The chained FIR filter test is written as

  vsip::length_type got1a = 0;
  for (vsip::length_type i = 0; i < 2 * M; ++i) // chained
  {
    got1a += fir1a(
      input(vsip::Domain<1>(i * N, 1, N)),
      output1(vsip::Domain<1>(got1a, 1, (N + D - 1) / D)));
  }

The value got1a is incrementing by a fixed interval each time, because
the operator() function (which calls cml_fir_apply_f()) always returns
the same size. 

So it is as you suggested -- that CML essentially expects to process
even multiples of the output size each time.  It would take a change in
the apply function to make this work correctly -- such that 'apply'
returns the correct number of output values calculated on each
iteration.  Yet this might not be all that simple.

Just an historical note: I copied this implementation, including the
rt_valid check, from the IPP implementation.  Perhaps they had a good
reason for having that restriction as well...


-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From don at codesourcery.com  Wed Jun 11 23:12:59 2008
From: don at codesourcery.com (Don McCoy)
Date: Wed, 11 Jun 2008 17:12:59 -0600
Subject: [vsipl++] [patch] CML backend fir FIR filters
In-Reply-To: <484F57AF.8070206@codesourcery.com>
References: <484F57AF.8070206@codesourcery.com>
Message-ID: <48505BFB.40806@codesourcery.com>

Don McCoy wrote:
> This patch adds support for real, single-precision, floating point FIR
> filters using CML. 
>   
Feedback applied.  Committed as attached.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fb2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080611/1656e965/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fb2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080611/1656e965/attachment-0001.ksh>

From jules at codesourcery.com  Mon Jun 16 17:54:37 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 16 Jun 2008 13:54:37 -0400
Subject: [patch] Fix IPP FIR strides
Message-ID: <4856A8DD.7050108@codesourcery.com>

This patch catches up the IPP FIR BE to use 'stride_type' for strides.

Patch applied.

			-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fir.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080616/e1af3f80/attachment.ksh>

From jules at codesourcery.com  Mon Jun 16 20:31:10 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 16 Jun 2008 16:31:10 -0400
Subject: [patch] Fix pointer comparison
Message-ID: <4856CD8E.1040900@codesourcery.com>

The pointer equality check for an in-place transpose will cause a 
compilation error for assignments between different value types (such as 
in view.cpp).

This patch fixes that by introducing an 'is_same_ptr' that can compare 
pointers of different types.

It also includes a simple unit-test for in-place transpose.

Ok to apply?

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trans.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080616/8bba74e4/attachment.ksh>

From jules at codesourcery.com  Mon Jun 16 20:44:43 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 16 Jun 2008 16:44:43 -0400
Subject: [vsipl++] [patch] Fix pointer comparison
In-Reply-To: <4856CD8E.1040900@codesourcery.com>
References: <4856CD8E.1040900@codesourcery.com>
Message-ID: <4856D0BB.8010402@codesourcery.com>


> This patch fixes that by introducing an 'is_same_ptr' that can compare 
> pointers of different types.

Revised to limit application to pointers only.

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trans.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080616/f37502bc/attachment.ksh>

From stefan at codesourcery.com  Mon Jun 16 20:54:11 2008
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Mon, 16 Jun 2008 16:54:11 -0400
Subject: [vsipl++] [patch] Fix pointer comparison
In-Reply-To: <4856D0BB.8010402@codesourcery.com>
References: <4856CD8E.1040900@codesourcery.com> <4856D0BB.8010402@codesourcery.com>
Message-ID: <4856D2F3.6050200@codesourcery.com>

Jules Bergmann wrote:
> 
>> This patch fixes that by introducing an 'is_same_ptr' that can compare 
>> pointers of different types.
> 
> Revised to limit application to pointers only.

This patch looks good. Though I imagine that

template <typename T1, typename T2>
inline bool
is_same_ptr(T1 *ptr1, T2 *ptr2)
{
   return Is_same_ptr<T1*, T2*>::compare(ptr1, ptr2);
}

would result in a simpler error message "no matching function for call 
to 'is_same_ptr(non-pointer-type1, non-pointer-type2)'".


Thanks,
		Stefan


-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From don at codesourcery.com  Mon Jun 16 22:43:33 2008
From: don at codesourcery.com (Don McCoy)
Date: Mon, 16 Jun 2008 16:43:33 -0600
Subject: [patch] Add byte-swapping support to save/load_view.hpp
Message-ID: <4856EC95.2050504@codesourcery.com>

This patch utilizes the matlab byte-swapping functions to allow
load_view and save_view to work on big-endian platforms using data
written by little-endian systems (or vice-versa).  The test has been
expanded to cover the added functionality, however the default is such
that existing code will behave the same (i.e. bytes are not swapped when
read or written).

Ok to commit?

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lv.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080616/47c874e0/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lv.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080616/47c874e0/attachment-0001.ksh>

From jules at codesourcery.com  Wed Jun 25 14:58:32 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 25 Jun 2008 10:58:32 -0400
Subject: [vsipl++] [patch] Add byte-swapping support to save/load_view.hpp
In-Reply-To: <4856EC95.2050504@codesourcery.com>
References: <4856EC95.2050504@codesourcery.com>
Message-ID: <48625D18.1080009@codesourcery.com>

Don McCoy wrote:
> This patch utilizes the matlab byte-swapping functions to allow
> load_view and save_view to work on big-endian platforms using data
> written by little-endian systems (or vice-versa).  The test has been
> expanded to cover the added functionality, however the default is such
> that existing code will behave the same (i.e. bytes are not swapped when
> read or written).
> 
> Ok to commit?


Don, Looks good, please check this in.  -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Fri Jun 27 15:25:40 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 27 Jun 2008 11:25:40 -0400
Subject: [patch] Disable profiling in dot products
Message-ID: <48650674.403@codesourcery.com>

Some of the profiling event name code for dot product was consuming 
considerable time even when VSIP_IMPL_PROFILER == 0 (see below for an 
example).  This patch disables that when VSIP_IMPL_PROFILER & mask is false.

Ok to apply?

				-- Jules


Pre change

% benchmarks/dot -1 -ops -samples 3
# what             : t_dot1 (1)
# nproc            : 1
# ops_per_point(1) : 2
# riob_per_point(1): 8
# wiob_per_point(1): 0
# metric           : ops_per_sec
# start_loop       : 133341
      16 16.903055 16.885244 16.949444
      32 33.609573 33.514175 33.751736
      64 66.244659 66.077934 66.556038
     128 131.906174 130.535217 133.274078
     256 256.051178 255.275986 258.389465
     512 488.516510 488.469940 490.960785
    1024 859.449341 852.046570 865.133423
    2048 1438.800781 1436.463745 1450.017334
    4096 1849.857788 1845.208984 1853.868042
    8192 2116.325684 2113.715820 2117.107910
   16384 2440.822754 2436.575684 2441.876709
   32768 2707.750977 2697.868652 2708.683594
   65536 2829.450439 2823.039062 2830.946045
  131072 2885.994141 2882.093018 2891.500000
  262144 2871.778809 2868.683350 2883.067627
  524288 1753.057373 1681.539062 1754.856323
1048576 830.176514 826.936584 856.340210
2097152 759.201416 740.848389 761.915283


Post change

% benchmarks/dot -1 -ops -samples 3
# what             : t_dot1 (1)
# nproc            : 1
# ops_per_point(1) : 2
# riob_per_point(1): 8
# wiob_per_point(1): 0
# metric           : ops_per_sec
# start_loop       : 3378559
      16 393.554230 384.410278 424.985718
      32 722.641785 695.435608 730.206116
      64 1371.760376 1341.322266 1387.337769
     128 2211.171875 2210.553955 2213.565918
     256 3231.258545 3229.922119 3234.606445
     512 4069.423584 4054.140625 4075.931396
    1024 4726.631836 4715.732422 4733.557129
    2048 5040.998535 5032.172852 5041.894043
    4096 4921.556641 4913.420410 4929.406738
    8192 2906.246582 2903.860596 2910.550537
   16384 2924.517334 2917.423340 2928.528076
   32768 2962.245605 2960.767822 2964.117432
   65536 2958.519531 2952.603271 2960.056641
  131072 2951.296143 2945.608154 2962.209961
  262144 2858.965332 2855.086182 2867.772949
  524288 1654.068970 1621.036987 1683.348999
1048576 781.400024 758.868774 808.620667
2097152 737.620056 735.490540 743.508240


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: prof.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080627/805cd8b0/attachment.ksh>

From jules at codesourcery.com  Mon Jun 30 19:54:09 2008
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 30 Jun 2008 15:54:09 -0400
Subject: [patch] Fix FFT macros when neutral_acconfig = n
Message-ID: <486939E1.1050901@codesourcery.com>

This patch avoids defining VSIP_IMPL_FFTW3_HAVE_{TYPE} when TYPE is not 
supported.  This is necessary since the macro users check for definition 
(#ifdef).

Patch applied.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fft-m4.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20080630/249b7b7f/attachment.ksh>