From assem at codesourcery.com  Fri Jun  1 14:46:03 2007
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 01 Jun 2007 10:46:03 -0400
Subject: SIMD loop fusion support for unaligned
Message-ID: <4660312B.5030009@codesourcery.com>

Everyone,
  This patch adds a new unary operator, unaligned. This operator hints 
to the compiler that this array may be unaligned. This allows the user 
to mix unaligned and aligned vectors.

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.05302007.1.log
Type: text/x-log
Size: 27901 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070601/b022fb17/attachment.bin>

From assem at codesourcery.com  Fri Jun  1 15:07:16 2007
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 01 Jun 2007 11:07:16 -0400
Subject: Support for parallel generator blocks
Message-ID: <46603624.5040002@codesourcery.com>

Everyone,
  This patch was submitted a while ago but didn't receive any feedback. 
This patch has a Choose_local_block addition that switches between 
Map_subset_block and Subset_block.

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.06012007.1.log
Type: text/x-log
Size: 2382 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070601/1661812c/attachment.bin>

From assem at codesourcery.com  Fri Jun  1 15:08:38 2007
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 01 Jun 2007 11:08:38 -0400
Subject: fftw3 split support
Message-ID: <46603676.3040504@codesourcery.com>

Everyone,
  This patch supports split ffts using fftw3 backend.

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.06012007.2.log
Type: text/x-log
Size: 27041 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070601/607ce18d/attachment.bin>

From assem at codesourcery.com  Fri Jun  1 15:13:24 2007
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 01 Jun 2007 11:13:24 -0400
Subject: benchmarks
Message-ID: <46603794.4050608@codesourcery.com>

Everyone,
  This patch contains two benchmarks, one for expression template stuff 
and the other for vramp.

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.06012007.3.log
Type: text/x-log
Size: 9754 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070601/997a1391/attachment.bin>

From jules at codesourcery.com  Mon Jun  4 14:41:53 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 04 Jun 2007 10:41:53 -0400
Subject: [vsipl++] SIMD loop fusion support for unaligned
In-Reply-To: <4660312B.5030009@codesourcery.com>
References: <4660312B.5030009@codesourcery.com>
Message-ID: <466424B1.6000802@codesourcery.com>

Assem Salama wrote:
 > Everyone,
 >  This patch adds a new unary operator, unaligned. This operator hints to
 > the compiler that this array may be unaligned. This allows the user to
 > mix unaligned and aligned vectors.

Assem,

This looks good.  Can you add 'has_perm' to the general simd traits
class (faux-SIMD), then check in?

				thanks,
				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon Jun  4 14:49:44 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 04 Jun 2007 10:49:44 -0400
Subject: [vsipl++] Support for parallel generator blocks
In-Reply-To: <46603624.5040002@codesourcery.com>
References: <46603624.5040002@codesourcery.com>
Message-ID: <46642688.8000008@codesourcery.com>

Assem Salama wrote:
> Everyone,
>  This patch was submitted a while ago but didn't receive any feedback. 
> This patch has a Choose_local_block addition that switches between 
> Map_subset_block and Subset_block.

Assem,

Thanks for resending this.  I did have some feedback from the first time 
around, I apologize if you did not see it:


This looks good, however, can you extend Choose_subblock to handle 
Global_map and Replicated_map?  Both maps should be able to use a 
Subset_block.

Also, you might consider specializing Create_subblock based on the 
RetBlock type rather than Map type, since the RetBlock type is what 
governs the arguments to the constructor.  As currently written, if you 
add a new cases to Choose_subblock (say for Global_map), but forget to 
add it to Create_subblock, you'll get an error.

                 -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon Jun  4 14:57:03 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 04 Jun 2007 10:57:03 -0400
Subject: [vsipl++] fftw3 split support
In-Reply-To: <46603676.3040504@codesourcery.com>
References: <46603676.3040504@codesourcery.com>
Message-ID: <4664283F.2080708@codesourcery.com>

Assem Salama wrote:
> Everyone,
>  This patch supports split ffts using fftw3 backend.

Assem,

Is this the same patch as

http://www.codesourcery.com/archives/vsipl%2B%2B/msg01024.html

?

If so, it looks good, modulo one comment.  See:

http://www.codesourcery.com/archives/vsipl%2B%2B/msg01033.html

				thanks,
				-- Jules

> 
> Thanks,
> Assem
> 
> 
> ------------------------------------------------------------------------
> 
> Index: src/vsip/opt/fftw3/fft.cpp
> ===================================================================
> --- src/vsip/opt/fftw3/fft.cpp	(revision 165174)
> +++ src/vsip/opt/fftw3/fft.cpp	(working copy)
> @@ -19,6 +19,11 @@
>  #include <vsip/support.hpp>
>  #include <fftw3.h>
>  
> +// We need to include this create_plan.hpp header file because fft_impl.cpp
> +// uses this file. We cannot include this file in fft_impl.cpp because
> +// fft_impl.cpp gets included multiple times here.
> +#include <vsip/opt/fftw3/create_plan.hpp>
> +
>  /***********************************************************************
>    Declarations
>  ***********************************************************************/
> Index: src/vsip/opt/fftw3/fft_impl.cpp
> ===================================================================
> --- src/vsip/opt/fftw3/fft_impl.cpp	(revision 168725)
> +++ src/vsip/opt/fftw3/fft_impl.cpp	(working copy)
> @@ -21,8 +21,8 @@
>  #include <vsip/core/fft/util.hpp>
>  #include <vsip/opt/fftw3/fft.hpp>
>  #include <vsip/core/equal.hpp>
> +#include <vsip/dense.hpp>
>  #include <fftw3.h>
> -#include <complex>
>  
>  /***********************************************************************
>    Declarations
> @@ -40,25 +40,25 @@
>  {
>    Fft_base(Domain<D> const& dom, int exp, int flags)
>      VSIP_THROW((std::bad_alloc))
> -      : in_buffer_(32, dom.size()),
> -	out_buffer_(32, dom.size())
> +      : in_buffer_(dom.size()),
> +	out_buffer_(dom.size())
>    {
>      // For multi-dimensional transforms, these plans assume both
>      // input and output data is dense, row-major, interleave-complex
>      // format.
> -
> -    for (vsip::dimension_type i = 0; i < D; ++i) size_[i] = dom[i].size();
> -    plan_in_place_ = FFTW(plan_dft)(D, size_,
> -      reinterpret_cast<FFTW(complex)*>(in_buffer_.get()),
> -      reinterpret_cast<FFTW(complex)*>(in_buffer_.get()),
> -      exp, flags);
>      
> +    for(index_type i=0;i<D;i++) size_[i] = dom[i].size();
> +    plan_in_place_ =
> +      Create_plan<vsip::impl::dense_complex_type>
> +        ::create<FFTW(plan), FFTW(iodim)>
> +        (in_buffer_.ptr(), in_buffer_.ptr(), exp, flags, dom);
> +    
>      if (!plan_in_place_) VSIP_IMPL_THROW(std::bad_alloc());
>  
> -    plan_by_reference_ = FFTW(plan_dft)(D, size_,
> -      reinterpret_cast<FFTW(complex)*>(in_buffer_.get()),
> -      reinterpret_cast<FFTW(complex)*>(out_buffer_.get()),
> -      exp, FFTW_PRESERVE_INPUT | flags);    
> +    plan_by_reference_ = Create_plan<vsip::impl::dense_complex_type>
> +      ::create<FFTW(plan), FFTW(iodim)>
> +      (in_buffer_.ptr(), out_buffer_.ptr(), exp, flags, dom);
> +
>      if (!plan_by_reference_)
>      {
>        FFTW(destroy_plan)(plan_in_place_);
> @@ -71,8 +71,8 @@
>      if (plan_by_reference_) FFTW(destroy_plan)(plan_by_reference_);
>    }
>  
> -  aligned_array<std::complex<SCALAR_TYPE> > in_buffer_;
> -  aligned_array<std::complex<SCALAR_TYPE> > out_buffer_;
> +  Cmplx_buffer<dense_complex_type, SCALAR_TYPE> in_buffer_;
> +  Cmplx_buffer<dense_complex_type, SCALAR_TYPE> out_buffer_;
>    FFTW(plan) plan_in_place_;
>    FFTW(plan) plan_by_reference_;
>    int size_[D];
> @@ -84,17 +84,15 @@
>    Fft_base(Domain<D> const& dom, int A, int flags)
>      VSIP_THROW((std::bad_alloc))
>      : in_buffer_(32, dom.size()),
> -      out_buffer_(32, dom.size())
> +      out_buffer_(dom.size())
>    { 
>      for (vsip::dimension_type i = 0; i < D; ++i) size_[i] = dom[i].size();  
>      // FFTW3 assumes A == D - 1.
>      // See also query_layout().
>      if (A != D - 1) std::swap(size_[A], size_[D - 1]);
> -    plan_by_reference_ = FFTW(plan_dft_r2c)(
> -      D, size_,
> -      in_buffer_.get(), reinterpret_cast<FFTW(complex)*>(out_buffer_.get()),
> -      FFTW_PRESERVE_INPUT | flags);
> -    
> +    plan_by_reference_ = Create_plan<dense_complex_type>::
> +      create<FFTW(plan), FFTW(iodim)>
> +      (in_buffer_.get(), out_buffer_.ptr(), A, flags, dom);
>      if (!plan_by_reference_) VSIP_IMPL_THROW(std::bad_alloc());
>    }
>    ~Fft_base() VSIP_NOTHROW
> @@ -103,7 +101,7 @@
>    }
>  
>    aligned_array<SCALAR_TYPE> in_buffer_;
> -  aligned_array<std::complex<SCALAR_TYPE> > out_buffer_;
> +  Cmplx_buffer<dense_complex_type, SCALAR_TYPE> out_buffer_;
>    FFTW(plan) plan_by_reference_;
>    int size_[D];
>  };
> @@ -113,17 +111,16 @@
>  {
>    Fft_base(Domain<D> const& dom, int A, int flags)
>      VSIP_THROW((std::bad_alloc))
> -    : in_buffer_(32, dom.size()),
> +    : in_buffer_(dom.size()),
>        out_buffer_(32, dom.size())
>    {
>      for (vsip::dimension_type i = 0; i < D; ++i) size_[i] = dom[i].size();
>      // FFTW3 assumes A == D - 1.
>      // See also query_layout().
>      if (A != D - 1) std::swap(size_[A], size_[D - 1]);
> -    plan_by_reference_ = FFTW(plan_dft_c2r)(
> -      D, size_,
> -      reinterpret_cast<FFTW(complex)*>(in_buffer_.get()), out_buffer_.get(),
> -      flags);
> +    plan_by_reference_ = Create_plan<dense_complex_type>::
> +      create<FFTW(plan), FFTW(iodim)>
> +      (in_buffer_.ptr(), out_buffer_.get(), A, flags, dom);
>  
>      if (!plan_by_reference_) VSIP_IMPL_THROW(std::bad_alloc());
>    }
> @@ -132,8 +129,8 @@
>      if (plan_by_reference_) FFTW(destroy_plan)(plan_by_reference_);
>    }
>  
> -  aligned_array<std::complex<SCALAR_TYPE> > in_buffer_;
> -  aligned_array<SCALAR_TYPE> out_buffer_;
> +  Cmplx_buffer<dense_complex_type, SCALAR_TYPE> in_buffer_;
> +  aligned_array<SCALAR_TYPE>              out_buffer_;
>    FFTW(plan) plan_by_reference_;
>    int size_[D];
>  };
> @@ -156,6 +153,23 @@
>      : Fft_base<1, ctype, ctype>(dom, E, convert_NoT(number))
>    {}
>    virtual char const* name() { return "fft-fftw3-1D-complex"; }
> +  virtual void query_layout(Rt_layout<1> &rtl_inout)
> +  {
> +    // By default use unit_stride, tuple<0, 1, 2>
> +    rtl_inout.pack = stride_unit_dense;
> +    rtl_inout.order = tuple<0, 1, 2>();
> +    // make default based on library
> +    rtl_inout.complex = Create_plan<dense_complex_type>::format;
> +  }
> +  virtual void query_layout(Rt_layout<1> &rtl_in, Rt_layout<1> &rtl_out)
> +  {
> +    // By default use unit_stride, tuple<0, 1, 2>
> +    rtl_in.pack = rtl_out.pack = stride_unit_dense;
> +    rtl_in.order = rtl_out.order = tuple<0, 1, 2>();
> +    // make default based on library
> +    rtl_in.complex = rtl_out.complex = Create_plan<dense_complex_type>::format;
> +  }
> +
>    virtual void in_place(ctype *inout, stride_type s, length_type l)
>    {
>      assert(s == 1 && static_cast<int>(l) == this->size_[0]);
> @@ -163,8 +177,12 @@
>  		      reinterpret_cast<FFTW(complex)*>(inout),
>  		      reinterpret_cast<FFTW(complex)*>(inout));
>    }
> -  virtual void in_place(ztype, stride_type, length_type)
> +  virtual void in_place(ztype inout, stride_type s, length_type l)
>    {
> +    assert(s == 1 && static_cast<int>(l) == this->size_[0]);
> +    FFTW(execute_split_dft)(plan_in_place_,
> +		      inout.first, inout.second,
> +		      inout.first, inout.second);
>    }
>    virtual void by_reference(ctype *in, stride_type in_stride,
>  			    ctype *out, stride_type out_stride,
> @@ -173,13 +191,18 @@
>      assert(in_stride == 1 && out_stride == 1 &&
>  	   static_cast<int>(length) == this->size_[0]);
>      FFTW(execute_dft)(plan_by_reference_,
> -		      reinterpret_cast<FFTW(complex)*>(in), 
> +		      reinterpret_cast<FFTW(complex)*>(in),
>  		      reinterpret_cast<FFTW(complex)*>(out));
>    }
> -  virtual void by_reference(ztype, stride_type,
> -			    ztype, stride_type,
> -			    length_type)
> +  virtual void by_reference(ztype in, stride_type in_stride,
> +			    ztype out, stride_type out_stride,
> +			    length_type length)
>    {
> +    assert(in_stride == 1 && out_stride == 1 &&
> +	   static_cast<int>(length) == this->size_[0]);
> +    FFTW(execute_split_dft)(plan_by_reference_,
> +		      in.first,  in.second,
> +		      out.first, out.second);
>    }
>  };
>  
> @@ -206,11 +229,29 @@
>      FFTW(execute_dft_r2c)(plan_by_reference_, 
>  			  in, reinterpret_cast<FFTW(complex)*>(out));
>    }
> -  virtual void by_reference(rtype *, stride_type,
> -			    ztype, stride_type,
> -			    length_type)
> +  virtual void by_reference(rtype *in, stride_type is,
> +			    ztype out, stride_type os,
> +			    length_type length)
>    {
> +    FFTW(execute_split_dft_r2c)(plan_by_reference_, 
> +			  in, out.first, out.second);
>    }
> +  virtual void query_layout(Rt_layout<1> &rtl_inout)
> +  {
> +    // By default use unit_stride, tuple<0, 1, 2>
> +    rtl_inout.pack = stride_unit_dense;
> +    rtl_inout.order = tuple<0, 1, 2>();
> +    // make default based on library
> +    rtl_inout.complex = Create_plan<dense_complex_type>::format;
> +  }
> +  virtual void query_layout(Rt_layout<1> &rtl_in, Rt_layout<1> &rtl_out)
> +  {
> +    // By default use unit_stride, tuple<0, 1, 2>
> +    rtl_in.pack = rtl_out.pack = stride_unit_dense;
> +    rtl_in.order = rtl_out.order = tuple<0, 1, 2>();
> +    // make default based on library
> +    rtl_in.complex = rtl_out.complex = Create_plan<dense_complex_type>::format;
> +  }
>  
>  };
>  
> @@ -241,11 +282,29 @@
>      FFTW(execute_dft_c2r)(plan_by_reference_,
>  			  reinterpret_cast<FFTW(complex)*>(in), out);
>    }
> -  virtual void by_reference(ztype, stride_type,
> -			    rtype *, stride_type,
> -			    length_type)
> +  virtual void by_reference(ztype in, stride_type is,
> +			    rtype *out, stride_type os,
> +			    length_type length)
>    {
> +    FFTW(execute_split_dft_c2r)(plan_by_reference_,
> +			  in.first, in.second, out);
>    }
> +  virtual void query_layout(Rt_layout<1> &rtl_inout)
> +  {
> +    // By default use unit_stride, tuple<0, 1, 2>
> +    rtl_inout.pack = stride_unit_dense;
> +    rtl_inout.order = tuple<0, 1, 2>();
> +    // make default based on library
> +    rtl_inout.complex = Create_plan<dense_complex_type>::format;
> +  }
> +  virtual void query_layout(Rt_layout<1> &rtl_in, Rt_layout<1> &rtl_out)
> +  {
> +    // By default use unit_stride, tuple<0, 1, 2>, cmplx_inter_fmt
> +    rtl_in.pack = rtl_out.pack = stride_unit_dense;
> +    rtl_in.order = rtl_out.order = tuple<0, 1, 2>();
> +    // make default based on library
> +    rtl_in.complex = rtl_out.complex = Create_plan<dense_complex_type>::format;
> +  }
>  
>  };
>  
> @@ -270,8 +329,8 @@
>    virtual void query_layout(Rt_layout<2> &rtl_in, Rt_layout<2> &rtl_out)
>    {
>      rtl_in.pack = stride_unit_dense;
> -    rtl_in.complex = cmplx_inter_fmt;
>      rtl_in.order = row2_type();
> +    rtl_in.complex = Create_plan<dense_complex_type>::format;
>      rtl_out = rtl_in;
>    }
>    virtual void in_place(ctype *inout,
> @@ -288,10 +347,13 @@
>  		      reinterpret_cast<FFTW(complex)*>(inout));
>    }
>    /// complex (split) in-place
> -  virtual void in_place(ztype,
> +  virtual void in_place(ztype inout,
>  			stride_type, stride_type,
>  			length_type, length_type)
>    {
> +    FFTW(execute_split_dft)(plan_in_place_,
> +		      inout.first, inout.second,
> +		      inout.first, inout.second);
>    }
>    virtual void by_reference(ctype *in,
>  			    stride_type in_r_stride,
> @@ -311,12 +373,21 @@
>  		      reinterpret_cast<FFTW(complex)*>(in), 
>  		      reinterpret_cast<FFTW(complex)*>(out));
>    }
> -  virtual void by_reference(ztype,
> -			    stride_type, stride_type,
> -			    ztype,
> -			    stride_type, stride_type,
> -			    length_type, length_type)
> +  virtual void by_reference(ztype in,
> +			    stride_type in_r_stride, stride_type in_c_stride,
> +			    ztype out,
> +			    stride_type out_r_stride, stride_type out_c_stride,
> +			    length_type, length_type cols)
>    {
> +    // Check that data is dense row-major.
> +    assert(in_r_stride == static_cast<stride_type>(cols));
> +    assert(in_c_stride == 1);
> +    assert(out_r_stride == static_cast<stride_type>(cols));
> +    assert(out_c_stride == 1);
> +
> +    FFTW(execute_split_dft)(plan_by_reference_,
> +                            in.first, in.second,
> +                            out.first, out.second);
>    }
>  };
>  
> @@ -344,7 +415,7 @@
>      // FFTW3 assumes A is the last dimension.
>      if (A == 0) rtl_in.order = tuple<1, 0, 2>();
>      else rtl_in.order = tuple<0, 1, 2>();
> -    rtl_in.complex = cmplx_inter_fmt;
> +    rtl_in.complex = Create_plan<dense_complex_type>::format;
>      rtl_out = rtl_in;
>    }
>    virtual bool requires_copy(Rt_layout<2> &) { return true;}
> @@ -358,12 +429,14 @@
>      FFTW(execute_dft_r2c)(plan_by_reference_,
>  			  in, reinterpret_cast<FFTW(complex)*>(out));
>    }
> -  virtual void by_reference(rtype *,
> +  virtual void by_reference(rtype *in,
>  			    stride_type, stride_type,
> -			    ztype,
> +			    ztype out,
>  			    stride_type, stride_type,
>  			    length_type, length_type)
>    {
> +    FFTW(execute_split_dft_r2c)(plan_by_reference_,
> +			  in, out.first, out.second);
>    }
>  
>  };
> @@ -392,7 +465,7 @@
>      // FFTW3 assumes A is the last dimension.
>      if (A == 0) rtl_in.order = tuple<1, 0, 2>();
>      else rtl_in.order = tuple<0, 1, 2>();
> -    rtl_in.complex = cmplx_inter_fmt;
> +    rtl_in.complex = Create_plan<dense_complex_type>::format;
>      rtl_out = rtl_in;
>    }
>    virtual bool requires_copy(Rt_layout<2> &) { return true;}
> @@ -406,12 +479,14 @@
>      FFTW(execute_dft_c2r)(plan_by_reference_, 
>  			  reinterpret_cast<FFTW(complex)*>(in), out);
>    }
> -  virtual void by_reference(ztype,
> +  virtual void by_reference(ztype in,
>  			    stride_type, stride_type,
> -			    rtype *,
> +			    rtype *out,
>  			    stride_type, stride_type,
>  			    length_type, length_type)
>    {
> +    FFTW(execute_split_dft_c2r)(plan_by_reference_, 
> +			  in.first, in.second, out);
>    }
>  
>  };
> @@ -437,8 +512,8 @@
>    virtual void query_layout(Rt_layout<3> &rtl_in, Rt_layout<3> &rtl_out)
>    {
>      rtl_in.pack = stride_unit_dense;
> -    rtl_in.complex = cmplx_inter_fmt;
>      rtl_in.order = row3_type();
> +    rtl_in.complex = Create_plan<dense_complex_type>::format;
>      rtl_out = rtl_in;
>    }
>    virtual void in_place(ctype *inout,
> @@ -462,14 +537,26 @@
>  		      reinterpret_cast<FFTW(complex)*>(inout),
>  		      reinterpret_cast<FFTW(complex)*>(inout));
>    }
> -  virtual void in_place(ztype,
> -			stride_type,
> -			stride_type,
> -			stride_type,
> -			length_type,
> -			length_type,
> -			length_type)
> +  virtual void in_place(ztype inout,
> +			stride_type x_stride,
> +			stride_type y_stride,
> +			stride_type z_stride,
> +			length_type x_length,
> +			length_type y_length,
> +			length_type z_length)
>    {
> +    assert(static_cast<int>(x_length) == this->size_[0]);
> +    assert(static_cast<int>(y_length) == this->size_[1]);
> +    assert(static_cast<int>(z_length) == this->size_[2]);
> +
> +    // Check that data is dense row-major.
> +    assert(x_stride == static_cast<stride_type>(y_length*z_length));
> +    assert(y_stride == static_cast<stride_type>(z_length));
> +    assert(z_stride == 1);
> +
> +    FFTW(execute_split_dft)(plan_in_place_,
> +		      inout.first, inout.second,
> +		      inout.first, inout.second);
>    }
>    virtual void by_reference(ctype *in,
>  			    stride_type in_x_stride,
> @@ -499,18 +586,33 @@
>  		      reinterpret_cast<FFTW(complex)*>(in), 
>  		      reinterpret_cast<FFTW(complex)*>(out));
>    }
> -  virtual void by_reference(ztype,
> -			    stride_type,
> -			    stride_type,
> -			    stride_type,
> -			    ztype,
> -			    stride_type,
> -			    stride_type,
> -			    stride_type,
> -			    length_type,
> -			    length_type,
> -			    length_type)
> +  virtual void by_reference(ztype in,
> +			    stride_type in_x_stride,
> +			    stride_type in_y_stride,
> +			    stride_type in_z_stride,
> +			    ztype out,
> +			    stride_type out_x_stride,
> +			    stride_type out_y_stride,
> +			    stride_type out_z_stride,
> +			    length_type x_length,
> +			    length_type y_length,
> +			    length_type z_length)
>    {
> +    assert(static_cast<int>(x_length) == this->size_[0]);
> +    assert(static_cast<int>(y_length) == this->size_[1]);
> +    assert(static_cast<int>(z_length) == this->size_[2]);
> +
> +    // Check that data is dense row-major.
> +    assert(in_x_stride == static_cast<stride_type>(y_length*z_length));
> +    assert(in_y_stride == static_cast<stride_type>(z_length));
> +    assert(in_z_stride == 1);
> +    assert(out_x_stride == static_cast<stride_type>(y_length*z_length));
> +    assert(out_y_stride == static_cast<stride_type>(z_length));
> +    assert(out_z_stride == 1);
> +
> +    FFTW(execute_split_dft)(plan_by_reference_,
> +                      in.first, in.second,
> +                      out.first, out.second);
>    }
>  };
>  
> @@ -542,7 +644,7 @@
>        case 1: rtl_in.order = tuple<0, 2, 1>(); break;
>        default: rtl_in.order = tuple<0, 1, 2>(); break;
>      }
> -    rtl_in.complex = cmplx_inter_fmt;
> +    rtl_in.complex = Create_plan<dense_complex_type>::format;
>      rtl_out = rtl_in;
>    }
>    virtual bool requires_copy(Rt_layout<3> &) { return true;}
> @@ -562,11 +664,11 @@
>      FFTW(execute_dft_r2c)(plan_by_reference_,
>  			  in, reinterpret_cast<FFTW(complex)*>(out));
>    }
> -  virtual void by_reference(rtype *,
> +  virtual void by_reference(rtype *in,
>  			    stride_type,
>  			    stride_type,
>  			    stride_type,
> -			    ztype,
> +			    ztype out,
>  			    stride_type,
>  			    stride_type,
>  			    stride_type,
> @@ -574,6 +676,8 @@
>  			    length_type,
>  			    length_type)
>    {
> +    FFTW(execute_split_dft_r2c)(plan_by_reference_,
> +			  in, out.first, out.second);
>    }
>  
>  };
> @@ -606,7 +710,7 @@
>        case 1: rtl_in.order = tuple<0, 2, 1>(); break;
>        default: rtl_in.order = tuple<0, 1, 2>(); break;
>      }
> -    rtl_in.complex = cmplx_inter_fmt;
> +    rtl_in.complex = Create_plan<dense_complex_type>::format;
>      rtl_out = rtl_in;
>    }
>    virtual bool requires_copy(Rt_layout<3> &) { return true;}
> @@ -626,11 +730,11 @@
>      FFTW(execute_dft_c2r)(plan_by_reference_,
>  			  reinterpret_cast<FFTW(complex)*>(in), out);
>    }
> -  virtual void by_reference(ztype,
> +  virtual void by_reference(ztype in,
>  			    stride_type,
>  			    stride_type,
>  			    stride_type,
> -			    rtype *,
> +			    rtype *out,
>  			    stride_type,
>  			    stride_type,
>  			    stride_type,
> @@ -638,6 +742,8 @@
>  			    length_type,
>  			    length_type)
>    {
> +    FFTW(execute_split_dft_c2r)(plan_by_reference_,
> +			  in.first, in.second, out);
>    }
>  
>  };
> Index: src/vsip/opt/fftw3/create_plan.hpp
> ===================================================================
> --- src/vsip/opt/fftw3/create_plan.hpp	(revision 0)
> +++ src/vsip/opt/fftw3/create_plan.hpp	(revision 0)
> @@ -0,0 +1,224 @@
> +/* Copyright (c) 2007 by CodeSourcery.  All rights reserved.
> +
> +   This file is available for license from CodeSourcery, Inc. under the terms
> +   of a commercial license and under the GPL.  It is not part of the VSIPL++
> +   reference implementation and is not available under the BSD license.
> +*/
> +/** @file    vsip/opt/fftw3/create_plan.hpp
> +    @author  Assem Salama
> +    @date    2007-04-13
> +    @brief   VSIPL++ Library: File that has create_plan struct
> +*/
> +#ifndef VSIP_OPT_FFTW3_CREATE_PLAN_HPP
> +#define VSIP_OPT_FFTW3_CREATE_PLAN_HPP
> +
> +#include <vsip/dense.hpp>
> +
> +#include <vsip/opt/fftw3/fftw_support.hpp>
> +
> +namespace vsip
> +{
> +namespace impl
> +{
> +namespace fftw3
> +{
> +
> +// This is a helper struct to create temporary buffers used durring plan
> +// creation.
> +template <typename complex_type, typename T>
> +struct Cmplx_buffer;
> +
> +// intereaved complex
> +template <typename T>
> +struct Cmplx_buffer<Cmplx_inter_fmt, T>
> +{
> +  std::complex<T> *ptr() { return buffer_.get(); }
> +
> +  Cmplx_buffer(length_type size) : buffer_(32, size)
> +  {}
> +  aligned_array<std::complex<T> > buffer_;
> +};
> +
> +// split complex
> +template <typename T>
> +struct Cmplx_buffer<Cmplx_split_fmt, T>
> +{
> +  Cmplx_buffer(length_type size) :
> +    buffer_r_(32, size),
> +    buffer_i_(32, size)
> +  {}
> +
> +  std::pair<T*,T*> ptr()
> +  { return std::pair<T*, T*>(buffer_r_.get(), buffer_i_.get()); }
> +
> +  aligned_array<T>  buffer_r_;
> +  aligned_array<T>  buffer_i_;
> +};
> +
> +// Convert form axis to tuple
> +template <dimension_type Dim>
> +Rt_tuple tuple_from_axis(int A);
> +
> +template <>
> +Rt_tuple tuple_from_axis<1>(int A) { return Rt_tuple(0,1,2); }
> +template <>
> +Rt_tuple tuple_from_axis<2>(int A) 
> +{
> +  switch (A) {
> +    case 0:  return Rt_tuple(1,0,2);
> +    default: return Rt_tuple(0,1,2);
> +  };
> +}
> +
> +template <>
> +Rt_tuple tuple_from_axis<3>(int A) 
> +{
> +  switch (A) {
> +    case 0:  return Rt_tuple(2,1,0);
> +    case 1:  return Rt_tuple(0,2,1);
> +    default: return Rt_tuple(0,1,2);
> +  };
> +}
> +
> +// This is a helper strcut to create plans
> +template<typename complex_type>
> +struct Create_plan;
> +
> +// interleaved
> +template<>
> +struct Create_plan<vsip::impl::Cmplx_inter_fmt>
> +{
> +
> +  // create function for complex -> complex
> +  template <typename PlanT, typename IodimT,
> +            typename T, dimension_type Dim>
> +  static PlanT
> +  create(std::complex<T>* ptr1, std::complex<T>* ptr2,
> +         int exp, int flags, Domain<Dim> const& size)
> +  {
> +    int sz[Dim],i;
> +    for(i=0;i<Dim;i++) sz[i] = size[i].size();
> +    return create_fftw_plan(Dim, sz, ptr1,ptr2,exp,flags);
> +  }
> +
> +  // create function for real -> complex
> +  template <typename PlanT, typename IodimT,
> +            typename T, dimension_type Dim>
> +  static PlanT
> +  create(T* ptr1, std::complex<T>* ptr2,
> +         int A, int flags, Domain<Dim> const& size)
> +  {
> +    int sz[Dim],i;
> +    for(i=0;i<Dim;i++) sz[i] = size[i].size();
> +    if(A != Dim-1) std::swap(sz[A], sz[Dim-1]);
> +    return create_fftw_plan(Dim,sz,ptr1,ptr2,flags);
> +  }
> +
> +  // create function for complex -> real
> +  template <typename PlanT, typename IodimT,
> +            typename T, dimension_type Dim>
> +  static PlanT
> +  create(std::complex<T>* ptr1, T* ptr2,
> +         int A, int flags, Domain<Dim> const& size)
> +  {
> +    int sz[Dim],i;
> +    for(i=0;i<Dim;i++) sz[i] = size[i].size();
> +    if(A != Dim-1) std::swap(sz[A], sz[Dim-1]);
> +    return create_fftw_plan(Dim,sz,ptr1,ptr2,flags);
> +  }
> +
> +  static rt_complex_type const format = cmplx_inter_fmt;  
> +
> +};
> +
> +// split
> +template<>
> +struct Create_plan<vsip::impl::Cmplx_split_fmt>
> +{
> +
> +  // create for complex -> complex
> +  template <typename PlanT, typename IodimT,
> +            typename T, dimension_type Dim>
> +  static PlanT
> +  create(std::pair<T*,T*> ptr1, std::pair<T*,T*> ptr2,
> +         int exp, int flags, Domain<Dim> const& size)
> +  {
> +    IodimT iodims[Dim];
> +    int i;
> +    Applied_layout<Layout<Dim, typename Row_major<Dim>::type,
> +                   Stride_unit_dense, Cmplx_split_fmt> >
> +    app_layout(size);
> +
> +    for(i=0;i<Dim;i++) 
> +    { 
> +      iodims[i].n = app_layout.size(i);
> +      iodims[i].is = iodims[i].os = app_layout.stride(i);
> +    }
> +
> +    return create_fftw_plan(Dim, iodims, ptr1,ptr2, flags);
> +
> +  }
> +
> +  // create for real -> complex
> +  template <typename PlanT, typename IodimT,
> +            typename T, dimension_type Dim>
> +  static PlanT
> +  create(T *ptr1, std::pair<T*, T*> ptr2, 
> +         int A, int flags, Domain<Dim> const& size)
> +  {
> +    IodimT iodims[Dim];
> +    int i;
> +    Applied_layout<Rt_layout<Dim> >
> +       app_layout(Rt_layout<Dim>(stride_unit_align,
> +                                 tuple_from_axis<Dim>(A),
> +                                 cmplx_split_fmt,
> +                                 0),
> +              size, sizeof(T));
> +
> +
> +    for(i=0;i<Dim;i++) 
> +    { 
> +      iodims[i].n = app_layout.size(i);
> +      iodims[i].is = iodims[i].os = app_layout.stride(i); 
> +    }
> +
> +    return create_fftw_plan(Dim, iodims, ptr1,ptr2, flags);
> +  }
> +
> +  // create for complex -> real
> +  template <typename PlanT, typename IodimT,
> +            typename T, dimension_type Dim>
> +  static PlanT
> +  create(std::pair<T*,T*> ptr1, T* ptr2,
> +         int A, int flags, Domain<Dim> const& size)
> +  {
> +    IodimT iodims[Dim];
> +    int i;
> +    Applied_layout<Rt_layout<Dim> >
> +       app_layout(Rt_layout<Dim>(stride_unit_align,
> +                                 tuple_from_axis<Dim>(A),
> +                                 cmplx_split_fmt,
> +                                 0),
> +              size, sizeof(T));
> +
> +
> +
> +
> +    for(i=0;i<Dim;i++) 
> +    { 
> +      iodims[i].n = app_layout.size(i);
> +      iodims[i].is = iodims[i].os = app_layout.stride(i);
> +    }
> +
> +    return create_fftw_plan(Dim, iodims, ptr1,ptr2, flags);
> +  }
> +
> +  static rt_complex_type const format = cmplx_split_fmt;  
> +};
> +
> +
> +} // namespace vsip::impl::fftw3
> +} // namespace vsip::impl
> +} // namespace vsip
> +
> +#endif // VSIP_OPT_FFTW3_CREATE_PLAN_HPP
> Index: src/vsip/opt/fftw3/fftw_support.hpp
> ===================================================================
> --- src/vsip/opt/fftw3/fftw_support.hpp	(revision 0)
> +++ src/vsip/opt/fftw3/fftw_support.hpp	(revision 0)
> @@ -0,0 +1,92 @@
> +/* Copyright (c) 2007 by CodeSourcery.  All rights reserved.
> +
> +   This file is available for license from CodeSourcery, Inc. under the terms
> +   of a commercial license and under the GPL.  It is not part of the VSIPL++
> +   reference implementation and is not available under the BSD license.
> +*/
> +/** @file    vsip/opt/fftw3/fftw_support.hpp
> +    @author  Assem Salama
> +    @date    2007-04-25
> +    @brief   VSIPL++ Library: File that has overloaded create functions for
> +                              fftw
> +
> +*/
> +#ifndef VSIP_OPT_FFTW3_FFTW_SUPPORT_HPP
> +#define VSIP_OPT_FFTW3_FFTW_SUPPORT_HPP
> +
> +namespace vsip
> +{
> +namespace impl
> +{
> +namespace fftw3
> +{
> +
> +#define DCL_FFTW_PLAN_FUNC_C2C(T, fT) \
> +fT##_plan create_fftw_plan(int dim, int *sz, \
> +                      std::complex<T>* ptr1, std::complex<T>* ptr2,\
> +                      int exp, int flags) \
> +{ return fT##_plan_dft(dim,sz,reinterpret_cast<fT##_complex*>(ptr1), \
> +                     reinterpret_cast<fT##_complex*>(ptr2), exp, flags); \
> +} \
> +\
> +fT##_plan create_fftw_plan(int dim, fT##_iodim *iodim, \
> +                      std::pair<T*,T*> ptr1, std::pair<T*,T*> ptr2,\
> +                      int flags) \
> +{ return fT##_plan_guru_split_dft(dim,iodim,0,NULL, \
> +                            ptr1.first,ptr1.second,ptr2.first,ptr2.second, \
> +                            flags); \
> +}
> +
> +#define DCL_FFTW_PLAN_FUNC_R2C(T, fT) \
> +fT##_plan create_fftw_plan(int dim, int *sz, \
> +                      T* ptr1, std::complex<T>* ptr2,\
> +                      int flags) \
> +{ return fT##_plan_dft_r2c(dim,sz,ptr1, \
> +                     reinterpret_cast<fT##_complex*>(ptr2), flags); \
> +} \
> +\
> +fT##_plan create_fftw_plan(int dim, fT##_iodim *iodim, \
> +                      T* ptr1, std::pair<T*,T*> ptr2,\
> +                      int flags) \
> +{ return fT##_plan_guru_split_dft_r2c(dim,iodim,0,NULL, \
> +                            ptr1,ptr2.first,ptr2.second, \
> +                            flags); \
> +}
> +
> +#define DCL_FFTW_PLAN_FUNC_C2R(T, fT) \
> +fT##_plan create_fftw_plan(int dim, int *sz, \
> +                      std::complex<T>* ptr1, T* ptr2,\
> +                      int flags) \
> +{ return fT##_plan_dft_c2r(dim,sz,reinterpret_cast<fT##_complex*>(ptr1), \
> +                     ptr2, flags); \
> +} \
> +\
> +fT##_plan create_fftw_plan(int dim, fT##_iodim *iodim, \
> +                      std::pair<T*,T*> ptr1, T* ptr2,\
> +                      int flags) \
> +{ return fT##_plan_guru_split_dft_c2r(dim,iodim,0,NULL, \
> +                            ptr1.first,ptr1.second,ptr2, \
> +                            flags); \
> +}
> +
> +#define DCL_FFTW_PLANS(T, fT) \
> +  DCL_FFTW_PLAN_FUNC_C2C(T, fT) \
> +  DCL_FFTW_PLAN_FUNC_R2C(T, fT) \
> +  DCL_FFTW_PLAN_FUNC_C2R(T, fT)
> +
> +
> +#if VSIP_IMPL_PROVIDE_FFT_FLOAT
> +  DCL_FFTW_PLANS(float, fftwf)
> +#endif
> +#if VSIP_IMPL_PROVIDE_FFT_DOUBLE
> +  DCL_FFTW_PLANS(double, fftw)
> +#endif
> +#if VSIP_IMPL_PROVIDE_FFT_LONG_DOUBLE
> +  DCL_FFTW_PLANS(long double, fftwl)
> +#endif
> +
> +} // namespace vsip::impl::fftw3
> +} // namespace vsip::impl
> +} // namespace vsip
> +
> +#endif


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon Jun  4 15:03:10 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 04 Jun 2007 11:03:10 -0400
Subject: [vsipl++] Support for parallel generator blocks
In-Reply-To: <46642880.1070209@codesourcery.com>
References: <46603624.5040002@codesourcery.com> <46642688.8000008@codesourcery.com> <46642880.1070209@codesourcery.com>
Message-ID: <466429AE.9080609@codesourcery.com>

Assem Salama wrote:
> Jules Bergmann wrote:
>> Assem Salama wrote:
>>> Everyone,
>>>  This patch was submitted a while ago but didn't receive any 
>>> feedback. This patch has a Choose_local_block addition that switches 
>>> between Map_subset_block and Subset_block.
>>
>> Assem,
>>
>> Thanks for resending this.  I did have some feedback from the first 
>> time around, I apologize if you did not see it:
>>
>>
>> This looks good, however, can you extend Choose_subblock to handle 
>> Global_map and Replicated_map?  Both maps should be able to use a 
>> Subset_block.
>>
>> Also, you might consider specializing Create_subblock based on the 
>> RetBlock type rather than Map type, since the RetBlock type is what 
>> governs the arguments to the constructor.  As currently written, if 
>> you add a new cases to Choose_subblock (say for Global_map), but 
>> forget to add it to Create_subblock, you'll get an error.
>>
>>                 -- Jules
>>
> Jules,
>  I am confused. This patch does support Global_map and Replicated map...

Assem,

No, I am confused :).  Sorry!  I was looking at feedback from a previous 
version of the patch.  The patch looks good, please check it in.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon Jun  4 15:11:49 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 04 Jun 2007 11:11:49 -0400
Subject: [vsipl++] benchmarks
In-Reply-To: <46603794.4050608@codesourcery.com>
References: <46603794.4050608@codesourcery.com>
Message-ID: <46642BB5.7050102@codesourcery.com>

Assem Salama wrote:
> Everyone,
>  This patch contains two benchmarks, one for expression template stuff 
> and the other for vramp.

Assem,

These look good.

Please check expr.cpp in.

For vramp.cpp, can you
  - use Create_map from benchmarks/create_map.hpp,
  - merge do_test into vramp.cpp, so that vramp.hpp goes away,
and then check in?

				thanks,
				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Tue Jun  5 21:09:16 2007
From: don at codesourcery.com (Don McCoy)
Date: Tue, 05 Jun 2007 15:09:16 -0600
Subject: [vsipl++] [patch] more cleanup with benchmarks
In-Reply-To: <465D836C.8050000@codesourcery.com>
References: <4654C4FC.40300@codesourcery.com> <4655F411.60307@codesourcery.com> <465D836C.8050000@codesourcery.com>
Message-ID: <4665D0FC.1070600@codesourcery.com>

Jules Bergmann wrote:
> Don, this looks good, please check it in. -- Jules
>
Just FYI, this is checked in now.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From assem at codesourcery.com  Wed Jun  6 05:35:29 2007
From: assem at codesourcery.com (Assem Salama)
Date: Wed, 06 Jun 2007 01:35:29 -0400
Subject: Simd unaligned vectors
Message-ID: <466647A1.5060603@codesourcery.com>

Everyone,
  This patch adds support for operations where all vectors are unaligned.

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.06062007.1.log
Type: text/x-log
Size: 9984 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070606/cebc5cf7/attachment.bin>

From jules at codesourcery.com  Wed Jun  6 17:16:34 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 06 Jun 2007 13:16:34 -0400
Subject: [patch] simd.hpp: fix typos, work around ppu-g++
Message-ID: <4666EBF2.9040407@codesourcery.com>


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: simd.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070606/c071c8ef/attachment.ksh>

From jules at codesourcery.com  Wed Jun  6 17:35:14 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 06 Jun 2007 13:35:14 -0400
Subject: [vsipl++] [patch] simd.hpp: fix typos, work around ppu-g++
In-Reply-To: <4666EBF2.9040407@codesourcery.com>
References: <4666EBF2.9040407@codesourcery.com>
Message-ID: <4666F052.30204@codesourcery.com>

Oops, I attached the wrong patch.

Patch applied. -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: simd.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070606/cf39786d/attachment.ksh>

From jules at codesourcery.com  Wed Jun  6 17:41:41 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 06 Jun 2007 13:41:41 -0400
Subject: [patch] Split large tests
Message-ID: <4666F1D5.9060809@codesourcery.com>

This splits several large tests into separate, smaller tests.  The 
smaller tests are easier to compile on machines with limited physical 
memory.

Patch applied.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070606/a91bf845/attachment.ksh>

From don at codesourcery.com  Wed Jun  6 18:33:46 2007
From: don at codesourcery.com (Don McCoy)
Date: Wed, 06 Jun 2007 12:33:46 -0600
Subject: [patch] Fix dot product benchmark for split complex case
Message-ID: <4666FE0A.50504@codesourcery.com>

Ok to commit?

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdot.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070606/c2ed57ec/attachment.ksh>

From jules at codesourcery.com  Wed Jun  6 18:46:12 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 06 Jun 2007 14:46:12 -0400
Subject: [vsipl++] [patch] Fix dot product benchmark for split complex
 case
In-Reply-To: <4666FE0A.50504@codesourcery.com>
References: <4666FE0A.50504@codesourcery.com>
Message-ID: <466700F4.5080606@codesourcery.com>

Don McCoy wrote:
> Ok to commit?

Don,

Instead of ifdef'ing the case out, can you make the test class t_dot2 
check the evaluator's ct_valid flag (the check would be through an 
implicit template parameter / class specialization).  That would be a 
little more robust, in case this code ever gets copied-and-pasted.

				thanks,
				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From hq at export2u.ro  Thu Jun  7 23:14:52 2007
From: hq at export2u.ro (hq at export2u.ro)
Date: Fri, 8 Jun 2007 02:14:52 +0300
Subject: Romanian PHP, Java, ASP, & .NET Software Outsourcing
Message-ID: <eaep.3.0.reg.CorMKN.39241.0924215046@10.101.101.13>

A non-text attachment was scrubbed...
Name: not available
Type: text/plain charset=us-ascii
Size: 4753 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070608/84dd9bcf/attachment.bin>

From jules at codesourcery.com  Mon Jun 11 17:31:56 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 11 Jun 2007 13:31:56 -0400
Subject: [patch] Expression performance optimization
Message-ID: <466D870C.9050108@codesourcery.com>

This patch has several optimizations for expression performance.

For Scalar_blocks, it uses a new shared map for all blocks, instead of 
each block having a local Local_or_global map.  It also removes the 
storage of the Scalar_block's size.  Before these changes, the compiler 
believed that Scalar_blocks had to be stored on the stack.  This added 
significant overhead to expressions using Scalar_blocks.

For unary, binary, and ternary functions defined in fns_elementwise, it 
passes views by const reference, instead of by value.  This avoids the 
need to increment/decrement reference counts, which add significant 
overhead for small vector sizes.

For the mul binary function, it uses the op::Mult functor instead of 
creating a redundant mul_functor.  mul_functor was functionally 
equivalent, but math library evaluators (such as SAL, IPP, and builtin 
SIMD) did not recognize it.  Similar changes need to be made for other 
functions that correspond to an operator.

Currently testing.  Will post some examples of improved performance in a 
bit.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: opt.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070611/d4114e5b/attachment.ksh>

From don at codesourcery.com  Mon Jun 11 22:37:35 2007
From: don at codesourcery.com (Don McCoy)
Date: Mon, 11 Jun 2007 16:37:35 -0600
Subject: [vsipl++] [patch] Fix dot product benchmark for split complex
 case
In-Reply-To: <466700F4.5080606@codesourcery.com>
References: <4666FE0A.50504@codesourcery.com> <466700F4.5080606@codesourcery.com>
Message-ID: <466DCEAF.4040108@codesourcery.com>

Jules Bergmann wrote:
> Instead of ifdef'ing the case out, can you make the test class t_dot2 
> check the evaluator's ct_valid flag (the check would be through an 
> implicit template parameter / class specialization).  That would be a 
> little more robust, in case this code ever gets copied-and-pasted.
Here is a revised patch.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdot2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070611/516f9b2d/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdot2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070611/516f9b2d/attachment-0001.ksh>

From jules at codesourcery.com  Tue Jun 12 14:23:48 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Jun 2007 10:23:48 -0400
Subject: [vsipl++] [patch] Fix dot product benchmark for split complex
 case
In-Reply-To: <466DCEAF.4040108@codesourcery.com>
References: <4666FE0A.50504@codesourcery.com> <466700F4.5080606@codesourcery.com> <466DCEAF.4040108@codesourcery.com>
Message-ID: <466EAC74.1030706@codesourcery.com>

Don McCoy wrote:
> Jules Bergmann wrote:
>> Instead of ifdef'ing the case out, can you make the test class t_dot2 
>> check the evaluator's ct_valid flag (the check would be through an 
>> implicit template parameter / class specialization).  That would be a 
>> little more robust, in case this code ever gets copied-and-pasted.
> Here is a revised patch.

This looks good, please check it in.  thanks, -- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Tue Jun 12 14:55:16 2007
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 12 Jun 2007 10:55:16 -0400
Subject: patch: fix merge conflicts
Message-ID: <466EB3D4.6080505@codesourcery.com>

The attached patch fixes some conflicts seemingly introduced by two overlapping
patches / commits last week.

(Assem: please be careful when applying 'svn resolved'. There were some artifacts
(such as "<<<< .mine") as well as conflicting code checked in with the last commit.)

The new Create_plan harness uses Stride_unit_align everywhere (there was one place
with Stride_unit_dense, that looked like a typo). I'm not sure this is required,
as we only stipulate aligned input for 1D FFTs. Thus, the current code may require
the data to be copied without need.
Should I instead add an overload for Create_plan::create() for 1D FFTs and relax
the alignment for non-1D cases to Stride_unit_dense ?

Thanks,
		Stefan


-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fftw3.patch
Type: text/x-patch
Size: 4729 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070612/bb9fdc6e/attachment.bin>

From assem at codesourcery.com  Tue Jun 12 16:23:01 2007
From: assem at codesourcery.com (Assem Salama)
Date: Tue, 12 Jun 2007 12:23:01 -0400
Subject: [vsipl++] patch: fix merge conflicts
In-Reply-To: <466EB3D4.6080505@codesourcery.com>
References: <466EB3D4.6080505@codesourcery.com>
Message-ID: <466EC865.6060600@codesourcery.com>

Stefan Seefeld wrote:

>The attached patch fixes some conflicts seemingly introduced by two overlapping
>patches / commits last week.
>
>(Assem: please be careful when applying 'svn resolved'. There were some artifacts
>(such as "<<<< .mine") as well as conflicting code checked in with the last commit.)
>  
>
Sorry about that. I actually fixed this same file yesterday but didn't 
get arround to generating a patch.

Thanks,
Assem


From jules at codesourcery.com  Tue Jun 12 16:56:59 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Jun 2007 12:56:59 -0400
Subject: [vsipl++] patch: fix merge conflicts
In-Reply-To: <466EB3D4.6080505@codesourcery.com>
References: <466EB3D4.6080505@codesourcery.com>
Message-ID: <466ED05B.4040405@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch fixes some conflicts seemingly introduced by two overlapping
> patches / commits last week.
> 
> (Assem: please be careful when applying 'svn resolved'. There were some artifacts
> (such as "<<<< .mine") as well as conflicting code checked in with the last commit.)
> 
> The new Create_plan harness uses Stride_unit_align everywhere (there was one place
> with Stride_unit_dense, that looked like a typo). I'm not sure this is required,
> as we only stipulate aligned input for 1D FFTs. Thus, the current code may require
> the data to be copied without need.
> Should I instead add an overload for Create_plan::create() for 1D FFTs and relax
> the alignment for non-1D cases to Stride_unit_dense ?

Stefan,

Thanks for fixing this.

Do you consider FFTM to be 1D or non-1D?

The reason I ask is ...

We implement FFTM by planning for a single 1D FFT (we do this, rather 
than planning for multiple 1D FFTs, because distributed data may cause 
the local multiple count to be different from the global multiple count).

Ideally this single 1D FFT should be planned for aligned data.

We can relax that when the FFT size is not a multiple of the alignment. 
  I.e. if we're doing multiple 257-point FFTs, we can plan for unaligned 
data.

Does that sound reasonable?

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Tue Jun 12 17:09:23 2007
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 12 Jun 2007 13:09:23 -0400
Subject: [vsipl++] patch: fix merge conflicts
In-Reply-To: <466ED05B.4040405@codesourcery.com>
References: <466EB3D4.6080505@codesourcery.com> <466ED05B.4040405@codesourcery.com>
Message-ID: <466ED343.8000002@codesourcery.com>

Jules Bergmann wrote:

> Do you consider FFTM to be 1D or non-1D?

Well, the distinction is only required if we use different alignment
constraints. For planning we use FFTW_UNALIGNED for all but 1D FFT
(i.e. 2D, 3D, as well as M). With assem's patch (and my little fix)
we use Stride_unit_align throughout, which may be overly restrictive,
given that Stride_unit_dense would be perfectly valid for non-1D cases,
so we may end up doing a redundant copy (well, two, actually).

> The reason I ask is ...
> 
> We implement FFTM by planning for a single 1D FFT (we do this, rather
> than planning for multiple 1D FFTs, because distributed data may cause
> the local multiple count to be different from the global multiple count).
> 
> Ideally this single 1D FFT should be planned for aligned data.

Right, understood. We don't do that yet, though this only seems to
require a change to the Fftm_impl constructor's call to Fft_base<>(),
where we would no longer pass FFTW_UNALIGNED.

> We can relax that when the FFT size is not a multiple of the alignment.
>  I.e. if we're doing multiple 257-point FFTs, we can plan for unaligned
> data.
> 
> Does that sound reasonable?

Indeed. Should I add my suggested change above to the patch before checking
it in ?

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From jules at codesourcery.com  Tue Jun 12 17:17:25 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Jun 2007 13:17:25 -0400
Subject: [vsipl++] patch: fix merge conflicts
In-Reply-To: <466ED343.8000002@codesourcery.com>
References: <466EB3D4.6080505@codesourcery.com> <466ED05B.4040405@codesourcery.com> <466ED343.8000002@codesourcery.com>
Message-ID: <466ED525.8060807@codesourcery.com>


> 
> Indeed. Should I add my suggested change above to the patch before checking
> it in ?

Yes, that sounds good.  I suspect we'll have to do something different 
if people ever start using multi-dim FFTs, but for now let's avoid the 
copy.  -- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Tue Jun 12 21:20:48 2007
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 12 Jun 2007 17:20:48 -0400
Subject: [vsipl++] patch: fix merge conflicts
In-Reply-To: <466ED525.8060807@codesourcery.com>
References: <466EB3D4.6080505@codesourcery.com> <466ED05B.4040405@codesourcery.com> <466ED343.8000002@codesourcery.com> <466ED525.8060807@codesourcery.com>
Message-ID: <466F0E30.9090004@codesourcery.com>

Jules Bergmann wrote:
> 
>>
>> Indeed. Should I add my suggested change above to the patch before
>> checking
>> it in ?
> 
> Yes, that sounds good.  I suspect we'll have to do something different
> if people ever start using multi-dim FFTs, but for now let's avoid the
> copy.  -- Jules

Here is a new patch, incorporating the changes we discussed. 1D FFT as
well as FFTM now use / require aligned blocks if the block size is a multiple
of the alignment size (and thus individual rows operations can be vectorized).

(Since the patch is slightly more involved than I originally assumed, I'd
 prefer another round of review.)

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fftw.patch
Type: text/x-patch
Size: 10484 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070612/6a897076/attachment.bin>

From jules at codesourcery.com  Wed Jun 13 02:52:07 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 12 Jun 2007 22:52:07 -0400
Subject: [vsipl++] patch: fix merge conflicts
In-Reply-To: <466F0E30.9090004@codesourcery.com>
References: <466EB3D4.6080505@codesourcery.com> <466ED05B.4040405@codesourcery.com> <466ED343.8000002@codesourcery.com> <466ED525.8060807@codesourcery.com> <466F0E30.9090004@codesourcery.com>
Message-ID: <466F5BD7.7070606@codesourcery.com>

Stefan Seefeld wrote:
> Jules Bergmann wrote:
>>> Indeed. Should I add my suggested change above to the patch before
>>> checking
>>> it in ?
>> Yes, that sounds good.  I suspect we'll have to do something different
>> if people ever start using multi-dim FFTs, but for now let's avoid the
>> copy.  -- Jules
> 
> Here is a new patch, incorporating the changes we discussed. 1D FFT as
> well as FFTM now use / require aligned blocks if the block size is a multiple
> of the alignment size (and thus individual rows operations can be vectorized).
> 
> (Since the patch is slightly more involved than I originally assumed, I'd
>  prefer another round of review.)
> 
> Thanks,
> 		Stefan

Stefan, this looks good.  I like the way you have made aligned/unaligned 
orthogonal in Fft_base.  Please check it in.

				thanks,
				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From assem at codesourcery.com  Wed Jun 13 16:12:39 2007
From: assem at codesourcery.com (Assem Salama)
Date: Wed, 13 Jun 2007 12:12:39 -0400
Subject: SIMD unaligned loop fusion
Message-ID: <46701777.5010809@codesourcery.com>

Everyone,
  This patch makes a new dispatcher that is valid when all operands are 
unaligned. I could make the normal Simd_loop_fusion dispatcher for this 
and if the alignment is 0, don't do the initial cleanup. What does 
everyone think about that?

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.06132007.1.log
Type: text/x-log
Size: 11469 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070613/ef513801/attachment.bin>

From jules at codesourcery.com  Fri Jun 15 11:39:44 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 15 Jun 2007 07:39:44 -0400
Subject: [patch] Fix missing tags and traits
Message-ID: <46727A80.5080907@codesourcery.com>

This should fix the non-FFT test failures.

I'm looking into the FFT failures now.  From the location of the assert 
failures, it looks like complex->real FFT is broken for FFTW3.  Does 
that case ring any bells?

Patch applied.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc-fix.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070615/7762b2af/attachment.ksh>

From jules at codesourcery.com  Sat Jun 16 05:32:55 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Sat, 16 Jun 2007 01:32:55 -0400
Subject: [patch] Fix Rt_layout/Rt_extdata/Fftw3 BE
Message-ID: <46737607.5090607@codesourcery.com>

This patch fixes a couple of bugs

  - First, it fixes Applied_layout<Rt_layout<Dim> > (used by
    Rt_extdata) to only pay attention to alignment
    when the pack type is stride_unit_aligned.  Previously it
    adjusted alignment when it was non-zero.  (This is probably
    the only fix strictly necessary to fix the test failures).

  - Second, it robustifies the FFT workspace frontend and
    the FFTW3 BE to deal with alignment requirements.

    For workspace, before constructing temporary input/output
    buffers, the backend is not queried to determine the
    acceptable layout.  This will handle cases
    where padding is needed to fix alignment.

    For the FFTW3 BE, the 2D FFTM implementation class now
    uses the stride arguments to determine the FFT to FFT
    stride (instead of the size).  This lets it deal with
    non-dense but unit-stride matrices.  (Right now query_layout
    guarantees that input/output matrices will be unit-stride,
    but in the future we can relax this).

I'm going to start a snapshot build with this.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fftw3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070616/41ecf30e/attachment.ksh>

From jules at codesourcery.com  Mon Jun 18 02:56:50 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 17 Jun 2007 22:56:50 -0400
Subject: [patch] Fix FFTW3 BE alignment for R-to-C and C-to-R plan creation
Message-ID: <4675F472.8010408@codesourcery.com>

Patch applied.
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc-fft.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070617/4c4529d5/attachment.ksh>

From assem at codesourcery.com  Mon Jun 18 16:03:46 2007
From: assem at codesourcery.com (Assem Salama)
Date: Mon, 18 Jun 2007 12:03:46 -0400
Subject: SIMD all unaligned dispatch
Message-ID: <4676ACE2.50104@codesourcery.com>

Everyone,
  This patch includes some missing pieces not included in previous 
patch. This should make a fresh checkout compile ok :) I apologize for 
last patch's incompleteness.

Thanks,
Assem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svn.diff.06172007.1.log
Type: text/x-log
Size: 13313 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070618/cd501c01/attachment.bin>

From jules at codesourcery.com  Mon Jun 18 20:30:56 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 18 Jun 2007 16:30:56 -0400
Subject: [patch] Fix scalar_blocks to work with GCC 3.4
Message-ID: <4676EB80.9090807@codesourcery.com>

This patch scales back the scalar-block optimization a bit when using 
GCC 3.x (anything pre 4.0).

GCC 3.4.4 was having trouble compiling expressions like this one from 
threshold.cpp:

	Vector<float> A, C;
	float b = 0.5;

	C = ite(A >= b, A, 0.0)

The scalar value for b (0.5) was being replaced by the scalar value 
(0.0).  Any of the following changes made the error go away:

  - compiling with lower optimization levels
  - compiling with less aggressive inlining options
  - using printfs to examine the values stored in the
    expression blocks
  - using later versions of GCC

Similar errors occured in the coverage_ternary_*.cpp tests, but were 
even more difficult to debug because most attempts to simplify the test 
case or print debugging information caused the error go away.

The fix adds a copy constructor equivalent to the default copy 
constructor, which IIUC forces GCC to store scalar_blocks (and any 
expression containing scalar_blocks) on the stack.

This patch also separates benchmark installation into a separate rule. 
The rationale for this is that building the benchmarks takes much longer 
than building the core library and installing the benchmarks isn't 
necessary to use the library.

Patch applied, snapshot started!

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fix-sb.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070618/545823ca/attachment.ksh>

From Aramini at LL.MIT.edu  Tue Jun 19 00:59:39 2007
From: Aramini at LL.MIT.edu (Michael Aramini)
Date: Mon, 18 Jun 2007 20:59:39 -0400
Subject: configure fails on Solaris on Intel/AMD architecture
Message-ID: <46772A7B.9090202@LL.MIT.edu>

When attempting to build Sourcery VSIPL++ 1.3 on Solaris running on a
system with 64-bit AMD processors, configure fails as follows:
> ATLAS: CC  gcc
> ATLAS: F77
> ATLAS: CFLAGS -g -O2
> checking build system type... i386-pc-solaris2.10
> checking host system type... i386-pc-solaris2.10
> checking for i386-pc-solaris2.10-gcc... gcc
> checking for C compiler default output file name... a.out
> checking whether the C compiler works... yes
> checking whether we are cross compiling... no
> checking for suffix of executables...
> checking for suffix of object files... o
> checking whether we are using the GNU C compiler... yes
> checking whether gcc accepts -g... yes
> checking for gcc option to accept ANSI C... none needed
> checking for machine type... probe
> checking for asm style... configure: error: cannot determine asm type.
> ===============================================================
> configure: error: built-in ATLAS configure FAILED.

-Michael Aramini


From jules at codesourcery.com  Tue Jun 19 12:34:52 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 19 Jun 2007 08:34:52 -0400
Subject: [vsipl++] SIMD all unaligned dispatch
In-Reply-To: <4676ACE2.50104@codesourcery.com>
References: <4676ACE2.50104@codesourcery.com>
Message-ID: <4677CD6C.3060207@codesourcery.com>

Assem Salama wrote:
 > Everyone,
 >  This patch includes some missing pieces not included in previous patch.
 > This should make a fresh checkout compile ok :) I apologize for last
 > patch's incompleteness.

Assem,

What is the reason for extending the length of type_list?  Is that
needed for this patch?

Rather than add a new evaluator ("all unaligned"), I would like to
have a single evaluator handle the cases where views have the same
alignment (whether it is 0 or N).  The only difference between the two
is cleanup code before SIMD processing.  Can you make that change and
repost a patch?

Also, did you have a chance to benchmark the iterator change (#4)
below?

				-- Jules


 >
 > Thanks,
 > Assem
 >
 >
 > ------------------------------------------------------------------------
 >

 > Index: src/vsip/opt/simd/expr_evaluator.hpp
 > ===================================================================

 > +  static bool rt_valid(LB& lhs, RB const& rhs)
 > +  {
 > +    Ext_data<LB, layout_type> dda(lhs, SYNC_OUT);
 > +    int lhs_a = simd::Proxy_factory<LB,       true>::alignment(lhs);

[1] Instead of calling Proxy_factory::alignment (which internally
creates another Ext_data object -- which is both extra overhead and
potentially undefined), use Simd_traits::alignment_of directly.

 > +    return (dda.stride(0) == 1 &&
 > +            simd::Proxy_factory<RB, true>::rt_valid(rhs, lhs_a));
 > +
 > +
 > +  }
 > +

 > +    // First, deal with unaligned pointers
 > +    typename Ext_data<LB, layout_type>::raw_ptr_type  raw_ptr = 
dda.data();
 > +    while(simd::Simd_traits<typename 
LB::value_type>::alignment_of(raw_ptr) &&
 > +          n > 0)
 > +    {
 > +      lhs.put(size-n, rhs.get(size-n));
 > +      n--;
 > +      raw_ptr++;
 > +    }

[2] What updates the pointers held by lp and rp?  They are still
unaligned, right?

Ah, I see.  You've changed Proxy::Proxy to force alignment below.


 > Index: src/vsip/opt/simd/eval_generic.hpp
 > ===================================================================
 > --- src/vsip/opt/simd/eval_generic.hpp	(revision 174261)
 > +++ src/vsip/opt/simd/eval_generic.hpp	(working copy)
 > @@ -664,6 +664,8 @@
 >
 >    static bool rt_valid(DstBlock& dst, SrcBlock const& src)
 >    {
 > +    typedef simd::Simd_traits<typename SrcBlock::value_type> simd;
 > +
 >      // check if all data is unit stride
 >      Ext_data<DstBlock, dst_lp>     ext_dst(dst,              SYNC_OUT);
 >      Ext_data<Block1,   a_lp>       ext_a(src.first().left(), SYNC_IN);
 > @@ -672,7 +674,11 @@
 >             ext_a.stride(0) == 1 &&
 >  	   ext_b.stride(0) == 1 &&
 >  	   // make sure (A op B, A, k)
 > -	   (&(src.first().left()) == &(src.second())));
 > +	   (&(src.first().left()) == &(src.second())) &&
 > +	   // make sure everyting is aligned!
 > +	   !simd::alignment_of(ext_dst.data()) &&
 > +	   !simd::alignment_of(ext_a.data()) &&
 > +	   !simd::alignment_of(ext_b.data()));

[3] Doesn't threshold handle initial unaligned values?  If so, it is
sufficient to check that dst, a, and b all have the same alignment.


 >    static void exec(DstBlock& dst, SrcBlock const& src)
 > Index: src/vsip/opt/simd/expr_iterator.hpp
 > ===================================================================
 > --- src/vsip/opt/simd/expr_iterator.hpp	(revision 174261)
 > +++ src/vsip/opt/simd/expr_iterator.hpp	(working copy)
 > @@ -268,13 +268,14 @@
 >    simd_type load() const
 >    { return simd::perm(x0_, x1_, sh_); }
 >
 > -  void increment(length_type n = 1)
 > +  //void increment(length_type n = 1)
 > +  void increment()
 >    {
 > -    ptr_unaligned_ += n * Simd_traits<value_type>::vec_size;
 > -    ptr_aligned_   += n;
 > +    ptr_unaligned_ += Simd_traits<value_type>::vec_size;
 > +    ptr_aligned_++;
 >
 >      // update x0
 > -    x0_ = (n == 1)? x1_:simd::load((value_type*)ptr_aligned_);
 > +    x0_ = x1_;

[4] Did you ever benchmark the difference between these two?

 >
 > -  Proxy(value_type *ptr) : ptr_(ptr) {}
 > +  Proxy(value_type *ptr) : ptr_(ptr)
 > +  {
 > +    // Force alignment of pointer.
 > +    intptr_t int_ptr = (intptr_t)ptr_;
 > +    int_ptr &= ~(Simd_traits<value_type>::alignment-1);
 > +    ptr_ = (value_type*) int_ptr;
 > +  }
 > +

[5] For LValue_access_traits, this ignores the IsAligned template
parameter.  since we appear to only handle the case where the LHS
is aligned, we should specialize this for IsAligned = true.


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue Jun 19 12:45:34 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 19 Jun 2007 08:45:34 -0400
Subject: [vsipl++] SIMD all unaligned dispatch
In-Reply-To: <4677CD6C.3060207@codesourcery.com>
References: <4676ACE2.50104@codesourcery.com> <4677CD6C.3060207@codesourcery.com>
Message-ID: <4677CFEE.7010309@codesourcery.com>

Assem,

Also, can you include a unit test for this in your next patch?

				thanks,
				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Wed Jun 20 19:18:04 2007
From: don at codesourcery.com (Don McCoy)
Date: Wed, 20 Jun 2007 13:18:04 -0600
Subject: [patch] fix for MPI type define
Message-ID: <46797D6C.7050504@codesourcery.com>

This patch corrects a minor typo related to the location of the mpi.h 
header file.

Ok to commit?

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mh.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070620/cc6dd2f4/attachment.ksh>

From jules at codesourcery.com  Wed Jun 20 22:44:05 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 20 Jun 2007 18:44:05 -0400
Subject: [vsipl++] [patch] fix for MPI type define
In-Reply-To: <46797D6C.7050504@codesourcery.com>
References: <46797D6C.7050504@codesourcery.com>
Message-ID: <4679ADB5.30008@codesourcery.com>

Don McCoy wrote:
> This patch corrects a minor typo related to the location of the mpi.h 
> header file.
> 
> Ok to commit?

Don,

Looks good, please check it in.  Can you mention the macro name in the 
ChangeLog?  Thanks for catching this.

The obvious response is to wonder how this ever worked.  IIRC we used to 
create a pound-define with the MPI header name, i.e.

	#define VSIP_IMPL_MPI_H <mpi/mpi.h>

and would then include it

	#include VSIP_IMPL_MPI_H

However, this did not work with Intel C++.  The fix was to use 
VSIP_IMPL_MPI_H_TYPE, but apparently we did not test the mpi/mpi.h case 
after making the change!

				-- Jules

> 
> 
> ------------------------------------------------------------------------
> 
> Index: ChangeLog
> ===================================================================
> --- ChangeLog	(revision 174589)
> +++ ChangeLog	(working copy)
> @@ -1,3 +1,8 @@
> +2007-06-20  Don McCoy  <don at codesourcery.com>
> +
> +	* src/vsip/core/mpi/services.hpp: Fix typo for systems having
> +	  their MPI header files in the mpi/ subdirectory.

Can you mention the macro name:

	* src/vsip/core/mpi/services.hpp (VSIP_IMPL_MPI_H_TYPE): ...


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon Jun 25 22:31:42 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 25 Jun 2007 18:31:42 -0400
Subject: Using FFTW3 with the VSIPL++ Reference Implementation
Message-ID: <4680424E.1050309@codesourcery.com>

Here is a description of how to use FFTW3 with the VSIPL++ reference 
implementation.

These instructions are only for the reference implementation.  FFTW3 
already works with the optimized implementation (just configure with 
--enable-fft=fftw3 or --enable-fft=builtin).

Please let me know if you have any questions regarding these instructions.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FFTW3_RefImpl.pdf
Type: application/pdf
Size: 86944 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070625/3ba120a0/attachment.pdf>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 1.3-ri-fftw3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070625/3ba120a0/attachment.ksh>

From assem at codesourcery.com  Tue Jun 26 07:16:24 2007
From: assem at codesourcery.com (Assem Salama)
Date: Tue, 26 Jun 2007 03:16:24 -0400
Subject: reductions-idx
Message-ID: <4680BD48.6000004@codesourcery.com>

Everyone,
  This patch fixes a failure in reduction-idx test.

Thanks,
Assem
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: svn.diff.06262007.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070626/22d6bee3/attachment.ksh>

From assem at codesourcery.com  Tue Jun 26 13:27:52 2007
From: assem at codesourcery.com (Assem Salama)
Date: Tue, 26 Jun 2007 09:27:52 -0400
Subject: Vacation
Message-ID: <46811458.8040601@codesourcery.com>

Everyone,
  I will taking this week off and work half time for next two weeks 
after that. I will have access to interent.

Thanks,
Assem


From John.Day at EssexCorp.com  Tue Jun 26 23:18:09 2007
From: John.Day at EssexCorp.com (Day, John)
Date: Tue, 26 Jun 2007 19:18:09 -0400
Subject: fftm compile problem
Message-ID: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com>

Hello,

I am trying out vsipl++ 1.3 !A32 binary for Windows/XP and am having a
problem instantiating the fftm templates in your fft.cpp code example
(from the source distribution).

I using MinGW with g++ 3.4.5 and I am getting "error: no type named
`first' in `struct vsip::impl::fft::LibraryTagList'"  (see below for all
messages). I suspect this has something to do with
vsip/opt/dispatch.hpp, but I don't know enough about the architecture to
figure out how to fix it.

 
I have also tried to compile Judd and Cottel vsipl++ beamformer and got
the same error for the fftm template instantiation:
http://hpec-si.com/MinimumVarianceBeamformerExample.pdf Everything else
compiled without error.

 
I know that you recommend the Intel compiler for the Windows binary, but
I'm hoping that g++ 3.4.5 might work with MinGW as I assume it does
under Linux. Has anyone else tried this? Also, has anyone built the
vsipl++ source using MinGW alone (i.e. without Cygwin)? [MinGW provides
a convenient environment for linking gcc/g++/g77 to MSVC runtime, Cygwin
dll's are not required] 

 
Thanks,

John Day

Staff Scientist

Essex Corp.

Melbourne, Fl

 
> g++ -c -I/usr/local/include fft.cpp

 
/usr/local/include/vsip/core/fft.hpp: In instantiation of
`vsip::impl::fft_facade<1u, vsip::cscalar_f, vsip::cscalar_f,
vsip::impl::fft::LibraryTagList, -0x000000002,  by_value, 0u,
alg_time>':

/usr/local/include/vsip/core/fft.hpp:432:   instantiated from
`vsip::Fft<vsip::const_Vector, vsip::cscalar_f, vsip::cscalar_f,
-0x000000002,  by_value, 0u,  alg_time>'

fft.cpp:45:   instantiated from here

/usr/local/include/vsip/core/fft.hpp:187: error: no type named `first'
in `struct vsip::impl::fft::LibraryTagList'

 
/usr/local/include/vsip/core/fft.hpp: In instantiation of
`vsip::impl::fft_facade<1u, vsip::cscalar_f, vsip::cscalar_f,
vsip::impl::fft::LibraryTagList, -0x000000001,  by_value, 0u,
alg_time>':

/usr/local/include/vsip/core/fft.hpp:432:   instantiated from
`vsip::Fft<vsip::const_Vector, vsip::cscalar_f, vsip::cscalar_f,
-0x000000001,  by_value, 0u,  alg_time>'

fft.cpp:46:   instantiated from here

/usr/local/include/vsip/core/fft.hpp:187: error: no type named `first'
in `struct vsip::impl::fft::LibraryTagList'

 
/usr/local/include/vsip/core/fft.hpp: In constructor
`vsip::impl::fft_facade<D, I, O, L, S,  by_value, N,
H>::fft_facade(const vsip::Domain<D>&, typename
vsip::impl::fft::base_interface<D, I, O, vsip::impl::fft_facade<D, I, O,
L, S,  by_value, N, H>::axis, vsip::impl::fft_facade<D, I, O, L, S,
by_value, N, H>::exponent>::scalar_type) [with unsigned int D = 1u, I =
vsip::cscalar_f, O = vsip::cscalar_f, L =
vsip::impl::fft::LibraryTagList, int S = -0x000000002, unsigned int N =
0u, vsip::alg_hint_type H =  alg_time]':

/usr/local/include/vsip/core/fft.hpp:439:   instantiated from
`vsip::Fft<V, I, O, S, R, N, H>::Fft(const vsip::Domain<vsip::Fft<V, I,
O, S, R, N, H>::dim>&, typename vsip::impl::fft_facade<vsip::Fft<V, I,
O, S, R, N, H>::dim, I, O, vsip::impl::fft::LibraryTagList, S, R, N,
H>::scalar_type) [with V = vsip::const_Vector, I = vsip::cscalar_f, O =
vsip::cscalar_f, int S = -0x000000002, vsip::return_mechanism_type R =
by_value, unsigned int N = 0u, vsip::alg_hint_type H =  alg_time]'

fft.cpp:45:   instantiated from here

/usr/local/include/vsip/core/fft.hpp:199: error: no type named `first'
in `struct vsip::impl::fft::LibraryTagList'

 
/usr/local/include/vsip/core/fft.hpp: In constructor
`vsip::impl::fft_facade<D, I, O, L, S,  by_value, N,
H>::fft_facade(const vsip::Domain<D>&, typename
vsip::impl::fft::base_interface<D, I, O, vsip::impl::fft_facade<D, I, O,
L, S,  by_value, N, H>::axis, vsip::impl::fft_facade<D, I, O, L, S,
by_value, N, H>::exponent>::scalar_type) [with unsigned int D = 1u, I =
vsip::cscalar_f, O = vsip::cscalar_f, L =
vsip::impl::fft::LibraryTagList, int S = -0x000000001, unsigned int N =
0u, vsip::alg_hint_type H =  alg_time]':

/usr/local/include/vsip/core/fft.hpp:439:   instantiated from
`vsip::Fft<V, I, O, S, R, N, H>::Fft(const vsip::Domain<vsip::Fft<V, I,
O, S, R, N, H>::dim>&, typename vsip::impl::fft_facade<vsip::Fft<V, I,
O, S, R, N, H>::dim, I, O, vsip::impl::fft::LibraryTagList, S, R, N,
H>::scalar_type) [with V = vsip::const_Vector, I = vsip::cscalar_f, O =
vsip::cscalar_f, int S = -0x000000001, vsip::return_mechanism_type R =
by_value, unsigned int N = 0u, vsip::alg_hint_type H =  alg_time]'

fft.cpp:46:   instantiated from here

/usr/local/include/vsip/core/fft.hpp:199: error: no type named `first'
in `struct vsip::impl::fft::LibraryTagList'
 
 
This electronic message and any files transmitted with it contain information which may be privileged and/or proprietary. The information is intended for use solely by the intended recipient(s). If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this information is prohibited. If you have received this electronic message in error, please advise the sender by reply email or by telephone (301-939-7000) and delete the message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070626/aab7f686/attachment.html>

From stefan at codesourcery.com  Tue Jun 26 23:27:33 2007
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 26 Jun 2007 19:27:33 -0400
Subject: [vsipl++] fftm compile problem
In-Reply-To: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com>
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com>
Message-ID: <4681A0E5.6010707@codesourcery.com>

Hello John,

Day, John wrote:
> Hello,
> 
> I am trying out vsipl++ 1.3 !A32 binary for Windows/XP and am having a
> problem instantiating the fftm templates in your fft.cpp code example
> (from the source distribution).
> 
> I using MinGW with g++ 3.4.5 and I am getting ?error: no type named
> `first' in `struct vsip::impl::fft::LibraryTagList'?  (see below for all
> messages). I suspect this has something to do with
> vsip/opt/dispatch.hpp, but I don?t know enough about the architecture to
> figure out how to fix it.

You are right, the problem is related to the dispatch mechanism we use to
delegate calls to backends. In this case, it sounds as if you haven't
configured any fft backends.
You mention the windows binary release, which is configured / compiled
for use with Intel's IPP and MKL libraries. But then you are talking about
the source distribution, and mingw. To help you a little further it
is important to know what Sourcery VSIPL++ package you use, and how exactly
the command looks that raises the error. Are you configuring / building
Sourcery VSIPL++ yourself ? What commands are you using ?

Thanks,
		Stefan


-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From John.Day at EssexCorp.com  Wed Jun 27 01:07:00 2007
From: John.Day at EssexCorp.com (Day, John)
Date: Tue, 26 Jun 2007 21:07:00 -0400
Subject: [vsipl++] fftm compile problem
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com>
Message-ID: <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com>

Stefan wrote:
>> You mention the windows binary release, which is configured / compiled
>> for use with Intel's IPP and MKL libraries. But then you are talking about
>> the source distribution, and mingw. To help you a little further it
>> is important to know what Sourcery VSIPL++ package you use .....
 
At first I tried to build the source distribution using MinGW and g++ 3.4.5, but the build failed trying to configure ATLAS and I was not able to produce the config files.
 
So then I tried using the IA32 binary, just to see if I could compile the example fft.cpp (and the BeamformEx files) from the MS-DOS command line:
 
> g++ -c -I/usr/local/include fft.cpp
 
That's when the error occurred. I did not configure the fft backends or anything else. Nor did I expect the link step to work because there are no .a libraries in the Windows binary. 
 
I suppose I will have to set up a Cygwin environment, but I was hoping that MinGW alone would work.
 
Tnx,
John Day

________________________________

From: Stefan Seefeld [mailto:stefan at codesourcery.com]
Sent: Tue 6/26/2007 7:27 PM
To: Day, John
Cc: vsipl++ at codesourcery.com
Subject: Re: [vsipl++] fftm compile problem


Hello John,

Day, John wrote:
> Hello,
>
> I am trying out vsipl++ 1.3 !A32 binary for Windows/XP and am having a
> problem instantiating the fftm templates in your fft.cpp code example
> (from the source distribution).
>
> I using MinGW with g++ 3.4.5 and I am getting "error: no type named
> `first' in `struct vsip::impl::fft::LibraryTagList'"  (see below for all
> messages). I suspect this has something to do with
> vsip/opt/dispatch.hpp, but I don't know enough about the architecture to
> figure out how to fix it.

You are right, the problem is related to the dispatch mechanism we use to
delegate calls to backends. In this case, it sounds as if you haven't
configured any fft backends.
You mention the windows binary release, which is configured / compiled
for use with Intel's IPP and MKL libraries. But then you are talking about
the source distribution, and mingw. To help you a little further it
is important to know what Sourcery VSIPL++ package you use, and how exactly
the command looks that raises the error. Are you configuring / building
Sourcery VSIPL++ yourself ? What commands are you using ?

Thanks,
                Stefan


--
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
 
 
This electronic message and any files transmitted with it contain information which may be privileged and/or proprietary. The information is intended for use solely by the intended recipient(s). If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this information is prohibited. If you have received this electronic message in error, please advise the sender by reply email or by telephone (301-939-7000) and delete the message.


From stefan at codesourcery.com  Wed Jun 27 02:28:39 2007
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 26 Jun 2007 22:28:39 -0400
Subject: [vsipl++] fftm compile problem
In-Reply-To: <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com>
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com>
Message-ID: <4681CB57.50905@codesourcery.com>

Day, John wrote:
> Stefan wrote:
>>> You mention the windows binary release, which is configured / compiled
>>> for use with Intel's IPP and MKL libraries. But then you are talking about
>>> the source distribution, and mingw. To help you a little further it
>>> is important to know what Sourcery VSIPL++ package you use .....
>  
> At first I tried to build the source distribution using MinGW and g++ 3.4.5, but the build failed trying to configure ATLAS and I was not able to produce the config files.

Right, configuring ATLAS is not easy. We have never attempted to support ATLAS
on Windows. Note, however, that there are a number of configure options to work
around those problems by using alternate lapack implementations, or none at all
(thus disabling parts of the functionality provided by the VSIPL++ spec). You
can find out more about these in the quickstart
(http://www.codesourcery.com/public/vsiplplusplus/sourceryvsipl++-1.3/quickstart/ch02s03.html)

> So then I tried using the IA32 binary, just to see if I could compile the example fft.cpp (and the BeamformEx files) from the MS-DOS command line:
>  
>> g++ -c -I/usr/local/include fft.cpp
>  
> That's when the error occurred. I did not configure the fft backends or anything else. Nor did I expect the link step to work because there are no .a libraries in the Windows binary. 

That is strange, as the Windows binary package is configured / built
for use with Intel's IPP and MKL. I'm thus not sure what causes the
error message you are reporting. Please note that the suggested way
to build applications with Sourcery VSIPL++ is to query compiler options
from the vsipl++.pc files that are part of binary releases. It is possible,
or even likely, that you are missing some important macro definition that
causes the built-in FFT backends to be masked.

> I suppose I will have to set up a Cygwin environment, but I was hoping that MinGW alone would work.

The only supported compiler on Windows is Intel's ICC. We haven't attempted
to build using GCC on Windows, though we are now considering it.

Regards,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718


From jules at codesourcery.com  Wed Jun 27 11:33:40 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 27 Jun 2007 07:33:40 -0400
Subject: [vsipl++] fftm compile problem
In-Reply-To: <4681CB57.50905@codesourcery.com>
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com> <4681CB57.50905@codesourcery.com>
Message-ID: <46824B14.4090009@codesourcery.com>

John,

A couple of bits:

- It should be possible to build Sourcery VSIPL++ with MinGW on windows. 
  Unfortunately, you won't be able to use MinGW with the windows binary 
package from our website, because that has been built with Intel C++, 
which IIUC has a different C++ ABI than GCC on windows.

To use MinGW, you will need to build the library from the source 
package.  This requires you to run configure, so you will need either 
MSys or cygwin (something to provide the equiv of /bin/sh).


- MinGW GCC 3.4.5 will work fine (we use 3.4.4 to build our Linux binary 
pacakges).  GCC 4.1/4.2 will give better performance, but that is 
another matter ...


- The compile error you're seeing is a result of the library not being 
able to find a FFT backend.  This happens because you're missing some 
macro definitions that need to be on the command line.

If you look in the file 'lib/pkgconfig/vsipl++.pc' of the binary 
package, you will see a line:

cppflags=-I${includedir}  -DVSIP_IMPL_PAR_SERVICE=0 
-DVSIP_IMPL_IPP_FFT=1 -DVSIP_IMPL_FFT_USE_FLOAT=1 
-DVSIP_IMPL_FFT_USE_DOUBLE=1 -DVSIP_IMPL_FFT_USE_LONG_DOUBLE=1 
-DVSIP_IMPL_PROVIDE_FFT_FLOAT=1 -DVSIP_IMPL_PROVIDE_FFT_DOUBLE=1 
-DVSIP_IMPL_PROVIDE_FFT_LONG_DOUBLE=0 -DVSIP_IMPL_USE_CBLAS=2

These macros tell the library which FFT backends to use (in this case, 
we're using the IPP FFT, which happens to be how the windows binary 
package was configured).

Those definitions need to be on the command line when you compile.

You might retry compiling fft.cpp as

g++ -c -I/usr/local/include -DVSIP_IMPL_PAR_SERVICE=0 
-DVSIP_IMPL_IPP_FFT=1 -DVSIP_IMPL_FFT_USE_FLOAT=1 
-DVSIP_IMPL_FFT_USE_DOUBLE=1 -DVSIP_IMPL_FFT_USE_LONG_DOUBLE=1 
-DVSIP_IMPL_PROVIDE_FFT_FLOAT=1 -DVSIP_IMPL_PROVIDE_FFT_DOUBLE=1 
-DVSIP_IMPL_PROVIDE_FFT_LONG_DOUBLE=0 -DVSIP_IMPL_USE_CBLAS=2

That should fix the compilation errors.  However, the above mentioned 
problem of ICC and MinGW C++ ABI's being incompatible still remains of 
course!

- Sourcery VSIPL++ can be built with Cygwin too.


Do you have MSYS installed along with MinGW?  If so, you should 
configure the library from the source package.  The following configure 
command would be a good starting point:

	configure				\
		--with-lapack=simple-builtin	\
		--enable-fft=builtin

Let us know how that works!

				-- Jules


Stefan Seefeld wrote:
> Day, John wrote:
>> Stefan wrote:
>>>> You mention the windows binary release, which is configured / compiled
>>>> for use with Intel's IPP and MKL libraries. But then you are talking about
>>>> the source distribution, and mingw. To help you a little further it
>>>> is important to know what Sourcery VSIPL++ package you use .....
>>  
>> At first I tried to build the source distribution using MinGW and g++ 3.4.5, but the build failed trying to configure ATLAS and I was not able to produce the config files.
> 
> Right, configuring ATLAS is not easy. We have never attempted to support ATLAS
> on Windows. Note, however, that there are a number of configure options to work
> around those problems by using alternate lapack implementations, or none at all
> (thus disabling parts of the functionality provided by the VSIPL++ spec). You
> can find out more about these in the quickstart
> (http://www.codesourcery.com/public/vsiplplusplus/sourceryvsipl++-1.3/quickstart/ch02s03.html)
> 
>> So then I tried using the IA32 binary, just to see if I could compile the example fft.cpp (and the BeamformEx files) from the MS-DOS command line:
>>  
>>> g++ -c -I/usr/local/include fft.cpp
>>  
>> That's when the error occurred. I did not configure the fft backends or anything else. Nor did I expect the link step to work because there are no .a libraries in the Windows binary. 
> 
> That is strange, as the Windows binary package is configured / built
> for use with Intel's IPP and MKL. I'm thus not sure what causes the
> error message you are reporting. Please note that the suggested way
> to build applications with Sourcery VSIPL++ is to query compiler options
> from the vsipl++.pc files that are part of binary releases. It is possible,
> or even likely, that you are missing some important macro definition that
> causes the built-in FFT backends to be masked.
> 
>> I suppose I will have to set up a Cygwin environment, but I was hoping that MinGW alone would work.
> 
> The only supported compiler on Windows is Intel's ICC. We haven't attempted
> to build using GCC on Windows, though we are now considering it.
> 
> Regards,
> 		Stefan
> 


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From John.Day at EssexCorp.com  Thu Jun 28 11:31:50 2007
From: John.Day at EssexCorp.com (Day, John)
Date: Thu, 28 Jun 2007 07:31:50 -0400
Subject: [vsipl++] fftm compile problem
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com> <4681CB57.50905@codesourcery.com> <46824B14.4090009@codesourcery.com>
Message-ID: <5CD1C9B961A59D4592F02D12CD93238B066A2A05@STITCH.essexcorp.com>

Jules,
We tried building vsipl++1.3 on Windows using the Cygwin enviroment, but had many problems. However (surprisingly) we were successful in building using standalone MinGW with Msys and gcc/g++/g77 3.4.5, with only two minor glitches:
1. MinGW didn't have sys/times.h, so we created one with just a tms structure which satisfied the make.
2. Modified vendor\fftw\kernel\alloc.c to allow compilation of our_alloc16()
 
The two build examples, fft.exe and example1.exe were linked and ran OK, which suggests that our compiler switches and linkage issues were resolved OK.
 
But we are still having a problem compiling the Judd/Cottel BeamformEx code (http://hpec-si.com/MinimumVarianceBeamformerExample.pdf) in files BeamformEx.cpp and beam_steer_coeff.cpp
[See listing compile/link commands and errors at end of this message]
 
BeamformEx.cpp: pg 7
// Create a cholesky object
 vsip::chold<vsip::cscalar_f, vsip::by_reference>
          chold_object(vsip::chold<vsip::cscalar_f, vsip::by_reference>::lower,nh);
I was able to get this to compile by changing the first parameter of the constructor to (vsip::mat_uplo)0, since it seems to be looking for an enumeration of zero.
 
beam_steer_coeff.cpp: pg37
k *= (2.0 * M_PI/sv);
This statement causes the error, possibly due to incorrect overloading of *= operator. 
 
I can get all of the beamformer files to compile and link if I comment out this last statement. 
Is this still a config problem, or is this code possibly out of date? A comment on page 43 suggests that this is using a very early implementation of VSIPL++.
 
We are trying to get this beamformer working to do signal processing on some towed-array sonar data. Are there any other adaptive beamformers available similar to this in the VSIPL++ community, either commercially or as free software? 
 
Thanks for your help and suggestions,
John Day

>set files=BeamformEx\BeamformEx.cpp BeamformEx\array.cpp BeamformEx\beam_steer_coef.cpp BeamformEx\data_input.cpp BeamformEx\param_mvdr.cpp BeamformEx\phat.cpp 
>g++ -c -I src -I ./src   -I./vendor/clapack/SRC -I/usr/local/include/fftw3  -DVSIP_IMPL_FFTW3=1 -DVSIP_IMPL_PAR_SERVICE=0 -DVSIP_IMPL_FFT_USE_FLOAT=1 -DVSIP_IMPL_FFT_USE_DOUBLE=1 -DVSIP_IMPL_FFT_USE_LONG_DOUBLE=1 -DVSIP_IMPL_PROVIDE_FFT_FLOAT=1 -DVSIP_IMPL_PROVIDE_FFT_DOUBLE=1 -DVSIP_IMPL_PROVIDE_FFT_LONG_DOUBLE=1 -I/lapack -DVSIP_IMPL_USE_CBLAS=0 -g -O2 -I./src BeamformEx\BeamformEx.cpp BeamformEx\array.cpp BeamformEx\beam_steer_coef.cpp BeamformEx\data_input.cpp BeamformEx\param_mvdr.cpp BeamformEx\phat.cpp  
BeamformEx\BeamformEx.cpp: In function `int main(int, char**)':
BeamformEx\BeamformEx.cpp:220: error: `lower' is not a member of `vsip::chold<vsip::cscalar_f,  by_reference>'
src/vsip/core/expr/scalar_block.hpp: In instantiation of `vsip::impl::Scalar_block_base<1u, double>':
src/vsip/core/expr/scalar_block.hpp:69:   instantiated from `vsip::impl::Scalar_block<1u, double>'
src/vsip/core/expr/binary_block.hpp:76:   instantiated from `vsip::impl::Binary_expr_block<1u, vsip::impl::op::Mult, vsip::Dense<1u, vsip::scalar_f, vsip::tuple<0u, 1u, 2u>, vsip::Local_map>, vsip::scalar_f, vsip::impl::Scalar_block<1u, double>, double>'
src/vsip/vector.hpp:45:   instantiated from `vsip::const_Vector<vsip::scalar_f, const vsip::impl::Binary_expr_block<1u, vsip::impl::op::Mult, vsip::Dense<1u, vsip::scalar_f, vsip::tuple<0u, 1u, 2u>, vsip::Local_map>, vsip::scalar_f, vsip::impl::Scalar_block<1u, double>, double> >'
src/vsip/vector.hpp:270:   instantiated from `vsip::Vector<T, B>& vsip::Vector<T, B>::operator*=(const T0&) [with T0 = double, T = vsip::scalar_f, Block = vsip::Dense<1u, vsip::scalar_f, vsip::tuple<0u, 1u, 2u>, vsip::Local_map>]'
BeamformEx\beam_steer_coef.cpp:71:   instantiated from here
src/vsip/core/expr/scalar_block.hpp:60: error: `vsip::impl::Scalar_block_base<D, Scalar>::map_' has incomplete type
src/vsip/core/parallel/local_map.hpp:32: error: declaration of `struct vsip::Local_or_global_map<1u>'


________________________________

From: Jules Bergmann [mailto:jules at codesourcery.com]
Sent: Wed 6/27/2007 7:33 AM
Cc: Day, John; vsipl++ at codesourcery.com
Subject: Re: [vsipl++] fftm compile problem


John,

A couple of bits:

- It should be possible to build Sourcery VSIPL++ with MinGW on windows.
  Unfortunately, you won't be able to use MinGW with the windows binary
package from our website, because that has been built with Intel C++,
which IIUC has a different C++ ABI than GCC on windows.

To use MinGW, you will need to build the library from the source
package.  This requires you to run configure, so you will need either
MSys or cygwin (something to provide the equiv of /bin/sh).


- MinGW GCC 3.4.5 will work fine (we use 3.4.4 to build our Linux binary
pacakges).  GCC 4.1/4.2 will give better performance, but that is
another matter ...


- The compile error you're seeing is a result of the library not being
able to find a FFT backend.  This happens because you're missing some
macro definitions that need to be on the command line.

If you look in the file 'lib/pkgconfig/vsipl++.pc' of the binary
package, you will see a line:

cppflags=-I${includedir}  -DVSIP_IMPL_PAR_SERVICE=0
-DVSIP_IMPL_IPP_FFT=1 -DVSIP_IMPL_FFT_USE_FLOAT=1
-DVSIP_IMPL_FFT_USE_DOUBLE=1 -DVSIP_IMPL_FFT_USE_LONG_DOUBLE=1
-DVSIP_IMPL_PROVIDE_FFT_FLOAT=1 -DVSIP_IMPL_PROVIDE_FFT_DOUBLE=1
-DVSIP_IMPL_PROVIDE_FFT_LONG_DOUBLE=0 -DVSIP_IMPL_USE_CBLAS=2

These macros tell the library which FFT backends to use (in this case,
we're using the IPP FFT, which happens to be how the windows binary
package was configured).

Those definitions need to be on the command line when you compile.

You might retry compiling fft.cpp as

g++ -c -I/usr/local/include -DVSIP_IMPL_PAR_SERVICE=0
-DVSIP_IMPL_IPP_FFT=1 -DVSIP_IMPL_FFT_USE_FLOAT=1
-DVSIP_IMPL_FFT_USE_DOUBLE=1 -DVSIP_IMPL_FFT_USE_LONG_DOUBLE=1
-DVSIP_IMPL_PROVIDE_FFT_FLOAT=1 -DVSIP_IMPL_PROVIDE_FFT_DOUBLE=1
-DVSIP_IMPL_PROVIDE_FFT_LONG_DOUBLE=0 -DVSIP_IMPL_USE_CBLAS=2

That should fix the compilation errors.  However, the above mentioned
problem of ICC and MinGW C++ ABI's being incompatible still remains of
course!

- Sourcery VSIPL++ can be built with Cygwin too.


Do you have MSYS installed along with MinGW?  If so, you should
configure the library from the source package.  The following configure
command would be a good starting point:

        configure                               \
                --with-lapack=simple-builtin    \
                --enable-fft=builtin

Let us know how that works!

                                -- Jules


Stefan Seefeld wrote:
> Day, John wrote:
>> Stefan wrote:
>>>> You mention the windows binary release, which is configured / compiled
>>>> for use with Intel's IPP and MKL libraries. But then you are talking about
>>>> the source distribution, and mingw. To help you a little further it
>>>> is important to know what Sourcery VSIPL++ package you use .....
>> 
>> At first I tried to build the source distribution using MinGW and g++ 3.4.5, but the build failed trying to configure ATLAS and I was not able to produce the config files.
>
> Right, configuring ATLAS is not easy. We have never attempted to support ATLAS
> on Windows. Note, however, that there are a number of configure options to work
> around those problems by using alternate lapack implementations, or none at all
> (thus disabling parts of the functionality provided by the VSIPL++ spec). You
> can find out more about these in the quickstart
> (http://www.codesourcery.com/public/vsiplplusplus/sourceryvsipl++-1.3/quickstart/ch02s03.html)
>
>> So then I tried using the IA32 binary, just to see if I could compile the example fft.cpp (and the BeamformEx files) from the MS-DOS command line:
>> 
>>> g++ -c -I/usr/local/include fft.cpp
>> 
>> That's when the error occurred. I did not configure the fft backends or anything else. Nor did I expect the link step to work because there are no .a libraries in the Windows binary.
>
> That is strange, as the Windows binary package is configured / built
> for use with Intel's IPP and MKL. I'm thus not sure what causes the
> error message you are reporting. Please note that the suggested way
> to build applications with Sourcery VSIPL++ is to query compiler options
> from the vsipl++.pc files that are part of binary releases. It is possible,
> or even likely, that you are missing some important macro definition that
> causes the built-in FFT backends to be masked.
>
>> I suppose I will have to set up a Cygwin environment, but I was hoping that MinGW alone would work.
>
> The only supported compiler on Windows is Intel's ICC. We haven't attempted
> to build using GCC on Windows, though we are now considering it.
>
> Regards,
>               Stefan
>


--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
 
 
This electronic message and any files transmitted with it contain information which may be privileged and/or proprietary. The information is intended for use solely by the intended recipient(s). If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this information is prohibited. If you have received this electronic message in error, please advise the sender by reply email or by telephone (301-939-7000) and delete the message.


From jules at codesourcery.com  Thu Jun 28 13:03:05 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 28 Jun 2007 09:03:05 -0400
Subject: [vsipl++] fftm compile problem
In-Reply-To: <5CD1C9B961A59D4592F02D12CD93238B066A2A05@STITCH.essexcorp.com>
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com> <4681CB57.50905@codesourcery.com> <46824B14.4090009@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A05@STITCH.essexcorp.com>
Message-ID: <4683B189.7040407@codesourcery.com>

Day, John wrote:
 > Jules,

 > We tried building vsipl++1.3 on Windows using the Cygwin enviroment,
 > but had many problems.

If you don't mind, can you describe the problems?  We've had some
success with cygwin, however we would like to make things more robust.

 > However (surprisingly) we were successful in
 > building using standalone MinGW with Msys and gcc/g++/g77 3.4.5,

Great!

 > with only two minor glitches:

 > 1. MinGW didn't have sys/times.h, so we created one with just a tms
 > structure which satisfied the make.

OK.  Do you know where this was being included from?  We try to pull
in <time.h>, but only if you've enabled one of the posix timers
(--enable-timer=posix or --enable-timer=realtime).

 > 2. Modified vendor\fftw\kernel\alloc.c to allow compilation of
 > our_alloc16()

Was this to fix a compilation error in that routine, or to force the
#ifdef to true?

 >
 > The two build examples, fft.exe and example1.exe were linked and ran 
OK, which suggests that our compiler switches and linkage issues were 
resolved OK.
 >
 > But we are still having a problem compiling the Judd/Cottel 
BeamformEx code 
(http://hpec-si.com/MinimumVarianceBeamformerExample.pdf) in files 
BeamformEx.cpp and beam_steer_coeff.cpp
 > [See listing compile/link commands and errors at end of this message]
 >
 > BeamformEx.cpp: pg 7
 > // Create a cholesky object
 >  vsip::chold<vsip::cscalar_f, vsip::by_reference>
 >           chold_object(vsip::chold<vsip::cscalar_f, 
vsip::by_reference>::lower,nh);
 > I was able to get this to compile by changing the first parameter of 
the constructor to (vsip::mat_uplo)0, since it seems to be looking for 
an enumeration of zero.

'lower' is no longer part of the chold object, rather it is in the
vsip namespace.  You might try changing parameter to vsip::lower.

 >
 > beam_steer_coeff.cpp: pg37
 > k *= (2.0 * M_PI/sv);
 > This statement causes the error, possibly due to incorrect
 > overloading of *= operator.
 >
 > I can get all of the beamformer files to compile and link if I
 > comment out this last statement.  Is this still a config problem, or
 > is this code possibly out of date? A comment on page 43 suggests
 > that this is using a very early implementation of VSIPL++.

That statement should work.  From the error message below, the library
may be failing to include a header file.

Can you try adding the following include

	#include <vsip/map.hpp>

and recompiling?

 >  We are trying to get this beamformer working to do signal
 >  processing on some towed-array sonar data. Are there any other
 >  adaptive beamformers available similar to this in the VSIPL++
 >  community, either commercially or as free software?

There is a K-Omega beamformer (also originating from Randy Judd) that
was included with the old VSIPL++ reference implementation.  However,
I am not sure if it is adaptive.
 >
 > Thanks for your help and suggestions,
 > John Day

No problem!  Thanks for your feedback on VSIPL++.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From John.Day at EssexCorp.com  Thu Jun 28 15:05:50 2007
From: John.Day at EssexCorp.com (Day, John)
Date: Thu, 28 Jun 2007 11:05:50 -0400
Subject: [vsipl++] fftm compile problem
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com> <4681CB57.50905@codesourcery.com> <46824B14.4090009@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A05@STITCH.essexcorp.com> <4683B189.7040407@codesourcery.com>
Message-ID: <5CD1C9B961A59D4592F02D12CD93238B07CEBA55@STITCH.essexcorp.com>

Jules wrote:
>> We tried building vsipl++1.3 on Windows using the Cygwin enviroment,
>> but had many problems.

> If you don't mind, can you describe the problems?  We've had some
> success with cygwin, however we would like to make things more robust.

Turns out that there was only a single problem: the configure failed on
fftw3l (using the "builtin" parameters that you suggested). Configure
reported an error on the console, but the logs did not contain any
specific error that we could identify or troubleshoot. That's when we
decided to give MingGW a try.

I should also mention that our development platform is a Dell running an
x64 Dual Core Xeon processor. But we are mostly running in 32-bit
emulation (using the WOW64 emulation) which seems to be slightly
unstable, for example we cannot get gdb (MinGW or Cygwin versions) to
run reliably. So there might be other "x64" side-effects at play here.

 >> However (surprisingly) we were successful in
 >> building using standalone MinGW with Msys and gcc/g++/g77 3.4.5,

>Great!

 >> with only two minor glitches:

 >> 1. MinGW didn't have sys/times.h, so we created one with just a tms
 >> structure which satisfied the make.

>OK.  Do you know where this was being included from?  We try to pull
>in <time.h>, but only if you've enabled one of the posix timers
>(--enable-timer=posix or --enable-timer=realtime).

These CLAPACK files included sys/times.h
vendor/clapack/SRC/dsecnd.c 
vendor/clapack/SRC/second.c

 >> 2. Modified vendor\fftw\kernel\alloc.c to allow compilation of
 >> our_alloc16()

>Was this to fix a compilation error in that routine, or to force the
>#ifdef to true?

We forced with these #defines
#define WITH_OUR_MALLOC16
#define MIN_ALIGNMENT  16
#if defined(WITH_OUR_MALLOC16) && (MIN_ALIGNMENT == 16)
 ?>
>> The two build examples, fft.exe and example1.exe were linked and ran 
>>OK, which suggests that our compiler switches and linkage issues were 
>>resolved OK.
 >>
 >> But we are still having a problem compiling the Judd/Cottel 
>>BeamformEx code 
>>(http://hpec-si.com/MinimumVarianceBeamformerExample.pdf) in files 
>>BeamformEx.cpp and beam_steer_coeff.cpp
 >> [See listing compile/link commands and errors at end of this
message]
 >>
 >> BeamformEx.cpp: pg 7
 >> // Create a cholesky object
 >>  vsip::chold<vsip::cscalar_f, vsip::by_reference>
 >>           chold_object(vsip::chold<vsip::cscalar_f, 
>>vsip::by_reference>::lower,nh);
 >> I was able to get this to compile by changing the first parameter of

>>the constructor to (vsip::mat_uplo)0, since it seems to be looking for

>>an enumeration of zero.

>'lower' is no longer part of the chold object, rather it is in the
>vsip namespace.  You might try changing parameter to vsip::lower.

That worked.

 >>
 >> beam_steer_coeff.cpp: pg37
 >> k *= (2.0 * M_PI/sv);
 >> This statement causes the error, possibly due to incorrect
 >> overloading of *= operator.
 >>
 >> I can get all of the beamformer files to compile and link if I
 >> comment out this last statement.  Is this still a config problem, or
 >> is this code possibly out of date? A comment on page 43 suggests
 >> that this is using a very early implementation of VSIPL++.

>That statement should work.  From the error message below, the library
>may be failing to include a header file.

>Can you try adding the following include

>	#include <vsip/map.hpp>

>and recompiling?

That worked. 
Also tried replacing both includes with a single #include
<vsip/signal.h> and that worked too.

 >>  We are trying to get this beamformer working to do signal
 >>  processing on some towed-array sonar data. Are there any other
 >>  adaptive beamformers available similar to this in the VSIPL++
 >>  community, either commercially or as free software?

>There is a K-Omega beamformer (also originating from Randy Judd) that
>was included with the old VSIPL++ reference implementation.  However,
>I am not sure if it is adaptive.
 
We found this presentation with code snippets, 
http://hpec-si.com/S14-HPEC-SI-VSIPL++.ppt#298,12,VSIPL++ Version

...but can't find the entire source code. How might we obtain this code
or similar VSIPL++ implementations? (We are under Navy contract, so
might reuse some old government code, if any exists).

Thanks,
John Day
 
 
This electronic message and any files transmitted with it contain information which may be privileged and/or proprietary. The information is intended for use solely by the intended recipient(s). If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this information is prohibited. If you have received this electronic message in error, please advise the sender by reply email or by telephone (301-939-7000) and delete the message.


From jules at codesourcery.com  Thu Jun 28 16:08:39 2007
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 28 Jun 2007 12:08:39 -0400
Subject: [vsipl++] fftm compile problem
In-Reply-To: <5CD1C9B961A59D4592F02D12CD93238B07CEBA55@STITCH.essexcorp.com>
References: <5CD1C9B961A59D4592F02D12CD93238B07C9E806@STITCH.essexcorp.com> <4681A0E5.6010707@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A03@STITCH.essexcorp.com> <4681CB57.50905@codesourcery.com> <46824B14.4090009@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B066A2A05@STITCH.essexcorp.com> <4683B189.7040407@codesourcery.com> <5CD1C9B961A59D4592F02D12CD93238B07CEBA55@STITCH.essexcorp.com>
Message-ID: <4683DD07.8020904@codesourcery.com>

Day, John wrote:
> Jules wrote:
>>> We tried building vsipl++1.3 on Windows using the Cygwin enviroment,
>>> but had many problems.
> 
>> If you don't mind, can you describe the problems?  We've had some
>> success with cygwin, however we would like to make things more robust.
> 
> Turns out that there was only a single problem: the configure failed on
> fftw3l (using the "builtin" parameters that you suggested). Configure
> reported an error on the console, but the logs did not contain any
> specific error that we could identify or troubleshoot. That's when we
> decided to give MingGW a try.

Ok, you can work around that by configuring with --disable-fft-long-double.

> 
> I should also mention that our development platform is a Dell running an
> x64 Dual Core Xeon processor. But we are mostly running in 32-bit
> emulation (using the WOW64 emulation) which seems to be slightly
> unstable, for example we cannot get gdb (MinGW or Cygwin versions) to
> run reliably. So there might be other "x64" side-effects at play here.

Interesting.  As you might know, our company also produces Sourcery G++ 
a productized version of the GNU toolchain.  I'm checking with our G++ 
team to see if we have any solutions for 64-bit windows.


> 
> These CLAPACK files included sys/times.h
> vendor/clapack/SRC/dsecnd.c 
> vendor/clapack/SRC/second.c

Thanks!  Unfotunately we pull in all of lapack, even though we don't use 
all of it, including the timer routines.  I've captured this issue 
internally, we'll correct that in our next release.
> 
>  >> 2. Modified vendor\fftw\kernel\alloc.c to allow compilation of
>  >> our_alloc16()
> 
>> Was this to fix a compilation error in that routine, or to force the
>> #ifdef to true?
> 
> We forced with these #defines
> #define WITH_OUR_MALLOC16
> #define MIN_ALIGNMENT  16
> #if defined(WITH_OUR_MALLOC16) && (MIN_ALIGNMENT == 16)

Thanks, we need to look into why FFTW's configure did not detect 
WITH_OUR_MALLOC16.

> 
>> 'lower' is no longer part of the chold object, rather it is in the
>> vsip namespace.  You might try changing parameter to vsip::lower.
> 
> That worked.

Great!


>> Can you try adding the following include
> 
>> 	#include <vsip/map.hpp>
> 
>> and recompiling?
> 
> That worked. 
> Also tried replacing both includes with a single #include
> <vsip/signal.h> and that worked too.

Great, thanks for trying that out.  That is an issue in our library that 
we need to fix.  Including map should not be required if maps are not 
being explicitly used.

> 
>> There is a K-Omega beamformer (also originating from Randy Judd) that
>> was included with the old VSIPL++ reference implementation.  However,
>> I am not sure if it is adaptive.
>  
> We found this presentation with code snippets, 
> http://hpec-si.com/S14-HPEC-SI-VSIPL++.ppt#298,12,VSIPL++ Version
> 
> ...but can't find the entire source code. How might we obtain this code
> or similar VSIPL++ implementations? (We are under Navy contract, so
> might reuse some old government code, if any exists).

Ok, I'll look into where this code might be.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705