From jules at codesourcery.com Sat Oct 1 14:01:55 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 01 Oct 2005 10:01:55 -0400 Subject: [vsipl++] [patch] matvec: outer, gem, cumsum In-Reply-To: <433D8B8F.8000202@codesourcery.com> References: <43398FE1.7080906@codesourcery.com> <4339CE30.9070608@codesourcery.com> <433D8B8F.8000202@codesourcery.com> Message-ID: <433E96D3.3070509@codesourcery.com> Don, Looks good. Please check it in, modulo the two comments below. thanks, -- Jules Don McCoy wrote: > Suggested changes applied. Using a modified approach that applies the > 'mat_op_type' makes the code more readable and it was easier to extend > to include op types mat_herm and mat_conj. Also includes > specializations that allow herm and conj to be performed on real types > (by doing transpose and nothing respectively). > Tested under GCC 3.4 successfully. ICPC 8.0 and 9.0 caused failures > related to handling of complex types. > > > > + template + typename Block1, typename Block2, typename Block4> > + void > + gemp( T0 alpha, const_Matrix A, > + const_Matrix B, T3 beta, Matrix C) > + { > + assert( A.size(0) == C.size(0) ); > + assert( B.size(1) == C.size(1) ); Also assert that A.size(1) == B.size(0) (calling dot() does this implicity, but catching errors earlier in the call chain makes it easier for users to understand the assertion failure). > + > + for ( index_type i = A.size(0); i-- > 0; ) > + for ( index_type j = B.size(1); j-- > 0; ) > + C.put(i, j, alpha * dot( A.row(i), B.col(j) ) + beta * C.get(i, j)); > + } > + > + > + /// outer product of two complex vectors > + template + typename T1, > + typename T2, > + typename Block1, > + typename Block2> > + const_Matrix::type>::type> > + outer( T0 alpha, const_Vector, Block1> v, > + const_Vector, Block2> w ) > + VSIP_NOTHROW > + { > + typedef Matrix + typename Promotion::type>::type> return_type; I think this should be: typedef Matrix, std::complex >::type >::type> return type; i.e. promote std::complex instead of T1, same for T2. Also, the function return type should be updated too. > + return_type r( v.size(), w.size(), alpha ); > + > + for ( index_type i = v.size(); i-- > 0; ) > + for ( index_type j = w.size(); j-- > 0; ) > + r.put( i, j, alpha * v.get(i) * conj(w.get(j)) ); > + > + return r; > + } > + > + From stefan at codesourcery.com Mon Oct 3 16:27:35 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 03 Oct 2005 12:27:35 -0400 Subject: [vsipl++] status report - 2005-10-03 Message-ID: <43415BF7.8000202@codesourcery.com> 26th September - 30th September ------------------------------- VSIPL++: * Implement selgen. * Start documentation. QMTest: * Documentation. This week --------- VSIPL++: * More documentation. QMTest: * Documentation. From jules at codesourcery.com Mon Oct 3 18:08:17 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 03 Oct 2005 14:08:17 -0400 Subject: User's Guide wiki - Re: [vsipl++] status report - 2005-10-03 In-Reply-To: <43415BF7.8000202@codesourcery.com> References: <43415BF7.8000202@codesourcery.com> Message-ID: <43417391.4010700@codesourcery.com> Stefan, I started a wiki for end-user documentation: https://intranet.codesourcery.com/VSIPLDocumentation For the user's guide, there is a strawman outline and a link to some older documentation on the SAR example. They're meant to be a starting points, so if they're already overcome by events, that's great! -- Jules Stefan Seefeld wrote: > --------- > > VSIPL++: > > * More documentation. > From don at codesourcery.com Tue Oct 4 05:58:40 2005 From: don at codesourcery.com (Don McCoy) Date: Mon, 03 Oct 2005 23:58:40 -0600 Subject: [vsipl++] [patch] matvec: outer, gem, cumsum In-Reply-To: <433E96D3.3070509@codesourcery.com> References: <43398FE1.7080906@codesourcery.com> <4339CE30.9070608@codesourcery.com> <433D8B8F.8000202@codesourcery.com> <433E96D3.3070509@codesourcery.com> Message-ID: <43421A10.2010403@codesourcery.com> Jules Bergmann wrote: > Don, Looks good. Please check it in, modulo the two comments below. > thanks, -- Jules Applied and checked in. Thanks, -- Don McCoy CodeSourcery, LLC -------------- next part -------------- A non-text attachment was scrubbed... Name: mv.diff Type: text/x-patch Size: 20874 bytes Desc: not available URL: From jules at codesourcery.com Tue Oct 4 20:21:20 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 04 Oct 2005 16:21:20 -0400 Subject: [patch] Convolution: implement symmetric coefficients Message-ID: <4342E440.30806@codesourcery.com> This patch extends convolution to work with symmetric kernels (sym_even_len_odd and sym_even_len_even). It extends tests to cover these new cases, and also to cover cases where decimation != 1. This uncovered a couple of specification issues: - First, the VSIPL++ spec defines the convolution accessor 'kernel_size()' to return the domain having the same length for each dimension as 'filter_coeffs'. However, when constructing a convolution with a symmetric kernel, 'filter_coeffs' only contains a subset of coefficients. In those cases, the true kernel size is either '2 * filter_coeffs.size()' or '2 * filter_coeff.size() + 1' (for 1D). In contrast, the C-VSIPL spec defines the kernel size as M, and specifies the size of 'filter_coeffs' as either M if symmetry = non_sym, or 'floor(M/2)' if 'symmetry == sym_even_len_{odd,even}' (Issue #91) - Second, the VSIPL++ and C-VSIPL specs define the output size of a minimal support output convolution such that values outside of the input may be required for some decimations != 1. This contradicts the equation that defines the output values (Issue #90) Fixing the first issue is straight-forward. The second issue is more tricky since the C-VSIPL specs looks to be "wrong" as well and changing the output size would break existing C-VSIPL implementations. I've added an ifdef (VSIP_IMPL_CONV_CORRECT_MIN_SUPPORT_SIZE) that controls whether we follow the spec or not. When following the spec, we treat accesses outside the input range as 0. (This is probably the right way to "fix" the C-VSIPL spec: keep the length defined as is and specifiy that zero values are used when trying to access outside the input range, similar to support_same and support_full). -- Jules From jules at codesourcery.com Tue Oct 4 20:24:29 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 04 Oct 2005 16:24:29 -0400 Subject: [vsipl++] [patch] matvec: outer, gem, cumsum In-Reply-To: <43421A10.2010403@codesourcery.com> References: <43398FE1.7080906@codesourcery.com> <4339CE30.9070608@codesourcery.com> <433D8B8F.8000202@codesourcery.com> <433E96D3.3070509@codesourcery.com> <43421A10.2010403@codesourcery.com> Message-ID: <4342E4FD.3060308@codesourcery.com> Don McCoy wrote: > > Applied and checked in. > > Thanks, Woo-hoo! Now ref-impl/matvec passes. Only signal-correlation, signal-fir, and signal-histogram to go! (ref-impl/selgen also fails, but the test is using an old clip/invclip API that differs from the specification.) -- Jules From jules at codesourcery.com Thu Oct 6 20:55:40 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 06 Oct 2005 16:55:40 -0400 Subject: [patch] 1D Correlation Message-ID: <43458F4C.4030405@codesourcery.com> This patch implements and tests 1D correlation. There are two implementations, a simple loop version based on the C-VSIPL equations for correlation, and an optimized version that uses FFT overlap-add. the overlap-add algorithm needs its parameters tuned (the block size it chooses for a given input-size and reference-size), as you can see from the red line on the chart, but overall it performs better than the straight forward equation. The chart shows effective flops based on the simple version so that the two versions can be compared. Also, the chart is for a relatively small reference vector size (16 elements), for larger sizes the big-O advantage of overlap-add gets better. This patch also changes the view op= operators to take their argument by value. This was necessary when the RHS of an expression is a temporary object, such as that returned by 'ramp', ie.: Vector out(N); out /= ramp(1, 1, N); -- Jules -------------- next part -------------- A non-text attachment was scrubbed... Name: corr.png Type: image/png Size: 5379 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: corr.diff URL: From ncm at codesourcery.com Mon Oct 10 01:22:34 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Sun, 9 Oct 2005 18:22:34 -0700 Subject: [PATCH] Implement Fir<>, native C++ version Message-ID: <20051010012234.GA29098@codesourcery.com> The following patch has been committed. It implements vsip::Fir<> using native C++ code, and a comprehensive test of all its modes. Note that a few bits of the test are commented out; it uses vsip::Convolution<> to generate the reference output, and that has a little bug I haven't got to tracking down yet. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.288 diff -u -p -r1.288 ChangeLog --- ChangeLog 7 Oct 2005 13:46:45 -0000 1.288 +++ ChangeLog 10 Oct 2005 01:17:11 -0000 @@ -1,3 +1,9 @@ +2005-10-09 Nathan Myers + + Implement FIR filter, all modes. + * src/vsip/impl/signal-fir.hpp, tests/fir.cpp: New. + * src/vsip/signal.hpp: Include new impl/signal-fir.hpp. + 2005-10-06 Jules Bergmann Implement 1-D correlation. Index: src/vsip/signal.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/signal.hpp,v retrieving revision 1.10 diff -u -p -r1.10 signal.hpp --- src/vsip/signal.hpp 7 Oct 2005 13:46:46 -0000 1.10 +++ src/vsip/signal.hpp 10 Oct 2005 01:17:11 -0000 @@ -19,6 +19,7 @@ #include #include #include +#include #endif // VSIP_SIGNAL_HPP Index: src/vsip/impl/signal-fir.hpp =================================================================== RCS file: src/vsip/impl/signal-fir.hpp diff -N src/vsip/impl/signal-fir.hpp --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ src/vsip/impl/signal-fir.hpp 10 Oct 2005 01:17:11 -0000 @@ -0,0 +1,208 @@ +/* Copyright (c) 2005 by CodeSourcery, LLC. All rights reserved. */ + +/** @file vsip/impl/signal-fir.hpp + @author Nathan Myers + @date 2005-09-30 + @brief VSIPL++ Library: FIR filters +*/ + +#ifndef VSIP_IMPL_SIGNAL_FIR_HPP +#define VSIP_IMPL_SIGNAL_FIR_HPP + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace vsip +{ + +enum obj_state { state_no_save, state_save }; + +namespace impl +{ + +// Fir_aligned; block and view types with optimal layout for FIR operations. + +template +struct Fir_aligned +{ + typedef typename + impl::Fast_block<1,T, + impl::Layout<1, + vsip::tuple<0,1,2>, + impl::Stride_unit_align<16>,impl::Cmplx_inter_fmt>, + Map_type> + block_type; +}; + +} // namespace impl + +/////////////////////////////////////////////////////////////////// +// +// class Fir<> +// + +template +< + typename T = VSIP_DEFAULT_VALUE_TYPE, + vsip::symmetry_type symV = vsip::nonsym, + vsip::obj_state useOldState = vsip::state_save, + unsigned NumberOfTimes = 0, + vsip::alg_hint_type algHint = vsip::alg_time +> +class Fir +{ +public: + static const vsip::symmetry_type symmetry = symV; + static const vsip::obj_state continuous_filter = useOldState; + + template + Fir( + vsip::const_Vector kernel, + vsip::length_type input_size, + vsip::length_type decimation = 1) + VSIP_THROW((std::bad_alloc)) + : input_size_(input_size), + order_(kernel.size() * (1 + (symV != vsip::nonsym)) - + (symV == vsip::sym_even_len_odd) - 1), + decimation_(decimation), + skip_(0), + state_saved_(0), + state_(this->order_, T(0)), + kernel_(this->order_ + 1) + { + assert(input_size > 0); + assert(this->order_ + 1 > 1); // counter unsigned wraparound + assert(decimation > 0); + assert(this->order_ + 1 > decimation); // M >= decimation + assert(input_size >= this->order_); // input_size >= M + + // must be after asserts because of division + this->output_size_ = (input_size + decimation - 1) / decimation; + + // mirror the kernel + unsigned const ksz = kernel.size(); + this->kernel_(vsip::Domain<1>(this->order_, -1, ksz)) = kernel; + // and maybe unmirror a copy, too + if (symV != vsip::nonsym) + this->kernel_(vsip::Domain<1>(ksz)) = kernel; + } + + vsip::length_type kernel_size() const VSIP_NOTHROW + { return this->order_ + 1; } + vsip::length_type filter_order() const VSIP_NOTHROW + { return this->order_ + 1; } + // vsip::symmetry_type symmetry() const VSIP_NOTHROW + // { return symV; } + vsip::length_type input_size() const VSIP_NOTHROW + { return this->input_size_; } + vsip::length_type output_size() const VSIP_NOTHROW + { return this->output_size_; } + vsip::obj_state continuous_filtering() const VSIP_NOTHROW + { return useOldState; } + vsip::length_type decimation() const VSIP_NOTHROW + { return this->decimation_; } + + template + vsip::length_type + operator()( + vsip::const_Vector data, + vsip::Vector out) + VSIP_NOTHROW + { + assert(data.size() == this->input_size_); + assert(out.size() == this->output_size_); + + typedef vsip::Domain<1> d_type; + typedef vsip::length_type len_type; + + const len_type dec = this->decimation_; + const len_type m = this->order_; + const len_type skip = this->skip_; + const len_type saved = this->state_saved_; + len_type oix = 0; + len_type i = 0; + for (; i < m - skip; ++oix, i += dec) + { + // Conceptually this comes second, but it's more convenient + // to put it here. So, read the second statement below first. + T sum = vsip::dot( + this->kernel_(d_type(m - skip - i, 1, i + skip + 1)), + data(d_type(i + skip + 1))); + + if (useOldState == vsip::state_save && i < saved) + { + sum += vsip::dot( + this->kernel_(d_type(saved - i)), + this->state_(d_type(i, 1, saved - i))); + } + + out.put(oix, sum); + } + + const len_type isz = this->input_size_; + len_type start = i - (m - skip); + for ( ; start + m < isz; ++oix, start += dec) + { + T sum = vsip::dot(this->kernel_, data(d_type(start, 1, m + 1))); + out.put(oix, sum); + } + + if (useOldState == state_save) + { + this->skip_ = start + m - isz; + const len_type new_save = isz - start; + this->state_saved_ = new_save; + this->state_(d_type(new_save)) = data(d_type(start, 1, new_save)); + } + return oix; + } + + void reset() VSIP_NOTHROW + { this->state_saved_ = this->skip_ = 0; } + + ~Fir() + { } + +public: + + float impl_performance(char* what) const + { + if (!strcmp(what, "mflops")) + { + // Compute rough estimate of flop-count. + unsigned cxmul = impl::Is_complex::value ? 4 : 1; // *(4*,2+), +(2+) + return (this->timer_.count() * cxmul * 2 * // 1* 1+ + ((this->order + 1) * this->input_size_ / this->decimation_)) / + (1e6 * this->timer_.total()); + } + else if (!strcmp(what, "time")) + return this->timer_.total(); + return 0.f; + } + +private: + vsip::length_type input_size_; + vsip::length_type output_size_; + vsip::length_type order_; // M in the spec + vsip::length_type decimation_; + vsip::length_type skip_; // how much of next input to skip + vsip::length_type state_saved_; // number of elements saved + vsip::Vector::block_type> state_; + vsip::Vector::block_type> kernel_; + + impl::profile::Acc_timer timer_; +}; + +} // namespace vsip + +#endif // VSIP_IMPL_SIGNAL_FIR_HPP + Index: tests/fir.cpp =================================================================== RCS file: tests/fir.cpp diff -N tests/fir.cpp --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ tests/fir.cpp 10 Oct 2005 01:17:11 -0000 @@ -0,0 +1,201 @@ +/* Copyright (c) 2005 by CodeSourcery, LLC. All rights reserved. */ + +/** @file tests/fft.cpp + @author Nathan Myers + @date 2005-10-03 + @brief VSIPL++ Library: Testcases for Fir<> +*/ + +/*********************************************************************** + Included Files +***********************************************************************/ + +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "test.hpp" +#include "output.hpp" + + +/*********************************************************************** + Definitions +***********************************************************************/ + +template +void +test_fir( + vsip::length_type D, + vsip::length_type M, + vsip::length_type N) +{ + const unsigned insize = 2 * M * N; + const unsigned outsize = ((2 * M * N) + D - 1) / D + 1; + vsip::Vector input(insize); + vsip::Vector output1(outsize); + vsip::Vector output2(2 * M * (N+D-1)/D); + vsip::Vector output3(2 * M * (N+D-1)/D); + + vsip::Vector convinput(insize+M, T(0)); // room for initial state + vsip::Vector convout((insize+M-1)/D + 1, T(0)); // per spec + vsip::Vector kernel(M); + + for (vsip::length_type i = 0; i < insize; ++i) + input.put(i, T(i+1)); + for (vsip::length_type i = 0; i < M; ++i) + kernel.put(i, T(2*i+1)); + + vsip::Convolution conv( + kernel, vsip::Domain<1>(convinput.size()), D); + + const vsip::length_type pad = (sym == vsip::nonsym) ? M/2 : + (sym == vsip::sym_even_len_even) ? M : M - 1; + convinput(vsip::Domain<1>(pad, 1, insize)) = input; + conv(convinput, convout); // emulate chained FIR + + vsip::Fir fir1a(kernel, N, D); + vsip::Fir fir1b(kernel, N, D); + vsip::Fir fir2(kernel, N, D); + + vsip::length_type got = 0; + for (vsip::length_type i = 0; i < 2 * M; ++i) // chained + { + got += fir1a( + input(vsip::Domain<1>(i * N, 1, N)), + output1(vsip::Domain<1>(got, 1, (N + D - 1) / D))); + } + + + vsip::length_type got1b = 0; + vsip::length_type got2 = 0; + for (vsip::length_type i = 0; i < 2 * M; ++i) // not + { + got1b += fir1b(input(vsip::Domain<1>(i * N, 1, N)), + output2(vsip::Domain<1>(got1b, 1, (N+D-1)/D))); + fir1b.reset(); + got2 += fir2(input(vsip::Domain<1>(i * N, 1, N)), + output3(vsip::Domain<1>(got2, 1, (N+D-1)/D))); + } + + vsip::Vector reference(convout(vsip::Domain<1>(got))); + vsip::Vector result(output1(vsip::Domain<1>(got))); + assert(outsize - got <= 1); + assert(vsip::alltrue(result == reference)); + assert(got1b == got2); + assert(vsip::alltrue( + output2(vsip::Domain<1>(got1b)) == output3(vsip::Domain<1>(got1b)))); +} + +int +main() +{ + vsip::vsipl init; + + + test_fir(1,2,3); + + test_fir(3,23,31); + + test_fir(1,3,5); + test_fir(1,3,9); + test_fir(1,4,8); + test_fir(1,23,31); + test_fir(1,32,256); + + test_fir(2,3,5); + test_fir(2,3,9); + test_fir(2,4,8); + test_fir(2,23,31); + test_fir(2,32,256); + + test_fir(2,3,5); + test_fir(2,3,9); + test_fir(2,4,8); + test_fir(2,23,31); + test_fir(2,32,1024); + + test_fir,vsip::nonsym>(2,3,5); + test_fir,vsip::nonsym>(2,3,9); + test_fir,vsip::nonsym>(2,4,8); + test_fir,vsip::nonsym>(2,23,31); + test_fir,vsip::nonsym>(2,32,256); + + test_fir,vsip::nonsym>(2,3,5); + test_fir,vsip::nonsym>(2,3,9); + test_fir,vsip::nonsym>(2,4,8); + test_fir,vsip::nonsym>(2,23,31); + test_fir,vsip::nonsym>(2,32,1024); + + test_fir(3,4,8); + test_fir(3,4,21); + test_fir(3,9,27); + test_fir(3,23,31); + test_fir(3,32,256); + + test_fir(4,5,13); + test_fir(4,7,31); + test_fir(4,8,32); + test_fir(4,23,31); + test_fir(4,32,256); + + test_fir(1,1,3); + test_fir(1,2,3); + test_fir(1,3,5); + test_fir(1,3,9); + test_fir(1,4,8); + test_fir(1,23,57); +#if 0 + // FIXME: this exposes a bug in vsip::Convolution. + test_fir(1,32,256); +#endif + + test_fir(2,2,3); + test_fir(2,3,5); + test_fir(2,3,9); + test_fir(2,4,8); + test_fir(2,23,57); +#if 0 + // FIXME: likewise + test_fir(2,32,256); +#endif + + test_fir(3,3,5); + test_fir(3,4,8); + test_fir(3,23,57); +#if 0 + test_fir(3,32,256); +#endif + + test_fir(1,2,3); + test_fir(1,3,5); + test_fir(1,3,9); + test_fir(1,4,9); + test_fir(1,23,57); +#if 0 + test_fir(1,32,256); +#endif + + test_fir(2,2,3); + test_fir(2,3,5); + test_fir(2,3,9); + test_fir(2,4,10); + test_fir(2,23,57); +#if 0 + test_fir(2,32,256); +#endif + + test_fir(3,3,5); + test_fir(3,4,9); + test_fir(3,23,55); +#if 0 + test_fir(3,32,256); +#endif + + return 0; +} From ncm at codesourcery.com Mon Oct 10 06:34:33 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Sun, 9 Oct 2005 23:34:33 -0700 Subject: [PATCH] more Fir<> tests Message-ID: <20051010063433.GA29454@codesourcery.com> I have also checked in the patch below. It adds Fir<> accessor tests, and a bit of instrumentation for benchmarking. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.289 diff -u -p -r1.289 ChangeLog --- ChangeLog 10 Oct 2005 01:22:30 -0000 1.289 +++ ChangeLog 10 Oct 2005 06:31:44 -0000 @@ -1,5 +1,11 @@ 2005-10-09 Nathan Myers + * src/vsip/impl/signal-fir.hpp: support Fir<>::impl_performance() + command "count". + * tests/fir.cpp: add tests for accessors, default template arg. + +2005-10-09 Nathan Myers + Implement FIR filter, all modes. * src/vsip/impl/signal-fir.hpp, tests/fir.cpp: New. * src/vsip/signal.hpp: Include new impl/signal-fir.hpp. Index: src/vsip/impl/signal-fir.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fir.hpp,v retrieving revision 1.1 diff -u -p -r1.1 signal-fir.hpp --- src/vsip/impl/signal-fir.hpp 10 Oct 2005 01:22:30 -0000 1.1 +++ src/vsip/impl/signal-fir.hpp 10 Oct 2005 06:31:44 -0000 @@ -76,6 +76,7 @@ public: decimation_(decimation), skip_(0), state_saved_(0), + op_calls_(0), state_(this->order_, T(0)), kernel_(this->order_ + 1) { @@ -163,6 +164,7 @@ public: this->state_saved_ = new_save; this->state_(d_type(new_save)) = data(d_type(start, 1, new_save)); } + ++ this->op_calls_; return oix; } @@ -186,6 +188,8 @@ public: } else if (!strcmp(what, "time")) return this->timer_.total(); + else if (!strcmp(what, "count")) + return this->op_calls_; return 0.f; } @@ -196,6 +200,7 @@ private: vsip::length_type decimation_; vsip::length_type skip_; // how much of next input to skip vsip::length_type state_saved_; // number of elements saved + unsigned long op_calls_; vsip::Vector::block_type> state_; vsip::Vector::block_type> kernel_; Index: tests/fir.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fir.cpp,v retrieving revision 1.1 diff -u -p -r1.1 fir.cpp --- tests/fir.cpp 10 Oct 2005 01:22:30 -0000 1.1 +++ tests/fir.cpp 10 Oct 2005 06:31:44 -0000 @@ -59,10 +59,29 @@ test_fir( convinput(vsip::Domain<1>(pad, 1, insize)) = input; conv(convinput, convout); // emulate chained FIR + vsip::Fir<> dummy( + vsip::const_Vector<>(vsip::length_type(3),vsip::scalar_f(1)), N*10); + assert(dummy.decimation() == 1); vsip::Fir fir1a(kernel, N, D); vsip::Fir fir1b(kernel, N, D); vsip::Fir fir2(kernel, N, D); + assert(fir1a.symmetry == sym); + assert(fir2.symmetry == sym); + assert(fir1a.continuous_filter == vsip::state_save); + assert(fir2.continuous_filter == vsip::state_no_save); + + const vsip::length_type order = (sym == vsip::nonsym) ? M : + (sym == vsip::sym_even_len_even) ? 2 * M : (2 * M) - 1; + assert(fir1a.kernel_size() == order); + assert(fir1a.filter_order() == order); + // assert(fir1a.symmetry() + assert(fir1a.input_size() == N); + assert(fir1a.output_size() == (N+D-1)/D); + assert(fir1a.continuous_filtering() == fir1a.continuous_filter); + assert(fir2.continuous_filtering() == fir2.continuous_filter); + assert(fir1a.decimation() == D); + vsip::length_type got = 0; for (vsip::length_type i = 0; i < 2 * M; ++i) // chained { From jules at codesourcery.com Mon Oct 10 14:07:03 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 10 Oct 2005 10:07:03 -0400 Subject: [vsipl++] [PATCH] Implement Fir<>, native C++ version In-Reply-To: <20051010012234.GA29098@codesourcery.com> References: <20051010012234.GA29098@codesourcery.com> Message-ID: <434A7587.2060101@codesourcery.com> Nathan, Looks good. A couple of comments below. -- Jules Nathan (Jasper) Myers wrote: > The following patch has been committed. > > It implements vsip::Fir<> using native C++ code, and a comprehensive > test of all its modes. > > Note that a few bits of the test are commented out; it uses > vsip::Convolution<> to generate the reference output, and that has > a little bug I haven't got to tracking down yet. > > Nathan Myers > ncm > + order_(kernel.size() * (1 + (symV != vsip::nonsym)) - > + (symV == vsip::sym_even_len_odd) - 1), Clever. This is portable, right? (i.e. do comparisons portably evaluate to 0 or 1?) > + decimation_(decimation), > + skip_(0), > + state_saved_(0), > + state_(this->order_, T(0)), > + kernel_(this->order_ + 1) > + { > + assert(input_size > 0); > + assert(this->order_ + 1 > 1); // counter unsigned wraparound What's going on here? The comment is a bit hard to decipher at first. Perhaps "Use modulo arithmetic to counter the effect of unsigned wraparound", although that doesn't fit on a single line. Does this catch symV == odd && kernel->size() == 0: symV kernel->size() order_ nonsym 0 -1 nonsym 1 0 even 0 -1 even 1 1 odd 0 -2 <<< odd 1 0 How about assert(kernel->size(0) > 0 && this->order_ >= 1); > + // Compute rough estimate of flop-count. > + unsigned cxmul = impl::Is_complex::value ? 4 : 1; // *(4*,2+), +(2+) A good approx is that each filter tap is a multiply-add. For complex this is 6 ops, for real this is 2 ops. > + return (this->timer_.count() * cxmul * 2 * // 1* 1+ > + ((this->order + 1) * this->input_size_ / this->decimation_)) / > + (1e6 * this->timer_.total()); > + } > + else if (!strcmp(what, "time")) > + return this->timer_.total(); > + return 0.f; > + } > + > + assert(vsip::alltrue(result == reference)); Using '==' for floating point values may cause the test to fail. For example, if FIR and convolution perform the same operations in different order, they may have different rounding/accumulation errors even though they produce effectively the same answer. You should try using 'view_equal' (which uses almost_equal() underneath) or perhaps 'error_db' instead. From stefan at codesourcery.com Mon Oct 10 15:25:02 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 10 Oct 2005 11:25:02 -0400 Subject: make doxygen be usable from separate build directory Message-ID: <434A87CE.5090205@codesourcery.com> This tiny patch essentially makes the doxygen configuration files a configure template, replacing the source path by a variable. Using that 'make doc/html/index.html' will work, even from an external build directory. Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: doc.patch Type: text/x-patch Size: 103136 bytes Desc: not available URL: From stefan at codesourcery.com Mon Oct 10 15:59:06 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 10 Oct 2005 11:59:06 -0400 Subject: [vsipl++] status report - 2005-10-10 Message-ID: <434A8FCA.7060306@codesourcery.com> 3rd October - 7th October ------------------------- VSIPL++: * Code review (notably the sarsim application) and documentation. * Experimentation with doxygen for better manual generation. (synopsis at present has problems with some of the template declarations, so I can't even suggest to use that, in its current form :-( ) QMTest: * Work on tutorial. This week --------- VSIPL++: * More of the above. QMTest: * Likewise. From jules at codesourcery.com Mon Oct 10 18:33:33 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 10 Oct 2005 14:33:33 -0400 Subject: [vsipl++] [PATCH] Implement Fir<>, native C++ version In-Reply-To: <20051010012234.GA29098@codesourcery.com> References: <20051010012234.GA29098@codesourcery.com> Message-ID: <434AB3FD.7090207@codesourcery.com> Nathan (Jasper) Myers wrote: > The following patch has been committed. > > It implements vsip::Fir<> using native C++ code, and a comprehensive > test of all its modes. > > Note that a few bits of the test are commented out; it uses > vsip::Convolution<> to generate the reference output, and that has > a little bug I haven't got to tracking down yet. > Nathan, For your failing cases: > + // FIXME: this exposes a bug in vsip::Convolution. > + test_fir(1,32,256); > + test_fir(2,32,256); > + test_fir(3,32,256); > + test_fir(1,32,256); > + test_fir(2,32,256); > + test_fir(3,32,256); There are two causes for differences between the Fir and Convolution results. The first has to do with exceeding the dynamic range of float: These kernels have 63 to 64 coefficients (due to symmetry) ranging from 1 to 63 with an average value of 32. This gives the filter a gain of approx 64 * 32 = 2048 or 2^11. Floating point numbers have 24 bits of precision. As soon as the input values start to have magnitudes in the 2^13 range, the filter output magnitude will be in the 2^24 range. At this point, small preturbations in input values (~ 2^1) will be outside the dynamic range of the floating point value. Two computations performing the same operations but in a different order will likely exceed the dynamic range at different points, resulting in different rounding and different answers. Experimentally, for your first case above (even, D=1, M=32, N=256), when running the non-IPP convolution the failures start to occur around n= 8240, approximate 2^13. The output value of the filter is 1.681 * 10^7, just a bit over 2^24. By changing the precision to double (which as 53 bits of precision), these miscompare go away. However, if you made your input longer, or if you scaled the values appropriately, you could recreate the same type of miscompare. The second cause has to do with algorithm choice. When using IPP to perform convolution, I see a different type of miscompares for smaller values that should be within the dynamic range of float. I suspect this may be do to IPP using a different algorithm underneath (such as FFT based convolution) where the size of the FFT spreads noise over a wider range than the size of the convolution kernel. Either way, you shouldn't use '==' to compare floating point values in tests. In most element-wise cases you should use 'equal' (which uses almost_equal for floats and doubles, which chekcs relative and absolute errors against some semi-arbitrary bounds). For comparing results of non-elementwise operations, its a bit more tricky. In this case, I would recommend using the 'error_db' comparison to compare the results, perhaps with a maximum value derived from the entire view to account for the noise from the FFT based convolution. -- Jules From jules at codesourcery.com Tue Oct 11 20:00:06 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 11 Oct 2005 16:00:06 -0400 Subject: [patch] General dispatch, use BLAS for dot- and matrix-matrix products Message-ID: <434C19C6.9050802@codesourcery.com> This patch adds a general dispatch mechanism based on Serial_expr_dispatch. Like serial dispatch, it dispatches an operation to an appropriate implementation using compile-time and run-time checks for applicability. This dispatch is used to implement dot-product and matrix-matrix product. For dot-product, three implementations are provided: a generic implementation for all types and blocks, a BLAS implementation for BLAS types and direct data access blocks, and a BLAS implementation that specializes handling of a conjugated vector (for cvjdot). By using the block type to determine if conjugation is necessary, the expressions 'cvjdot(x, y)' and 'dot(x, conj(y))' are evaluated identically. For matrix-matrix product, two implementations are provided: a generic impl for all types and blocks, and a BLAS implementation for BLAS types and direct access blocks. The attached graphs show performance locally. For the mm-product graph, three lines are shown: green is generic VSIPL++ performance (this is also the current library performance), red is VSIPL++ performance using BLAS, and blue is VSIPL++ performance using BLAS, but without the overhead of copying the result matrix. Why define General_dispatch and not use Serial_expr_dispatch for these? Serial_expr_dispatch is designed to take an expression for the RHS, as opposed to a list of operand block types that General_dispatch takes. Wrapping multiple arguments as an expression is possible, but would likely be cumbersome. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: prod.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: prod.png Type: image/png Size: 5382 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dot.png Type: image/png Size: 5643 bytes Desc: not available URL: From mark at codesourcery.com Tue Oct 11 20:16:32 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Tue, 11 Oct 2005 13:16:32 -0700 Subject: [vsipl++] [patch] General dispatch, use BLAS for dot- and matrix-matrix products In-Reply-To: <434C19C6.9050802@codesourcery.com> References: <434C19C6.9050802@codesourcery.com> Message-ID: <434C1DA0.8060509@codesourcery.com> Jules Bergmann wrote: > For dot-product, three implementations are provided: a generic > implementation for all types and blocks, a BLAS implementation for BLAS > types and direct data access blocks, and a BLAS implementation that > specializes handling of a conjugated vector (for cvjdot). By using the > block type to determine if conjugation is necessary, the expressions > 'cvjdot(x, y)' and 'dot(x, conj(y))' are evaluated identically. That's really, really cool. The performance numbers look great, too. Does VSIPL have cvjdot? If not, we can probably show better performance than any VSIPL implementation on this code. -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From jules at codesourcery.com Tue Oct 11 20:31:55 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 11 Oct 2005 16:31:55 -0400 Subject: [vsipl++] [patch] General dispatch, use BLAS for dot- and matrix-matrix products In-Reply-To: <434C1DA0.8060509@codesourcery.com> References: <434C19C6.9050802@codesourcery.com> <434C1DA0.8060509@codesourcery.com> Message-ID: <434C213B.9080909@codesourcery.com> Mark Mitchell wrote: > Jules Bergmann wrote: > > >>For dot-product, three implementations are provided: a generic >>implementation for all types and blocks, a BLAS implementation for BLAS >>types and direct data access blocks, and a BLAS implementation that >>specializes handling of a conjugated vector (for cvjdot). By using the >>block type to determine if conjugation is necessary, the expressions >>'cvjdot(x, y)' and 'dot(x, conj(y))' are evaluated identically. > > > That's really, really cool. The performance numbers look great, too. > Does VSIPL have cvjdot? If not, we can probably show better performance > than any VSIPL implementation on this code. Unfortunately :), VSIPL does have cvjdot. It is common enough to justify the special case. However, there is definitely an productivity/performance advantage to recognizing optimizable sequences without requiring the programmer to a special library function. The programmer doesn't have to remember "oh yeah, I should use cvjdot here". It also allows a wider scope to optimize over (for example, VSIPL has prodt, but no tprod). Some other potential examples of this: General <=> Special-case library call prod(A, trans(B)) <=> prodt(A, B) prod(trans(A), B) <=> none x * y + z <=> ma(x, y, z) Our current dispatch should handle prod(A, trans(B) and prodt(A, B) identically. -- Jules From jules at codesourcery.com Wed Oct 12 13:39:55 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 12 Oct 2005 09:39:55 -0400 Subject: [patch] configure.ac support for ATLAS Message-ID: <434D122B.50609@codesourcery.com> This patch adds configure support for using ATLAS. It includes a 'trypkg == atlas' where it tests for the atlas libraries (-llapack -lcblas -lf77blas -latlas -lg2c) when lapack is enabled. It adds two options to specify the ATLAS prefix or library directory (--with-atlas-prefix and --with-atlas-libdir). On a system where ATLAS is installed into a particular directory (such as the GTRI cluster) you would use --with-atlas-prefix # config for GTRI configure --with-atlas-prefix=/usr/local/vsipl++/atlas-3.6.0/Linux_P4SSE2 On a system where ATLAS is installed directly into a subdirectory of /usr/lib (such as debian, which puts ATLAS in /usr/lib/atlas), you would use --with-atlas-libdir: # config for my local machine configure --with-atlas-libdir=/usr/lib/atlas Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cfg-atlas.diff URL: From jules at codesourcery.com Wed Oct 12 14:14:14 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 12 Oct 2005 10:14:14 -0400 Subject: [vsipl++] make doxygen be usable from separate build directory In-Reply-To: <434A87CE.5090205@codesourcery.com> References: <434A87CE.5090205@codesourcery.com> Message-ID: <434D1A36.9090503@codesourcery.com> Stefan, looks good, please check this in. -- Jules Stefan Seefeld wrote: > This tiny patch essentially makes the doxygen configuration > files a configure template, replacing the source path by > a variable. Using that 'make doc/html/index.html' will > work, even from an external build directory. > > Regards, > Stefan From ncm at codesourcery.com Thu Oct 13 10:44:41 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Thu, 13 Oct 2005 03:44:41 -0700 Subject: [PATCH] Use IPP for Fir<> Message-ID: <20051013104441.GA5326@codesourcery.com> I have checked in the patch below. It makes vsip::Fir<> use IPP's FIR support where possible. In practice, that means whenever block size and decimation are not relatively prime. (IPP produces bad output when they are. The IPP API seems to make it impossible, so it amounts to an IPP documentation bug.) Fir<> uses the native C++ implementation for such cases. They are probably rare in real programs. The spec says the copy constructor Fir(Fir const&) is supposed to be VSIP_NOTHROW, but it seems to me that to implement it safely, it needs to do allocation. I declared it VSIP_THROW((std::bad_alloc)). The no-macro method used here to adapt to IPP's version of overloading is similar to that in fft-core.hpp, and seems practical for general use. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.292 retrieving revision 1.293 diff -u -p -r1.292 -r1.293 --- ChangeLog 12 Oct 2005 13:40:17 -0000 1.292 +++ ChangeLog 13 Oct 2005 10:23:34 -0000 1.293 @@ -1,3 +1,8 @@ +2005-10-13 Nathan Myers + + * src/vsip/impl/signal-fir.hpp: use IPP FIR support where available. + * tests/fir.cpp: forgive FFT noise on big samples. + 2005-10-12 Jules Bergmann * configure.ac (--with-atlas-prefix, --with-atlas-libdir): New Index: src/vsip/impl/signal-fir.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fir.hpp,v retrieving revision 1.2 retrieving revision 1.3 diff -u -p -r1.2 -r1.3 --- src/vsip/impl/signal-fir.hpp 10 Oct 2005 06:33:40 -0000 1.2 +++ src/vsip/impl/signal-fir.hpp 13 Oct 2005 10:23:34 -0000 1.3 @@ -19,6 +19,11 @@ #include #include +#if VSIP_IMPL_HAVE_IPP +#include +#include +#endif + #include namespace vsip @@ -43,6 +48,52 @@ struct Fir_aligned block_type; }; +#if VSIP_IMPL_HAVE_IPP + +template +< + typename T, typename IppT, + IppStatus (&ippFirF)(IppT const*,IppT*,int,IppT const*,int,IppT*,int*), + IppStatus (&ippFirDecF)( + IppT const*,IppT*,int,IppT const*,int,int,int,int,int,IppT*) +> +struct Ipp_fir_driver_base +{ + static void + run_fir( + T const* xin, T* xout, vsip::length_type outsize, + T const* xkernel, vsip::length_type ksize, + T* xstate, vsip::length_type* xstate_ix, vsip::length_type dec) + { + IppT const* const in = reinterpret_cast(xin); + IppT* const out = reinterpret_cast(xout); + IppT const* const kernel = reinterpret_cast(xkernel); + IppT* const state = reinterpret_cast(xstate); + int state_ix = *xstate_ix; + IppStatus stat = (dec == 1) ? + ippFirF(in, out, outsize, kernel, ksize, state, &state_ix) : + ippFirDecF(in, out, outsize, kernel, ksize, 1, 0, dec, 0, state); + assert(stat == ippStsNoErr); + *xstate_ix = state_ix; + } +}; + +template struct Ipp_fir_driver; + +template < > struct Ipp_fir_driver : Ipp_fir_driver_base< + float,Ipp32f,ippsFIR_Direct_32f,ippsFIRMR_Direct_32f> { }; + +template<> struct Ipp_fir_driver : Ipp_fir_driver_base< + double,Ipp64f,ippsFIR_Direct_64f,ippsFIRMR_Direct_64f> {}; + +template<> struct Ipp_fir_driver > : Ipp_fir_driver_base< + std::complex,Ipp32fc,ippsFIR_Direct_32fc,ippsFIRMR_Direct_32fc> {}; + +template<> struct Ipp_fir_driver > : Ipp_fir_driver_base< + std::complex,Ipp64fc,ippsFIR_Direct_64fc,ippsFIRMR_Direct_64fc> {}; + +#endif // VSIP_IMPL_HAVE_IPP + } // namespace impl /////////////////////////////////////////////////////////////////// @@ -63,22 +114,27 @@ class Fir public: static const vsip::symmetry_type symmetry = symV; static const vsip::obj_state continuous_filter = useOldState; - + typedef typename impl::Fir_aligned::block_type block_type; + template Fir( vsip::const_Vector kernel, vsip::length_type input_size, vsip::length_type decimation = 1) VSIP_THROW((std::bad_alloc)) - : input_size_(input_size), - order_(kernel.size() * (1 + (symV != vsip::nonsym)) - - (symV == vsip::sym_even_len_odd) - 1), - decimation_(decimation), - skip_(0), - state_saved_(0), - op_calls_(0), - state_(this->order_, T(0)), - kernel_(this->order_ + 1) + : input_size_(input_size) + , order_(kernel.size() * (1 + (symV != vsip::nonsym)) - + (symV == vsip::sym_even_len_odd) - 1) + , decimation_(decimation) + , skip_(0) + , op_calls_(0) + , kernel_(this->order_ + 1) + , state_(2 * (this->order_ + 1), T(0)) // IPP wants 2x. + , state_saved_(0) +#if VSIP_IMPL_HAVE_IPP + , temp_in_(this->input_size_) + , temp_out_(this->input_size_) +#endif { assert(input_size > 0); assert(this->order_ + 1 > 1); // counter unsigned wraparound @@ -89,14 +145,58 @@ public: // must be after asserts because of division this->output_size_ = (input_size + decimation - 1) / decimation; - // mirror the kernel - unsigned const ksz = kernel.size(); - this->kernel_(vsip::Domain<1>(this->order_, -1, ksz)) = kernel; - // and maybe unmirror a copy, too - if (symV != vsip::nonsym) - this->kernel_(vsip::Domain<1>(ksz)) = kernel; +#if VSIP_IMPL_HAVE_IPP + // use IPP only if decimation is a factor of input size. + if (this->output_size_ * decimation == this->input_size_) + { + // IPP doesn't want it reversed. + this->kernel_(vsip::Domain<1>(kernel.size())) = kernel; + if (symV != vsip::nonsym) + this->kernel_(vsip::Domain<1>( + this->kernel_.size() - 1, -1, kernel.size())) = kernel; + } + else +#endif + { + // mirror the kernel + unsigned const ksz = kernel.size(); + this->kernel_(vsip::Domain<1>(this->order_, -1, ksz)) = kernel; + // and maybe unmirror a copy, too + if (symV != vsip::nonsym) + this->kernel_(vsip::Domain<1>(ksz)) = kernel; + } + } + + // FIXME: spec says this should be nothrow, but it has to allocate + Fir(Fir const& fir) VSIP_THROW((std::bad_alloc)) + : input_size_(fir.input_size_) + , order_(fir.order_) + , decimation_(fir.decimation_) + , skip_(fir.skip_) + , op_calls_(0) + , kernel_(fir.kernel_) + , state_(fir.state_(vsip::Domain<1>(fir.state_.size()))) // actual copy + , state_saved_(fir.state_saved_) +#if VSIP_IMPL_HAVE_IPP + , temp_in_(this->input_size_) // allocate + , temp_out_(this->input_size_) // allocate +#endif + { } + + Fir& operator=(Fir const& fir) VSIP_NOTHROW + { + assert(this->input_size_ == fir.input_size_); + assert(this->order_ == fir.order_); + assert(this->decimation_ = fir.decimation_); + this->skip_ = fir.skip_; + this->op_calls_ = 0; + this->kernel_ = fir.kernel_; + this->state_ = fir.state_; + this->state_saved_ = fir.state_saved_; } + ~Fir() VSIP_NOTHROW {} + vsip::length_type kernel_size() const VSIP_NOTHROW { return this->order_ + 1; } vsip::length_type filter_order() const VSIP_NOTHROW @@ -119,6 +219,7 @@ public: vsip::Vector out) VSIP_NOTHROW { + ++ this->op_calls_; assert(data.size() == this->input_size_); assert(out.size() == this->output_size_); @@ -131,52 +232,78 @@ public: const len_type saved = this->state_saved_; len_type oix = 0; len_type i = 0; - for (; i < m - skip; ++oix, i += dec) - { - // Conceptually this comes second, but it's more convenient - // to put it here. So, read the second statement below first. - T sum = vsip::dot( - this->kernel_(d_type(m - skip - i, 1, i + skip + 1)), - data(d_type(i + skip + 1))); - - if (useOldState == vsip::state_save && i < saved) - { - sum += vsip::dot( - this->kernel_(d_type(saved - i)), - this->state_(d_type(i, 1, saved - i))); - } - out.put(oix, sum); - } - - const len_type isz = this->input_size_; - len_type start = i - (m - skip); - for ( ; start + m < isz; ++oix, start += dec) +#if VSIP_IMPL_HAVE_IPP + + // use IPP only if decimation is a factor of input size. + if (this->input_size_ == this->output_size_ * dec) { - T sum = vsip::dot(this->kernel_, data(d_type(start, 1, m + 1))); - out.put(oix, sum); + typedef impl::Layout<1,vsip::tuple<0,1,2>, + impl::Stride_unit,impl::Cmplx_inter_fmt> layout_type; + impl::Ext_data raw_in( + data.block(), impl::SYNC_IN, + impl::Ext_data(this->temp_in_.block()).data()); + impl::Ext_data raw_out( + out.block(), impl::SYNC_OUT, + impl::Ext_data(this->temp_out_.block()).data()); + impl::Ext_data raw_kernel(this->kernel_.block()); + impl::Ext_data raw_state(this->state_.block()); + oix = (this->input_size_ - skip + dec - 1) / dec; + + impl::Ipp_fir_driver::run_fir(raw_in.data(), raw_out.data(), oix, + raw_kernel.data(), m + 1, raw_state.data(), &this->state_saved_, dec); + + if (useOldState != state_save) + this->reset(); } + else + +#endif - if (useOldState == state_save) { - this->skip_ = start + m - isz; - const len_type new_save = isz - start; - this->state_saved_ = new_save; - this->state_(d_type(new_save)) = data(d_type(start, 1, new_save)); + for (; i < m - skip; ++oix, i += dec) + { + // Conceptually this comes second, but it's more convenient + // to put it here. So, read the second statement below first. + T sum = vsip::dot( + this->kernel_(d_type(m - skip - i, 1, i + skip + 1)), + data(d_type(i + skip + 1))); + + if (useOldState == vsip::state_save && i < saved) + { + sum += vsip::dot( + this->kernel_(d_type(saved - i)), + this->state_(d_type(i, 1, saved - i))); + } + + out.put(oix, sum); + } + + const len_type isz = this->input_size_; + len_type start = i - (m - skip); + for ( ; start + m < isz; ++oix, start += dec) + { + T sum = vsip::dot(this->kernel_, data(d_type(start, 1, m + 1))); + out.put(oix, sum); + } + + if (useOldState == state_save) + { + this->skip_ = start + m - isz; + const len_type new_save = isz - start; + this->state_saved_ = new_save; + this->state_(d_type(new_save)) = data(d_type(start, 1, new_save)); + } } - ++ this->op_calls_; return oix; } void reset() VSIP_NOTHROW { this->state_saved_ = this->skip_ = 0; } - ~Fir() - { } - public: - float impl_performance(char* what) const + float impl_performance(char* what) const VSIP_NOTHROW { if (!strcmp(what, "mflops")) { @@ -199,11 +326,14 @@ private: vsip::length_type order_; // M in the spec vsip::length_type decimation_; vsip::length_type skip_; // how much of next input to skip - vsip::length_type state_saved_; // number of elements saved unsigned long op_calls_; - vsip::Vector::block_type> state_; vsip::Vector::block_type> kernel_; - + vsip::Vector::block_type> state_; + vsip::length_type state_saved_; // number of elements saved +#if VSIP_IMPL_HAVE_IPP + vsip::Vector::block_type> temp_in_; + vsip::Vector::block_type> temp_out_; +#endif impl::profile::Acc_timer timer_; }; Index: tests/fir.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fir.cpp,v retrieving revision 1.2 retrieving revision 1.3 diff -u -p -r1.2 -r1.3 --- tests/fir.cpp 10 Oct 2005 06:33:40 -0000 1.2 +++ tests/fir.cpp 13 Oct 2005 10:23:34 -0000 1.3 @@ -28,6 +28,39 @@ Definitions ***********************************************************************/ + +template +double +error_db( + vsip::const_Vector v1, + vsip::const_Vector v2) +{ + double refmax = 0.0; + double maxsum = -250; + double sum; + + vsip::Index<1> idx; + + refmax = vsip::maxval(vsip::magsq(v1), idx); + + for (vsip::index_type i=0; i maxsum) + maxsum = sum; + } + + return maxsum; +} + + + template void test_fir( @@ -70,7 +103,7 @@ test_fir( assert(fir2.symmetry == sym); assert(fir1a.continuous_filter == vsip::state_save); assert(fir2.continuous_filter == vsip::state_no_save); - + const vsip::length_type order = (sym == vsip::nonsym) ? M : (sym == vsip::sym_even_len_even) ? 2 * M : (2 * M) - 1; assert(fir1a.kernel_size() == order); @@ -90,6 +123,8 @@ test_fir( output1(vsip::Domain<1>(got, 1, (N + D - 1) / D))); } + // vsip::Vector o1(output1.size(), T(0)); + // o1 = convout(vsip::Domain<1>(output1.size())) - output1; vsip::length_type got1b = 0; vsip::length_type got2 = 0; @@ -104,11 +139,26 @@ test_fir( vsip::Vector reference(convout(vsip::Domain<1>(got))); vsip::Vector result(output1(vsip::Domain<1>(got))); + assert(outsize - got <= 1); - assert(vsip::alltrue(result == reference)); + if (got > 256) + { + double error = error_db(result, reference); + assert(error < -100); + } + else + assert(view_equal(result, reference)); + assert(got1b == got2); - assert(vsip::alltrue( - output2(vsip::Domain<1>(got1b)) == output3(vsip::Domain<1>(got1b)))); + if (got > 256) + { + double error = error_db(output2(vsip::Domain<1>(got1b)), + output3(vsip::Domain<1>(got1b))); + assert(error < -100); + } + else + assert(view_equal(output2(vsip::Domain<1>(got1b)), + output3(vsip::Domain<1>(got1b)))); } int @@ -116,22 +166,18 @@ main() { vsip::vsipl init; - test_fir(1,2,3); - - test_fir(3,23,31); - test_fir(1,3,5); test_fir(1,3,9); test_fir(1,4,8); test_fir(1,23,31); - test_fir(1,32,256); + test_fir(1,32,1024); test_fir(2,3,5); test_fir(2,3,9); test_fir(2,4,8); test_fir(2,23,31); - test_fir(2,32,256); + test_fir(2,32,1024); test_fir(2,3,5); test_fir(2,3,9); @@ -143,7 +189,7 @@ main() test_fir,vsip::nonsym>(2,3,9); test_fir,vsip::nonsym>(2,4,8); test_fir,vsip::nonsym>(2,23,31); - test_fir,vsip::nonsym>(2,32,256); + test_fir,vsip::nonsym>(2,32,1024); test_fir,vsip::nonsym>(2,3,5); test_fir,vsip::nonsym>(2,3,9); @@ -155,13 +201,13 @@ main() test_fir(3,4,21); test_fir(3,9,27); test_fir(3,23,31); - test_fir(3,32,256); + test_fir(3,32,1024); test_fir(4,5,13); test_fir(4,7,31); test_fir(4,8,32); test_fir(4,23,31); - test_fir(4,32,256); + test_fir(4,32,1024); test_fir(1,1,3); test_fir(1,2,3); @@ -169,52 +215,38 @@ main() test_fir(1,3,9); test_fir(1,4,8); test_fir(1,23,57); -#if 0 - // FIXME: this exposes a bug in vsip::Convolution. - test_fir(1,32,256); -#endif + test_fir(1,32,1024); test_fir(2,2,3); test_fir(2,3,5); test_fir(2,3,9); test_fir(2,4,8); test_fir(2,23,57); -#if 0 - // FIXME: likewise - test_fir(2,32,256); -#endif + test_fir(2,32,1024); test_fir(3,3,5); test_fir(3,4,8); test_fir(3,23,57); -#if 0 - test_fir(3,32,256); -#endif + test_fir(3,32,1024); test_fir(1,2,3); test_fir(1,3,5); test_fir(1,3,9); test_fir(1,4,9); test_fir(1,23,57); -#if 0 - test_fir(1,32,256); -#endif + test_fir(1,32,1024); test_fir(2,2,3); test_fir(2,3,5); test_fir(2,3,9); test_fir(2,4,10); test_fir(2,23,57); -#if 0 - test_fir(2,32,256); -#endif + test_fir(2,32,1024); test_fir(3,3,5); test_fir(3,4,9); test_fir(3,23,55); -#if 0 - test_fir(3,32,256); -#endif + test_fir(3,32,1024); return 0; } From don at codesourcery.com Thu Oct 13 18:13:28 2005 From: don at codesourcery.com (Don McCoy) Date: Thu, 13 Oct 2005 12:13:28 -0600 Subject: [patch] SAL library integration Message-ID: <434EA3C8.6000508@codesourcery.com> Please see attached. Testing is currently being done with the C-SAL library rather than a cross-compiled version for use on actual Mercury hardware. C-SAL comes with a pre-built 32-bit library only. I rebuilt it from source to use on a 64-bit machine (see /home/don/mercury/csal). Regards, -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sal.changes URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sal.diff Type: text/x-patch Size: 40709 bytes Desc: not available URL: From jules at codesourcery.com Thu Oct 13 19:45:50 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 13 Oct 2005 15:45:50 -0400 Subject: [vsipl++] [PATCH] Use IPP for Fir<> In-Reply-To: <20051013104441.GA5326@codesourcery.com> References: <20051013104441.GA5326@codesourcery.com> Message-ID: <434EB96E.30607@codesourcery.com> Nathan (Jasper) Myers wrote: > I have checked in the patch below. > > It makes vsip::Fir<> use IPP's FIR support where possible. In practice, > that means whenever block size and decimation are not relatively prime. > (IPP produces bad output when they are. The IPP API seems to make it > impossible, so it amounts to an IPP documentation bug.) Fir<> uses > the native C++ implementation for such cases. They are probably rare > in real programs. What happens when using a type not support by IPP (such as long double)? Does the generic code get used? Also, what happens when instantiating a FIR for a type we don't support (such as int)? > > The spec says the copy constructor Fir(Fir const&) is supposed to > be VSIP_NOTHROW, but it seems to me that to implement it safely, it > needs to do allocation. I declared it VSIP_THROW((std::bad_alloc)). Please file an issue for this in the tracker. -- Jules From jules at codesourcery.com Thu Oct 13 19:58:37 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 13 Oct 2005 15:58:37 -0400 Subject: [vsipl++] [patch] SAL library integration In-Reply-To: <434EA3C8.6000508@codesourcery.com> References: <434EA3C8.6000508@codesourcery.com> Message-ID: <434EBC6D.1080202@codesourcery.com> Don, This looks good, please check it in. -- Jules Don McCoy wrote: > Please see attached. Testing is currently being done with the C-SAL > library rather than a cross-compiled version for use on actual Mercury > hardware. C-SAL comes with a pre-built 32-bit library only. I rebuilt > it from source to use on a 64-bit machine (see /home/don/mercury/csal). > > Regards, > From ncm at codesourcery.com Thu Oct 13 23:18:07 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Thu, 13 Oct 2005 16:18:07 -0700 Subject: [vsipl++] [PATCH] Use IPP for Fir<> In-Reply-To: <434EB96E.30607@codesourcery.com> References: <20051013104441.GA5326@codesourcery.com> <434EB96E.30607@codesourcery.com> Message-ID: <20051013231807.GC14408@codesourcery.com> On Thu, Oct 13, 2005 at 03:45:50PM -0400, Jules Bergmann wrote: > Nathan (Jasper) Myers wrote: > >I have checked in the patch below. > > > >It makes vsip::Fir<> use IPP's FIR support where possible. In practice, > >that means whenever block size and decimation are not relatively prime. > >(IPP produces bad output when they are. The IPP API seems to make it > >impossible, so it amounts to an IPP documentation bug.) Fir<> uses > >the native C++ implementation for such cases. They are probably rare > >in real programs. > > What happens when using a type not support by IPP (such as long double)? > Does the generic code get used? Also, what happens when instantiating > a FIR for a type we don't support (such as int)? For the existing code, it will get compile-time error saying there is no definition for (e.g.) vsip::impl::Ipp_fir_driver. It is a minor effort to make it do something more friendly. > >The spec says the copy constructor Fir(Fir const&) is supposed to > >be VSIP_NOTHROW, but it seems to me that to implement it safely, it > >needs to do allocation. I declared it VSIP_THROW((std::bad_alloc)). > > Please file an issue for this in the tracker. Will do. Nathan Myers ncm From stefan at codesourcery.com Fri Oct 14 14:52:16 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Fri, 14 Oct 2005 10:52:16 -0400 Subject: IPP and split complex block layout Message-ID: <434FC620.2090605@codesourcery.com> Trying out Don's elementwise.cpp tests I now get (expected) errors with IPP, since its elementwise functions assume interleaved format, i.e. complex *, instead of std::pair. The attached patch adds a ct-check to make the evaluator fail on complex split blocks. Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: ipp.hpp.diff Type: text/x-patch Size: 948 bytes Desc: not available URL: From stefan at codesourcery.com Mon Oct 17 15:05:49 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 17 Oct 2005 11:05:49 -0400 Subject: [vsipl++] status report - 2005-10-17 Message-ID: <4353BDCD.5010500@codesourcery.com> 10th October - 14th October --------------------------- VSIPL++: * Code review (notably the sarsim application) and documentation. * Fixed IPP dispatch to explicitely require interleave format for complex blocks. * Started setting up a test report to have a good status indicator across all supported platforms / configurations. QMTest: * Work on tutorial. Started to add features that cover more middle-ground between the 'basic' and 'sophisticated' use cases. This week --------- VSIPL++: * More of the above. QMTest: * Likewise. From mark at codesourcery.com Mon Oct 17 15:11:44 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 17 Oct 2005 08:11:44 -0700 Subject: [vsipl++] status report - 2005-10-17 In-Reply-To: <4353BDCD.5010500@codesourcery.com> References: <4353BDCD.5010500@codesourcery.com> Message-ID: <4353BF30.6060703@codesourcery.com> Stefan Seefeld wrote: > 10th October - 14th October > --------------------------- Stefan -- This is a public list. Internal status reports need to go to all at codesourcery.com. -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From jules at codesourcery.com Fri Oct 21 19:21:11 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 21 Oct 2005 15:21:11 -0400 Subject: [vsipl++] [PATCH] Use IPP for Fir<> In-Reply-To: <20051013104441.GA5326@codesourcery.com> References: <20051013104441.GA5326@codesourcery.com> Message-ID: <43593FA7.5070507@codesourcery.com> Nathan, Just need to make the final push here so we can check this off as done. Can you: - Encapsulate the use of IPP so that user programs don't see the ipps.h header. This could go into ipp.hpp/ipp.cpp. - Have the IPP version defer to the generic version for types not supported by IPP (such as long-double and possibly int). - Get the benchmark checked in. - Fix the assertions to handle unsigned wrap-around when the input view is size 0. - Add tracker issue for the copy constructor and NOTHROW. thanks, -- Jules Nathan (Jasper) Myers wrote: > I have checked in the patch below. > > It makes vsip::Fir<> use IPP's FIR support where possible. In practice, > that means whenever block size and decimation are not relatively prime. > (IPP produces bad output when they are. The IPP API seems to make it > impossible, so it amounts to an IPP documentation bug.) Fir<> uses > the native C++ implementation for such cases. They are probably rare > in real programs. > > The spec says the copy constructor Fir(Fir const&) is supposed to > be VSIP_NOTHROW, but it seems to me that to implement it safely, it > needs to do allocation. I declared it VSIP_THROW((std::bad_alloc)). > > The no-macro method used here to adapt to IPP's version of overloading > is similar to that in fft-core.hpp, and seems practical for general > use. > > Nathan Myers > ncm From ncm at codesourcery.com Mon Oct 24 13:30:52 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Mon, 24 Oct 2005 06:30:52 -0700 Subject: [PATCH] IPP, other minor changes Message-ID: <20051024133052.GA1523@codesourcery.com> I have checked in the patch below. Mostly minor changes, but fixes four real bugs in Fir<>, and adds new benchmarks/fir.cpp. Now tests/ref-impl/signal-fir.cpp passes on sethra with IPP. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.295 diff -u -p -r1.295 ChangeLog --- ChangeLog 14 Oct 2005 16:00:47 -0000 1.295 +++ ChangeLog 24 Oct 2005 13:23:53 -0000 @@ -1,3 +1,14 @@ +2005-10-24 Nathan Myers + + * configure.ac: fix help for "--enable-profile-timer". + * src/vsip/impl/sal.cpp: #if out if SAL not configured. + * src/vsip/impl/signal-fir.hpp: robustify assertions; make copy-ctor + copy output size, fix overload ambiguity copying state_ member; + make op= return *this; make reset() clear state more thoroughly. + * tests/fir.cpp: test copy ctor more thoroughly. + * benchmarks/fir.cpp: new. + * benchmarks/loop.hpp: quiet printf-format warnings. + 2005-10-14 Stefan Seefeld * src/vsip/impl/ipp.hpp: Explicitely test for Cmplx_inter_fmt as IPP Index: configure.ac =================================================================== RCS file: /home/cvs/Repository/vpp/configure.ac,v retrieving revision 1.44 diff -u -p -r1.44 configure.ac --- configure.ac 14 Oct 2005 14:07:45 -0000 1.44 +++ configure.ac 24 Oct 2005 13:23:53 -0000 @@ -113,7 +113,7 @@ AC_ARG_WITH(mkl_prefix, AC_ARG_ENABLE([profile_timer], - AS_HELP_STRING([--profile-timer=type], + AS_HELP_STRING([--enable-profile-timer=type], [set profile timer type. Choices include none, posix, realtime, pentiumtsc, x86_64_tsc]),, [enable_profile_timer=none]) Index: benchmarks/fir.cpp =================================================================== RCS file: benchmarks/fir.cpp diff -N benchmarks/fir.cpp --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ benchmarks/fir.cpp 24 Oct 2005 13:23:53 -0000 @@ -0,0 +1,130 @@ +/* Copyright (c) 2005 by CodeSourcery, LLC. All rights reserved. */ + +/** @file benchmarks/fir.cpp + @author Jules Bergmann, Nathan Myers + @date 2005-08-13 + @brief VSIPL++ Library: Benchmark for FIR filter. + +*/ + +/*********************************************************************** + Included Files +***********************************************************************/ + +#include + +#include +#include +#include +#include + +#include + +#include "test.hpp" +#include "loop.hpp" +#include "ops_info.hpp" + +using namespace vsip; + +template +struct t_fir1 +{ + + char* what() { return "t_fir1"; } + + float ops_per_point(length_type size) + { + float ops = (coeff_size_ * size / Dec) * + (Ops_info::mul + Ops_info::add); + + return ops / size; + } + + int riob_per_point(length_type) + { return 2 * this->coeff_size_ * sizeof(T); } + + int wiob_per_point(length_type) + { return this->coeff_size_ * sizeof(T); } + + void operator()(length_type size, length_type loop, float& time) + { + typedef Fir fir_type; + + Vector coeff(coeff_size_, T()); + coeff(0) = T(1); + coeff(1) = T(2); + + fir_type fir(coeff, size, Dec); + + Vector in (size, T()); + Vector out(fir.output_size()); + + vsip::impl::profile::Timer t1; + + t1.start(); + for (index_type l=0; l value types. +// +// Non-symmetric, continuous, where kernel size and decimation +// are parameters and input size is swept. +// Float and complex value types. + +int +test(Loop1P& loop, int what) +{ + typedef std::complex CX; + switch (what) + { + case 1: loop(t_fir1(loop.user_param_)); break; + case 2: loop(t_fir1(loop.user_param_)); break; + case 3: loop(t_fir1(loop.user_param_)); break; + case 4: loop(t_fir1(loop.user_param_)); break; + case 5: loop(t_fir1(loop.user_param_)); break; + + case 6: loop(t_fir1(loop.user_param_)); break; + case 7: loop(t_fir1(loop.user_param_)); break; + case 8: loop(t_fir1(loop.user_param_)); break; + case 9: loop(t_fir1(loop.user_param_)); break; + case 10: loop(t_fir1(loop.user_param_)); break; + + case 11: loop(t_fir1(loop.user_param_)); break; + case 12: loop(t_fir1(loop.user_param_)); break; + case 13: loop(t_fir1(loop.user_param_)); break; + case 14: loop(t_fir1(loop.user_param_)); break; + case 15: loop(t_fir1(loop.user_param_)); break; + + case 16: loop(t_fir1(loop.user_param_)); break; + case 17: loop(t_fir1(loop.user_param_)); break; + case 18: loop(t_fir1(loop.user_param_)); break; + case 19: loop(t_fir1(loop.user_param_)); break; + case 20: loop(t_fir1(loop.user_param_)); break; + + default: return 0; + } + return 1; +} Index: benchmarks/loop.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/benchmarks/loop.hpp,v retrieving revision 1.4 diff -u -p -r1.4 loop.hpp --- benchmarks/loop.hpp 7 Sep 2005 12:19:30 -0000 1.4 +++ benchmarks/loop.hpp 24 Oct 2005 13:23:53 -0000 @@ -152,7 +152,7 @@ Loop1P::operator()( "*unknown*"); if (this->note_) printf("# note: %s\n", this->note_); - printf("# start_loop : %d\n", loop); + printf("# start_loop : %lu\n", (unsigned long) loop); if (this->do_prof_) vsip::impl::profile::prof->set_mode(vsip::impl::profile::pm_accum); @@ -167,25 +167,26 @@ Loop1P::operator()( if (this->do_prof_) { - sprintf(filename, "vprof.%d.out", M); + sprintf(filename, "vprof.%lu.out", (unsigned long) M); vsip::impl::profile::prof->dump(filename); } std::sort(mtime.begin(), mtime.end()); if (this->metric_ == all_per_sec) - printf("%7d %f %f %f\n", M, + printf("%7ld %f %f %f\n", (unsigned long) M, this->metric(fcn, M, loop, mtime[(n_time-1)/2], pts_per_sec), this->metric(fcn, M, loop, mtime[(n_time-1)/2], ops_per_sec), this->metric(fcn, M, loop, mtime[(n_time-1)/2], iob_per_sec)); else if (n_time > 1) // Note: max time is min op/s, and min time is max op/s - printf("%7d %f %f %f\n", M, + printf("%7lu %f %f %f\n", (unsigned long) M, this->metric(fcn, M, loop, mtime[(n_time-1)/2], metric_), this->metric(fcn, M, loop, mtime[n_time-1], metric_), this->metric(fcn, M, loop, mtime[0], metric_)); else - printf("%7d %f\n", M, this->metric(fcn, M, loop, mtime[0], metric_)); + printf("%7lu %f\n", (unsigned long) M, + this->metric(fcn, M, loop, mtime[0], metric_)); time = mtime[(n_time-1)/2]; Index: src/vsip/impl/sal.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/sal.cpp,v retrieving revision 1.1 diff -u -p -r1.1 sal.cpp --- src/vsip/impl/sal.cpp 14 Oct 2005 14:07:45 -0000 1.1 +++ src/vsip/impl/sal.cpp 24 Oct 2005 13:23:53 -0000 @@ -7,6 +7,8 @@ Mercury SAL. */ +#if defined(VSIP_IMPL_HAVE_SAL) + /*********************************************************************** Included Files ***********************************************************************/ @@ -313,3 +315,4 @@ void vdiv(std::pair co } // namespace vsip::impl } // namespace vsip +#endif Index: src/vsip/impl/signal-fir.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fir.hpp,v retrieving revision 1.3 diff -u -p -r1.3 signal-fir.hpp --- src/vsip/impl/signal-fir.hpp 13 Oct 2005 10:23:34 -0000 1.3 +++ src/vsip/impl/signal-fir.hpp 24 Oct 2005 13:23:53 -0000 @@ -137,7 +137,9 @@ public: #endif { assert(input_size > 0); - assert(this->order_ + 1 > 1); // counter unsigned wraparound + assert(kernel.size() > 0); + // spec says a nonsym kernel size has to be >1, but symmetric can be ==1: + assert(this->kernel_.size() > (symV == vsip::nonsym)); assert(decimation > 0); assert(this->order_ + 1 > decimation); // M >= decimation assert(input_size >= this->order_); // input_size >= M @@ -170,13 +172,14 @@ public: // FIXME: spec says this should be nothrow, but it has to allocate Fir(Fir const& fir) VSIP_THROW((std::bad_alloc)) : input_size_(fir.input_size_) + , output_size_(fir.output_size_) , order_(fir.order_) , decimation_(fir.decimation_) , skip_(fir.skip_) , op_calls_(0) , kernel_(fir.kernel_) - , state_(fir.state_(vsip::Domain<1>(fir.state_.size()))) // actual copy - , state_saved_(fir.state_saved_) + , state_(fir.state_.get(vsip::Domain<1>(fir.state_.size()))) // actual copy + , state_saved_(fir.state_saved_) #if VSIP_IMPL_HAVE_IPP , temp_in_(this->input_size_) // allocate , temp_out_(this->input_size_) // allocate @@ -193,6 +196,7 @@ public: this->kernel_ = fir.kernel_; this->state_ = fir.state_; this->state_saved_ = fir.state_saved_; + return *this; } ~Fir() VSIP_NOTHROW {} @@ -299,7 +303,8 @@ public: } void reset() VSIP_NOTHROW - { this->state_saved_ = this->skip_ = 0; } + { this->state_saved_ = this->skip_ = 0; + this->state_ = T(0.0); } public: Index: tests/fir.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fir.cpp,v retrieving revision 1.3 diff -u -p -r1.3 fir.cpp --- tests/fir.cpp 13 Oct 2005 10:23:34 -0000 1.3 +++ tests/fir.cpp 24 Oct 2005 13:23:53 -0000 @@ -96,7 +96,7 @@ test_fir( vsip::const_Vector<>(vsip::length_type(3),vsip::scalar_f(1)), N*10); assert(dummy.decimation() == 1); vsip::Fir fir1a(kernel, N, D); - vsip::Fir fir1b(kernel, N, D); + vsip::Fir fir1b(fir1a); vsip::Fir fir2(kernel, N, D); assert(fir1a.symmetry == sym); @@ -107,13 +107,18 @@ test_fir( const vsip::length_type order = (sym == vsip::nonsym) ? M : (sym == vsip::sym_even_len_even) ? 2 * M : (2 * M) - 1; assert(fir1a.kernel_size() == order); + assert(fir1b.kernel_size() == order); assert(fir1a.filter_order() == order); + assert(fir1b.filter_order() == order); // assert(fir1a.symmetry() assert(fir1a.input_size() == N); + assert(fir1b.input_size() == N); assert(fir1a.output_size() == (N+D-1)/D); + assert(fir1b.output_size() == (N+D-1)/D); assert(fir1a.continuous_filtering() == fir1a.continuous_filter); assert(fir2.continuous_filtering() == fir2.continuous_filter); assert(fir1a.decimation() == D); + assert(fir1b.decimation() == D); vsip::length_type got = 0; for (vsip::length_type i = 0; i < 2 * M; ++i) // chained From ncm at codesourcery.com Wed Oct 26 00:05:27 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Tue, 25 Oct 2005 17:05:27 -0700 Subject: [PATCH] Fir<> IPP cleanup Message-ID: <20051026000527.GF13447@codesourcery.com> I have checked in the patch below. Fir<> now uses IPP for types and modes it supports, and native C++ code otherwise. Before, if IPP was turned on it would only support types IPP supports -- e.g., not long double, or int. It also avoids exposing user code to Intel-header definitions. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.296 diff -u -p -r1.296 ChangeLog --- ChangeLog 24 Oct 2005 13:25:30 -0000 1.296 +++ ChangeLog 25 Oct 2005 23:50:18 -0000 @@ -1,3 +1,10 @@ +2005-10-25 Nathan Myers + + * src/vsip/impl/ipp.cpp, src/vsip/impl/signal-fir.hpp: + Use native C++ FIR code for all types and modes not supported + by IPP FIR. Confine Intel ipp*.h includes to ipp.cpp where + users' code will not be exposed to them. + 2005-10-24 Nathan Myers * configure.ac: fix help for "--enable-profile-timer". Index: src/vsip/impl/ipp.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/ipp.cpp,v retrieving revision 1.5 diff -u -p -r1.5 ipp.cpp --- src/vsip/impl/ipp.cpp 21 Sep 2005 09:38:59 -0000 1.5 +++ src/vsip/impl/ipp.cpp 25 Oct 2005 23:50:18 -0000 @@ -10,7 +10,13 @@ Included Files ***********************************************************************/ -#include "ipp.hpp" +#include + +#if defined(VSIP_IMPL_HAVE_IPP) + +#include +#include +#include #include /*********************************************************************** @@ -192,7 +198,74 @@ void conv(double* coeff, length_type coe ippsConv_64f(coeff, coeff_size, in, in_size, out); } +// +// FIR support +// + +template +< + typename T, typename IppT, + IppStatus (&ippFirF)(IppT const*,IppT*,int,IppT const*,int,IppT*,int*), + IppStatus (&ippFirDecF)( + IppT const*,IppT*,int,IppT const*,int,int,int,int,int,IppT*) +> +struct Ipp_fir_base +{ + typedef Ipp_fir_base base_type; + + inline static void + run( + T const* xin, T* xout, vsip::length_type outsize, + T const* xkernel, vsip::length_type ksize, + T* xstate, vsip::length_type* xstate_ix, vsip::length_type dec) + { + IppT const* const in = reinterpret_cast(xin); + IppT* const out = reinterpret_cast(xout); + IppT const* const kernel = reinterpret_cast(xkernel); + IppT* const state = reinterpret_cast(xstate); + int state_ix = *xstate_ix; + IppStatus stat = (dec == 1) ? + ippFirF(in, out, outsize, kernel, ksize, state, &state_ix) : + ippFirDecF(in, out, outsize, kernel, ksize, 1, 0, dec, 0, state); + assert(stat == ippStsNoErr); + *xstate_ix = state_ix; + } +}; + +template struct Ipp_fir; + +template<> struct Ipp_fir : Ipp_fir_base< + float,Ipp32f,ippsFIR_Direct_32f,ippsFIRMR_Direct_32f> { }; + +template<> struct Ipp_fir : Ipp_fir_base< + double,Ipp64f,ippsFIR_Direct_64f,ippsFIRMR_Direct_64f> {}; + +template<> struct Ipp_fir > : Ipp_fir_base< + std::complex,Ipp32fc,ippsFIR_Direct_32fc,ippsFIRMR_Direct_32fc> {}; + +template<> struct Ipp_fir > : Ipp_fir_base< + std::complex,Ipp64fc,ippsFIR_Direct_64fc,ippsFIRMR_Direct_64fc> {}; + +template +void +Ipp_fir_driver::run_fir( + T const* xin, T* xout, vsip::length_type outsize, + T const* xkernel, vsip::length_type ksize, + T* xstate, vsip::length_type* xstate_ix, vsip::length_type dec) +{ + Ipp_fir::run( + xin, xout, outsize, xkernel, ksize, xstate, xstate_ix, dec); +} + +// instantiate the specialized IPP FIR drivers here, along with what they use. + +template struct Ipp_fir_driver; +template struct Ipp_fir_driver; +template struct Ipp_fir_driver >; +template struct Ipp_fir_driver >; + } // namespace vsip::impl::ipp } // namespace vsip::impl } // namespace vsip +#endif /* VSIP_IMPL_HAVE_IPP */ Index: src/vsip/impl/signal-fir.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fir.hpp,v retrieving revision 1.4 diff -u -p -r1.4 signal-fir.hpp --- src/vsip/impl/signal-fir.hpp 24 Oct 2005 13:25:30 -0000 1.4 +++ src/vsip/impl/signal-fir.hpp 25 Oct 2005 23:50:18 -0000 @@ -19,13 +19,6 @@ #include #include -#if VSIP_IMPL_HAVE_IPP -#include -#include -#endif - -#include - namespace vsip { @@ -48,49 +41,54 @@ struct Fir_aligned block_type; }; -#if VSIP_IMPL_HAVE_IPP - -template -< - typename T, typename IppT, - IppStatus (&ippFirF)(IppT const*,IppT*,int,IppT const*,int,IppT*,int*), - IppStatus (&ippFirDecF)( - IppT const*,IppT*,int,IppT const*,int,int,int,int,int,IppT*) -> -struct Ipp_fir_driver_base +template +struct Fir_driver { + static const bool reverse_kernel = true; + static const bool use_native = true; + static const bool mismatch_ok = true; + + // code that calls this should be elided by the optimizer. static void run_fir( - T const* xin, T* xout, vsip::length_type outsize, - T const* xkernel, vsip::length_type ksize, - T* xstate, vsip::length_type* xstate_ix, vsip::length_type dec) - { - IppT const* const in = reinterpret_cast(xin); - IppT* const out = reinterpret_cast(xout); - IppT const* const kernel = reinterpret_cast(xkernel); - IppT* const state = reinterpret_cast(xstate); - int state_ix = *xstate_ix; - IppStatus stat = (dec == 1) ? - ippFirF(in, out, outsize, kernel, ksize, state, &state_ix) : - ippFirDecF(in, out, outsize, kernel, ksize, 1, 0, dec, 0, state); - assert(stat == ippStsNoErr); - *xstate_ix = state_ix; - } + T const* xin, T* xout, vsip::length_type outsize, + T const* xkernel, vsip::length_type ksize, + T* xstate, vsip::length_type* xstate_ix, vsip::length_type dec) + { assert(false); } }; -template struct Ipp_fir_driver; +#if VSIP_IMPL_HAVE_IPP -template < > struct Ipp_fir_driver : Ipp_fir_driver_base< - float,Ipp32f,ippsFIR_Direct_32f,ippsFIRMR_Direct_32f> { }; +namespace ipp +{ + +template +struct Ipp_fir_driver +{ + static const bool reverse_kernel = false; + static const bool use_native = false; + static const bool mismatch_ok = false; + + // same API as in Fir_driver, but implemented in src/vsip/impl/ipp.cpp + static void + run_fir( + T const* xin, T* xout, vsip::length_type outsize, + T const* xkernel, vsip::length_type ksize, + T* xstate, vsip::length_type* xstate_ix, vsip::length_type dec); +}; -template<> struct Ipp_fir_driver : Ipp_fir_driver_base< - double,Ipp64f,ippsFIR_Direct_64f,ippsFIRMR_Direct_64f> {}; +} // namespace vsip::impl::ipp -template<> struct Ipp_fir_driver > : Ipp_fir_driver_base< - std::complex,Ipp32fc,ippsFIR_Direct_32fc,ippsFIRMR_Direct_32fc> {}; +// use IPP specialization for certain T: -template<> struct Ipp_fir_driver > : Ipp_fir_driver_base< - std::complex,Ipp64fc,ippsFIR_Direct_64fc,ippsFIRMR_Direct_64fc> {}; +template<> struct Fir_driver + : ipp::Ipp_fir_driver {}; +template<> struct Fir_driver + : ipp::Ipp_fir_driver {}; +template<> struct Fir_driver > + : ipp::Ipp_fir_driver > {}; +template<> struct Fir_driver > + : ipp::Ipp_fir_driver > {}; #endif // VSIP_IMPL_HAVE_IPP @@ -147,18 +145,10 @@ public: // must be after asserts because of division this->output_size_ = (input_size + decimation - 1) / decimation; -#if VSIP_IMPL_HAVE_IPP // use IPP only if decimation is a factor of input size. - if (this->output_size_ * decimation == this->input_size_) - { - // IPP doesn't want it reversed. - this->kernel_(vsip::Domain<1>(kernel.size())) = kernel; - if (symV != vsip::nonsym) - this->kernel_(vsip::Domain<1>( - this->kernel_.size() - 1, -1, kernel.size())) = kernel; - } - else -#endif + if (impl::Fir_driver::reverse_kernel || + (!impl::Fir_driver::mismatch_ok && + this->output_size_ * decimation != this->input_size_)) { // mirror the kernel unsigned const ksz = kernel.size(); @@ -167,6 +157,14 @@ public: if (symV != vsip::nonsym) this->kernel_(vsip::Domain<1>(ksz)) = kernel; } + else + { + // e.g. IPP doesn't want it reversed. + this->kernel_(vsip::Domain<1>(kernel.size())) = kernel; + if (symV != vsip::nonsym) + this->kernel_(vsip::Domain<1>( + this->kernel_.size() - 1, -1, kernel.size())) = kernel; + } } // FIXME: spec says this should be nothrow, but it has to allocate @@ -240,7 +238,9 @@ public: #if VSIP_IMPL_HAVE_IPP // use IPP only if decimation is a factor of input size. - if (this->input_size_ == this->output_size_ * dec) + if (!impl::Fir_driver::use_native && + (impl::Fir_driver::mismatch_ok || + this->input_size_ == this->output_size_ * dec)) { typedef impl::Layout<1,vsip::tuple<0,1,2>, impl::Stride_unit,impl::Cmplx_inter_fmt> layout_type; @@ -254,7 +254,7 @@ public: impl::Ext_data raw_state(this->state_.block()); oix = (this->input_size_ - skip + dec - 1) / dec; - impl::Ipp_fir_driver::run_fir(raw_in.data(), raw_out.data(), oix, + impl::Fir_driver::run_fir(raw_in.data(), raw_out.data(), oix, raw_kernel.data(), m + 1, raw_state.data(), &this->state_saved_, dec); if (useOldState != state_save) @@ -304,7 +304,7 @@ public: void reset() VSIP_NOTHROW { this->state_saved_ = this->skip_ = 0; - this->state_ = T(0.0); } + this->state_ = T(0); } public: From don at codesourcery.com Wed Oct 26 19:48:17 2005 From: don at codesourcery.com (Don McCoy) Date: Wed, 26 Oct 2005 13:48:17 -0600 Subject: [patch] SAL dispatch for matrix and vector products Message-ID: <435FDD81.9070002@codesourcery.com> I am working on BLAS dispatch as well. Patch to follow. This one just includes SAL. Hopefully I've separated them well. Two issues worth pointing out: 1) The exec() function checks for row-major ordering because it calls the newer SAL functions (mat_mul) that allow the column-stride to be specified. I believe the rows must be unit stride. This is a little less general than the older ones (mulx) which allow non-unit strides. Recently, we heard back from Mercury that the older ones perform better (at this time). Given that the older ones handle non-unit strides and are faster, should we revert to using those? If Mercury changes in the future, then we can follow. 2) Split-complex products (other than vector-vector) are not handled at this time. Just a reminder that we were going to discuss how to address this issue sometime. Regards, -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sd.changes URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sd.diff Type: text/x-patch Size: 19306 bytes Desc: not available URL: From jules at codesourcery.com Thu Oct 27 11:44:38 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 27 Oct 2005 07:44:38 -0400 Subject: [vsipl++] [PATCH] Fir<> IPP cleanup In-Reply-To: <20051026000527.GF13447@codesourcery.com> References: <20051026000527.GF13447@codesourcery.com> Message-ID: <4360BDA6.7020708@codesourcery.com> Nathan, Nathan (Jasper) Myers wrote: > I have checked in the patch below. > > Fir<> now uses IPP for types and modes it supports, and native C++ > code otherwise. Before, if IPP was turned on it would only support > types IPP supports -- e.g., not long double, or int. It also > avoids exposing user code to Intel-header definitions. > > Nathan Myers > ncm > > > -#include "ipp.hpp" > +#include > + > +#if defined(VSIP_IMPL_HAVE_IPP) This file (ipp.cpp) is only compiled if VSIP_IMPL_HAVE_IPP is defined. Why is it necessary to add this guard? > + > +#include > +#include Why does ipp.cpp need to include these? > +#include > #include > -struct Ipp_fir_driver_base > +template > +struct Fir_driver > { > + static const bool reverse_kernel = true; > + static const bool use_native = true; > + static const bool mismatch_ok = true; Can you document what 'mismatch_ok' means? > } > > // FIXME: spec says this should be nothrow, but it has to allocate Please capture this fixme with an issue and then remove it. From jules at codesourcery.com Thu Oct 27 13:28:59 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 27 Oct 2005 09:28:59 -0400 Subject: [vsipl++] [patch] SAL dispatch for matrix and vector products In-Reply-To: <435FDD81.9070002@codesourcery.com> References: <435FDD81.9070002@codesourcery.com> Message-ID: <4360D61B.6040508@codesourcery.com> Don, Don McCoy wrote: > I am working on BLAS dispatch as well. Patch to follow. This one just > includes SAL. Hopefully I've separated them well. > > Two issues worth pointing out: > > 1) The exec() function checks for row-major ordering because it calls > the newer SAL functions (mat_mul) that allow the column-stride to be > specified. I believe the rows must be unit stride. This is a little > less general than the older ones (mulx) which allow non-unit strides. > Recently, we heard back from Mercury that the older ones perform better > (at this time). Given that the older ones handle non-unit strides and > are faster, should we revert to using those? If Mercury changes in the > future, then we can follow. Yes, we should revert to the old ones for now. Because the old and new functions have different dispatch requirements (for supported strides and mixing of dimension orderings), we should have separate evaluators for each (as opposed to trying to hide the different in sal::mmul). We almost need a Venn diagram to represent the non-overlapping sets of functionality: The old matrix-multiply - required all operands to have the same dimension-ordering - supported non-unit stride in the minor dimension - required the major dimension to be "dense". I.e. the major dimension stride == minor dimenson size * minor dimension stride. - provided a special case for unit-stride minor dimension (so called "fast" versions) The new matrix-multiply - supports mixing of dimension-ordering (via the transpose flags) - requires unit-stride in the minor dimensions - allows major dimensions to be non-dense, via the column stride. > > 2) Split-complex products (other than vector-vector) are not handled > at this time. Just a reminder that we were going to discuss how to > address this issue sometime. We should be able to handle this by: - providing overloads of sal::mmul for std::pair, and - checking that all the matrices have the same complex format in ct_valid. Granted, we wont be able to fully exercise this until we get prod integrated into the expression templates. More comments below on the matrix-matrix evaluator. Some may apply to the matrix-vector and vector-matrix evaluators too. -- Jules > ------------------------------------------------------------------------ > > + > + > + // SAL evaluator for matrix-matrix products. > + > + template + typename Block1, > + typename Block2> > + struct Evaluator, > + Mercury_sal_tag> > + { > + typedef typename Block0::value_type T; You could move these typedefs here so the new ct_valid conditions below fit on a single line: typedef typename Block_layout::order_type order0_type; ... typedef typename Block_layout::complex_type complex0_type; ... > + > + static bool const ct_valid = > + impl::sal::Sal_traits::valid && > + Type_equal::value && > + Type_equal::value && > + // check that direct access is supported > + Ext_data_cost::value == 0 && > + Ext_data_cost::value == 0 && > + Ext_data_cost::value == 0; Assuming that we're going to modify this evaluator to handle the old matrix multiply, the ct_valid should also check that all three matrices have the same dimension ordering. Also check that all the matrices have the same complex format (this applies to both the old and new multiply). > + > + static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b) > + { > + typedef typename Block_layout::order_type order0_type; > + typedef typename Block_layout::order_type order1_type; > + typedef typename Block_layout::order_type order2_type; > + > + Ext_data ext_r(const_cast(r)); > + Ext_data ext_a(const_cast(a)); > + Ext_data