From Wayne.Haney at gd-ais.com Tue Mar 6 14:53:00 2007 From: Wayne.Haney at gd-ais.com (Haney, Wayne W.) Date: Tue, 6 Mar 2007 09:53:00 -0500 Subject: Needing help with VSIPL++ nomenclature... Message-ID: <3D54FD86EBFE0540BDA2048CE1A4F258AA06DA@vaff06-mail01.ad.gd-ais.com> The last few days I've been researching VSIPL & VSIPL++ in order to integrate into our Submarine SONAR code. The VSIPL++ specification is confusing with the definition of what a Domain<> is. Could you please elucidate a bit more on what Domain<>s are? Any information you can give is greatly appreciated. Thank you! Wayne Haney -------------- next part -------------- An HTML attachment was scrubbed... URL: From jules at codesourcery.com Tue Mar 6 16:03:10 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 06 Mar 2007 11:03:10 -0500 Subject: [patch] Test updates Message-ID: <45ED90BE.7030104@codesourcery.com> This patch: - Splits scalar-view.cpp into 4 separate tests. Compiling scalar-view was taking 2 GB of core! - Adds TEST_LEVEL=0 cases for some long running tests. - Disables some double-precision tests when TEST_DOUBLE is not defined. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.diff URL: From don at codesourcery.com Tue Mar 6 16:50:54 2007 From: don at codesourcery.com (Don McCoy) Date: Tue, 06 Mar 2007 09:50:54 -0700 Subject: [vsipl++] Needing help with VSIPL++ nomenclature... In-Reply-To: <3D54FD86EBFE0540BDA2048CE1A4F258AA06DA@vaff06-mail01.ad.gd-ais.com> References: <3D54FD86EBFE0540BDA2048CE1A4F258AA06DA@vaff06-mail01.ad.gd-ais.com> Message-ID: <45ED9BEE.3050908@codesourcery.com> Haney, Wayne W. wrote: > > The last few days I?ve been researching VSIPL & VSIPL++ in order to > integrate into our Submarine SONAR code. The VSIPL++ specification is > confusing with the definition of what a Domain<> is. Could you please > elucidate a bit more on what Domain<>s are? Any information you can > give is greatly appreciated. > Domains provide an efficient way to specify a subset of the elements in a particular view, i.e. a matrix or a vector. In specifying a domain, one gives the starting index, an element-to-element stride and a length. Using domains is one way to extract a "sub-view" from a set of data, for cases where built-in methods are inadequate. Normally, subviews are expressed through various, more convenient, view member functions which depend on type of view being considered. For example, complex views provide the real() and imag() subviews that one would expect. These could also be expressed with domains, but one must then be concerned with whether the data is held in split or interleaved forms. Other examples of subviews are the row() and col() operators for matrices, which again may be expressed using domains, but only given foreknowledge of whether the data is stored in row-major or col-major form, etc... In short, domains are an important construct but are not often needed (at least in application programs) due to the high-level syntax provided by the VSIPL++ standard. I'd encourage you to take a look at the tutorial if you haven't already. It contains an in-depth example (fast convolution), which is relevant to a number of signal processing algorithms. The reference section included in part two discusses views, subviews and domains among other things. http://www.codesourcery.com/vsiplplusplus/1.3/tutorial.pdf I personally find it easier to work from things like the tutorial or from functional examples. We do not at this time have as many examples as I would like, but we are adding more all the time. If you have any specific requests, please don't hesitate to contact us again. We'll do everything we can to assist you in your evaluation! Feedback is appreciated as well. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From jules at codesourcery.com Wed Mar 7 01:00:09 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 06 Mar 2007 20:00:09 -0500 Subject: [patch] Split fastconv benchmark Message-ID: <45EE0E99.1070006@codesourcery.com> This patch splits the Cbe benchmark cases into benchmarks/cell/fastconv.cpp, and puts commont bits into benchmarks/fastconv.hpp. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fc.diff URL: From jules at codesourcery.com Wed Mar 7 02:01:29 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 06 Mar 2007 21:01:29 -0500 Subject: [patch] Missing file Message-ID: <45EE1CF9.5090208@codesourcery.com> This file is used by the cell/fastconv benchmark for block creation. It can create a block using either user-provided storage or library provided storage. Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ab.diff URL: From jules at codesourcery.com Wed Mar 7 02:07:52 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 06 Mar 2007 21:07:52 -0500 Subject: [vsipl++] [patch] Missing file In-Reply-To: <45EE1CF9.5090208@codesourcery.com> References: <45EE1CF9.5090208@codesourcery.com> Message-ID: <45EE1E78.5050103@codesourcery.com> This patch removes a debug assert that used a private member function, causing a compile error. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ab2.diff URL: From jules at codesourcery.com Wed Mar 7 16:24:21 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 07 Mar 2007 11:24:21 -0500 Subject: [patch] RBO preview Message-ID: <45EEE735.1000302@codesourcery.com> This patch - adds RBO support, - applies it to by-value FFT and FFTM, - adds a simple RBO evaluator for expressions like 'A = fft(B)' which avoids the temporary and copy, - adds fastconv RBO evaluators for the general case using Fftm underneath, and the special case when using Cbe Fastconv underneath. - adds single-line fastconv case to the fastconv benchmark This patch is fairly close to ready. However there are a few bits missing: - add distributed support for Return_expr_blocks - validation (this does work for the fastconv benchmark) Comments? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rbo.diff URL: From don at codesourcery.com Wed Mar 7 17:38:17 2007 From: don at codesourcery.com (Don McCoy) Date: Wed, 07 Mar 2007 10:38:17 -0700 Subject: [patch] support for non-contiguous rows or columns with Cell FFTM Message-ID: <45EEF889.8090303@codesourcery.com> Ok to commit? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fnc.diff URL: From stefan at codesourcery.com Wed Mar 7 18:32:25 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 07 Mar 2007 13:32:25 -0500 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45EEF889.8090303@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> Message-ID: <45EF0539.7030706@codesourcery.com> Don McCoy wrote: > Index: src/vsip/opt/cbe/ppu/fft.cpp > =================================================================== > --- src/vsip/opt/cbe/ppu/fft.cpp (revision 165069) > +++ src/vsip/opt/cbe/ppu/fft.cpp (working copy) > @@ -306,7 +306,20 @@ > length_type, length_type) > { > } > - > + virtual void query_layout(Rt_layout<2> &rtl_inout) > + { > + // must have unit stride, but does not have to be dense > + rtl_inout.pack = stride_unit; > + rtl_inout.order = tuple<0, 1, 2>(); Since we want unit-stride in the direction in which the FFT is taken, we need to take the axis parameter 'A' into account. So, for example: if (A == 0) rtl_inout.order = tuple<0, 1, 2>(); else rtl_inout.order = tuple<1, 0, 2>(); > + rtl_inout.complex = cmplx_inter_fmt; > + } > + virtual void query_layout(Rt_layout<2> &rtl_in, Rt_layout<2> &rtl_out) > + { > + // must have unit stride, but does not have to be dense > + rtl_in.pack = rtl_out.pack = stride_unit; > + rtl_in.order = rtl_out.order = tuple<0, 1, 2>(); Same here. > + rtl_in.complex = rtl_out.complex = cmplx_inter_fmt; > + } > private: > rtype scale_; > length_type fft_length_; Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From stefan at codesourcery.com Thu Mar 8 21:57:27 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 08 Mar 2007 16:57:27 -0500 Subject: patch: conditionalize support for bool and int C-VSIPL views. Message-ID: <45F086C7.10304@codesourcery.com> The attached patch adds checks for bool and int view creation in the used C-VSIPL library, and conditionalizes appropriate View_traits<>. OK to check in ? Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- A non-text attachment was scrubbed... Name: cvsip.patch Type: text/x-patch Size: 16015 bytes Desc: not available URL: From jules at codesourcery.com Fri Mar 9 16:05:14 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 09 Mar 2007 11:05:14 -0500 Subject: [vsipl++] patch: conditionalize support for bool and int C-VSIPL views. In-Reply-To: <45F086C7.10304@codesourcery.com> References: <45F086C7.10304@codesourcery.com> Message-ID: <45F185BA.4070304@codesourcery.com> Stefan Seefeld wrote: > The attached patch adds checks for bool and int view creation > in the used C-VSIPL library, and conditionalizes appropriate > View_traits<>. > > OK to check in ? Stefan, Yes, please check it in. thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Fri Mar 9 17:43:27 2007 From: don at codesourcery.com (Don McCoy) Date: Fri, 09 Mar 2007 10:43:27 -0700 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45EF0539.7030706@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> Message-ID: <45F19CBF.9030203@codesourcery.com> Stefan Seefeld wrote: > Since we want unit-stride in the direction in which the FFT is taken, > we need to take the axis parameter 'A' into account. > Thanks for catching that Stefan. I've fixed the attached patch and tested it locally. I'm presently adding test cases to the FFT "backend" test (tests/fft_be.cpp), but will include those changes in a separate patch. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fnc2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fnc2.diff URL: From don at codesourcery.com Mon Mar 12 19:47:39 2007 From: don at codesourcery.com (Don McCoy) Date: Mon, 12 Mar 2007 13:47:39 -0600 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F19CBF.9030203@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> Message-ID: <45F5AE5B.1050604@codesourcery.com> Don McCoy wrote: > Thanks for catching that Stefan. I've fixed the attached patch and > tested it locally. I'm presently adding test cases to the FFT > "backend" test (tests/fft_be.cpp), but will include those changes in a > separate patch. And here is that patch. In putting this together I found and fixed two defects in the Cell FFT code. Yay! The tests for FFTM also cover using column-major data when doing column-wise FFT's, including the case where the columns are not dense (tightly packed), as is the case when a subview of a matrix is taken (i.e. every other row, etc...). Note also this patch includes the changes from the previous patch as well. The changes in FFT require that FFT_BE_TESTS be defined in order to run the backend-specific tests. Without it, the test presently fails to compile because the CBE backend does not support 2-D and 3-D FFT's as of yet. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cbe_tests.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cbe_tests.diff URL: From stefan at codesourcery.com Mon Mar 12 20:10:29 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 12 Mar 2007 16:10:29 -0400 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F5AE5B.1050604@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> <45F5AE5B.1050604@codesourcery.com> Message-ID: <45F5B3B5.6090403@codesourcery.com> Don McCoy wrote: > The changes in FFT require that FFT_BE_TESTS be defined in order to run > the backend-specific tests. Without it, the test presently fails to > compile because the CBE backend does not support 2-D and 3-D FFT's as of > yet. Is that because the cbe evaluator doesn't invalidate 2D and 3D operations, or because you didn't configure with --enable-fft=no_fft (or any real fft, for that matter) ? I'm asking because the way fft_be is designed is to only generate runtime-errors, so all those 'little' subtests can be driven by a single executable. Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From stefan at codesourcery.com Mon Mar 12 22:37:26 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 12 Mar 2007 18:37:26 -0400 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F5AE5B.1050604@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> <45F5AE5B.1050604@codesourcery.com> Message-ID: <45F5D626.4080609@codesourcery.com> Don McCoy wrote: > Index: src/vsip/opt/cbe/ppu/fft.cpp > =================================================================== > --- src/vsip/opt/cbe/ppu/fft.cpp (revision 165340) > +++ src/vsip/opt/cbe/ppu/fft.cpp (working copy) > @@ -53,18 +53,16 @@ > fft(std::complex const* in, std::complex* out, > length_type length, T scale, int exponent) > { > - // Note: the twiddle factors require only 1/4 the memory of the input and > - // output arrays. > Fft_params fftp; > fftp.direction = (exponent == -1 ? fwd_fft : inv_fft); > fftp.elements = length; > fftp.scale = scale; > fftp.ea_twiddle_factors = > reinterpret_cast(twiddle_factors_.get()); > - fftp.ea_input_buffer = 0; > - fftp.ea_output_buffer = 0; > - fftp.in_blk_stride = 0; > - fftp.out_blk_stride = 0; > + fftp.ea_input_buffer = reinterpret_cast(in); > + fftp.ea_output_buffer = reinterpret_cast(out); > + fftp.in_blk_stride = 1; // not applicable in the single FFT case > + fftp.out_blk_stride = 1; > > Task_manager *mgr = Task_manager::instance(); > // The stack size is determined by accounting for the *worst case* > @@ -76,11 +74,9 @@ > sizeof(Fft_params), > sizeof(complex)*length*2, Could you please add a comment explaining this factor '2' ? It isn't obvious... > sizeof(complex)*length, > - false); > - Workblock block = task.create_block(); > + true); > + Workblock block = task.create_multi_block(1); > block.set_parameters(fftp); > - block.add_input(in, length); > - block.add_output(out, length); > task.enqueue(block); > task.sync(); > } [...] > Index: tests/fft_be.cpp > =================================================================== > --- tests/fft_be.cpp (revision 165340) > +++ tests/fft_be.cpp (working copy) [...] > @@ -152,24 +166,33 @@ > static Domain out_dom(Domain const &dom) { return dom;} > }; > > -template > +template + typename OrderT> > const_Vector > const> > ramp(Domain<1> const &dom) > { return vsip::ramp(T(0.), T(1.), dom.length() * dom.stride());} > > -template > -Matrix > +template + typename OrderT> > +Matrix > > ramp(Domain<2> const &dom) > { > + typedef OrderT order_type; > + typedef Dense<2, T, order_type> block_type; > length_type rows = dom[0].length() * dom[0].stride(); > length_type cols = dom[1].length() * dom[1].stride(); > - Matrix m(rows, cols); > - for (size_t r = 0; r != rows; ++r) > - m.row(r) = ramp(T(r), T(1.), m.size(1)); > + Matrix m(rows, cols); > + if (impl::Type_equal::value) > + for (size_t r = 0; r != rows; ++r) > + m.row(r) = ramp(T(r), T(1.), m.size(1)); > + else > + for (size_t c = 0; c != cols; ++c) > + m.col(c) = ramp(T(c), T(1.), m.size(0)); > return m; > } While I like the addition of the dimension-ordering parameter, I think the conditional initialization here is a bit misleading: The value of matrix(x, y) should be the same, no matter its dimension-ordering. > -template > +template + typename OrderT> > Tensor > ramp(Domain<3> const &dom) > { [...] > @@ -222,7 +246,7 @@ > typedef typename rfft_type::I I; > static typename impl::View_of_dim >::type > create(Domain const &dom) > - { return ramp(rfft_type::in_dom(dom));} > + { return ramp(rfft_type::in_dom(dom));} > }; I think with the above in place we should go all the way and push the order parameter up to the highest level, so all tests get run twice, once for row-major and once for col-major. That gives maximum coverage. > // Real inverse 2D FFT. > @@ -238,7 +262,7 @@ > length_type rows2 = rows/2+1; > length_type cols2 = cols/2+1; > > - Matrix input = ramp(rfft_type::in_dom(dom)); > + Matrix input = ramp(rfft_type::in_dom(dom)); > if (rfft_type::axis == 0) > { > // Necessary symmetry: > @@ -330,8 +354,8 @@ > typedef impl::Fast_block block_type; > typedef typename impl::View_of_dim::type View; > > - View data = ramp(dom); > - View ref = ramp(dom); > + View data = ramp(dom); > + View ref = ramp(dom); > > typename View::subview_type sub_data = data(dom); > > @@ -357,9 +381,10 @@ > { > typedef typename T::I I; > typedef typename T::O O; > - typedef typename impl::Layout<2, row1_type, > + typedef typename T::order_type order_type; > + typedef typename impl::Layout<2, order_type, > impl::Stride_unit_dense, typename T::i_format> i_layout_type; > - typedef typename impl::Layout<2, row1_type, > + typedef typename impl::Layout<2, order_type, > impl::Stride_unit_dense, typename T::o_format> o_layout_type; > return_mechanism_type const r = by_reference; > > @@ -371,7 +396,7 @@ > Domain<2> in_dom = T::in_dom(dom); > Domain<2> out_dom = T::out_dom(dom); > > - Iview input = input_creator::create(dom); > + Iview input = input_creator::create(dom); > typename Iview::subview_type sub_input = input(in_dom); > > Oview output = empty(out_dom); > @@ -408,8 +433,8 @@ > typedef impl::Fast_block<2, CT, layout_type> block_type; > typedef Matrix View; > > - View data = ramp(dom); > - View ref = ramp(dom); > + View data = ramp(dom); > + View ref = ramp(dom); > > typename View::subview_type sub_data = data(dom); > > @@ -498,6 +523,13 @@ > fft_in_place(Domain<1>(0, 2, 8)); > #endif > > +#if VSIP_IMPL_CBE_SDK > + std::cout << "testing fwd in_place cbe..."; > + fft_in_place(Domain<1>(32)); > + std::cout << "testing inv in_place cbe..."; > + fft_in_place(Domain<1>(32)); > +#endif > + > #if VSIP_IMPL_FFTW3 > std::cout << "testing c->c fwd by_ref fftw..."; > fft_by_ref, fftw>(Domain<1>(16)); > @@ -558,7 +590,14 @@ > fft_by_ref, cvsip>(Domain<1>(0, 2, 8)); > #endif > > +#if VSIP_IMPL_CBE_SDK > + std::cout << "testing c->c fwd by_ref cbe..."; > + fft_by_ref, cbe>(Domain<1>(32)); > + std::cout << "testing c->c inv by_ref cbe..."; > + fft_by_ref, cbe>(Domain<1>(32)); > #endif > + > +#endif > } > > template > @@ -902,6 +941,23 @@ > fftm_in_place(Domain<2>(8, 16)); > #endif > > +#if VSIP_IMPL_CBE_SDK > +// Note: column-wise FFTs need to be performed on > +// col-major data in this case. These are commented > +// out until fftm_in_place is changed to be like > +// fftm_by_ref, where the cfft_type<> template allows > +// the dimension order to be specified. That's OK, though I believe we should fix that as soon as possible, such that fft_be.cpp remains as much backend-agnostic as possible, i.e. no backend-specific tests creep in. (I can complete that if you are busy finishing other bits.) > + > +// std::cout << "testing fwd on cols in_place cbe..."; > +// fftm_in_place(Domain<2>(64, 32)); > + std::cout << "testing fwd on rows in_place cbe..."; > + fftm_in_place(Domain<2>(32, 64)); > +// std::cout << "testing inv on cols in_place cbe..."; > +// fftm_in_place(Domain<2>(64, 32)); > + std::cout << "testing inv on rows in_place cbe..."; > + fftm_in_place(Domain<2>(32, 64)); > +#endif > + > #if VSIP_IMPL_FFTW3 > std::cout << "testing c->c fwd 0 by_ref fftw..."; > fftm_by_ref, fftw>(Domain<2>(8, 16)); > @@ -978,7 +1034,24 @@ > fftm_by_ref, cvsip> (Domain<2>(4, 16)); > #endif > > +#if VSIP_IMPL_CBE_SDK > + std::cout << "testing c->c fwd on cols by_ref cbe..."; > + fftm_by_ref, cbe>(Domain<2>(32, 64)); > + fftm_by_ref, cbe>(Domain<2>(Domain<1>(32), Domain<1>(0, 2, 32))); > + std::cout << "testing c->c fwd on rows by_ref cbe..."; > + fftm_by_ref, cbe>(Domain<2>(32, 64)); > + fftm_by_ref, cbe>(Domain<2>(Domain<1>(0, 2, 32), Domain<1>(64))); > + std::cout << "testing c->c inv 0 by_ref cbe..."; > + fftm_by_ref, cbe>(Domain<2>(32, 64)); > + fftm_by_ref, cbe>(Domain<2>(Domain<1>(32), Domain<1>(0, 2, 32))); > + std::cout << "testing c->c inv 1 by_ref cbe..."; > + fftm_by_ref, cbe>(Domain<2>(32, 64)); > + fftm_by_ref, cbe>(Domain<2>(Domain<1>(0, 2, 32), Domain<1>(64))); > #endif > + > + > + > +#endif > } > > int main(int argc, char **argv) Same here. Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Tue Mar 13 16:22:40 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 13 Mar 2007 12:22:40 -0400 Subject: [patch] RBO - Re: [vsipl++] [patch] RBO preview In-Reply-To: <45EEE735.1000302@codesourcery.com> References: <45EEE735.1000302@codesourcery.com> Message-ID: <45F6CFD0.3090807@codesourcery.com> This is an updated RBO patch, that should be ready to check in. It adds support for distributed expressions, and has been validated (all tests pass using the IPP/MKL backends on cugel, and all tests either pass or use too much VM using the FFTW backend on belgarath). It also has the following: - Fft_return_functor is now templatized by block type, rather than view type (it continues to store the operand by block). This makes the conversion from distributed block to local block easier. - Return_block and Fft_return_functor properly hide their member data, and provide accessor functions. This necessitates using const references/pointers to FFT Workspaces and FFT backends. I made the workspace member functions const correct, but did not attempt this for the backends. - Diagnostics for ext_data. - Moves files around: RBO is part of the optimized implementation, not the ref-impl. - Adds error checking to fastconv benchmark. It includes some unrelated benchmark updates for characterizing performance on the PowerStream. Ok to commit? -- Jules Jules Bergmann wrote: > This patch > - adds RBO support, > - applies it to by-value FFT and FFTM, > - adds a simple RBO evaluator for expressions like 'A = fft(B)' > which avoids the temporary and copy, > - adds fastconv RBO evaluators for the general case using Fftm > underneath, and the special case when using Cbe Fastconv underneath. > - adds single-line fastconv case to the fastconv benchmark > > This patch is fairly close to ready. However there are a few bits missing: > - add distributed support for Return_expr_blocks > - validation (this does work for the fastconv benchmark) > > Comments? -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rbo.diff URL: From jules at codesourcery.com Tue Mar 13 18:18:47 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 13 Mar 2007 14:18:47 -0400 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F5AE5B.1050604@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> <45F5AE5B.1050604@codesourcery.com> Message-ID: <45F6EB07.2060207@codesourcery.com> Don, These change to ppu/fft.cpp looks good. I have a minor suggestion below, but otherwise please check it in. I'll defer to Stefan on the fft_be.cpp changes. Once he is happy, please check them in too. -- Jules > Index: src/vsip/opt/cbe/ppu/fft.cpp > =================================================================== > - fftp.ea_input_buffer = 0; > - fftp.ea_output_buffer = 0; > - fftp.in_blk_stride = 0; > - fftp.out_blk_stride = 0; > + fftp.ea_input_buffer = reinterpret_cast(in); > + fftp.ea_output_buffer = reinterpret_cast(out); > + fftp.in_blk_stride = 1; // not applicable in the single FFT case > + fftp.out_blk_stride = 1; I would keep the strides set to 0 if they're aren't applicable. That way, if the SPE kernel needs to check them for some reason, it can assume non-zero strides are valid. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Tue Mar 13 19:18:10 2007 From: don at codesourcery.com (Don McCoy) Date: Tue, 13 Mar 2007 13:18:10 -0600 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F6EB07.2060207@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> <45F5AE5B.1050604@codesourcery.com> <45F6EB07.2060207@codesourcery.com> Message-ID: <45F6F8F2.5040305@codesourcery.com> Jules Bergmann wrote: > These change to ppu/fft.cpp looks good. I have a minor suggestion > below, but otherwise please check it in. > > I'll defer to Stefan on the fft_be.cpp changes. Once he is happy, > please check them in too. > Committed as attached. I believe all comments have been addressed. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fnc3.diff URL: From don at codesourcery.com Tue Mar 13 19:27:45 2007 From: don at codesourcery.com (Don McCoy) Date: Tue, 13 Mar 2007 13:27:45 -0600 Subject: [patch] SPU timer In-Reply-To: <45F6D712.8000408@codesourcery.com> References: <45F6204B.7060107@codesourcery.com> <45F69117.90300@codesourcery.com> <45F6D712.8000408@codesourcery.com> Message-ID: <45F6FB31.1040500@codesourcery.com> This file is not included (presently) from any other file, but it is useful for debugging and testing. Committed. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: timer.diff URL: From stefan at codesourcery.com Wed Mar 14 03:02:56 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 13 Mar 2007 23:02:56 -0400 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F5AE5B.1050604@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> <45F5AE5B.1050604@codesourcery.com> Message-ID: <45F765E0.7000506@codesourcery.com> Don McCoy wrote: > The changes in FFT require that FFT_BE_TESTS be defined in order to run > the backend-specific tests. Without it, the test presently fails to > compile because the CBE backend does not support 2-D and 3-D FFT's as of > yet. The attached patch fixes the CBE FFT Evaluator to only enable 1D FFTs. With that the above workaround isn't needed. Checked in. Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- A non-text attachment was scrubbed... Name: fft.hpp.diff Type: text/x-patch Size: 1300 bytes Desc: not available URL: From stefan at codesourcery.com Wed Mar 14 03:58:40 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 13 Mar 2007 23:58:40 -0400 Subject: [vsipl++] [patch] support for non-contiguous rows or columns with Cell FFTM In-Reply-To: <45F5D626.4080609@codesourcery.com> References: <45EEF889.8090303@codesourcery.com> <45EF0539.7030706@codesourcery.com> <45F19CBF.9030203@codesourcery.com> <45F5AE5B.1050604@codesourcery.com> <45F5D626.4080609@codesourcery.com> Message-ID: <45F772F0.2050806@codesourcery.com> Please find attached a cleanup patch. (Checked in.) Comments below... Stefan Seefeld wrote: > Don McCoy wrote: > >> Index: tests/fft_be.cpp >> =================================================================== >> --- tests/fft_be.cpp (revision 165340) >> +++ tests/fft_be.cpp (working copy) > > [...] > >> @@ -152,24 +166,33 @@ >> static Domain out_dom(Domain const &dom) { return dom;} >> }; >> >> -template >> +template > + typename OrderT> >> const_Vector > const> >> ramp(Domain<1> const &dom) >> { return vsip::ramp(T(0.), T(1.), dom.length() * dom.stride());} >> >> -template >> -Matrix >> +template > + typename OrderT> >> +Matrix > >> ramp(Domain<2> const &dom) >> { >> + typedef OrderT order_type; >> + typedef Dense<2, T, order_type> block_type; >> length_type rows = dom[0].length() * dom[0].stride(); >> length_type cols = dom[1].length() * dom[1].stride(); >> - Matrix m(rows, cols); >> - for (size_t r = 0; r != rows; ++r) >> - m.row(r) = ramp(T(r), T(1.), m.size(1)); >> + Matrix m(rows, cols); >> + if (impl::Type_equal::value) >> + for (size_t r = 0; r != rows; ++r) >> + m.row(r) = ramp(T(r), T(1.), m.size(1)); >> + else >> + for (size_t c = 0; c != cols; ++c) >> + m.col(c) = ramp(T(c), T(1.), m.size(0)); >> return m; >> } > > While I like the addition of the dimension-ordering parameter, I think > the conditional initialization here is a bit misleading: The value of > matrix(x, y) should be the same, no matter its dimension-ordering. Having another look at that code I realized that the layout of the views created by ramp() (and input_creator::create(), for that matter), doesn't play any role in the actual tests, as they are assigned to other views only. Thus, I removed the dimension-ordering parameter from the above, only adding it to the harness in fft_by_ref and fftm_by_ref. I still need to change the way fft_in_place as well as fftm_in_place handle their template parameters, so I can easily add the dimension-ordering there, too, but I'll defer that to some later point. Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- A non-text attachment was scrubbed... Name: fft_be.cpp.diff Type: text/x-patch Size: 4065 bytes Desc: not available URL: From jules at codesourcery.com Fri Mar 16 15:22:38 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 16 Mar 2007 11:22:38 -0400 Subject: [patch] Cell fixes Message-ID: <45FAB63E.7000407@codesourcery.com> This patch - Adds loop fusion init/fini calls before block copy uses get to access values. This is necessary if copying from an expression block with Return_blocks. This is a temporary fix. The real fix is to make get/put do the right thing for Return_blocks, that is check if the result has been computed. In cases where get will be called multiple times that this overhead would have an impact (such as loop fusion evaluators), an expression tree transformation would be done to replace the return blocks with regular blocks. However, this is a day or two of work so I've created an issue (#132). - Pulls additional command line arguments from the SVPP_OPT environment variable. export SVPP_OPTS="--svpp-num-spes 1" ./run-program Is equiv to ./run-program --svpp-num-spes 1 I added this to run 'make check' without using all the 8 SPEs, but it should be useful for other things as well. - Fix Cbe dispatch for vmul to check if type is supported (some tests were attempting to perform double and int vector-multiplies). Patch applied, but suggestions for simplifying how arguments are pulled from the environment are welcome! thanks -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc.diff URL: From stefan at codesourcery.com Fri Mar 16 15:32:54 2007 From: stefan at codesourcery.com (Stefan Seefeld) Date: Fri, 16 Mar 2007 11:32:54 -0400 Subject: [vsipl++] [patch] Cell fixes In-Reply-To: <45FAB63E.7000407@codesourcery.com> References: <45FAB63E.7000407@codesourcery.com> Message-ID: <45FAB8A6.9010703@codesourcery.com> Jules Bergmann wrote: > - Pulls additional command line arguments from the SVPP_OPT environment > variable. > > export SVPP_OPTS="--svpp-num-spes 1" > ./run-program > > Is equiv to > > ./run-program --svpp-num-spes 1 > > I added this to run 'make check' without using all the 8 SPEs, but it > should be useful for other things as well. I'm not sure the VSIPL++ library is the best place to do this. If users want to pass extra arguments by default, why can't they just write a little wrapper script themselves ? (We may even provide a generic wrapper to do this, if there is a common use case for it.) My point is that (shell) scripts are much better suited to do this kind of command-line argument / environment meddling than C++ code. We could even make it work more easily for windows that way. :-) Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Fri Mar 16 18:02:07 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 16 Mar 2007 14:02:07 -0400 Subject: [patch] Avoid invalid DMA sizes for vmul Message-ID: <45FADB9F.4020605@codesourcery.com> This patch fixes the cleanup code to avoid DMA sizes that aren't a multiple of 16. This fixes test failures for coverage_binary. It also adds a new regression test that sweeps through vmul sizes from 1 to 128. Don, is this ok to commit? Is there a better place than bindings.hpp to put the GRANULARITY macro and is_dma_size_ok() function? I'm currently running a regression now to see how this works out. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dmasize.diff URL: From don at codesourcery.com Fri Mar 16 20:32:51 2007 From: don at codesourcery.com (Don McCoy) Date: Fri, 16 Mar 2007 14:32:51 -0600 Subject: [vsipl++] [patch] Avoid invalid DMA sizes for vmul In-Reply-To: <45FADB9F.4020605@codesourcery.com> References: <45FADB9F.4020605@codesourcery.com> Message-ID: <45FAFEF3.5040201@codesourcery.com> Jules Bergmann wrote: > This patch fixes the cleanup code to avoid DMA sizes that aren't a > multiple of 16. This fixes test failures for coverage_binary. It > also adds a new regression test that sweeps through vmul sizes from 1 > to 128. > > Don, is this ok to commit? Is there a better place than bindings.hpp > to put the GRANULARITY macro and is_dma_size_ok() function? > I think this is the right place. But there is also a bit of code in the vmul kernels that deals with any leftover values in cases where the length is not a multiple of four floats (16 bytes). We could probably get rid of that now. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From jules at codesourcery.com Sat Mar 17 02:46:10 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 16 Mar 2007 22:46:10 -0400 Subject: [vsipl++] [patch] Avoid invalid DMA sizes for vmul In-Reply-To: <45FAFEF3.5040201@codesourcery.com> References: <45FADB9F.4020605@codesourcery.com> <45FAFEF3.5040201@codesourcery.com> Message-ID: <45FB5672.7020507@codesourcery.com> Don McCoy wrote: > Jules Bergmann wrote: >> This patch fixes the cleanup code to avoid DMA sizes that aren't a >> multiple of 16. This fixes test failures for coverage_binary. It >> also adds a new regression test that sweeps through vmul sizes from 1 >> to 128. >> >> Don, is this ok to commit? Is there a better place than bindings.hpp >> to put the GRANULARITY macro and is_dma_size_ok() function? >> > I think this is the right place. But there is also a bit of code in the > vmul kernels that deals with any leftover values in cases where the > length is not a multiple of four floats (16 bytes). We could probably > get rid of that now. > Patch applied. As before, but with removal of cleanup code for float vmul kernel. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dmasize.diff URL: From jules at codesourcery.com Sun Mar 18 03:51:48 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 17 Mar 2007 23:51:48 -0400 Subject: [patch] Misc fixes Message-ID: <45FCB754.7070200@codesourcery.com> This patch: - Fixes the DFT FFT backend to force the input and output layouts to have the same complex format. Previously attempting to use the backend when the input and output had different formats resulted in an assertion failure in the workspace. This was causing the regressions/fft_expr_arg test to fail. A new test regressions/fft_split_inter was added for more direct coverage. - Adds a 'name()' member to the FFT backends. This is useful for debugging (determining which backend is being used). It may also be useful for diagnostics and profiling in the future. - Changes the DFT backend to use double-precision internally for accumulation. This fixes precision difference that were arising between the DFT backend and the ref::dft routine. This was causing parallel/fftm to fail. IIRC it was also causing fft_be to fail when testing the DFT backend. - Checks DMA address alignment. Address must have 16-byte alignment on the Cell. This caused vmmul test to fail because vmmul redispatch generated vector multiplies that were unaligned (i.e. the second row of a 5 x 7 matrix of floats). - Updates SIMD traits for AltiVec (also tested on PPC 970FX with GCC 4.1 and PowerPC 7447A with GreenHills), and adds a unit-test for SIMD traits that I've been meaning to checkin for some time. - Fixes the builtin SIMD vmul routine for split-complex to work correctly when the output aliases one of the inputs. This was causing coverage_binary to fail. Curiously, ppu-g++ -m32 does not defined __VEC__, while ppu32-g++ does. With this patch, all tests should pass on the Cell, with the following exceptions: - convolution fails with OpenMPI becasue "MPI_BOR reduction not define for non-intrinsic type". Passes in serial build. - parallel/fftm likewise. => These two appear to be an OpenMPI problem, not a Cell problem. We can debug them later.) - Some of the fft_ext test cases fail, in particular real->complex => I have not debugged this. - correlation fails because of a precision error (error_db threshold). - ref-impl/fft-coverage fails because of a precision error (test does not use error_db, but if it did, it would fail for our usual threshold) => It looks like the libfft FFT is noisy. This isn't worth diagnosing too much since we'll eventually replace it with a faster FFT. Also, the regressions/transpose_assign test takes a long time to run. Granted, it is doing a lot of transposes and I had optimization turned off, but it runs much faster on EM64t. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fixes.diff URL: From jules at codesourcery.com Sun Mar 18 21:17:57 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 18 Mar 2007 17:17:57 -0400 Subject: [patch] Handle unaligned Fft, Fftm; Split view_functions test. Message-ID: <45FDAC85.9030203@codesourcery.com> This patch fixes the Cbe Fft and Fftm backends to request have 16-byte alignment, which is necessary for DMA. It also adds regression tests for unaligned Fft and Fftm. It splits the view_functions test into three smaller tests. Compiling view_functions with optimization turned on was being killed by the process killer on snipes. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: align.diff URL: From jules at codesourcery.com Mon Mar 19 17:15:45 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 19 Mar 2007 13:15:45 -0400 Subject: [patch] Fix bug for aligned rt_layouts Message-ID: <45FEC541.3020206@codesourcery.com> This fixes a bug where the layout was larger than the memory allocated by an Fftm workspace, leading to memory corruption. Hurrah for valgrind! Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: layout-bug.diff URL: From jules at codesourcery.com Mon Mar 19 21:01:13 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 19 Mar 2007 17:01:13 -0400 Subject: [patch] Quickstart changes Message-ID: <45FEFA19.1000808@codesourcery.com> This patch documents the new configure options for Cell. Does it look ok to apply? -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qs.diff URL: From jules at codesourcery.com Tue Mar 20 00:00:55 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 19 Mar 2007 20:00:55 -0400 Subject: [patch] Quickstart, + configure/build bits Message-ID: <45FF2437.6060008@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qs.diff URL: From jules at codesourcery.com Tue Mar 20 00:13:28 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 19 Mar 2007 20:13:28 -0400 Subject: [patch] Bump FFTW to 3.1.2 Message-ID: <45FF2728.2070507@codesourcery.com> This bumps vendor/fftw to 3.1.2. 3.0.1 + ppu-gcc + altivec did not get along together :( Bumping to 3.1.2 fixes this, plus should give better performance on em64t and altivec. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ext.diff URL: From don at codesourcery.com Fri Mar 23 17:46:59 2007 From: don at codesourcery.com (Don McCoy) Date: Fri, 23 Mar 2007 11:46:59 -0600 Subject: [patch] Increase max size of split-complex fast convolution Message-ID: <46041293.2000609@codesourcery.com> This patch allows the split-complex version of the fast convolution for Cell BE run at twice the former size limit. It now works for up to 4K points (2K for interleaved complex). Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fc4k.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fc4k.diff URL: From jules at codesourcery.com Fri Mar 23 18:21:04 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 23 Mar 2007 14:21:04 -0400 Subject: [vsipl++] [patch] Increase max size of split-complex fast convolution In-Reply-To: <46041293.2000609@codesourcery.com> References: <46041293.2000609@codesourcery.com> Message-ID: <46041A90.1080700@codesourcery.com> Don McCoy wrote: > This patch allows the split-complex version of the fast convolution for > Cell BE run at twice the former size limit. It now works for up to 4K > points (2K for interleaved complex). Don, This looks good, please check it in. Did we resolve the use of asserts on the SPE? Do they abort the program, or still cause it to deadlock? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Fri Mar 23 19:32:14 2007 From: don at codesourcery.com (Don McCoy) Date: Fri, 23 Mar 2007 13:32:14 -0600 Subject: [vsipl++] [patch] Increase max size of split-complex fast convolution In-Reply-To: <46041A90.1080700@codesourcery.com> References: <46041293.2000609@codesourcery.com> <46041A90.1080700@codesourcery.com> Message-ID: <46042B3E.5000803@codesourcery.com> Jules Bergmann wrote: > This looks good, please check it in. > > Did we resolve the use of asserts on the SPE? Do they abort the > program, or still cause it to deadlock? > > They still cause a deadlock. This patch add an alternative, called spe_assert(), though I note that it still hangs in abort(), but at least the messages get out to the console. Note: we may want to consider using -NDEBUG for release builds of SPE code, as this new version pulls in the stdio header otherwise. Is the same true with the system assert? Still ok to check in? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fc4k2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fc4k2.diff URL: From jules at codesourcery.com Fri Mar 23 19:41:49 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 23 Mar 2007 15:41:49 -0400 Subject: [vsipl++] [patch] Increase max size of split-complex fast convolution In-Reply-To: <46042B3E.5000803@codesourcery.com> References: <46041293.2000609@codesourcery.com> <46041A90.1080700@codesourcery.com> <46042B3E.5000803@codesourcery.com> Message-ID: <46042D7D.90108@codesourcery.com> > They still cause a deadlock. This patch add an alternative, called > spe_assert(), though I note that it still hangs in abort(), but at least > the messages get out to the console. That is an improvement. > > Note: we may want to consider using -NDEBUG for release builds of SPE > code, as this new version pulls in the stdio header otherwise. Is the > same true with the system assert? That is a good idea to use -NDEBUG for the SPE cflags when building a release. That shouldn't require any changes to the library itself, but should be a matter of configuration options. I'll handle that in scripts/config. Just to make sure I undertand the impact of including stdio: it is going to increase the SPE text size, which may result in our larger problem sizes being single buffered instead of double buffered. However, it should not create a correctness issue, i.e. it shouldn't affect stack usage if an assertion is not thrown. I don't know if the SPE system assert pulls in stdio, but I hope not since it doesn't actually print anything! > > Still ok to check in? Yes, please do. thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Fri Mar 23 19:47:26 2007 From: don at codesourcery.com (Don McCoy) Date: Fri, 23 Mar 2007 13:47:26 -0600 Subject: [vsipl++] [patch] Increase max size of split-complex fast convolution In-Reply-To: <46042D7D.90108@codesourcery.com> References: <46041293.2000609@codesourcery.com> <46041A90.1080700@codesourcery.com> <46042B3E.5000803@codesourcery.com> <46042D7D.90108@codesourcery.com> Message-ID: <46042ECE.4060606@codesourcery.com> Jules Bergmann wrote: > Just to make sure I undertand the impact of including stdio: it is > going to increase the SPE text size, which may result in our larger > problem sizes being single buffered instead of double buffered. > However, it should not create a correctness issue, i.e. it shouldn't > affect stack usage if an assertion is not thrown. That matches my assumptions. Patch committed. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From assem at codesourcery.com Sat Mar 24 10:25:24 2007 From: assem at codesourcery.com (Assem Salama) Date: Sat, 24 Mar 2007 06:25:24 -0400 Subject: par evaluators Message-ID: <4604FC94.4010204@codesourcery.com> Everyone, This patch address Jules comments. Thanks, Assem -------------- next part -------------- A non-text attachment was scrubbed... Name: svn.diff.03242007.1.log Type: text/x-log Size: 15152 bytes Desc: not available URL: From assem at codesourcery.com Sat Mar 24 10:26:46 2007 From: assem at codesourcery.com (Assem Salama) Date: Sat, 24 Mar 2007 06:26:46 -0400 Subject: maxval_test Message-ID: <4604FCE6.9050301@codesourcery.com> Everyone, This test tests the maxval operator. It creates a vector on a subset of the processors to test the processor mapping. Thanks, Assem -------------- next part -------------- A non-text attachment was scrubbed... Name: maxval_test.cpp Type: text/x-c++src Size: 1680 bytes Desc: not available URL: From joseph.j.cook at lmco.com Sun Mar 25 20:53:46 2007 From: joseph.j.cook at lmco.com (Cook, Joseph J) Date: Sun, 25 Mar 2007 16:53:46 -0400 Subject: VSIPL++ Compile Problem Message-ID: <2ACA4DC2D980EC43B5771343F896B6C6012560E0@EMSS04M21.us.lmco.com> Good Afternoon, I'm trying to compile the following simple program using Parallel VSIPL++ for Mercury mcoe, but I am getting a compile error: #include "vsip/vector.hpp" int main() { vsip::Vector foo(10,2.f); vsip::Vector bar(10,2.f); bar *= foo; } The error I get is on a computer unfortunately inaccessible to e-mail so I can't cut and paste it. A portion of it is: "incomplete type is not allowed: map_type map_; ...many more lines...complaining about BinaryOperator *= in the last line" If you cannot replicate this problem, let me know. My impression was that even though I am using Parallel VSIPL++, declaring a Map for my Vectors was an optional parameter, is this true? Thanks! Joe Cook -------------- next part -------------- An HTML attachment was scrubbed... URL: From jules at codesourcery.com Mon Mar 26 13:17:32 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Mar 2007 09:17:32 -0400 Subject: [vsipl++] VSIPL++ Compile Problem In-Reply-To: <2ACA4DC2D980EC43B5771343F896B6C6012560E0@EMSS04M21.us.lmco.com> References: <2ACA4DC2D980EC43B5771343F896B6C6012560E0@EMSS04M21.us.lmco.com> Message-ID: <4607C7EC.4000600@codesourcery.com> Cook, Joseph J wrote: > Good Afternoon, > > I?m trying to compile the following simple program using Parallel > VSIPL++ for Mercury mcoe, but I am getting a compile error: > > > #include ?vsip/vector.hpp? > int main() > { > vsip::Vector foo(10,2.f); > vsip::Vector bar(10,2.f); > > bar *= foo; > } > > The error I get is on a computer unfortunately inaccessible to e-mail so > I can?t cut and paste it. A portion of it is: > > ?incomplete type is not allowed: > map_type map_; > ?many more lines?complaining about BinaryOperator *= in the last line? > > If you cannot replicate this problem, let me know. Joe, Thanks for the problem report. I will try to reproduce the error locally. If there are any file names and line numbers associated with the message above, that might be helfpul. This is using the 1.3 release? > My impression was that even though I am using Parallel VSIPL++, > declaring a Map for my Vectors was an optional parameter, is this true? Yes, that is correct. The program above should compile without error. The only thing that stands out is the library is not being initialized (there is no "vsip::vsipl" object). Not initializing the library will not cause a compile-time error (but it may lead to run-time errors, esp when using the library in parallel). However, the error "incomplete type" suggests that a definition is not being included, which may be a result of not including a particular header file. You might including (necessary to initilize the library), and potentially -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Mon Mar 26 13:42:33 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Mar 2007 09:42:33 -0400 Subject: [vsipl++] VSIPL++ Compile Problem In-Reply-To: <4607C7EC.4000600@codesourcery.com> References: <2ACA4DC2D980EC43B5771343F896B6C6012560E0@EMSS04M21.us.lmco.com> <4607C7EC.4000600@codesourcery.com> Message-ID: <4607CDC9.6010804@codesourcery.com> > > However, the error "incomplete type" suggests that a definition is not > being included, which may be a result of not including a particular > header file. You might including (necessary to > initilize the library), and potentially Joe, To confirm, the problem is because of a missing header file. If you include the program should compile and execute correctly. This is a bug in the library. It should not be necessary to include map.hpp in this program. We'll definitely fix it. However, I assume that the work around should be sufficient and that an updated snapshot is not necessary. Is that OK? thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Mon Mar 26 14:08:21 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Mar 2007 10:08:21 -0400 Subject: [patch] Regression test for vector headers Message-ID: <4607D3D5.9050808@codesourcery.com> This regression test captures the bug reported by Joe Cook. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vh.diff URL: From joseph.j.cook at lmco.com Mon Mar 26 14:17:32 2007 From: joseph.j.cook at lmco.com (Cook, Joseph J) Date: Mon, 26 Mar 2007 10:17:32 -0400 Subject: [vsipl++] VSIPL++ Compile Problem In-Reply-To: <4607CDC9.6010804@codesourcery.com> References: <2ACA4DC2D980EC43B5771343F896B6C6012560E0@EMSS04M21.us.lmco.com> <4607C7EC.4000600@codesourcery.com> <4607CDC9.6010804@codesourcery.com> Message-ID: <2ACA4DC2D980EC43B5771343F896B6C6012560E4@EMSS04M21.us.lmco.com> Thanks for your very quick response. Adding map.hpp allowed the program to compile. No, we won't need a new drop just for this fix since the workaround is simple enough. Thanks, Joe Cook -----Original Message----- From: Jules Bergmann [mailto:jules at codesourcery.com] Sent: Monday, March 26, 2007 9:43 AM To: Jules Bergmann Cc: Cook, Joseph J; vsipl++ at codesourcery.com; Steck, Thomas F; McClean, Tom Subject: Re: [vsipl++] VSIPL++ Compile Problem > > However, the error "incomplete type" suggests that a definition is not > being included, which may be a result of not including a particular > header file. You might including (necessary to > initilize the library), and potentially Joe, To confirm, the problem is because of a missing header file. If you include the program should compile and execute correctly. This is a bug in the library. It should not be necessary to include map.hpp in this program. We'll definitely fix it. However, I assume that the work around should be sufficient and that an updated snapshot is not necessary. Is that OK? thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Mon Mar 26 16:58:32 2007 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Mar 2007 12:58:32 -0400 Subject: [vsipl++] par evaluators In-Reply-To: <4604FC94.4010204@codesourcery.com> References: <4604FC94.4010204@codesourcery.com> Message-ID: <4607FBB8.3020906@codesourcery.com> Assem Salama wrote: > Everyone, > This patch address Jules comments. Assem, There are several items from my last feedback that you did not address: --- [1] When you adding to existing code, try to follow the lead set by it if possible (and if it doesn't violate our coding standards :) . Here the '#includes" are indented to improve readability of the #ifdef logic. Your new include should be indented too. --- (for global_from_local_index_blk:) [2] What namespace is this in? vsip or vsip::impl ? If it is in vsip, please move it to vsip::impl. (Can you answer the question? Is it in vsip or vsip::impl?) --- [3] ^^^^ opt --- Also, I have some additional feedback below. -- Jules > ------------------------------------------------------------------------ > > Index: benchmarks/maxval.cpp > =================================================================== > @@ -96,6 +173,12 @@ > case 1: loop(t_maxval1(0)); break; > case 2: loop(t_maxval1(1)); break; > case 3: loop(t_maxval1(2)); break; > + case 4: loop(t_maxval2,impl::Parallel_tag>(0)); break; > + case 5: loop(t_maxval2,impl::Parallel_tag>(1)); break; > + case 6: loop(t_maxval2,impl::Parallel_tag>(2)); break; > + case 7: loop(t_maxval2,impl::Cvsip_tag>(0)); break; > + case 8: loop(t_maxval2,impl::Cvsip_tag>(1)); break; > + case 9: loop(t_maxval2,impl::Cvsip_tag>(2)); break; [1] Will this benchmark build if the C-VSIP backend is not configured in? If not, please guard these cases with an ifdef. > default: return 0; > } > return 1; > Index: benchmarks/maxval.hpp > =================================================================== > --- benchmarks/maxval.hpp (revision 0) > +++ benchmarks/maxval.hpp (revision 0) > @@ -0,0 +1,101 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. [2] Please fix the date. > + > + This file is available for license from CodeSourcery, Inc. under the terms > + of a commercial license and under the GPL. It is not part of the VSIPL++ > + reference implementation and is not available under the BSD license. > +*/ > +/** @file benchmarks/maxval.hpp > + @author Assem Salama > + @date 2006-07-22 [3] Please fix the date. > + @brief VSIPL++ Library: Helper file for maxval benchmark > + > +*/ > +#ifndef BENCHMARKS_MAXVAL_HPP > +#define BENCHMARKS_MAXVAL_HPP > + > +using namespace vsip::impl; > +using namespace vsip; [4] Header files shouldn't have global 'using namespace' statements that import names into the global namespace. They can introduces order-dependent behavior (names introduced before this header will be included, names introduced after will not). Instead, names should either have an explicit namespace (i.e. vsip::dimension_type), or 'using' statements should have limited scope (such as within a function). > Index: src/vsip/opt/reductions/par_reductions.hpp > =================================================================== > +/*********************************************************************** > + Parallel evaluators. [5] Please update or remove this comment. > +***********************************************************************/ > +/********************************************************************** > +* Parallel evaluators for index returning reductions [6] Please change this to the singular "Parallel evaluator". There is only one evaluator. > +**********************************************************************/ > + > +template