From don at codesourcery.com Mon Jun 2 04:51:35 2008 From: don at codesourcery.com (Don McCoy) Date: Sun, 01 Jun 2008 22:51:35 -0600 Subject: [patch] fix in SAL backend for outer product Message-ID: <48437C57.5070502@codesourcery.com> This patch fixes a block dimension order check in the outer product evaluator. Note that this evaluator is only available when VSIP_IMPL_SAL_USE_MAT_MUL is defined to something other than '0' (the default). As a result of the bug, the existing test coverage was missing the cases where the output matrix was column-major in the smallest dimension. Now those cases are getting picked up and validated. Ok to commit? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: so.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: so.diff URL: From jules at codesourcery.com Mon Jun 2 13:50:45 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 02 Jun 2008 09:50:45 -0400 Subject: [vsipl++] [patch] Kernel updates In-Reply-To: <483F3DB9.9070102@codesourcery.com> References: <483D79CC.9060804@codesourcery.com> <483D96FB.8030301@codesourcery.com> <483D9ADE.60202@codesourcery.com> <483F3DB9.9070102@codesourcery.com> Message-ID: <4843FAB5.4070607@codesourcery.com> >> It's sort of a moot point right now, as C++ kernels don't work with ALF 3.0. > > */me looks at alf_functions.cpp in the SBIR demo code* > > */me is rather confused* > > As far as I can tell, C++ kernels seem to work fine with ALF 3.0, so > long as the relevant functions (and the ALF macro stuff at the end) are > wrapped in an 'extern "C"' block. Thanks Brooks, I will poke further on this. FWIW, before giving up on using C++ for the split-complex FFT kernel, I wrapped the ALF macros in extern "C" but then ran into a ALF runtime error. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Mon Jun 2 14:29:59 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 02 Jun 2008 10:29:59 -0400 Subject: [vsipl++] patch: enable/disable fftwf, fftw, and fftwl individually In-Reply-To: <4841625E.7070505@codesourcery.com> References: <483F91C5.5060701@codesourcery.com> <48405F4C.8040305@codesourcery.com> <48406A32.5030102@codesourcery.com> <48407438.7040301@codesourcery.com> <4841625E.7070505@codesourcery.com> Message-ID: <484403E7.8030306@codesourcery.com> > The 'provide_...' macros have values of either '0' or '1', but never ''. > Thus, the original test: > >> if test "x$provide_fft_long_double" != "x"; then > > will always succeed. I don't mind whether we use 0/1 or ''/'something', > but we should either test the real state or not test at all. So, the > least disruptive change would be just to take out the test, as it is > always true. Ok, sounds like we're on the same page. I was less concerned with removing the check than with changing the definition of the macro. Instead of > if test "$neutral_acconfig" = 'y'; then > - if test "x$provide_fft_float" != "x"; then > - CPPFLAGS="$CPPFLAGS -DVSIP_IMPL_PROVIDE_FFT_FLOAT=$provide_fft_float" > + if test $provide_fft_float != 0; then > + CPPFLAGS="$CPPFLAGS -DVSIP_IMPL_PROVIDE_FFT_FLOAT=1" > fi How about > if test "$neutral_acconfig" = 'y'; then > - if test "x$provide_fft_float" != "x"; then > CPPFLAGS="$CPPFLAGS -DVSIP_IMPL_PROVIDE_FFT_FLOAT=$provide_fft_float" > - fi With that, and some address of the sh portability question (such as telling me not to worry about it :), and the patch looks good. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Tue Jun 3 22:46:26 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 03 Jun 2008 18:46:26 -0400 Subject: patch: enhancements for freqswap() Message-ID: <4845C9C2.4090502@codesourcery.com> The attached patch reimplements vsip::freqswap(), yielding two important changes: * The recently discovered restriction that the input Block type must be allocatable is removed. * It uses a return-block optimization, allowing it avoid a temporary view by using expression template techniques. Note that the algorithm itself was slightly enhanced to allow to operate on a block in-place, therefor removing the constraint that input and output must not alias. OK to commit ? Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- A non-text attachment was scrubbed... Name: freqswap.hpp.diff Type: text/x-patch Size: 9353 bytes Desc: not available URL: From jules at codesourcery.com Wed Jun 4 12:27:39 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 04 Jun 2008 08:27:39 -0400 Subject: [vsipl++] patch: enhancements for freqswap() In-Reply-To: <4845C9C2.4090502@codesourcery.com> References: <4845C9C2.4090502@codesourcery.com> Message-ID: <48468A3B.7050409@codesourcery.com> Stefan Seefeld wrote: > The attached patch reimplements vsip::freqswap(), yielding two important > changes: > > * The recently discovered restriction that the input Block type must be > allocatable is removed. > > * It uses a return-block optimization, allowing it avoid a temporary > view by using expression template techniques. You're now a fully trained return-block optimization expert! :) > Note that the algorithm itself was slightly enhanced to allow to operate > on a block in-place, therefor removing the constraint that input and > output must not alias. Is the 2D algorithm enhanced too? I could see the 1D algorithm was enhanced, but wasn't about the 2D algorithm. > > OK to commit ? > Yes, this looks good. thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Wed Jun 4 18:28:05 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 04 Jun 2008 14:28:05 -0400 Subject: [patch] Distributed transpose Message-ID: <4846DEB5.9050300@codesourcery.com> This patch fixes handling of distributed transposes of the form: A = B.transpose(); Ok to apply? -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: par-trans.diff URL: From don at codesourcery.com Wed Jun 4 21:59:15 2008 From: don at codesourcery.com (Don McCoy) Date: Wed, 04 Jun 2008 15:59:15 -0600 Subject: [patch] CML bindings for matrix transpose operations Message-ID: <48471033.2060700@codesourcery.com> This is patterned after the existing serial evaluator for transpose operations using SIMD instructions, except that it dispatches operations to CML. One important difference is that it handles split complex as well as interleaved. Matrix copies are also performed (when the block layouts match), but only if the strides are unit in the smallest dimension, so as to have the potential to use the SPU's. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ct.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ct.diff URL: From stefan at codesourcery.com Thu Jun 5 00:39:46 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 04 Jun 2008 20:39:46 -0400 Subject: [vsipl++] [patch] CML bindings for matrix transpose operations In-Reply-To: <48471033.2060700@codesourcery.com> References: <48471033.2060700@codesourcery.com> Message-ID: <484735D2.3030507@codesourcery.com> Don, I have some small comments and questions (one for Jules). Overall this looks good. Don McCoy wrote: > This is patterned after the existing serial evaluator for transpose > operations using SIMD instructions, except that it dispatches operations > to CML. One important difference is that it handles split complex as > well as interleaved. Matrix copies are also performed (when the block > layouts match), but only if the strides are unit in the smallest > dimension, so as to have the potential to use the SPU's. > +// These macros support scalar and interleaved complex types I don't find these macros very useful. Most of them you use exactly once. And even the ones you use twice would result in simpler code if spelled out. For example: > + > +#define VSIP_IMPL_CML_TRANS(T, FCN, CML_FCN) \ > + inline void \ > + FCN( \ > + T* a, ptrdiff_t rsa, ptrdiff_t csa, \ > + T* z, ptrdiff_t rsz, ptrdiff_t csz, \ > + size_t m, size_t n) \ > + { \ > + typedef Scalar_of::type CML_T; \ > + CML_FCN( \ > + reinterpret_cast(a), rsa, csa, \ > + reinterpret_cast(z), rsz, csz, \ > + m, n ); \ > + } > + > +VSIP_IMPL_CML_TRANS(float, transpose, cml_mtrans_f) > +VSIP_IMPL_CML_TRANS(std::complex, transpose, cml_cmtrans_f) > +#undef VSIP_IMPL_CML_TRANS > + actually boils down to inline void transpose(float *a, ptrdiff_t rsa, ptrdiff_t csa, float *z, ptrdiff_t rsz, ptrdiff_t csz, size_t m, size_t n) { cml_mtrans_f(a, rsa, csa, z, rsz, csz, m, n); } for the first case, which I find much more readable than the macro code above. > + static bool rt_valid(DstBlock& dst, SrcBlock const& src) > + { > + bool rt = true; > + > + // If performing a copy, both source and destination blocks > + // must be unit stride. > + if (Type_equal::value) > + { > + Ext_data dst_ext(dst, SYNC_OUT); > + Ext_data src_ext(src, SYNC_IN); These objects only exist to check the strides, right ? I'm aware that we don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we ? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this context. Jules ? > + > + dimension_type const s_dim1 = src_order_type::impl_dim1; > + dimension_type const d_dim1 = src_order_type::impl_dim1; Why two constants, if they hold the same value ? > + if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1) > + rt = false; > + } > + > + return rt; > + } > + > + static void exec(DstBlock& dst, SrcBlock const& src, row2_type, row2_type) > + { > + vsip::impl::Ext_data dst_ext(dst, vsip::impl::SYNC_OUT); > + vsip::impl::Ext_data src_ext(src, vsip::impl::SYNC_IN); > + Why the full qualification here (but not above) ? (I know, this is really picky, but I like compact and concise code. ;-) ) Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Thu Jun 5 13:50:44 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Jun 2008 09:50:44 -0400 Subject: [vsipl++] [patch] CML bindings for matrix transpose operations In-Reply-To: <484735D2.3030507@codesourcery.com> References: <48471033.2060700@codesourcery.com> <484735D2.3030507@codesourcery.com> Message-ID: <4847EF34.5020306@codesourcery.com> Stefan, Don, > > for the first case, which I find much more readable than the macro > code above. I agree that macros are pretty much always less readable than straight code (in fact the first thing I usually do when debugging a problem in a macro is manually expand it). However, in defense of Don's approach, the macros will quickly scale when CML supports additional types (double, complex, int, short, etc etc). Moreover, the functions created by the macros don't do much (just wrap a CML function with a consistent overloaded function interface), so there isn't much to understand. The other way to get more mileage out of the macros is to use them for more than just a single function. Transpose is an example of a 1-input, 1-output matrix function. Instead of calling the macro VSIP_IMPL_CML_TRANS, it could be called VSIP_IMPL_CML_MUNARY ("matrix function, unary ~ 1 argument"). Then the macro could be reused across mtrans, matrix copy (when we add it), and so on. We don't have too many unary matrix functions yet, but the idea generalizes. > > >> + static bool rt_valid(DstBlock& dst, SrcBlock const& src) >> + { + bool rt = true; >> + >> + // If performing a copy, both source and destination blocks >> + // must be unit stride. >> + if (Type_equal::value) >> + { >> + Ext_data dst_ext(dst, SYNC_OUT); >> + Ext_data src_ext(src, SYNC_IN); > > These objects only exist to check the strides, right ? I'm aware that we > don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we > ? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this > context. Jules ? I think SYNC_OUT and SYNC_IN make sense because they indicate how the data access will be used. Because ct_valid includes a check that the cost of data access is 0, in rt_valid there is no danger that data access will require a copy. Using SYNC_OUT and SYNC_IN allows the rt_valid declarations to exactly match those used in exec. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu Jun 5 14:06:13 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Jun 2008 10:06:13 -0400 Subject: [vsipl++] [patch] CML bindings for matrix transpose operations In-Reply-To: <48471033.2060700@codesourcery.com> References: <48471033.2060700@codesourcery.com> Message-ID: <4847F2D5.1090307@codesourcery.com> Don McCoy wrote: > This is patterned after the existing serial evaluator for transpose > operations using SIMD instructions, except that it dispatches operations > to CML. One important difference is that it handles split complex as > well as interleaved. Matrix copies are also performed (when the block > layouts match), but only if the strides are unit in the smallest > dimension, so as to have the potential to use the SPU's. Don, This looks good. I have a couple of minor comments below, but please check it in. -- Jules > +#define VSIP_IMPL_CML_COPY_UNIT(T, FCN, CML_FCN) \ > + inline void \ > + FCN( \ > + T* a, ptrdiff_t rsa, \ > + T* z, ptrdiff_t rsz, \ > + size_t n) \ > + { \ > + typedef Scalar_of::type CML_T; \ > + CML_FCN( \ > + reinterpret_cast(a), rsa, \ > + reinterpret_cast(z), rsz, \ > + n * (Is_complex::value ? 2 : 1)); \ > + } > + > +VSIP_IMPL_CML_COPY_UNIT(float, copy_unit, cml_vcopy_f) > +VSIP_IMPL_CML_COPY_UNIT(complex, copy_unit, cml_vcopy_f) What does the '_unit' suffix indicate? 'copy_unit' appears to take a vector with a stride. It could easily handle a non-unit stride vector. For naming, you might call this function 'vcopy' to indicate that it copies a vector (as opposed to a matrix). > + if (s == 'r' && d == 'r') return "Expr_Trans (rr copy)"; > + else if (s == 'r' && d == 'c') return "Expr_Trans (rc trans)"; > + else if (s == 'c' && d == 'r') return "Expr_Trans (cr trans)"; > + else /* (s == 'c' && d == 'c') */ return "Expr_Trans (cc copy)"; You could rename these ^^^^^^^^^^ to something CML specific "Cml_tag matrix (rr copy)" etc. > + static bool rt_valid(DstBlock& dst, SrcBlock const& src) > + { ... > + dimension_type const s_dim1 = src_order_type::impl_dim1; > + dimension_type const d_dim1 = src_order_type::impl_dim1; As Stefan points out, this looks suspicious. Did you really mean *dst*_order_type for the second? Also, because the copy sub-evaluators require the block to be dense ('assert(dst_ext.stride(0) == dst.size(2, 1))' etc), shouldn't rt_valid enfore this restriction too? > + > + if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1) > + rt = false; > + } > + > + return rt; > + } -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu Jun 5 18:43:58 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Jun 2008 14:43:58 -0400 Subject: [patch] Fix lapack MV and VM prod stride bug Message-ID: <484833EE.4020504@codesourcery.com> This patch fixes a stride bug in the lapack MV and VM prod evaluators. Don's more thorough tests for the Cell turned these up! Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lapack-prod.diff URL: From stefan at codesourcery.com Thu Jun 5 19:06:13 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 05 Jun 2008 15:06:13 -0400 Subject: [vsipl++] [patch] Distributed transpose In-Reply-To: <4846DEB5.9050300@codesourcery.com> References: <4846DEB5.9050300@codesourcery.com> Message-ID: <48483925.8020703@codesourcery.com> Jules Bergmann wrote: > This patch fixes handling of distributed transposes of the form: > > A = B.transpose(); > > Ok to apply? Jules, the patch looks good. There were a couple of changes of the form: > -/* Copyright (c) 2005 by CodeSourcery, LLC. All rights reserved. */ > +/* Copyright (c) 2005, 2008 by CodeSourcery, LLC. All rights reserved. */ > where I'd like to suggest we remove the obsolete 'LLC.', while applying other changes. (I don't think this is worth a full sweep over the sources !) Also, this patch reminds me of two issues, which I hope we will find the time to address over the coming months: * We need to address the many header interdependencies, in particular between the serial and the parallel set. There is some circularity that is very hard to break out of. * There are sets of templates that form a self-contained block of functionality that I think is important to document, both, from a low level as well as middel-level view. I'm in particular thinking of the Combine_return_type logic (which I have run across multiple times this week already) Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Thu Jun 5 20:32:22 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Jun 2008 16:32:22 -0400 Subject: [vsipl++] [patch] Distributed transpose In-Reply-To: <48483925.8020703@codesourcery.com> References: <4846DEB5.9050300@codesourcery.com> <48483925.8020703@codesourcery.com> Message-ID: <48484D56.4080905@codesourcery.com> > * We need to address the many header interdependencies, in particular > between the serial and the parallel set. There is some circularity that > is very hard to break out of. I completely agree. Was there a particular dependency between the parallel/serial headers? Unfortunately the spec has a circular definition where maps return the processor set in a Vector, but Vectors have blocks, and blocks have maps, and maps have processor sets ... well you get the idea. > > * There are sets of templates that form a self-contained block of > functionality that I think is important to document, both, from a low > level as well as middel-level view. I'm in particular thinking of the > Combine_return_type logic (which I have run across multiple times this > week already) I completely agree too. I'll take an action to add some documentation for Combine_return_type. You'll notice I did add some gratuitous documentation to one of Subset_map_decl's functors ;) Let's take this opportunity as we get back into the library (relearning what we've written before but didn't document), to add some documentation. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Thu Jun 5 20:46:27 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 05 Jun 2008 16:46:27 -0400 Subject: [vsipl++] [patch] Distributed transpose In-Reply-To: <48484D56.4080905@codesourcery.com> References: <4846DEB5.9050300@codesourcery.com> <48483925.8020703@codesourcery.com> <48484D56.4080905@codesourcery.com> Message-ID: <484850A3.9060709@codesourcery.com> Jules Bergmann wrote: > >> * We need to address the many header interdependencies, in particular >> between the serial and the parallel set. There is some circularity >> that is very hard to break out of. > > I completely agree. > > Was there a particular dependency between the parallel/serial headers? > Unfortunately the spec has a circular definition where maps return the > processor set in a Vector, but Vectors have blocks, and blocks have > maps, and maps have processor sets ... well you get the idea. Yes, that's the one. :-) One issue that so far I have only been able to work around by somewhere injecting "#include " was that some (expression-)block types use a Local_or_global_map, which derives from Global_map, which comes from parallel/*. And unless all relevant templates (traits, etc.) are seen by the compiler, some operations may raise a compilation-error pointing out that it's invalid to mix local and global blocks in assignments... >> * There are sets of templates that form a self-contained block of >> functionality that I think is important to document, both, from a low >> level as well as middel-level view. I'm in particular thinking of the >> Combine_return_type logic (which I have run across multiple times this >> week already) > > I completely agree too. > > I'll take an action to add some documentation for Combine_return_type. > > You'll notice I did add some gratuitous documentation to one of > Subset_map_decl's functors ;) Let's take this opportunity as we get > back into the library (relearning what we've written before but didn't > document), to add some documentation. Yes ! (I have a list of topics that I want to document myself, so I'm happy to wrap that up, too.) Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From don at codesourcery.com Sat Jun 7 22:52:09 2008 From: don at codesourcery.com (Don McCoy) Date: Sat, 07 Jun 2008 16:52:09 -0600 Subject: [patch] Fix for Fir destructor not getting called Message-ID: <484B1119.8060907@codesourcery.com> This patch fixes a problem in which the Ref_counted_ptr holder's count_ member was being incremented one too many times, resulting in the Fir destructor not getting called. The problem occurs because in the dispatch mechanism (Evaluator::exec()), the Fir_impl backend is created and stored in a Ref_counted_ptr object, then passed to the Fir class and stored in yet another Ref_counted_ptr object. This results in a reference count of 2 after creation, inhibiting the destructor from being called when the Fir object goes out of scope. While this does in fact fix the problem, I would like to verify that it is the correct fix in this case. Comments? Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rc.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rc.diff URL: From stefan at codesourcery.com Sun Jun 8 13:41:27 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Sun, 08 Jun 2008 09:41:27 -0400 Subject: [vsipl++] [patch] Fix for Fir destructor not getting called In-Reply-To: <484B1119.8060907@codesourcery.com> References: <484B1119.8060907@codesourcery.com> Message-ID: <484BE187.6020200@codesourcery.com> Don McCoy wrote: > This patch fixes a problem in which the Ref_counted_ptr holder's count_ > member was being incremented one too many times, resulting in the Fir > destructor not getting called. > > The problem occurs because in the dispatch mechanism > (Evaluator::exec()), the Fir_impl backend is created and stored in a > Ref_counted_ptr object, then passed to the Fir class and stored in yet > another Ref_counted_ptr object. This results in a reference count of 2 > after creation, inhibiting the destructor from being called when the Fir > object goes out of scope. > > While this does in fact fix the problem, I would like to verify that it > is the correct fix in this case. Comments? Yes, I believe this is the correct fix: the created objects are of type Ref_count, which initializes the counter to 1. When we pass the newly created object to Ref_counted_ptr, we really mean to hand the object's ownership to it, too, so we must *not* increment the counter. I have to admit that I find the ref counting API still quite confusing... Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Mon Jun 9 21:09:47 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 09 Jun 2008 17:09:47 -0400 Subject: [patch] Fix eval diag when CML not present Message-ID: <484D9C1B.8060405@codesourcery.com> This patch disables some CML specific diag bits when CML is not configured. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: diag.diff URL: From jules at codesourcery.com Tue Jun 10 15:46:58 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 10 Jun 2008 11:46:58 -0400 Subject: [patch] Fix freqswap errata Message-ID: <484EA1F2.40706@codesourcery.com> This patch fixes a couple of bugs in freqswap that broke the cheby window test: - In-place freqswap was broken for odd vector and matrix sizes. For vectors, this was straight forward to fix. For matrices, this requires either creating temporary vectors approximately the size of the number of rows and columns of the matrix, or doing the swap in two phases. Under the rationale that memory allocation is to be avoided outside of early binding, I implemented the two phase swap. The vector fix was enough to get the cheby window test to pass. However, I also optimized cheby to use an out-of-place freqswap to avoid a copy, which turned up another bug: - Freqswap_functor stored the referee block as a reference. This works for by-reference blocks, but not by-value blocks, such as expressions. Fixed by using appropriate View_block_storage traits. This patch also extends the freqswap test to cover the in-place and RHS expression cases. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fs.diff URL: From jules at codesourcery.com Tue Jun 10 16:56:53 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 10 Jun 2008 12:56:53 -0400 Subject: [vsipl++] [patch] Fix freqswap errata In-Reply-To: <484EA1F2.40706@codesourcery.com> References: <484EA1F2.40706@codesourcery.com> Message-ID: <484EB255.8050303@codesourcery.com> This patch fixes a bug I just introduced that broke freqswap for cases where the input and output blocks have different types! It would also be possible to cast the pointers to (void*). However, this approach allows the check to be avoided at compile-time when the blocks have different types. Ok to apply? -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fs2.diff URL: From stefan at codesourcery.com Tue Jun 10 16:59:42 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 10 Jun 2008 12:59:42 -0400 Subject: [vsipl++] [patch] Fix freqswap errata In-Reply-To: <484EB255.8050303@codesourcery.com> References: <484EA1F2.40706@codesourcery.com> <484EB255.8050303@codesourcery.com> Message-ID: <484EB2FE.1080607@codesourcery.com> Jules Bergmann wrote: > This patch fixes a bug I just introduced that broke freqswap for cases > where the input and output blocks have different types! > > It would also be possible to cast the pointers to (void*). However, > this approach allows the check to be avoided at compile-time when the > blocks have different types. > > Ok to apply? This looks good. Thanks ! Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From don at codesourcery.com Wed Jun 11 04:42:23 2008 From: don at codesourcery.com (Don McCoy) Date: Tue, 10 Jun 2008 22:42:23 -0600 Subject: [patch] CML backend fir FIR filters Message-ID: <484F57AF.8070206@codesourcery.com> This patch adds support for real, single-precision, floating point FIR filters using CML. There is some overlap with another patch of mine (06/07: [patch] Fix for Fir destructor not getting called) as well as one of Stefan's (06/09: patch: Fix various FIR backends to support negative strides), but it's minimal as most of my changes are in the file opt/cbe/cml/fir.hpp. I will merge before committing, once those patches are in. Ok to commit? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fb.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fb.diff URL: From jules at codesourcery.com Wed Jun 11 12:02:33 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 11 Jun 2008 08:02:33 -0400 Subject: [vsipl++] [patch] fix in SAL backend for outer product In-Reply-To: <48437C57.5070502@codesourcery.com> References: <48437C57.5070502@codesourcery.com> Message-ID: <484FBED9.6000702@codesourcery.com> Don McCoy wrote: > This patch fixes a block dimension order check in the outer product > evaluator. Note that this evaluator is only available when > VSIP_IMPL_SAL_USE_MAT_MUL is defined to something other than '0' (the > default). > > As a result of the bug, the existing test coverage was missing the cases > where the output matrix was column-major in the smallest dimension. > Now those cases are getting picked up and validated. > > Ok to commit? Don, Yes, this looks good, please check it in. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Wed Jun 11 12:45:14 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 11 Jun 2008 08:45:14 -0400 Subject: [vsipl++] [patch] CML backend fir FIR filters In-Reply-To: <484F57AF.8070206@codesourcery.com> References: <484F57AF.8070206@codesourcery.com> Message-ID: <484FC8DA.1020206@codesourcery.com> Don McCoy wrote: > This patch adds support for real, single-precision, floating point FIR > filters using CML. > > There is some overlap with another patch of mine (06/07: [patch] Fix for > Fir destructor not getting called) as well as one of Stefan's (06/09: > patch: Fix various FIR backends to support negative strides), but it's > minimal as most of my changes are in the file opt/cbe/cml/fir.hpp. I > will merge before committing, once those patches are in. Don, This looks good, along with the reference counting and stride fixes. Please apply. thanks, -- Jules > + Fir_impl(Fir_impl const &fir) > + : base(fir), > + fir_obj_ptr_(NULL), > + filter_state_(fir.filter_state_) > + { > + fir_create( > + &fir_obj_ptr_, > + fir.fir_obj_ptr_->K, CML objects are intended to be opaque. Let's keep this for now, but create an issue (#175) to add a new CML attribute function that returns a pointer to the kernel coefficients. float* cml_fir_attr_kernel_f( cml_fir_f* obj); > + 1, // kernel stride > + this->decimation(), > + this->filter_state_, > + this->kernel_size(), > + this->input_size()); > + } > +template > +struct Evaluator + Ref_counted_ptr > > + (aligned_array, > + length_type, length_type, length_type, > + unsigned, alg_hint_type)> > +{ > + static bool const ct_valid = // false; Some left-over debug code ^^^^^^^^^ > + Type_equal::value; > + > + typedef Ref_counted_ptr > return_type; > + // rt_valid takes the first argument by reference to avoid taking > + // ownership. > + static bool rt_valid(aligned_array const &, length_type k, > + length_type i, length_type d, > + unsigned, alg_hint_type) > + { > + length_type o = k * (1 + (S != nonsym)) - (S == sym_even_len_odd) - 1; > + assert(i > 0); // input size > + assert(d > 0); // decimation > + assert(o + 1 > d); // M >= decimation > + assert(i >= o); // input_size >= M > + > + length_type output_size = (i + d - 1) / d; > + return i == output_size * d; What is rt_valid checking exactly? I think I see. Does CML FIR not work on fixed size inputs if i % d != 0? Argh! That's a mistake on my part with the API design. I wonder how often such cases happen in practice. Regardless, can you add a comment to that effect? "CML FIR objects have fixed output size, whereas VSIPL++ FIR objects have fixed input size. If input size is not a multiple of the decimation, output size will vary from frame to frame." > + } -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Wed Jun 11 14:33:53 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 11 Jun 2008 10:33:53 -0400 Subject: [patch] Fix FFTW3 macro checks Message-ID: <484FE251.5030705@codesourcery.com> The FFTW_HAVE_{*} variables are either set to 1 or left undefined. However, they were being checked with an "#if", which caused an error when left undefined. This patch changes the guard to #ifdef instead. Since I twisted Stefan's arm :) into keeping the PROVIDE_FFT_{*} macros set to either 1 or 0, I just want to make sure this looks OK. Ok to apply? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftw3.diff URL: From don at codesourcery.com Wed Jun 11 22:44:40 2008 From: don at codesourcery.com (Don McCoy) Date: Wed, 11 Jun 2008 16:44:40 -0600 Subject: [vsipl++] [patch] CML backend fir FIR filters In-Reply-To: <484FC8DA.1020206@codesourcery.com> References: <484F57AF.8070206@codesourcery.com> <484FC8DA.1020206@codesourcery.com> Message-ID: <48505558.9080803@codesourcery.com> Jules Bergmann wrote: > >> + Fir_impl(Fir_impl const &fir) >> + : base(fir), >> + fir_obj_ptr_(NULL), >> + filter_state_(fir.filter_state_) >> + { >> + fir_create( >> + &fir_obj_ptr_, >> + fir.fir_obj_ptr_->K, >> > > CML objects are intended to be opaque. > > Let's keep this for now, but create an issue (#175) to add a new CML > attribute function that returns a pointer to the kernel coefficients. > > float* > cml_fir_attr_kernel_f( > cml_fir_f* obj); I should have mentioned this. It seemed "wrong" to dip beneath the CML API to get what I wanted, but the above fix will take care of it. I'll do this soon. > What is rt_valid checking exactly? > > I think I see. Does CML FIR not work on fixed size inputs if i % d != > 0? Argh! That's a mistake on my part with the API design. I wonder > how often such cases happen in practice. > Several instances in our test harness were invoking the fir_opt evaluator instead of ours. I looked into it and found out this rt_valid check is the reason. > Regardless, can you add a comment to that effect? > > "CML FIR objects have fixed output size, whereas VSIPL++ FIR objects > have fixed input size. If input size is not a multiple of the > decimation, output size will vary from frame to frame." > > > I'll add the comment and leave it that way for now. I'm not 100% sure that CML won't do what we want, but I did try it. It leads to this error because it ends up exceeding the length of the output vector: i = 1024, os*d = 1026 fir: ../vpp/src/vsip/core/subblock.hpp:293: vsip::impl::Subset_block::Subset_block(const vsip::Domain::dim>&, Block&) [with Blo\ ck = vsip::Dense<1u, float, vsip::tuple<0u, 1u, 2u>, vsip::Local_map>]: Assertion `dom_[d].size() == 0 || dom_[d].impl_last() < blk_->size(dim, d)' failed. /bin/sh: line 1: 14399 Aborted ./tests/fir The chained FIR filter test is written as vsip::length_type got1a = 0; for (vsip::length_type i = 0; i < 2 * M; ++i) // chained { got1a += fir1a( input(vsip::Domain<1>(i * N, 1, N)), output1(vsip::Domain<1>(got1a, 1, (N + D - 1) / D))); } The value got1a is incrementing by a fixed interval each time, because the operator() function (which calls cml_fir_apply_f()) always returns the same size. So it is as you suggested -- that CML essentially expects to process even multiples of the output size each time. It would take a change in the apply function to make this work correctly -- such that 'apply' returns the correct number of output values calculated on each iteration. Yet this might not be all that simple. Just an historical note: I copied this implementation, including the rt_valid check, from the IPP implementation. Perhaps they had a good reason for having that restriction as well... -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From don at codesourcery.com Wed Jun 11 23:12:59 2008 From: don at codesourcery.com (Don McCoy) Date: Wed, 11 Jun 2008 17:12:59 -0600 Subject: [vsipl++] [patch] CML backend fir FIR filters In-Reply-To: <484F57AF.8070206@codesourcery.com> References: <484F57AF.8070206@codesourcery.com> Message-ID: <48505BFB.40806@codesourcery.com> Don McCoy wrote: > This patch adds support for real, single-precision, floating point FIR > filters using CML. > Feedback applied. Committed as attached. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fb2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fb2.diff URL: From jules at codesourcery.com Mon Jun 16 17:54:37 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 16 Jun 2008 13:54:37 -0400 Subject: [patch] Fix IPP FIR strides Message-ID: <4856A8DD.7050108@codesourcery.com> This patch catches up the IPP FIR BE to use 'stride_type' for strides. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fir.diff URL: From jules at codesourcery.com Mon Jun 16 20:31:10 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 16 Jun 2008 16:31:10 -0400 Subject: [patch] Fix pointer comparison Message-ID: <4856CD8E.1040900@codesourcery.com> The pointer equality check for an in-place transpose will cause a compilation error for assignments between different value types (such as in view.cpp). This patch fixes that by introducing an 'is_same_ptr' that can compare pointers of different types. It also includes a simple unit-test for in-place transpose. Ok to apply? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: trans.diff URL: From jules at codesourcery.com Mon Jun 16 20:44:43 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 16 Jun 2008 16:44:43 -0400 Subject: [vsipl++] [patch] Fix pointer comparison In-Reply-To: <4856CD8E.1040900@codesourcery.com> References: <4856CD8E.1040900@codesourcery.com> Message-ID: <4856D0BB.8010402@codesourcery.com> > This patch fixes that by introducing an 'is_same_ptr' that can compare > pointers of different types. Revised to limit application to pointers only. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: trans.diff URL: From stefan at codesourcery.com Mon Jun 16 20:54:11 2008 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 16 Jun 2008 16:54:11 -0400 Subject: [vsipl++] [patch] Fix pointer comparison In-Reply-To: <4856D0BB.8010402@codesourcery.com> References: <4856CD8E.1040900@codesourcery.com> <4856D0BB.8010402@codesourcery.com> Message-ID: <4856D2F3.6050200@codesourcery.com> Jules Bergmann wrote: > >> This patch fixes that by introducing an 'is_same_ptr' that can compare >> pointers of different types. > > Revised to limit application to pointers only. This patch looks good. Though I imagine that template inline bool is_same_ptr(T1 *ptr1, T2 *ptr2) { return Is_same_ptr::compare(ptr1, ptr2); } would result in a simpler error message "no matching function for call to 'is_same_ptr(non-pointer-type1, non-pointer-type2)'". Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From don at codesourcery.com Mon Jun 16 22:43:33 2008 From: don at codesourcery.com (Don McCoy) Date: Mon, 16 Jun 2008 16:43:33 -0600 Subject: [patch] Add byte-swapping support to save/load_view.hpp Message-ID: <4856EC95.2050504@codesourcery.com> This patch utilizes the matlab byte-swapping functions to allow load_view and save_view to work on big-endian platforms using data written by little-endian systems (or vice-versa). The test has been expanded to cover the added functionality, however the default is such that existing code will behave the same (i.e. bytes are not swapped when read or written). Ok to commit? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lv.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lv.diff URL: From jules at codesourcery.com Wed Jun 25 14:58:32 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 25 Jun 2008 10:58:32 -0400 Subject: [vsipl++] [patch] Add byte-swapping support to save/load_view.hpp In-Reply-To: <4856EC95.2050504@codesourcery.com> References: <4856EC95.2050504@codesourcery.com> Message-ID: <48625D18.1080009@codesourcery.com> Don McCoy wrote: > This patch utilizes the matlab byte-swapping functions to allow > load_view and save_view to work on big-endian platforms using data > written by little-endian systems (or vice-versa). The test has been > expanded to cover the added functionality, however the default is such > that existing code will behave the same (i.e. bytes are not swapped when > read or written). > > Ok to commit? Don, Looks good, please check this in. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Fri Jun 27 15:25:40 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 27 Jun 2008 11:25:40 -0400 Subject: [patch] Disable profiling in dot products Message-ID: <48650674.403@codesourcery.com> Some of the profiling event name code for dot product was consuming considerable time even when VSIP_IMPL_PROFILER == 0 (see below for an example). This patch disables that when VSIP_IMPL_PROFILER & mask is false. Ok to apply? -- Jules Pre change % benchmarks/dot -1 -ops -samples 3 # what : t_dot1 (1) # nproc : 1 # ops_per_point(1) : 2 # riob_per_point(1): 8 # wiob_per_point(1): 0 # metric : ops_per_sec # start_loop : 133341 16 16.903055 16.885244 16.949444 32 33.609573 33.514175 33.751736 64 66.244659 66.077934 66.556038 128 131.906174 130.535217 133.274078 256 256.051178 255.275986 258.389465 512 488.516510 488.469940 490.960785 1024 859.449341 852.046570 865.133423 2048 1438.800781 1436.463745 1450.017334 4096 1849.857788 1845.208984 1853.868042 8192 2116.325684 2113.715820 2117.107910 16384 2440.822754 2436.575684 2441.876709 32768 2707.750977 2697.868652 2708.683594 65536 2829.450439 2823.039062 2830.946045 131072 2885.994141 2882.093018 2891.500000 262144 2871.778809 2868.683350 2883.067627 524288 1753.057373 1681.539062 1754.856323 1048576 830.176514 826.936584 856.340210 2097152 759.201416 740.848389 761.915283 Post change % benchmarks/dot -1 -ops -samples 3 # what : t_dot1 (1) # nproc : 1 # ops_per_point(1) : 2 # riob_per_point(1): 8 # wiob_per_point(1): 0 # metric : ops_per_sec # start_loop : 3378559 16 393.554230 384.410278 424.985718 32 722.641785 695.435608 730.206116 64 1371.760376 1341.322266 1387.337769 128 2211.171875 2210.553955 2213.565918 256 3231.258545 3229.922119 3234.606445 512 4069.423584 4054.140625 4075.931396 1024 4726.631836 4715.732422 4733.557129 2048 5040.998535 5032.172852 5041.894043 4096 4921.556641 4913.420410 4929.406738 8192 2906.246582 2903.860596 2910.550537 16384 2924.517334 2917.423340 2928.528076 32768 2962.245605 2960.767822 2964.117432 65536 2958.519531 2952.603271 2960.056641 131072 2951.296143 2945.608154 2962.209961 262144 2858.965332 2855.086182 2867.772949 524288 1654.068970 1621.036987 1683.348999 1048576 781.400024 758.868774 808.620667 2097152 737.620056 735.490540 743.508240 -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: prof.diff URL: From jules at codesourcery.com Mon Jun 30 19:54:09 2008 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 30 Jun 2008 15:54:09 -0400 Subject: [patch] Fix FFT macros when neutral_acconfig = n Message-ID: <486939E1.1050901@codesourcery.com> This patch avoids defining VSIP_IMPL_FFTW3_HAVE_{TYPE} when TYPE is not supported. This is necessary since the macro users check for definition (#ifdef). Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fft-m4.diff URL: