From jules at codesourcery.com Wed Nov 1 19:53:19 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 01 Nov 2006 14:53:19 -0500 Subject: [patch] Script to post process profiler output Message-ID: <4548FB2F.80908@codesourcery.com> This patch adds a fmt-profile.pl script to post-processes profiler output files. It has two modes. If run with no options and the name of a profile dump: fmt-profile.pl profile.txt it will align all the columns (the input file is overwritten). If run with the "-sec" option: fmt-profile.pl -sec profile.txt it will convert the ticks into seconds. An example output file from using the "-sec" option is attached. This has only been tested with "accumulate" profiling output. Patch applied. -- jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fp.diff URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 1.txt URL: From don at codesourcery.com Wed Nov 1 22:55:40 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 01 Nov 2006 15:55:40 -0700 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454659E2.3090902@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> Message-ID: <454925EC.1000200@codesourcery.com> Jules Bergmann wrote: > Since this code isn't going into the core library, and since this is > going to be in a flux as we optimize, let's do the following: > > - address the easy comments: > - Definitely 1, 5, 8 > - Perhaps 4, 6, 7 > - Later: 2, 3, 9. This patch addresses comments 4, 6, and 7 (diffview, fft_shift implementation), in addition to some cleanups that allow it to run with the current reorganized code base. BTW, the change in the fftshift function resulted in a 2.5x speedup! Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftshift.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftshift.diff URL: From jules at codesourcery.com Thu Nov 2 14:37:40 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 02 Nov 2006 09:37:40 -0500 Subject: [vsipl++] QR Solver In-Reply-To: <4547BF4B.8050607@codesourcery.com> References: <4547BF4B.8050607@codesourcery.com> Message-ID: <454A02B4.2060805@codesourcery.com> Assem, This looks good. Can you address the comments below and then check it in? thanks, -- Jules Assem Salama wrote: > Everyone, > This patch implements the QR backend using Cvsipl. > > Thanks, > Assem > > > ------------------------------------------------------------------------ > > Index: src/vsip/core/cvsip/solver_qr.hpp > =================================================================== > +/// Qrd implementation using CVSIP > + > +/// Requires: > +/// T to be a value type supported by SAL's QR routines [1] Change reference to C-VSIPL (instead of SAL). > + > +template + bool Blocked> > +class Qrd_impl > +{ > + typedef vsip::impl::dense_complex_type complex_type; > + typedef Layout<2, col2_type, Stride_unit_dense, complex_type> data_LP; [2] Please change this to row2_type. col2_type was necessary for Lapack. For C-VISPL, either row2_type or col2_type will work. Since most user data will be row-major, using row2_type will avoid the need for a transpose in the common case. > + typedef Fast_block<2, T, data_LP> data_block_type; > + // Member data. > +private: > + typedef std::vector > vector_type; [3] remove this (unused) typedef. > + > + length_type m_; // Number of rows. > + length_type n_; // Number of cols. > + storage_type st_; // Q storage type > + > + Matrix data_; // Factorized QR(mxn) matrix > + cvsip::Cvsip_matrix cvsip_data_; [4] Please use cvsip::View<2, T, true> instead of Cvsip_matrix. > + cvsip::Cvsip_qr cvsip_qr_; > +}; > +#endif // VSIP_CORE_CVSIP_SOLVER_QR_HPP > Index: src/vsip/core/cvsip/cvsip.hpp > =================================================================== > +// some support defines [5] First, anything we #define has to be prefixed with VSIP_ (if it is in the spec) or VSIP_IMPL_ (if not). Second, these following macros should be inline functions. In C++ inline functions are just as efficient as macros, and they don't have all the downsides. > +#define get_vsip_st(st) \ > + ( (st == qrd_nosaveq)? VSIP_QRD_NOSAVEQ: \ > + (st == qrd_saveq1)? VSIP_QRD_SAVEQ1: \ > + VSIP_QRD_SAVEQ \ > + ) > + > +#define get_vsip_side(s) \ > + ( (s == mat_lside)? VSIP_MAT_LSIDE:VSIP_MAT_RSIDE ) > + > +#define get_vsip_mat_op(op) \ > + ( (op == mat_ntrans)? VSIP_MAT_NTRANS: \ > + (op == mat_trans)? VSIP_MAT_TRANS: \ > + VSIP_MAT_HERM \ > + ) > + > } // namespace cvsip > > } // namespace impl > Index: src/vsip/core/cvsip/cvsip_qr.hpp > =================================================================== > +template > +class Cvsip_qr; [6] Please remove this forward declaration. It isn't necessary, since the definition follows immediately. > + > +template > +class Cvsip_qr : Non_copyable > +{ > + typedef typename Cvsip_traits::qr_object_type qr_object_type; > + > + public: > + Cvsip_qr(int m, int n, vsip_qrd_qopt op); > + ~Cvsip_qr(); [7] Please name the constructor 'Cvsip_qr' instead of 'Cvsip_qr'. Likewise for the destructor. This is consistent with all the other class definitions in the library. Technically, referring to the class as 'Cvsip_qr' is OK. However, this style will become burdensome for classes with many template parameters. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu Nov 2 14:59:41 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 02 Nov 2006 09:59:41 -0500 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454925EC.1000200@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> <454925EC.1000200@codesourcery.com> Message-ID: <454A07DD.2080208@codesourcery.com> > This patch addresses comments 4, 6, and 7 (diffview, fft_shift > implementation), in addition to some cleanups that allow it to run with > the current reorganized code base. Don, This looks good, I have three suggestions below, otherwise it looks good to check in. > > BTW, the change in the fftshift function resulted in a 2.5x speedup! Sweet! thanks, -- Jules > ------------------------------------------------------------------------ > > Index: src/vsip_csl/matlab_utils.hpp > =================================================================== > +// The following versions are not as efficient as those above due > +// to the overhead of creating a new view. For optimized code, > +// use the ones above. > + > +template + typename Block1> > +Vector > +fftshift( > + const_Vector in) > +{ You could define these by-value versions in terms of the above by-reference versions: Vector out(nx); fftshift(in, out); return out; } Also, because you don't know what Block1 is, you should use 'Vector' (defaulting to a Dense block) instead of 'Vector'. For example, Block1 could be a subblock type which can't be used in this context. This affects both the defn of out, and the return type of the function. > + // This function swaps halves of a vector (dimension > + // must be even). > + > + length_type nx = in.size(0); > + assert(!(nx & 1)); > + assert(nx == out.size(0)); > + > + Domain<1> left(0, 1, nx/2); > + Domain<1> right(nx/2, 1, nx/2); > + > + Vector out(nx); > + out(left) = in(right); > + out(right) = in(left); > + > + return out; > +} > + > + > +template + typename Block1> > +Matrix > +fftshift( > + const_Matrix in) > +{ Likewise. > Index: apps/ssar/kernel1.hpp > =================================================================== > @@ -225,15 +221,16 @@ > // (spatial frequency) domain. > > // 59. (n by mc array of complex numbers) filtered echoed signal > - s_filt_ = vmmul(fast_time_filter_, col_fftm_(this->fft_shift(s_raw_))); > + s_filt_ = vmmul(fast_time_filter_, > + col_fftm_(vsip_csl::matlab::fftshift(s_raw_, s_filt_))); You could use a 'using vsip_csl::matlab::fftshift' in the function body. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Thu Nov 2 15:01:28 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 02 Nov 2006 10:01:28 -0500 Subject: [vsipl++] QR Solver In-Reply-To: <4547BF4B.8050607@codesourcery.com> References: <4547BF4B.8050607@codesourcery.com> Message-ID: <454A0848.4020803@codesourcery.com> Assem, I have some high-level comments / suggestions. Sorry for being a bit behind. Assem Salama wrote: > Index: src/vsip/core/cvsip/solver_qr.hpp Can this file be named src/vsip/core/cvsip/qr.hpp, for consistency with the other backends ? > +template + bool Blocked> > +class Qrd_impl > +{ > + typedef vsip::impl::dense_complex_type complex_type; > + typedef Layout<2, col2_type, Stride_unit_dense, complex_type> data_LP; > + typedef Fast_block<2, T, data_LP> data_block_type; > + > + // Constructors, copies, assignments, and destructors. > +public: > + Qrd_impl(length_type, length_type, storage_type) > + VSIP_THROW((std::bad_alloc)); > + Qrd_impl(Qrd_impl const&) > + VSIP_THROW((std::bad_alloc)); We discussed the use of non-empty exception-specifiers, and came to the conclusion that such use will likely make the code slower, not faster. I thus think it is best not to issue 'VSIP_THROW(...)' at all, at least not in non-public parts of the API that are not covered by the spec. > + cvsip::Cvsip_matrix cvsip_data_; As Jules suggests, this should become cvsip::View<2, T, true>. > + cvsip::Cvsip_qr cvsip_qr_; I think it would simplify the code if the cvsip::Cvsip_qr template became a traits template ('Qrd_traits', or may be even a unified 'Factor_traits'). Then, your Qrd_impl class would contain a Qrd_traits::solver_type * member (or Factor_traits::qr_solver_type *), and make calls to the static Qrd_traits::decompose(), etc. This would tidy up the code quite a bit. (See all the other C-VSIPL bindings.) Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From assem at codesourcery.com Thu Nov 2 17:14:30 2006 From: assem at codesourcery.com (Assem Salama) Date: Thu, 02 Nov 2006 12:14:30 -0500 Subject: [vsipl++] QR Solver In-Reply-To: <454A02B4.2060805@codesourcery.com> References: <4547BF4B.8050607@codesourcery.com> <454A02B4.2060805@codesourcery.com> Message-ID: <454A2776.5080202@codesourcery.com> Jules Bergmann wrote: > > [4] Please use cvsip::View<2, T, true> instead of Cvsip_matrix. > Does the cvsip::View<2,T,true> offer a way to get the vsip_mview_f* ? From stefan at codesourcery.com Thu Nov 2 17:17:03 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 02 Nov 2006 12:17:03 -0500 Subject: [vsipl++] QR Solver In-Reply-To: <454A2776.5080202@codesourcery.com> References: <4547BF4B.8050607@codesourcery.com> <454A02B4.2060805@codesourcery.com> <454A2776.5080202@codesourcery.com> Message-ID: <454A280F.8030406@codesourcery.com> Assem Salama wrote: > Jules Bergmann wrote: >> >> [4] Please use cvsip::View<2, T, true> instead of Cvsip_matrix. >> > Does the cvsip::View<2,T,true> offer a way to get the vsip_mview_f* ? It has a ptr() method. Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Thu Nov 2 18:12:21 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 02 Nov 2006 13:12:21 -0500 Subject: [patch] Nested profile event names Message-ID: <454A3505.9@codesourcery.com> This patch adds support for nested profile event names in the accumulate mode. Nested events occur when a high-level operations (such as vmmul) is implemented in terms of low-level operations (such as vmul). In accumulate mode, this nesting is lost, so the profile output will have double booking of time shared between the two events. In trace mode, this isn't a problem since the nesting is shown in the output. This patch changes how events are recorded in accumulate so that the event name includes any events which it is nested in. Double booking of time still occurs, but now the profile output shows which events occurred inside of others. I.e. instead of vmmul and vmul events, there will be vmmul and vmmul:vmul events, indicating that the vmul was nested inside of vmmul. This feature is controlled by a new configure option (--disable-profile-accum-nest-events). It is enabled by default. It has to be set at configure time (as opposed to on the command line) because it affects how profile.cpp is compiled. When disabled, there is no overhead over the current implementation. The patch also modifies the fmt-profile.pl script to parse the nested events and produce a nested pretty-printed file. Attached is example output for the SSAR app including raw profile output: (3-raw.txt) and pretty-printed output: (3.txt). Note: this is with the IPP backend, compiled with fast options. However, cugel was loaded around ~1 - 1.5. This is also before Don's recent patch optimizing fftshift. It is interesting that of the 11 seconds in kernel 1, most of it is being spent outside of operations that show up in the profile. Perhaps this time is being spent in some of the get/put loops? The configure option name is a bear. Any suggestions? --disable-nested-event-names? Don, OK to commit? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 4-raw.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 4.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: nested-events.diff URL: From don at codesourcery.com Thu Nov 2 18:26:17 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 02 Nov 2006 11:26:17 -0700 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454A07DD.2080208@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> <454925EC.1000200@codesourcery.com> <454A07DD.2080208@codesourcery.com> Message-ID: <454A3849.2060004@codesourcery.com> Jules Bergmann wrote: > This looks good, I have three suggestions below, otherwise it looks > good to check in. > > You could define these by-value versions in terms of the above > by-reference versions: > > Vector out(nx); > fftshift(in, out); > return out; > } > > Also, because you don't know what Block1 is, you should use > 'Vector' (defaulting to a Dense block) instead of 'Vector Block1>'. For example, Block1 could be a subblock type which can't be > used in this context. This affects both the defn of out, and the > return type of the function. > I've addressed both of these. > > You could use a 'using vsip_csl::matlab::fftshift' in the function > body. > > That helps readability. Thanks for the suggestions. Committed with the attached changes. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftshift2.diff URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftshift2.changes URL: From mark at codesourcery.com Thu Nov 2 18:31:22 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Thu, 02 Nov 2006 10:31:22 -0800 Subject: [vsipl++] [patch] Nested profile event names In-Reply-To: <454A3505.9@codesourcery.com> References: <454A3505.9@codesourcery.com> Message-ID: <454A397A.7040805@codesourcery.com> Jules Bergmann wrote: > This patch adds support for nested profile event names in the accumulate > mode. Nice! > This feature is controlled by a new configure option > (--disable-profile-accum-nest-events). It is enabled by default. Maybe it should just be always-on, rather than configurable? Is the overhead really a big deal? Maybe --disable-profile-nest would be a simpler configuration name? Thanks, -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From jules at codesourcery.com Thu Nov 2 18:46:20 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 02 Nov 2006 13:46:20 -0500 Subject: [vsipl++] [patch] Nested profile event names In-Reply-To: <454A397A.7040805@codesourcery.com> References: <454A3505.9@codesourcery.com> <454A397A.7040805@codesourcery.com> Message-ID: <454A3CFC.5010607@codesourcery.com> Mark Mitchell wrote: > Jules Bergmann wrote: >> This patch adds support for nested profile event names in the >> accumulate mode. > > Nice! > >> This feature is controlled by a new configure option >> (--disable-profile-accum-nest-events). It is enabled by default. > > Maybe it should just be always-on, rather than configurable? Is the > overhead really a big deal? I don't know how big of a deal the overhead for this is. There are some overheads with nested events that would be good to reduce. This patch will make those overheads worse, but I'm not sure by how much. This is something we will get a better handle on as we get experience using the profiling. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Thu Nov 2 21:43:26 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 02 Nov 2006 14:43:26 -0700 Subject: [vsipl++] [patch] Nested profile event names In-Reply-To: <454A3505.9@codesourcery.com> References: <454A3505.9@codesourcery.com> Message-ID: <454A667E.3070905@codesourcery.com> Jules Bergmann wrote: > This patch adds support for nested profile event names in the > accumulate mode. > This is a nice feature! > The configure option name is a bear. Any suggestions? > --disable-nested-event-names? Or maybe --disable-profile-nesting? I think it will prefer to leave this enabled the vast majority of the time, so it probably is ok as is. > > Don, OK to commit? > Fine by me. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From don at codesourcery.com Fri Nov 3 09:17:24 2006 From: don at codesourcery.com (Don McCoy) Date: Fri, 03 Nov 2006 02:17:24 -0700 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454659E2.3090902@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> Message-ID: <454B0924.2030703@codesourcery.com> Jules Bergmann wrote: > - Later: 2, 3, 9. > This addresses the last of these comments, with the following notes: [2] save_view_as - Created this template function, but did not do the pre-allocated view optimization yet (for avoiding memory allocation during steady-state operation). [3] load_view_as - Likewise. [9] view_cast in viewtoraw - simplified using view_cast<>. I'd like to do the remaining I/O changes after we get the computational performance where we'd like it. Does that sound ok? Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: viewcast.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: viewcast.diff URL: From jules at codesourcery.com Fri Nov 3 16:07:24 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 03 Nov 2006 11:07:24 -0500 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454B0924.2030703@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> <454B0924.2030703@codesourcery.com> Message-ID: <454B693C.7000502@codesourcery.com> Don McCoy wrote: > Jules Bergmann wrote: >> - Later: 2, 3, 9. >> > This addresses the last of these comments, with the following notes: > > [2] save_view_as - Created this template function, but did not do the > pre-allocated view optimization yet (for avoiding memory allocation > during steady-state operation). > [3] load_view_as - Likewise. > > [9] view_cast in viewtoraw - simplified using view_cast<>. > Don, This looks good. > > I'd like to do the remaining I/O changes after we get the computational > performance where we'd like it. Does that sound ok? That sounds fine. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Sun Nov 5 22:32:23 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 05 Nov 2006 15:32:23 -0700 Subject: More SSAR optimizations Message-ID: <454E6677.7090101@codesourcery.com> The attached patch splits the Kernel 1 processing class into two parts. The new base class is responsible for most the setup that is applicable to images with the same geometry. Its constructor also computes the dimensions of the final output image. The benefit to the derived class is that it can now pre-allocate the remaining memory needed for processing, including the creation of the Fftm objects, which includes a potentially lengthy planning process. Also of note, this "pre-processing" phase allows two equations to be reduced (at run-time that is) to simple multiplications, which can then be vectorized by the SIMD unit. See equations 62 and 68. The setup for these equations is expensive in part because they involve two vector-matrix multiplies (one along the rows and one along the columns) which results in a hard-to-optimize memory access pattern. As this portion is now done outside the computational loop, the cost is less of an issue. It should be possible to use the resulting matrices (that I'm correctly calling 'filters') on any incoming radar data. An explicit loop at eq. 65 was also removed. The good news: These simple changes realized a 1.5x performance improvement over the current (SVN head) version! Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: k1_base.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: k1_base.diff URL: From jules at codesourcery.com Mon Nov 6 13:19:06 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 06 Nov 2006 08:19:06 -0500 Subject: [vsipl++] More SSAR optimizations In-Reply-To: <454E6677.7090101@codesourcery.com> References: <454E6677.7090101@codesourcery.com> Message-ID: <454F364A.3010203@codesourcery.com> Don, This looks good, please check it in. thanks, -- Jules Don McCoy wrote: > The attached patch splits the Kernel 1 processing class into two parts. > The new base class is responsible for most the setup that is applicable > to images with the same geometry. Its constructor also computes the > dimensions of the final output image. The benefit to the derived class > is that it can now pre-allocate the remaining memory needed for > processing, including the creation of the Fftm objects, which includes a > potentially lengthy planning process. > > Also of note, this "pre-processing" phase allows two equations to be > reduced (at run-time that is) to simple multiplications, which can then > be vectorized by the SIMD unit. See equations 62 and 68. The setup for > these equations is expensive in part because they involve two > vector-matrix multiplies (one along the rows and one along the columns) > which results in a hard-to-optimize memory access pattern. As this > portion is now done outside the computational loop, the cost is less of > an issue. It should be possible to use the resulting matrices (that I'm > correctly calling 'filters') on any incoming radar data. > > An explicit loop at eq. 65 was also removed. > > The good news: These simple changes realized a 1.5x performance > improvement over the current (SVN head) version! > Woo-hoo! -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Mon Nov 6 18:12:58 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 06 Nov 2006 11:12:58 -0700 Subject: [patch] SSAR interpolation loop Message-ID: <454F7B2A.7030209@codesourcery.com> This patch gets a 6x speedup in the interpolation loop over the previous version. Again using the same methodology of pre-computing values where possible. The memory footprint is worth considering at this point. For example, this computation stores a cube of 1072 x 1144 x 17 doubles for this particular problem size. As the problem scales, the speedup may not be worth the trade-off in memory consumed. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: interp.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: interp.diff URL: From jules at codesourcery.com Wed Nov 8 14:22:40 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 08 Nov 2006 09:22:40 -0500 Subject: [patch] Fix bug leaving PAS psets open; Add MCOE TMR timer support Message-ID: <4551E830.307@codesourcery.com> This patch fixes a bug where PAS psets were being left open, eventually confusing PAS about which psets had been opened by the process. It also uses VSIP_IMPL_CHECK_RC to check PAS return codes instead of assert, so that better failure messages are given. This patch also adds support for the MCOE TMR timer. This patch makes the profiling events in Setup_assign under control of VSIP_IMPL_PROFILER macro so they can be disabled. Finally, this patch has several benchmark improvements, changes, and additions. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pas.diff URL: From don at codesourcery.com Tue Nov 14 01:47:05 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 13 Nov 2006 18:47:05 -0700 Subject: [patch] Misc. SSAR Speedups Message-ID: <45592019.8040005@codesourcery.com> This patch results in about a 15% decrease in runtime overall (in the computational loop) relative to the current version. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fft_ref.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fft_ref.diff URL: From jules at codesourcery.com Tue Nov 14 13:55:40 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 14 Nov 2006 08:55:40 -0500 Subject: [vsipl++] [patch] Misc. SSAR Speedups In-Reply-To: <45592019.8040005@codesourcery.com> References: <45592019.8040005@codesourcery.com> Message-ID: <4559CADC.5000705@codesourcery.com> Don McCoy wrote: > This patch results in about a 15% decrease in runtime overall (in the > computational loop) relative to the current version. > > Regards, Don, this looks good, please check it in, if not already. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Tue Nov 14 18:32:35 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 14 Nov 2006 13:32:35 -0500 Subject: [patch] IPP dispatch for mag() Message-ID: <455A0BC3.9070906@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mag.diff URL: From jules at codesourcery.com Wed Nov 15 02:42:34 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 14 Nov 2006 21:42:34 -0500 Subject: [patch] Profiling bits Message-ID: <455A7E9A.4090600@codesourcery.com> This patch: - adds row/col to Fftm event names to indicate which type of Fft is being done. - adds info to the name returned by the Transpose_tag evaluator name() to distinguish between copy and transpose. - adds a '-sum' option to fmt-profile to sum the times for events nested under events with 0 operations (and compute the mflop/s). - adds a '-extra ' option to fmt-profile to add synthetic events with the extra time not accounted for in nested events. The rationale for requiring the event name to be specified is that many nested library events will have unaccounted for time (i.e. Ffts with scaling). An example profile output with '-sum' and '-extra "Kernel1 total"' options is attached. Don, is this OK to commit? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: prof.diff URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: profile.txt URL: From don at codesourcery.com Wed Nov 15 03:06:37 2006 From: don at codesourcery.com (Don McCoy) Date: Tue, 14 Nov 2006 20:06:37 -0700 Subject: [vsipl++] [patch] Profiling bits In-Reply-To: <455A7E9A.4090600@codesourcery.com> References: <455A7E9A.4090600@codesourcery.com> Message-ID: <455A843D.5040004@codesourcery.com> Jules Bergmann wrote: > Don, is this OK to commit? Looks good to me -- and useful. Thanks, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From don at codesourcery.com Wed Nov 15 20:06:55 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 15 Nov 2006 13:06:55 -0700 Subject: [vsipl++-csl] [patch] Fix Fftm ops count In-Reply-To: <455B715C.8050506@codesourcery.com> References: <455B715C.8050506@codesourcery.com> Message-ID: <455B735F.8030701@codesourcery.com> Jules Bergmann wrote: > Ok to check in? > I thought you were going to have to pass both sizes and the axis down to a very modified Op_count function. This is better. Ok by me. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From jules at codesourcery.com Fri Nov 17 20:04:09 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 17 Nov 2006 15:04:09 -0500 Subject: [patch] SSAR: by-row/col processor for digital_spotlighting Message-ID: <455E15B9.2080808@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar.diff URL: From don at codesourcery.com Sat Nov 18 23:13:28 2006 From: don at codesourcery.com (Don McCoy) Date: Sat, 18 Nov 2006 16:13:28 -0700 Subject: [patch] SSAR refactoring Message-ID: <455F9398.3010302@codesourcery.com> Ok to commit? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rmtmps.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rmtmps.diff URL: From jules at codesourcery.com Mon Nov 20 02:34:05 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 19 Nov 2006 21:34:05 -0500 Subject: [vsipl++] [patch] SSAR refactoring In-Reply-To: <455F9398.3010302@codesourcery.com> References: <455F9398.3010302@codesourcery.com> Message-ID: <4561141D.3020201@codesourcery.com> Don McCoy wrote: > Ok to commit? Yes, this looks good. Please check it in. thanks -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Tue Nov 21 21:04:42 2006 From: don at codesourcery.com (Don McCoy) Date: Tue, 21 Nov 2006 14:04:42 -0700 Subject: [patch] SSAR Interpolation Message-ID: <456369EA.1080908@codesourcery.com> This patch changes the processing order of the interpolation loop to work on columns first and rows second. This entailed switching all views in that loop to use column-major storage and adding an explicit transpose to get it in the right format for processing. Just after this loop, the order of the FFTs is reversed to take advantage of the new ordering -- keeping the net processing time for them the same. The change results in a 2x speedup for the interpolation loop! That translates to a 25% increase overall, at the cost of an additional transpose. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: interp2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: interp2.diff URL: From jules at codesourcery.com Wed Nov 22 13:35:37 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 22 Nov 2006 08:35:37 -0500 Subject: [vsipl++] [patch] SSAR Interpolation In-Reply-To: <456369EA.1080908@codesourcery.com> References: <456369EA.1080908@codesourcery.com> Message-ID: <45645229.8010106@codesourcery.com> Don McCoy wrote: > This patch changes the processing order of the interpolation loop to > work on columns first and rows second. This entailed switching all > views in that loop to use column-major storage and adding an explicit > transpose to get it in the right format for processing. Just after this > loop, the order of the FFTs is reversed to take advantage of the new > ordering -- keeping the net processing time for them the same. > > The change results in a 2x speedup for the interpolation loop! That > translates to a 25% increase overall, at the cost of an additional > transpose. Sweet! 2x is good :) The white-paper fodder here is that it is easy to experiment with different dimension-orderings primarily by changing the matrix decls. Also, thinking out loud, in parallel the transposes will be more expensive, which might alter this tradeoff of an extra transpose. We'll cross that bridge when we get there. One minor comment below, please this in. thanks, -- Jules > + Tensor > SINC_HAM_; [x] col2_type happens to work, but this is undefined behavior. I.e. 'col2_type = tuple<1, 0, undefined>', where 'undefined' happens to be 2. Please use an explicit 'tuple<1, 0, 2>' instead. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Wed Nov 22 16:54:13 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 22 Nov 2006 09:54:13 -0700 Subject: [vsipl++] [patch] SSAR Interpolation In-Reply-To: <45645229.8010106@codesourcery.com> References: <456369EA.1080908@codesourcery.com> <45645229.8010106@codesourcery.com> Message-ID: <456480B5.8070007@codesourcery.com> Jules Bergmann wrote: > > + Tensor > SINC_HAM_; > > [x] col2_type happens to work, but this is undefined behavior. > I.e. 'col2_type = tuple<1, 0, undefined>', where 'undefined' happens > to be 2. > > Please use an explicit 'tuple<1, 0, 2>' instead. > > Done. Committed. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From don at codesourcery.com Wed Nov 22 20:29:09 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 22 Nov 2006 13:29:09 -0700 Subject: SSAR cleanup Message-ID: <4564B315.5030904@codesourcery.com> This patch cleans up a few minor issues as well as improving the output by giving some processing time statistics (useful when running with -loop N > 1). -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cleanup.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cleanup.diff URL: From jules at codesourcery.com Wed Nov 22 21:26:45 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 22 Nov 2006 16:26:45 -0500 Subject: [vsipl++] SSAR cleanup In-Reply-To: <4564B315.5030904@codesourcery.com> References: <4564B315.5030904@codesourcery.com> Message-ID: <4564C095.8060509@codesourcery.com> Don McCoy wrote: > This patch cleans up a few minor issues as well as improving the output > by giving some processing time statistics (useful when running with > -loop N > 1). Don, This looks good, please check it in. thanks, -- Jules From don at codesourcery.com Mon Nov 27 07:11:32 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 27 Nov 2006 00:11:32 -0700 Subject: [patch] SSAR Make targets Message-ID: <456A8FA4.9080802@codesourcery.com> This patch adds a couple of switches to the SSAR makefile as well as a README file that (as a starting point) describes how to run the benchmark. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssarmake.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssarmake.diff URL: From jules at codesourcery.com Tue Nov 28 03:03:23 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 27 Nov 2006 22:03:23 -0500 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456A8FA4.9080802@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> Message-ID: <456BA6FB.9000204@codesourcery.com> Don McCoy wrote: > This patch adds a couple of switches to the SSAR makefile as well as a > README file that (as a starting point) describes how to run the benchmark. > Don, this looks good, please check it in. -- Jules > + > + The makefile assumes the default install location of /usr/local. If > + Sourcery VSIPL++ is installed in a non-standard location, edit the > + 'prefix=...' line in the makefile or pass the correctly value on > + the command line ('make prefix=/path/to/vsipl++'). This is fine for now, but ideally pkg-config lets us not worry about where the library is installed, if people set PKG_CONFIG_PATH properly. "By default the makefile uses the VSIPL++ package found by pkg-config in the PKG_CONFIG_PATH. However, if the prefix variable is set, either by editing the makefile or from the command line, then the makefile uses the library installed in that location." -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Tue Nov 28 06:25:37 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 27 Nov 2006 23:25:37 -0700 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456BA6FB.9000204@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> <456BA6FB.9000204@codesourcery.com> Message-ID: <456BD661.6070907@codesourcery.com> Jules Bergmann wrote: >> + >> + The makefile assumes the default install location of /usr/local. >> If + Sourcery VSIPL++ is installed in a non-standard location, edit the >> + 'prefix=...' line in the makefile or pass the correctly value on >> + the command line ('make prefix=/path/to/vsipl++'). > > This is fine for now, but ideally pkg-config lets us not worry about > where the library is installed, if people set PKG_CONFIG_PATH properly. > > ... > If I understand correctly, we intend that leaving 'prefix' blank should work normally, but allow it to be overridden if the user so desires. Is that correct? If so, I will adjust. However, I'm not sure why we need to override prefix in the .pc file (--defined-variable=prefix=...). If prefix is not set, this leads to an error. PKG_CONFIG_PATH=/lib/pkgconfig pkg-config vsipl++-ser-builtin-32 --define-variable=prefix= --cflags --define-variable argument does not have a value for the variable make: *** [vars] Error 1 Would it be correct to leave that out since the .pc files already have the correct value for prefix? I am assuming an install from a binary package. In what case would we need to override 'prefix'? -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From jules at codesourcery.com Tue Nov 28 20:29:40 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 28 Nov 2006 15:29:40 -0500 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456BD661.6070907@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> <456BA6FB.9000204@codesourcery.com> <456BD661.6070907@codesourcery.com> Message-ID: <456C9C34.2000909@codesourcery.com> > Would it be correct to leave that out since the .pc files already have > the correct value for prefix? I am assuming an install from a binary > package. In what case would we need to override 'prefix'? > If the prefix in the .pc file is correct then you don't need to override 'prefix'. This happens when you install the library in /usr/local, or when you install the library somewhere else and run the 'set-prefix.sh' script to correct the prefixes. If you install the library somewhere else and do not run 'set-prefix.sh' (or otherwise correct the prefixes), then its necessary to override the prefix. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Thu Nov 30 00:28:42 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 29 Nov 2006 17:28:42 -0700 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456C9C34.2000909@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> <456BA6FB.9000204@codesourcery.com> <456BD661.6070907@codesourcery.com> <456C9C34.2000909@codesourcery.com> Message-ID: <456E25BA.3000603@codesourcery.com> Jules Bergmann wrote: > If the prefix in the .pc file is correct then you don't need to > override 'prefix'. This happens when you install the library in > /usr/local, or when you install the library somewhere else and run the > 'set-prefix.sh' script to correct the prefixes. > > If you install the library somewhere else and do not run > 'set-prefix.sh' (or otherwise correct the prefixes), then its > necessary to override the prefix. > > Ah, thank you for explaining. An adjusted patch that takes account of whether prefix is set is attached. The README is updated slightly as well. Committed. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar2.diff URL: From stefan at codesourcery.com Thu Nov 30 00:48:53 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 29 Nov 2006 19:48:53 -0500 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456E25BA.3000603@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> <456BA6FB.9000204@codesourcery.com> <456BD661.6070907@codesourcery.com> <456C9C34.2000909@codesourcery.com> <456E25BA.3000603@codesourcery.com> Message-ID: <456E2A75.3040903@codesourcery.com> Don McCoy wrote: > Modified: csl/vpp/trunk/apps/ssar/GNUmakefile > ============================================================================== > +# The default precision is single (double may also be used) > +precision = single > + > +ifeq ($(precision),double) > +ref_image_base = ref_image_dp > +ssar_type = SSAR_BASE_TYPE=double > +else > +ref_image_base = ref_image_sp > +ssar_type = SSAR_BASE_TYPE=float > +endif > + What is the effect of "ssar_type = SSAR_BASE_TYPE=double" in a (GNU) makefile ? We usually use the ':=' assignment operator whenever possible. The only reason not to do that is if we have to evaluate the assinment lazily (i.e. because it refers to an expression of variables with yet unknown values). > Index: apps/ssar/load_save.hpp > =================================================================== > --- apps/ssar/load_save.hpp (revision 0) > +++ apps/ssar/load_save.hpp (revision 0) > @@ -0,0 +1,114 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file load_save.hpp > + @author Don McCoy > + @date 2006-10-26 > + @brief Extensions to allow type double to be used as the view > + data type while using float as the storage type on disk. > +*/ > + > +#ifndef LOAD_SAVE_HPP > +#define LOAD_SAVE_HPP > + > +#include > +#include > + > +using namespace vsip_csl; Please never, ever, use a 'using namespace' declaration in global scope in a header. I understand that here it isn't dangerous since this is not a library, but it is still confusing, and error-prone. > Index: apps/ssar/diffview.cpp > =================================================================== > --- apps/ssar/diffview.cpp (revision 0) > +++ apps/ssar/diffview.cpp (revision 0) > @@ -0,0 +1,110 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file diffview.cpp > + @author Don McCoy > + @date 2006-10-29 > + @brief Utility to compare VSIPL++ views to determine equality > +*/ > + > +#include > +#include > + > +#include > +#include > + > +#include > +#include > +#include > + > + > +using namespace vsip; > +using namespace vsip_csl; > +using namespace std; > + > + > +enum data_format_type > +{ > + COMPLEX_VIEW = 0, > + REAL_VIEW, > + INTEGER_VIEW > +}; > + > +void compare(data_format_type format, > + char const* infile, char const* ref, length_type rows, length_type cols); > + > +int > +main(int argc, char** argv) > +{ > + vsip::vsipl init(argc, argv); > + > + if (argc < 5 || argc > 6) > + { > + fprintf(stderr, "Usage: %s [-rn] \n", > + argv[0]); Why not std::cerr << "Usage: " << arv[0] << "[-rn] " << std::endl; ? Now we are using both, std::iostreams, as well as stdio. I think we should use one consistently. That may even reduce the size of the program. Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From don at codesourcery.com Thu Nov 30 02:31:41 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 29 Nov 2006 19:31:41 -0700 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456E2A75.3040903@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> <456BA6FB.9000204@codesourcery.com> <456BD661.6070907@codesourcery.com> <456C9C34.2000909@codesourcery.com> <456E25BA.3000603@codesourcery.com> <456E2A75.3040903@codesourcery.com> Message-ID: <456E428D.6020607@codesourcery.com> Stefan Seefeld wrote: > What is the effect of "ssar_type = SSAR_BASE_TYPE=double" in a (GNU) makefile ? > We usually use the ':=' assignment operator whenever possible. The only reason > not to do that is if we have to evaluate the assinment lazily (i.e. because it > refers to an expression of variables with yet unknown values). > > > Corrected. > >> Index: apps/ssar/load_save.hpp >> =================================================================== >> >> > Please never, ever, use a 'using namespace' declaration in global scope in a header. > I understand that here it isn't dangerous since this is not a library, but it is still > confusing, and error-prone. > > This was done in a number of cases during development, but I've tried to weed them all out. I hope this was one of the last... load_save.hpp is no longer used. >> Index: apps/ssar/diffview.cpp >> >> > Why not > > > std::cerr << "Usage: " << arv[0] << "[-rn] " << std::endl; > > ? > > Now we are using both, std::iostreams, as well as stdio. I think we should use one consistently. > I agree and believe it's already been changed. Do you have an old copy? In any case, thanks for catching these things! I found it was including "stdlib.h", so I removed that. The attached patch fixes these two things. Committed. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssarmake3.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssarmake3.diff URL: From don at codesourcery.com Thu Nov 30 04:08:32 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 29 Nov 2006 21:08:32 -0700 Subject: [vsipl++] [patch] SSAR Make targets In-Reply-To: <456E25BA.3000603@codesourcery.com> References: <456A8FA4.9080802@codesourcery.com> <456BA6FB.9000204@codesourcery.com> <456BD661.6070907@codesourcery.com> <456C9C34.2000909@codesourcery.com> <456E25BA.3000603@codesourcery.com> Message-ID: <456E5940.4010602@codesourcery.com> Don McCoy wrote: > Jules Bergmann wrote: >> If the prefix in the .pc file is correct then you don't need to >> override 'prefix'. This happens when you install the library in >> /usr/local, or when you install the library somewhere else and run >> the 'set-prefix.sh' script to correct the prefixes. >> >> If you install the library somewhere else and do not run >> 'set-prefix.sh' (or otherwise correct the prefixes), then its >> necessary to override the prefix. >> >> > > Ah, thank you for explaining. > An adjusted patch that takes account of whether prefix is set is > attached. The README is updated slightly as well. Committed. The attachment to this message contained the wrong patch. It was committed correctly. What should have been attached, now is. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssarmake2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssarmake2.diff URL: