From jules at codesourcery.com Tue Aug 1 12:19:28 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 01 Aug 2006 08:19:28 -0400 Subject: [patch] Add SIMD operations for logical operations, optimize distributed get Message-ID: <44CF46D0.1000806@codesourcery.com> This patch: - Updates configure to support both SIMD loop fusion and SIMD builtin routines. The intent is that as SIMD loop fusion performance improves, SIMD builtin routines will either decrease in number or go away altogether. The dispatch tag for SIMD loop fusion is 'Simd_loop_fusion_tag'. The dispatch tag for SIMD builtin routines is 'Simd_builtin_tag' The old tag 'Simd_tag' has gone away to avoid confusions. To configure the library to use SIMD loop fusion, use: --enable-simd-loop-fusion To configure the library to use the generic SIMD builtin routines --enable-builtin-simd-routines=generic Currently SIMD loop fuions is disabled by default (so that we can make a snapshot release), but the intent is to be enabled by default. - Adds generic SIMD routines for logic operations ({b,l},{and,or,xor,not}) and greater-than comparison (gt()). These routines work with Intel SSE when using GCC 3.4, and with PowerPC altivec when using GreenHills. - Extends test coverage for these logic operators. - Optimizes distributed get() to avoid a communication when running on a single processor, and when data is globally replicated. - Un-reverts the FFTW changes in vendor/GNUmakefile.inc.in This patch is being tested as part of making a snapshot. So far, things look good: /scratch/jules/release-snapshot/log-test-ParallelIntel64 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-ParallelIntel64 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialBuiltin32 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialBuiltin32 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialBuiltinAMD64 ( unknown): 133 / 150 /scratch/jules/release-snapshot/log-test-SerialBuiltinAMD64 ( unknown): 148 / 150 /scratch/jules/release-snapshot/log-test-SerialBuiltinEM64T ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialBuiltinEM64T ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialIntel32 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialIntel32 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialIntel64 ( unknown): 149 / 150 /scratch/jules/release-snapshot/log-test-SerialIntel64 ( unknown): 149 / 150 (The 1 failure for the non-AMD64 cases is due to a test that needs to be linked with -lvsip_csl. The AMD64 failures are expected.) -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: simd-logic.diff URL: From assem at codesourcery.com Tue Aug 1 14:47:51 2006 From: assem at codesourcery.com (Assem Salama) Date: Tue, 01 Aug 2006 10:47:51 -0400 Subject: configure.ac Message-ID: <44CF6997.4040109@codesourcery.com> Everyone, This is my patch to configure.ac to allow for simple-builtin option and to take into account the new variables that the vendor makefile uses. Thanks, Assem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn.diff.0812006.log URL: From jules at codesourcery.com Tue Aug 1 14:58:46 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 01 Aug 2006 10:58:46 -0400 Subject: [vsipl++] configure.ac In-Reply-To: <44CF6997.4040109@codesourcery.com> References: <44CF6997.4040109@codesourcery.com> Message-ID: <44CF6C26.5080003@codesourcery.com> Assem Salama wrote: > Everyone, > This is my patch to configure.ac to allow for simple-builtin option and > to take into account the new variables that the vendor makefile uses. Assem, This looks good. I have two comments below about how libF77 is handled. Once those are in shape, please check it in. thanks, -- Jules > + ln -s ../../clapack/F2CLIBS/libF77/libF77.a vendor/atlas/lib/libF77.a It is no longer necessary to create a symbolic linke. vendor/GNUmakefile.inc.in now copies libF77.a into the lib/ subdirectory. This is similar to how we handle the FFTW libraries, and eventually we'll handle all the Lapack libraries this way too. > + INT_LDFLAGS="$INT_LDFLAGS -L$curdir/vendor/clapack/F2CLIBS/libF77" This shouldn't be necessary either, for the same reason as above. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From assem at codesourcery.com Tue Aug 1 15:05:32 2006 From: assem at codesourcery.com (Assem Salama) Date: Tue, 01 Aug 2006 11:05:32 -0400 Subject: [vsipl++] configure.ac In-Reply-To: <44CF6C26.5080003@codesourcery.com> References: <44CF6997.4040109@codesourcery.com> <44CF6C26.5080003@codesourcery.com> Message-ID: <44CF6DBC.5060402@codesourcery.com> I have now check in this patch Jules Bergmann wrote: > Assem Salama wrote: > > Everyone, > > This is my patch to configure.ac to allow for simple-builtin option > and > > to take into account the new variables that the vendor makefile uses. > > Assem, > > This looks good. I have two comments below about how libF77 is handled. > Once those are in shape, please check it in. > > thanks, > -- Jules > > > > + ln -s ../../clapack/F2CLIBS/libF77/libF77.a > vendor/atlas/lib/libF77.a > > It is no longer necessary to create a symbolic linke. > vendor/GNUmakefile.inc.in now copies libF77.a into the lib/ > subdirectory. > > This is similar to how we handle the FFTW libraries, and eventually > we'll handle all the Lapack libraries this way too. > > > > > + INT_LDFLAGS="$INT_LDFLAGS > -L$curdir/vendor/clapack/F2CLIBS/libF77" > > This shouldn't be necessary either, for the same reason as above. > > > From don at codesourcery.com Tue Aug 1 23:38:40 2006 From: don at codesourcery.com (Don McCoy) Date: Tue, 01 Aug 2006 17:38:40 -0600 Subject: [vsipl++] configure.ac patch for Athlon In-Reply-To: <23738f080607292049p25ae0068r98bf95e92b7075ee@mail.gmail.com> References: <23738f080607292049p25ae0068r98bf95e92b7075ee@mail.gmail.com> Message-ID: <44CFE600.4000403@codesourcery.com> Sashan Govender wrote: > Hi > > Tried to compile vsipl++ on my AMD Athlon and configure failed. I've > attached a patch for vendor/atlas/configure.ac. > I had a chance to test this today and added a few minor changes to your patch (giving credit where it is due, of course!). Please find the revised patch attached. Thank you for finding this defect and bringing it to our attention. Taking the time to investigate and post a patch is very much appreciated! Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: va.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: va.diff URL: From jules at codesourcery.com Fri Aug 4 16:50:53 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 04 Aug 2006 12:50:53 -0400 Subject: [patch] Expr_ops_per_point Message-ID: <44D37AED.2020801@codesourcery.com> This patch addes the reduction to count the number of ops/point for an expression. It needs to be extended (and tested) to deal with other operator types besides binary + and *, but the basic functionality is there. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: opp.diff URL: From jules at codesourcery.com Fri Aug 4 18:32:17 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 04 Aug 2006 14:32:17 -0400 Subject: [patch] Optimize logic functions Message-ID: <44D392B1.9050907@codesourcery.com> This patch ... ... modifies configure and dispatch so that the new SIMD loop fusion and old SIMD generic routines can both be used. ... introduces generic SIMD routines for of all the logic functions (band, bor, bxor, bnot, land, lor, lxor, and lnot) and one comparison function (gt). ... optimizes distributed get performance for several cases (single processor and when data is globally replicated). ... cleans up several build problems with FFTW rules. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mc-release.diff URL: From jules at codesourcery.com Mon Aug 7 17:28:04 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 07 Aug 2006 13:28:04 -0400 Subject: [vsipl++] [patch] Profiling for IIR, FIR and matrix-vector functions In-Reply-To: <44CA2E84.6070402@codesourcery.com> References: <44CA2E84.6070402@codesourcery.com> Message-ID: <44D77824.4060606@codesourcery.com> Don McCoy wrote: > This patch also reorganizes some of the description and operation > counting functions to one place and puts them under a namespace matching > their section name from the specification. For example, 'dot', 'outer' > and other matrix-vector helper functions are under the 'impl::matvec' > namespace. Signal processing helper functions, including the > Convolution and Correlation functions, are now under 'impl::signal' > namespace. > > This reorganization is helpful because it keeps all of these related > functions in one place, which should be easier for maintenance. Note, > FFT helper functions and some of the operations counting functions have > not yet been moved either, pending approval of the current changes. > > Two miscellaneous fixes are included: A change to the benchmarks > makefile skips building MPI benchmarks when not configured with MPI. > Second, a benchmark missed getting updated due to the change in location > of the ops_info.hpp header file. Don, This looks good. I have a copule of comments below: - I think that 'impl::Length' would be more efficient than 'Domain' for passing view sizes to the Op_count_xyz::value() functions. - The template parameter to Domain is a 'dimension_type'. To be correct, the template parameters for Op_count_xyz classes that take a dimension should also use 'dimension_type'. (The same would be true if you switch to impl::Length). Please have a look to see if those make sense. Otherwise it looks good, please check it in. Also, I will rename the MPI specific benchmarks to use the same naming convention as IPP and SAL specific benchmarks. -- Jules [1] 'dimension_type' should be used for dimensions (such as D). Likewise for several template declarations below. > +template > +struct Description > +{ > + static std::string tag(const char* op, length_type size) > + { > + std::ostringstream st; > + st << op << " " << Desc_datatype::value() << " "; > + > + st.width(7); > + st << size; > + > + return st.str(); > + } > + > + static std::string tag(const char* op, Domain const &dom_kernel, > + Domain const &dom_output) > + { > + std::ostringstream st; > + st << op << " " > + << D << "D " > + << Desc_datatype::value() << " "; > + > + st.width(4); > + st << dom_kernel[0].size(); > + st.width(1); > + st << "x" << (D == 2 ? dom_kernel[1].size() : 1) << " "; > + > + st.width(7); > + st << dom_output[0].size(); > + st.width(1); > + st << "x" << (D == 2 ? dom_output[1].size() : 1); > + > + return st.str(); > + } > +}; > + > +} // namespace signal > + > + > +namespace matvec > +{ > +template > +struct Op_count_dot > +{ > + static length_type value(Domain<1> const &dom) [2] Given the way these functions are called, it will probably be more efficient to pass the size as a 'length_type' or a 'impl::Length' instead of a 'Domain'. A Domain encodes has offset and stride fields that aren't used in the op-count calculation. Because the Domain is being passed by reference, it is possible that compiler could figure out that the offset and stride aren't used and avoid creating them, but I don't think we can rely on that. > + { > + length_type count = dom[0].size() * Ops_info::mul; > + if ( dom[0].size() > 1 ) > + count += (dom[0].size() - 1) * Ops_info::add; > + return count; > + } > +}; > @@ -545,18 +573,13 @@ > const_Vector v, > const_Vector w) VSIP_NOTHROW > { > - typedef typename Promotion::type return_type; > + typedef typename Promotion::type result_type; > + Domain<1> dom_v( view_domain(v) ); > + impl::profile::Scope_event event( > + impl::matvec::Description::tag("dot", dom_v), > + impl::matvec::Op_count_dot::value(dom_v) ); [3] if you change the Op_count_dot::value to accept a Length, you can use the 'extent()' function to get the size of the view as a Length. This becomes: impl::profile::Scope_event event( impl::matvec::Description::tag("dot", dom_v), impl::matvec::Op_count_dot::value(extent(v)) ); > Index: benchmarks/GNUmakefile.inc.in > =================================================================== > --- benchmarks/GNUmakefile.inc.in (revision 145922) > +++ benchmarks/GNUmakefile.inc.in (working copy) > @@ -41,6 +41,7 @@ > $(srcdir)/benchmarks/qrd.cpp > benchmarks_cxx_srcs_ipp := $(wildcard $(srcdir)/benchmarks/*_ipp.cpp) > benchmarks_cxx_srcs_sal := $(wildcard $(srcdir)/benchmarks/*_sal.cpp) > +benchmarks_cxx_srcs_mpi := $(wildcard $(srcdir)/benchmarks/mpi_*.cpp) [4] I will rename the mpi only benchmarks to match the convention. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Tue Aug 8 19:08:07 2006 From: don at codesourcery.com (Don McCoy) Date: Tue, 08 Aug 2006 13:08:07 -0600 Subject: [vsipl++] [patch] Profiling for IIR, FIR and matrix-vector functions In-Reply-To: <44D77824.4060606@codesourcery.com> References: <44CA2E84.6070402@codesourcery.com> <44D77824.4060606@codesourcery.com> Message-ID: <44D8E117.40704@codesourcery.com> Jules Bergmann wrote: > > - I think that 'impl::Length' would be more efficient than 'Domain' > for passing view sizes to the Op_count_xyz::value() functions. > > - The template parameter to Domain is a 'dimension_type'. To be correct, > the template parameters for Op_count_xyz classes that take a dimension > should also use 'dimension_type'. (The same would be true if you switch > to impl::Length). > > Please have a look to see if those make sense. Otherwise it looks > good, please check it in. > Yes. Thanks for pointing that out. Committed with changes as noted below. > Also, I will rename the MPI specific benchmarks to use the same naming > convention as IPP and SAL specific benchmarks. > Sounds good. > [1] 'dimension_type' should be used for dimensions (such as D). > Likewise for several template declarations below. > Changed. > > +namespace matvec > > +{ > > +template > > +struct Op_count_dot > > +{ > > + static length_type value(Domain<1> const &dom) > > [2] Given the way these functions are called, it will probably be more > efficient to pass the size as a 'length_type' or a 'impl::Length' > instead of a 'Domain'. A Domain encodes has offset and stride fields > that aren't used in the op-count calculation. > All these use Length in place of Domain now. > > const_Vector v, > > const_Vector w) VSIP_NOTHROW > > { > > - typedef typename Promotion::type return_type; > > + typedef typename Promotion::type result_type; > > + Domain<1> dom_v( view_domain(v) ); > > + impl::profile::Scope_event event( > > + impl::matvec::Description::tag("dot", dom_v), > > + impl::matvec::Op_count_dot::value(dom_v) ); > > [3] if you change the Op_count_dot::value to accept a Length, you can > use the 'extent()' function to get the size of the view as a Length. > This becomes: > > impl::profile::Scope_event event( > impl::matvec::Description::tag("dot", dom_v), > impl::matvec::Op_count_dot::value(extent(v)) ); > That is nicer. I also found we have a built-in converter for making Length objects from Domains. That was needed in signal-conv.hpp where the function returning the output size does so using a domain. > > Index: benchmarks/GNUmakefile.inc.in > > =================================================================== > > --- benchmarks/GNUmakefile.inc.in (revision 145922) > > +++ benchmarks/GNUmakefile.inc.in (working copy) > > @@ -41,6 +41,7 @@ > > $(srcdir)/benchmarks/qrd.cpp > > benchmarks_cxx_srcs_ipp := $(wildcard > $(srcdir)/benchmarks/*_ipp.cpp) > > benchmarks_cxx_srcs_sal := $(wildcard > $(srcdir)/benchmarks/*_sal.cpp) > > +benchmarks_cxx_srcs_mpi := $(wildcard > $(srcdir)/benchmarks/mpi_*.cpp) > > [4] I will rename the mpi only benchmarks to match the convention. > I changed it to *_mpi.cpp to correspond. Thanks for the suggestions! -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pm2.diff URL: From jules at codesourcery.com Tue Aug 8 19:24:45 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 08 Aug 2006 15:24:45 -0400 Subject: [patch] MPI benchmarks Message-ID: <44D8E4FD.1000803@codesourcery.com> Renames mpi_alltoall to alltoall_mpi (to follow ipp and sal benchmark conventions). New copy_mpi benchmark, measures point-to-point transfer rate. Used to help diagnose cost of using derived datatypes. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mpi.diff URL: From don at codesourcery.com Wed Aug 9 20:26:16 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 09 Aug 2006 14:26:16 -0600 Subject: [patch] Serial Expression Profiling Message-ID: <44DA44E8.1040108@codesourcery.com> The attached patch extends the profiling further by handling some of the dispatched expression evaluations. The three specific cases covered are: * Loop fusion - collapsing multiple loops into one when doing element-wise operations on views. * Dense expressions - converting tightly-packed 2-D and 3-D views into 1-D views that are then evaluated normally. * Matrix transpose - transposing matrices with possibly different storage formats (row/col) This can conceivably be extended to cover cases where we are dispatching to IPP and SAL as well. All expressions are tagged in the profiler output with "Expr[/type/]", where type is LF, Dense or Trans. Following that is the dimensionality (1D, 2D or 3D), a compact representation of the expression and finally the size(s). For example, the following expression (where all are the same size and of type Vector): r = v1 * v2; Gets logged as: Expr[LF] 1D *SS 262144 : 66929535 : 1 : 262144 : 14.0664 The expression is represented as "*SS", meaning "the binary multiply operator applied to two single-precision real values" (again using the BLAS/LAPACK convention of S/D/C/Z for operand types). In general, operators are designated with a 'u', 'b' or 't' for unary, binary and ternary operators respectively, with the exception of the common binary operators, shown in their more familiar +-*/ form. Multiple operators are evaluated in order, therefore v1 * T(4) + v2 / v3 is tagged as: Expr[LF] 1D *SS/SS+SS 2048 : 1527534 : 1 : 6144 : 14.4451 Changing it to (v1 * T(4) + v2) / v3 yields: Expr[LF] 1D *SS+SS/SS 2048 : 1536309 : 1 : 6144 : 14.3626 Dense expressions will appear twice in the profiler output -- once when it is converted from a 2- or 3-D view and once when evaluated as a 1-D expression. They do, in fact, refer to the same expression. For example: Expr[Dense] 3D *SS 64x64x64 : 67455693 : 1 : 262144 : 13.9567 Expr[LF] 1D *SS 262144 : 66991743 : 1 : 262144 : 14.0533 Note that the dense evaluation includes the time it takes to perform the loop fusion evaluation, hence the slightly longer amount of time spent there. However, the time difference is probably dominated by the amount of time it takes to generate the tag itself. Note also that the sizes are reported differently, but are equivalent as 64x64x64=262144 Finally, please note that not all the operation counts are done at this point. Missing ones should probably be counted in some fashion. Currently, if an operator is not handled, it defaults to adding zero ops to the total count. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: se1.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: se1.diff URL: From don at codesourcery.com Thu Aug 10 22:54:11 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 10 Aug 2006 16:54:11 -0600 Subject: [patch] Profiler Configuration Options Message-ID: <44DBB913.7030301@codesourcery.com> This patch adds the ability to enable/disable the profiler or selected portions. The new option is: --enable-profiler=type Specify list of areas to profile. Choices include none, all or a combination of: signal, matvec, fns and user. Default is none. There is a built-in dependency on the timer or it produces an error at configuration time. The timer has also been renamed to help avoid confusion. Although there are four options, only signal and matvec are implemented yet. The former controls profiling of FFT's, Convolutions etc. (all part of the signal processing portion of the standard) and the latter controls profiling of matrix-vector functions like dot-product and matrix multiplication. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pc1.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pc1.diff URL: From don at codesourcery.com Fri Aug 11 08:06:58 2006 From: don at codesourcery.com (Don McCoy) Date: Fri, 11 Aug 2006 02:06:58 -0600 Subject: [patch] Profiler Command Line Options Message-ID: <44DC3AA2.9080108@codesourcery.com> This patch adds two new command line options related to profiling: --vsipl++-profile-mode={accum,trace} --vsipl++-profile-output=/filename/ Both should normally be used together to enable the profiler, but if the filename is omitted, the output will go to stdout. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: po1.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: po1.diff URL: From jules at codesourcery.com Fri Aug 11 19:42:26 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 11 Aug 2006 15:42:26 -0400 Subject: [vsipl++] [patch] Profiler Configuration Options In-Reply-To: <44DBB913.7030301@codesourcery.com> References: <44DBB913.7030301@codesourcery.com> Message-ID: <44DCDDA2.7090807@codesourcery.com> Don McCoy wrote: > This patch adds the ability to enable/disable the profiler or selected > portions. The new option is: > > --enable-profiler=type Specify list of areas to profile. Choices include > none, all or a combination of: signal, matvec, > fns > and user. Default is none. > > There is a built-in dependency on the timer or it produces an error at > configuration time. The timer has also been renamed to help avoid > confusion. > > Although there are four options, only signal and matvec are implemented > yet. The former controls profiling of FFT's, Convolutions etc. (all > part of the signal processing portion of the standard) and the latter > controls profiling of matrix-vector functions like dot-product and > matrix multiplication. Don, This looks good. I have two comments: The first comment: in general, I like the way the current profiling code has a minimal foorprint on the functional code. This minimizes the impact on code readabilit. In particular, you have done a good job using techniques like RAII so that in many cases a profiling event can be inserted in just a single line with the Scope_event class. We should be able to keep this up as we add the ability to disable profiling. For example, instead of disabling a Scope_event class with an ifdef: #if PROFILING_ENABLED Scope_event ev("name"); #endif we could define a VSIP_IMPL_PROFILE macro: VSIP_IMPL_PROFILE(PROFILING_ENABLED, Scope_event ev("name")) That let's us keep this as a single line. We could even fold the PROFILING_ENABLED into the VSIP_IMPL_PROFILE macro: VSIP_IMPL_PROFILE(Scope_event ev("name")) Or go all the way down to: VSIP_IMPL_SCOPE_EVENT(ev("name")) VSIP_IMPL_PROFILE could be used for other things besides Scope_events: VSIP_IMPL_PROFILE(pm_in_ext_cost_ += in_ext.cost) Of course it will make sense to use #if/#endif for some multi-line chunks of profiling code. There are a couple of ways to implement this. First, at the top of each file, you could define those macros: #define PROFILING_ENABLED (VSIP_IMPL_PROFILER & ...) #if PROFILING_ENABLED # define VSIP_IMPL_PROFILE(X) X; # define VSIP_IMPL_SCOPE_EVENT(X) Scope_event X; ... #else # define VSIP_IMPL_PROFILE(X) # define VSIP_IMPL_SCOPE_EVENT(X) ... #endif However, that leads to replication of the macros in each file, which we should avoid. A better approach is to put the VSIP_IMPL_PROFILE macro in profile.hpp. That requires a bit of work because it will be defined before PROFILING_ENABLED. Something like this should work: /* in profile.hpp: */ // Enable (or not) for a single statement #define VSIP_IMPL_PROFILE_EN_0(X) #define VSIP_IMPL_PROFILE_EN_1(X) X; // Join two names together (allowing for expansion of macros) #define VSIP_IMPL_JOIN(A, B) VSIP_IMPL_JOIN_1(A, B) #define VSIP_IMPL_JOIN_1(A, B) A ## B #define VSIP_IMPL_PROFILE(STMT) \ VSIP_IMPL_JOIN(VSIP_IMPL_PROFILE_EN_, PROFILING_ENABLED) (STMT) #define VSIP_IMPL_SCOPE_EVNET(DECL) \ VSIP_IMPL_JOIN(VSIP_IMPL_PROFILE_EN_, PROFILING_ENABLED) \ (Scope_event DECL) One more change is necessary. The PROFILING_ENABLED variable set at the top of each file needs to be set to either 0 or 1: #if (VSIP_IMPL_PROFILER & VSIP_IMPL_PROFILER_SIGNAL) # define PROFILING_ENABLED 1 #else # define PROFILING_ENABLED 0 #endif An alternative to this is to have configure.ac set individual macros for each class of profiling (set VSIP_IMPL_PROFILER_SIGNAL to either a 0 or a 1) instead of rolling them together into a mask. Then each header would have: #define PROFILING_ENABLED VSIP_IMPL_PROFILER_SIGNAL The second comment is more of a wish that can be addressed later. I would like the ability to separately enable/disable the profiling and performance APIs. The performance API should have a lower overhead than the profiling because it doesn't store data in a std::vector or std::map. Right now we can punt on this, as I'm not entire sure what the performance API and profiling overheads are and how folks will actually use all this. Other than that, this looks good to check in. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Fri Aug 11 19:49:35 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 11 Aug 2006 15:49:35 -0400 Subject: [vsipl++] [patch] Profiler Command Line Options In-Reply-To: <44DC3AA2.9080108@codesourcery.com> References: <44DC3AA2.9080108@codesourcery.com> Message-ID: <44DCDF4F.8040703@codesourcery.com> Don McCoy wrote: > This patch adds two new command line options related to profiling: > > --vsipl++-profile-mode={accum,trace} > --vsipl++-profile-output=/filename/ > > Both should normally be used together to enable the profiler, but if the > filename is omitted, the output will go to stdout. Don, This looks good! Please check it in. Oops, this patch made me think of another comment for the previous patch. thanks, -- Jules > +#define MODE_OPTION "--vsipl++-profile-mode" > +#define MODE_LENGTH (strlen(MODE_OPTION)) > + > +#define OUTPUT_OPTION "--vsipl++-profile-output" > +#define OUTPUT_LENGTH (strlen(OUTPUT_OPTION)) I was going to make the following comment: Any macros that we define in the library should be prefixed by "VSIP_IMPL_", even those that we later undefine. The reason for this is that a user program may define those macros before including the library file. However, this does not apply to macros in .cpp files, so no problemo. It does apply to the PROFILING_ENABLED macro in the previous patch. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From mark at codesourcery.com Sun Aug 13 17:29:43 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Sun, 13 Aug 2006 10:29:43 -0700 Subject: [vsipl++] [patch] Profiler Configuration Options In-Reply-To: <44DCDDA2.7090807@codesourcery.com> References: <44DBB913.7030301@codesourcery.com> <44DCDDA2.7090807@codesourcery.com> Message-ID: <44DF6187.2040304@codesourcery.com> Jules Bergmann wrote: > For example, instead of disabling a Scope_event class with an ifdef: > > #if PROFILING_ENABLED > Scope_event ev("name"); > #endif I expected you to suggest making the Scope_event class itself conditional: class Scope_event { Scope_event (const char *name) { #if PROFILING_ENABLED // Do interesting stuff. #endif } In theory, the compiler should optimize away completely empty functions and such. I'm not sure that's true, in pratice, though, so your way may be more robust in the real world. Just a random thought, -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From don at codesourcery.com Mon Aug 14 05:49:10 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 13 Aug 2006 23:49:10 -0600 Subject: [vsipl++] [patch] Profiler Configuration Options In-Reply-To: <44DCDDA2.7090807@codesourcery.com> References: <44DBB913.7030301@codesourcery.com> <44DCDDA2.7090807@codesourcery.com> Message-ID: <44E00ED6.6030501@codesourcery.com> Jules Bergmann wrote: > A better approach is to put the VSIP_IMPL_PROFILE macro in profile.hpp. > That requires a bit of work because it will be defined before > PROFILING_ENABLED. Something like this should work: > I chose this method and added an explanatory comment in profile.hpp. > The second comment is more of a wish that can be addressed later. I > would like the ability to separately enable/disable the profiling and > performance APIs. The performance API should have a lower overhead than > the profiling because it doesn't store data in a std::vector or > std::map. Right now we can punt on this, as I'm not entire sure what > the performance API and profiling overheads are and how folks will > actually use all this. > I see your point. It shouldn't be too hard to separate the two in the future. After the overhead is characterized we can decide to leave the performance API enabled always (if the impact is shown to be very small) or change it to use its own configure option. > > Other than that, this looks good to check in. > Committed with attached changes. Thanks for the help. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pc2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pc2.diff URL: From jules at codesourcery.com Mon Aug 14 11:27:08 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 14 Aug 2006 07:27:08 -0400 Subject: [vsipl++] [patch] Profiler Configuration Options In-Reply-To: <44DF6187.2040304@codesourcery.com> References: <44DBB913.7030301@codesourcery.com> <44DCDDA2.7090807@codesourcery.com> <44DF6187.2040304@codesourcery.com> Message-ID: <44E05E0C.9090201@codesourcery.com> > > I expected you to suggest making the Scope_event class itself conditional: > > class Scope_event { > Scope_event (const char *name) { > #if PROFILING_ENABLED > // Do interesting stuff. > #endif > } > The problem with this is that PROFILING_ENABLED is re-defined locally in each file that uses profiling. I.e. we may have profiling turned on for FFT, but turned off for element-wise expressions. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From mark at codesourcery.com Mon Aug 14 15:02:04 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 14 Aug 2006 08:02:04 -0700 Subject: [vsipl++] [patch] Profiler Configuration Options In-Reply-To: <44E05E0C.9090201@codesourcery.com> References: <44DBB913.7030301@codesourcery.com> <44DCDDA2.7090807@codesourcery.com> <44DF6187.2040304@codesourcery.com> <44E05E0C.9090201@codesourcery.com> Message-ID: <44E0906C.2000302@codesourcery.com> Jules Bergmann wrote: > >> >> I expected you to suggest making the Scope_event class itself >> conditional: >> >> class Scope_event { >> Scope_event (const char *name) { >> #if PROFILING_ENABLED >> // Do interesting stuff. >> #endif >> } >> > > The problem with this is that PROFILING_ENABLED is re-defined locally in > each file that uses profiling. I.e. we may have profiling turned on for > FFT, but turned off for element-wise expressions. Ah! -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From jules at codesourcery.com Mon Aug 14 18:14:30 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 14 Aug 2006 14:14:30 -0400 Subject: [vsipl++] [patch] Serial Expression Profiling In-Reply-To: <44DA44E8.1040108@codesourcery.com> References: <44DA44E8.1040108@codesourcery.com> Message-ID: <44E0BD86.8040503@codesourcery.com> Don, This patch looks good. There are 2 things I would like to change: - First, I would like to move the profiling code from the evaluator class specializations into the Dispatch_assign class. This requires some changes to the Evaluator framework, and some help from the evaluators themselves, but not much. Doing this will reduce the amount of duplication, making it easier to add a new evaluator. It will also give us visibility into distributed expressions before they are reduced. - Second, the expression name generator is pretty cool, but the psuedo-postfix notation seems unintuitive. Since the framework you have is fairly general, it should not be too hard to generate a name with proper prefix or infix notation. This email only discusses the second bullet. I need to take a look at the evaluator framework before discussing the first in more detail. -- Jules > All expressions are tagged in the profiler output with "Expr[/type/]", > where type is LF, Dense or Trans. Following that is the dimensionality > (1D, 2D or 3D), a compact representation of the expression and finally > the size(s). For example, the following expression (where all are the > same size and of type Vector): > > r = v1 * v2; > > Gets logged as: > > Expr[LF] 1D *SS 262144 : 66929535 : 1 : 262144 : 14.0664 > > The expression is represented as "*SS", meaning "the binary multiply > operator applied to two single-precision real values" (again using the > BLAS/LAPACK convention of S/D/C/Z for operand types). > In general, operators are designated with a 'u', 'b' or 't' for unary, > binary and ternary operators respectively, with the exception of the > common binary operators, shown in their more familiar +-*/ form. > Multiple operators are evaluated in order, therefore > > v1 * T(4) + v2 / v3 > > is tagged as: > > Expr[LF] 1D *SS/SS+SS 2048 : 1527534 : 1 : 6144 : 14.4451 I think it would be easier to read the expression name if - it used prefix or infix notation - treated sub-expressions differently from leaves I.e. the above expression could be: prefix: +(*(S,S), /(S,S)) infix: (S*S)+(S/S) I would suggest doing infix first, even though it is harder to read, and then adding support for infix, since we'll have to support operators (such as 'hypot') that don't have infix equivalents. > > Changing it to > > (v1 * T(4) + v2) / v3 > > yields: > > Expr[LF] 1D *SS+SS/SS 2048 : 1536309 : 1 : 6144 : 14.3626 As an example of how this notation breaks down, (v1 * T(4)) / (v2 + v3) also has the same name: '*SS+SS/SS'. An alternative to generating the name in this way is to use the standard C++ typeinfo (i.e. 'typeid(ExprBlockType).name()'). This is *much* more verbose and difficult to read than the above, but it would be possible to clean up in a post-processing step. > Index: src/vsip/impl/expr_op_names.hpp > =================================================================== > +template