From jules at codesourcery.com Thu Feb 2 17:57:30 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 02 Feb 2006 12:57:30 -0500 Subject: [patch] Fix for lvalue_proxy's of complex types. Message-ID: <43E2480A.3050409@codesourcery.com> When trying to compile the library with split storage of complex for Dense blocks, instances of 'view(index)' syntax that previously used true lvalues now started using proxy lvalues. This uncovered some problems with proxies of complex values (for example, simple things like 'a += view(i)' didn't work!). See comment in regr_complex_proxy.cpp below for my attempt at describe the problem. The solution is to have Lvalue_proxy derive from std::complex when proxying a complex value. This isn't efficient or pretty, but it appears to work OK (with both GCC and ICC). I extended the existing lvalue-proxy test to cover the problems even when the library is compiled with interleaved storage of complex, and added a new regression to test the problem cases directly. In general, we should avoid using the 'view(index)' syntax from within the library and in high-performance code (esp for views of complex values). It is OK to continue using the 'view(index)' syntax in our tests cases (it is easier to read and it helps shake out Lvalue_proxy functionality). Any concerns with this approach? -- Jules -------------- next part -------------- A non-text attachment was scrubbed... Name: proxy.diff Type: text/x-patch Size: 23162 bytes Desc: not available URL: From jules at codesourcery.com Fri Feb 3 17:37:57 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 03 Feb 2006 12:37:57 -0500 Subject: [patch] Support for split-storage of complex by Dense Message-ID: <43E394F5.6090409@codesourcery.com> This patch primarily allows dense blocks to store complex data in split format: - New configure option: --with-complex=FORMAT For example, to store data in split format, '--with-complex=split' Right now the default is to use interleaved, i.e. '--with-complex=inter'. However, on the mercury we will want to make the default split. - New Ext_data_local class that combines aspects of working_view_holder (that translates a global view into a local view) and Ext_data (that returns a pointer to a local view, potentially rearranging the format). This arrangement avoids a potential extra copy and perform memory allocations early. This patch also: fixes a bug in Histogram::operator(). It should return a view with value-type scalar_i, not with value-type T. (Don, can you double-check that this is correct?) -- Jules -------------- next part -------------- A non-text attachment was scrubbed... Name: split.diff Type: text/x-patch Size: 121045 bytes Desc: not available URL: From don at codesourcery.com Fri Feb 3 17:53:56 2006 From: don at codesourcery.com (Don McCoy) Date: Fri, 03 Feb 2006 10:53:56 -0700 Subject: [vsipl++] [patch] Support for split-storage of complex by Dense In-Reply-To: <43E394F5.6090409@codesourcery.com> References: <43E394F5.6090409@codesourcery.com> Message-ID: <43E398B4.8060106@codesourcery.com> Jules Bergmann wrote: > This patch also: fixes a bug in Histogram::operator(). It should > return a view with value-type scalar_i, not with value-type T. (Don, > can you double-check that this is correct?) > Yes, indeed, that is correct. Thanks for finding that. -- Don McCoy CodeSourcery From jules at codesourcery.com Tue Feb 14 13:44:12 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 14 Feb 2006 08:44:12 -0500 Subject: [patch] MPI/Pro on mercury, potential fix for benchmark wrap-around Message-ID: <43F1DEAC.5070007@codesourcery.com> Woo-hoo! Last night I was able to configure the library with MPI and run the distributed-block.cpp test with 2 processors. A simpler corner-turn test ran fine too on 1, 2, and 4 processors. I'll update the wiki, but to run an mpi program using mpirun, it is necessary to first run sysmc as follows (assume you are using CEs 2,3,4,5 with -np 4): sysmc -ce 2 init 3-5 mpirun -np 4 job.exe sysmc -ce 2 reset 3-5 The 'sysmc -ce 2 init 3-5' establishes a communication cluster of the specified CEs. The patch below updates configure to work with MPI/Pro. Because the MPI/Pro headers don't have any distinguishing defines, the user must enable mpi as '--enable-mpi=mpipro'. MPI/Pro on the mercury doesn't have an mpicxx, so this patch avoids doing the showme probe. This patch also updates the benchmark loop driver to use doubles instead of size_t's as intermediate values when computing the ops/sec. On 32-bit machines (such as the GTRI cluster), I've seen wrap-around with large sample times (i.e. (big loop count) * (ops/point) * (points) > 2^32 -> number wraps around). Don, I'm hoping this will fix the funny drops you were seeing in SAL performance at 128 points and 2048 points. Patch applied. -- Jules -------------- next part -------------- A non-text attachment was scrubbed... Name: mc.diff Type: text/x-patch Size: 3600 bytes Desc: not available URL: From stefan at codesourcery.com Thu Feb 16 16:22:31 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 16 Feb 2006 11:22:31 -0500 Subject: patch: support sal-fft Message-ID: <43F4A6C7.7030708@codesourcery.com> The attached patch adds support for SAL as a new backend for our FFT engine. This new backend does not support 3D ffts, and block sizes that are not powers of two. I therefor (conditionally) masked those tests in tests/fft.cpp that SAL does not support. This can be reverted as soon as the fft_dispatch framework is in place, i.e. multiple fft backends are supported in parallel. I believe the code in src/vsip/impl/sal/fft.h can be tidied up quite a bit, though I'd prefer to do that only once the code is refactored, i.e. for now focus on feature completeness. Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Fri Feb 17 17:26:36 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 17 Feb 2006 12:26:36 -0500 Subject: [vsipl++] patch: support sal-fft In-Reply-To: <43F4A6C7.7030708@codesourcery.com> References: <43F4A6C7.7030708@codesourcery.com> Message-ID: <43F6074C.4080309@codesourcery.com> Stefan Seefeld wrote: > The attached patch adds support for SAL as a new backend for our FFT > engine. > This new backend does not support 3D ffts, and block sizes that are not > powers of two. > I therefor (conditionally) masked those tests in tests/fft.cpp that SAL > does not > support. This can be reverted as soon as the fft_dispatch framework is > in place, > i.e. multiple fft backends are supported in parallel. That sounds fine. > > I believe the code in src/vsip/impl/sal/fft.h can be tidied up quite a bit, > though I'd prefer to do that only once the code is refactored, i.e. for now > focus on feature completeness. Stefan, This looks good. We need to avoid copying data, unless it is necessary to pack/unpack it, align it, or transpose it. In particular,for complex to complex FFTs we should use fft_coptx() instead of fft_copx(). -- Jules > > Regards, > Stefan > > > ------------------------------------------------------------------------ > +#elif defined(VSIP_IMPL_SAL_FFT) > + // In some contexts, SAL destroys the input data itself, and sometimes > + // we have to modify it to 'pack' data into the format SAL expects > + // (see SAL Tutorial for details). > + // Therefor, we always copy the input. > + static const bool must_copy = true; For complex-to-complex FFTs, there are non-clobbering variants of some of the functions (fft_coptx instead of fft_copx) that we should use instead of copying the data. For real-to-complex and complex-to-real it still might be necessary to copy. > + // SAL cannot handle non-unit strides properly as 'complex' isn't > + // a real (packed) datatype, so the stride would be applied to the real/imag > + // offset, too. > + static const vsip::dimension_type transpose_target = 1; > #else > static const bool must_copy = false; > static const vsip::dimension_type transpose_target = axis; > #endif > - > typename impl::Maybe_force_copy< > must_copy,typename local_type::block_type, > typename impl::Maybe_transpose<2,axis,transpose_target>::type>::type > @@ -833,7 +890,7 @@ > impl::Ext_data(this->in_temp_).data()); > > const bool native_order = (axis == transpose_target); > - > + > this->core_->scale_ = this->scale_; > this->core_->runs_ = local_inout.size(1-axis); > this->core_->stride_ = 1; > Index: src/vsip/impl/sal/fft.hpp > + > +template > +inline void > +from_to(Fft_core& self, inT const* in, outT* out) > + VSIP_NOTHROW > +{ > + assert(0 && "Sorry, operation not yet supported for this type !"); > + // TBD This shouldn't make it out into a release, but to be on the safe size, you should really say: VSIP_IMPL_THROW(vsip::impl::unimplemented("...")); > +} > + > +// 1D real -> complex forward fft > + > +inline void > +from_to(Fft_core<1, float, std::complex, false>& self, > + float const* in, std::complex* out) > + VSIP_NOTHROW > +{ > + FFT_setup setup = reinterpret_cast(self.plan_); > + float *out_ = reinterpret_cast(out); We reserve the "_" suffix for member variables. Perhaps you could call the parameter "arg_out" and then use "out" for the cast? > + fft_ropx(&setup, const_cast(in), 1, out_, 1, > + self.size_[0], FFT_FORWARD, sal::ESAL); > + // unpack the data (see SAL reference for details). > + int const N = (1 << self.size_[0]) + 2; > + out_[N - 2] = out_[1]; > + out_[1] = 0.f; > + out_[N - 1] = 0.f; > + // forward fft_ropx is scaled up by 2. > + float scale = 0.5f; > + vsmulx(out_, 1, &scale, out_, 1, N, sal::ESAL); > +} > +inline void > +from_to(Fft_core<1, std::complex, std::complex, false>& self, > + std::complex const *in, std::complex *out) VSIP_NOTHROW > +{ > + FFT_setup setup = reinterpret_cast(self.plan_); > + COMPLEX *in_ = reinterpret_cast(const_cast*>(in)); > + COMPLEX *out_ = reinterpret_cast(out); > + long stride = 2; > + long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE; > + fft_copx(&setup, in_, stride, out_, stride, self.size_[0], > + direction, sal::ESAL); We should use fft_coptx() instead to avoid clobbering the input. Likewise for fft_copdtx. From stefan at codesourcery.com Fri Feb 17 18:00:00 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Fri, 17 Feb 2006 13:00:00 -0500 Subject: [vsipl++] patch: support sal-fft In-Reply-To: <43F6074C.4080309@codesourcery.com> References: <43F4A6C7.7030708@codesourcery.com> <43F6074C.4080309@codesourcery.com> Message-ID: <43F60F20.1060907@codesourcery.com> Jules Bergmann wrote: >> +#elif defined(VSIP_IMPL_SAL_FFT) >> + // In some contexts, SAL destroys the input data itself, and >> sometimes >> + // we have to modify it to 'pack' data into the format SAL expects >> + // (see SAL Tutorial for details). >> + // Therefor, we always copy the input. >> + static const bool must_copy = true; > > > For complex-to-complex FFTs, there are non-clobbering variants of some > of the functions (fft_coptx instead of fft_copx) that we should use > instead of copying the data. For real-to-complex and complex-to-real it > still might be necessary to copy. I agree. However, I believe the required logic will be much simpler after our fft code is modularized, i.e. backends become slightly better encapsulated. For example, SAL has its own memory allocator that we might want to use to allocate temporaries used during fft... Also, the packing / unpacking issue only exists on the complex side of a real fft, so copying is only needed for the real inverse FFTs, but not forward. >> + assert(0 && "Sorry, operation not yet supported for this type !"); >> + // TBD > > > This shouldn't make it out into a release, but to be on the safe size, > you should really say: > > VSIP_IMPL_THROW(vsip::impl::unimplemented("...")); I agree, though, with a dispatcher in place, this code would be unreachable. >> + FFT_setup setup = reinterpret_cast(self.plan_); >> + float *out_ = reinterpret_cast(out); > > > We reserve the "_" suffix for member variables. Perhaps you could call > the parameter "arg_out" and then use "out" for the cast? Fine. Thanks for the review ! Stefan From don at codesourcery.com Wed Feb 22 16:12:07 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 22 Feb 2006 09:12:07 -0700 Subject: [patch] fixes for profile timer 'realtime' option Message-ID: <43FC8D57.9090407@codesourcery.com> The attached patch fixes a minor problem with --enable-profile-timer=realtime. This option uses the function clock_gettime(), which operates on a timespec structure instead of a simple type like time_t. This caused problems where the underlying type 'stamp_type' was treated as an integer. Note: Tested with ghs and gcc. On RHEL, the options CXXFLAGS=-D_POSIX_C_SOURCE=199309L LDFLAGS=-lrt are needed to include the correct header definitions and link to the correct library. These options are not needed with the ghs compiler. Should these be incorporated into the configuration script, or is it better to require they be provided explicitly? Regards, -- Don McCoy CodeSourcery -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pt.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pt.diff URL: From jules at codesourcery.com Wed Feb 22 16:47:08 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 22 Feb 2006 11:47:08 -0500 Subject: [vsipl++] [patch] fixes for profile timer 'realtime' option In-Reply-To: <43FC8D57.9090407@codesourcery.com> References: <43FC8D57.9090407@codesourcery.com> Message-ID: <43FC958C.2090507@codesourcery.com> Don McCoy wrote: > The attached patch fixes a minor problem with > --enable-profile-timer=realtime. This option uses the function > clock_gettime(), which operates on a timespec structure instead of a > simple type like time_t. This caused problems where the underlying type > 'stamp_type' was treated as an integer. Don, This looks good, please check it in Instead of duplicating functionality for std::cout in dump(), can you change dump(char* filename, char mode) to special case the filename '-'? Something like: void Profiler::dump(char* filename, char /*mode*/) { std::ofstream file; if (strcmp(filename, "-") == 0) file = std::cout; else file.open(filename); ... body ... if (strcmp(filename, "-") != 0) file.close(); } Have you gotten file IO to work for mercury? > > Note: Tested with ghs and gcc. On RHEL, the options > CXXFLAGS=-D_POSIX_C_SOURCE=199309L LDFLAGS=-lrt are needed to include > the correct header definitions and link to the correct library. These > options are not needed with the ghs compiler. Should these be > incorporated into the configuration script, or is it better to require > they be provided explicitly? No, lets leave that out of configure.ac for now. -- Jules From don at codesourcery.com Thu Feb 23 08:21:35 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 23 Feb 2006 01:21:35 -0700 Subject: [vsipl++] [patch] fixes for profile timer 'realtime' option In-Reply-To: <43FC958C.2090507@codesourcery.com> References: <43FC8D57.9090407@codesourcery.com> <43FC958C.2090507@codesourcery.com> Message-ID: <43FD708F.8020100@codesourcery.com> Jules Bergmann wrote: > This looks good, please check it in > > Instead of duplicating functionality for std::cout in dump(), can you > change dump(char* filename, char mode) to special case the filename > '-'? Something like: Eliminated the duplicate dump() function as "/dev/stdout" covers that case. Checked in. -- Don McCoy CodeSourcery -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pt2.diff URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pt.changes URL: From stefan at codesourcery.com Fri Feb 24 22:26:58 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Fri, 24 Feb 2006 17:26:58 -0500 Subject: [patch] Some enhancements to the FFT code. Message-ID: <43FF8832.9040701@codesourcery.com> Please find attached two patches (folded into one): The SAL FFT backend uses a number of different functions that take temporary workspace, to avoid the input array to be clobbered. This is a first step to allow the frontend not to copy data in some (the most frequent ?) cases. The second part of the patch is the addition of tests to tests/fft.cpp that specifically targets the Complex_split_fmt. It passes with builtin fft (i.e. FFTW), and I'm planning to add support for split format to the SAR backend in the coming days. Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Mon Feb 27 13:50:25 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 27 Feb 2006 08:50:25 -0500 Subject: [vsipl++] [patch] Some enhancements to the FFT code. In-Reply-To: <43FF8832.9040701@codesourcery.com> References: <43FF8832.9040701@codesourcery.com> Message-ID: <440303A1.9040007@codesourcery.com> Stefan Seefeld wrote: > Please find attached two patches (folded into one): > > The SAL FFT backend uses a number of different > functions that take temporary workspace, to avoid > the input array to be clobbered. This is a first > step to allow the frontend not to copy data in > some (the most frequent ?) cases. > > The second part of the patch is the addition of > tests to tests/fft.cpp that specifically targets > the Complex_split_fmt. It passes with builtin fft > (i.e. FFTW), and I'm planning to add support for > split format to the SAR backend in the coming > days. > > Regards, > Stefan Stefan, I'm not sure if the mercury FFTs have any stated alignment requirements for their temporary buffer, but to be safe we should allocate with either a 16-byte (altivec) or 32-byte (cache line) alignment using the alloc_aligned function. Otherwise, this looks good, please check it in. -- Jules > > > ------------------------------------------------------------------------ > VSIP_THROW((std::bad_alloc)) > { > self.is_forward_ = (expn == -1); > + self.buffer_ = new Complex_of::type[dom.size()]; int const sal_alignment = 16; self.buffer_ = vsip::impl::alloc_align(sal_alignment, dom.size() * sizeof(...)); > unsigned long max = sal::log2n::translate(dom, sd, self.size_); > sal::fft_planner::create(self.plan_, max); > } > @@ -202,6 +203,7 @@ > destroy(Fft_core& self) VSIP_THROW((std::bad_alloc)) > { > sal::fft_planner::destroy(self.plan_); > + delete [] self.buffer_; vsip::impl::free_align(self.buffer_); > } > -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Mon Feb 27 15:10:00 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 27 Feb 2006 10:10:00 -0500 Subject: [vsipl++] [patch] Some enhancements to the FFT code. In-Reply-To: <440303A1.9040007@codesourcery.com> References: <43FF8832.9040701@codesourcery.com> <440303A1.9040007@codesourcery.com> Message-ID: <44031648.501@codesourcery.com> Jules Bergmann wrote: > I'm not sure if the mercury FFTs have any stated alignment requirements > for their temporary buffer, but to be safe we should allocate with > either a 16-byte (altivec) or 32-byte (cache line) alignment using the > alloc_aligned function. Ok, I use alloc_align(32, ...) for now. (Various backends may have their own optimized memory management routines, so this has to be revisited later.) I took the occasion to apply a patch to the alloc_align function which we had discussed many moons ago. It is now parametrized, i.e. instead of double *array = static_cast(alloc_align(32, 1024 * sizeof(double))); you simply write double *array = alloc_align(32, 1024); I adjusted all the code that uses alloc_align accordingly (and fixed a long-standing issue with paths in some sarsim-related scripts). The attached patch is checked in. Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Tue Feb 28 22:47:29 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 28 Feb 2006 17:47:29 -0500 Subject: [patch] Mercury bits, performance bits Message-ID: <4404D301.7060407@codesourcery.com> This patch covers fixes and improvements for running on the mercury. It also includes some of the performance work done for the LM site visit. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mc2.diff URL: