From postmaster at codesourcery.com Fri Sep 16 18:53:08 2005 From: postmaster at codesourcery.com (postmaster at codesourcery.com) Date: 16 Sep 2005 18:53:08 -0000 Subject: Welcome to vsipl++@codesourcery.com Message-ID: <20050916185308.28794.qmail@mail.codesourcery.com> Welcome to the vsipl++ at codesourcery.com mailing list! From jules at codesourcery.com Fri Sep 16 20:04:20 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 16 Sep 2005 16:04:20 -0400 Subject: [patch] distributed user-storage, setup_assign Message-ID: <432B2544.6040403@codesourcery.com> This patch adds initial support for distributed user-storage, along with unit tests. It is possible to create a distributed block that can be admitted/released. When creating a block, each processor supplies a pointer to memory large enough for the subblock they own. Some of the Chained_par_assign code that built MPI datatypes assumed that the data address would not change between when the send/recv lists are constructed and when they are executed. For single statement assignments 'A = B', this is true. However, for early-bound assignments (using Setup_assign, also in this patch) of views with user-storage, it is possible that address can change in between buiding the lists and executing them. To address this, lists are now built relative to the subblock's data pointer, and then offset at execution time. This patch includes a Setup_assign object which allows expressions to be bound early and executed later: Setup_assign expr(A, B + C); // prebind A = B + C ... expr(); // execute A = B + C For serial expressions, not a lot of early binding is done. For parallel expressions, the maps are examined to determine if the expression is simple or requires communication. If the expr requires communication, any necessary setup is done during early binding. Finally, this patch includes some setup work for mappings that can either be global or local, depending on their context. An example of where this might be used is for the generator block returned from the ramp function. This fixes a small number of FIXMEs and moves two into the tracker. Issues were created for Distributed_block::get/put and more efficient admit/release data copy (#59 and #60). -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dar.diff URL: From jules at codesourcery.com Fri Sep 16 20:16:16 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 16 Sep 2005 16:16:16 -0400 Subject: Thanks - Re: [vsipl++] math.fns.operators In-Reply-To: <432B0D94.7000405@codesourcery.com> References: <4329C474.2050202@codesourcery.com> <4329C5CE.5020506@codesourcery.com> <4329D46A.9030808@codesourcery.com> <4329E365.6090201@codesourcery.com> <4329E71A.9050009@codesourcery.com> <432A1644.5030309@codesourcery.com> <432B0884.2090400@codesourcery.com> <432B0D94.7000405@codesourcery.com> Message-ID: <432B2810.2020904@codesourcery.com> Mark, Nathan, I just want to say thanks for taking the time and effort to help us understand and fix this! This is a great example of how having folks who really understand the details of the compiler and language in-house benefits our HPC work. thanks, -- Jules Jules Bergmann wrote: > Stefan, > > Looks good (third time is a charm!). Thanks for resolving this. > > -- Jules > > Stefan Seefeld wrote: > >> Jules Bergmann wrote: >> >>> >>> Jules Bergmann wrote: >>> >>>> Here's my quick & dirty kludge. -- Jules >>>> >>> >>> Well, as Stefan pointed out, this patch doesn't work 3.4 (it fails >>> for me locally with "gcc version 3.4.5 20050706 (prerelease) (Debian >>> 3.4.4-5)"). >> >> >> >> >> The attached patch uses either of the two versions of the macro, >> depending >> on which compiler is used. I tested with gcc 3.4, gcc 4.0.1, and icc >> 8.0 (sethra). >> >> Regards, >> Stefan >> > From mark at codesourcery.com Fri Sep 16 20:22:20 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Fri, 16 Sep 2005 13:22:20 -0700 Subject: [vsipl++] [patch] distributed user-storage, setup_assign In-Reply-To: <432B2544.6040403@codesourcery.com> References: <432B2544.6040403@codesourcery.com> Message-ID: <432B297C.8090208@codesourcery.com> Jules Bergmann wrote: > This patch includes a Setup_assign object which allows expressions to be > bound early and executed later: Very cool! Boy, you are going to have some documentation to write, post-HPEC. :-) -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From ncm at codesourcery.com Sat Sep 17 08:49:54 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Sat, 17 Sep 2005 01:49:54 -0700 Subject: [PATCH] fix real->complex fftm stride bug Message-ID: <20050917084954.GA32661@codesourcery.com> I have checked in the patch below. ref-impl/fft-coverage.cpp passes on x86/FFTW3 now, and most likely others besides. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.248 diff -u -p -r1.248 ChangeLog --- ChangeLog 16 Sep 2005 22:03:20 -0000 1.248 +++ ChangeLog 17 Sep 2005 08:44:29 -0000 @@ -1,3 +1,8 @@ +2005-09-17 Nathan Myers + + * src/vsip/impl/signal-fft.hpp: fix a real->complex FFTM + stride bug detected by ref-impl/fft-coverage.hpp. + 2005-09-16 Jules Bergmann * src/vsip/impl/aligned_allocator.hpp (VSIP_IMPL_ALLOC_ALIGNMENT): Index: src/vsip/impl/signal-fft.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fft.hpp,v retrieving revision 1.21 diff -u -p -r1.21 signal-fft.hpp --- src/vsip/impl/signal-fft.hpp 16 Sep 2005 02:13:38 -0000 1.21 +++ src/vsip/impl/signal-fft.hpp 17 Sep 2005 08:44:29 -0000 @@ -792,9 +792,11 @@ protected: this->core_->stride_ = 1; this->core_->dist_ = 1; if (native_order == (axis == 1)) - this->core_->dist_ = local_out.size(axis); + this->core_->dist_ = (sizeof(inT) <= sizeof(outT)) ? + local_in.size(axis) : local_out.size(axis); else - this->core_->stride_ = local_out.size(1-axis); + this->core_->stride_ = (sizeof(inT) <= sizeof(outT)) ? + local_in.size(1-axis) : local_out.size(1-axis); this->core_->from_to(raw_in.data(), raw_out.data()); } From jules at codesourcery.com Sat Sep 17 16:18:58 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 17 Sep 2005 12:18:58 -0400 Subject: [patch] configure typo Message-ID: <432C41F2.2070207@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: conf.diff URL: From jules at codesourcery.com Sat Sep 17 17:09:43 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 17 Sep 2005 13:09:43 -0400 Subject: [patch] Wall cleanup Message-ID: <432C4DD7.7040104@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: wall.diff URL: From mark at codesourcery.com Sat Sep 17 19:58:54 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Sat, 17 Sep 2005 12:58:54 -0700 Subject: PATCH: Remove JADE probes Message-ID: <432C757E.7030503@codesourcery.com> Now that we're set up to use xsltproc, which seems to be much more reliable that OpenJade, and have much more consistent behavior, I've removed any default use of Jade. You can still use it with "make JADE=...", but it's not the default. Checked in. -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vsip.patch URL: From jules at codesourcery.com Sat Sep 17 20:36:56 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 17 Sep 2005 16:36:56 -0400 Subject: [patch] Fix FFTs to compile when destination is a temporary view. Message-ID: <432C7E68.2080506@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ftv.diff URL: From ncm at codesourcery.com Sun Sep 18 01:49:04 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Sat, 17 Sep 2005 18:49:04 -0700 Subject: [PATCH] fix fftm-par.cpp under LAM. Message-ID: <20050918014904.GA6144@codesourcery.com> I have checked in the patch below to make fftm-par.cpp run correctly in parallel under mpich-1.2.7 "ch_p4" mode on my x86, and under LAM on sethra. (I still don't know why comm.barrier() has no apparent effect, for me, both in LAM on sethra and in mpich-shmem, here.) Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.257 diff -u -p -r1.257 ChangeLog --- ChangeLog 17 Sep 2005 21:52:22 -0000 1.257 +++ ChangeLog 18 Sep 2005 01:44:37 -0000 @@ -1,3 +1,8 @@ +2005-09-17 Nathan Myers + + * tests/fftm-par.cpp: robustify against mysterious behavior + in sethra lam mpi. + 2005-09-17 Mark Mitchell * doc/quickstart/quickstart.xml: Mention FFTW, IPP, MKL, and Index: tests/fftm-par.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fftm-par.cpp,v retrieving revision 1.1 diff -u -p -r1.1 fftm-par.cpp --- tests/fftm-par.cpp 10 Sep 2005 10:18:43 -0000 1.1 +++ tests/fftm-par.cpp 18 Sep 2005 01:44:37 -0000 @@ -197,7 +197,10 @@ error_db( int size = comm.size(); if (rank != 0) + { comm.buf_send(0, &refmax, 1); + comm.recv(0, &refmax, 1); + } else { for (int i = 1; i < size; ++i) @@ -207,6 +210,8 @@ error_db( if (refmax < otherefmax) refmax = otherefmax; } + for (int i = 1; i < size; ++i) + comm.buf_send(i, &refmax, 1); } @@ -226,7 +231,10 @@ error_db( } if (rank != 0) + { comm.buf_send(0, &maxsum, 1); + comm.recv(0, &maxsum, 1); + } else { for (int i = 1; i < size; ++i) @@ -236,6 +244,8 @@ error_db( if (maxsum < othersum) maxsum = othersum; } + for (int i = 1; i < size; ++i) + comm.buf_send(i, &maxsum, 1); return maxsum; } @@ -718,7 +728,8 @@ main(int argc, char** argv) << endl; // Stop each process, allow debugger to be attached. - if (comm.rank() == 0) getchar(); + char c; + if (comm.rank() == 0) read(0,&c,1); comm.barrier(); #endif @@ -744,4 +755,5 @@ main(int argc, char** argv) test_real(242); test_real(16); #endif + return 0; } From jules at codesourcery.com Sun Sep 18 22:26:47 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 18 Sep 2005 18:26:47 -0400 Subject: [patch] ICC test fixes. Message-ID: <432DE9A7.4050201@codesourcery.com> patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: icc.diff URL: From jules at codesourcery.com Mon Sep 19 02:12:56 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 18 Sep 2005 22:12:56 -0400 Subject: [patch] Fix hypot in ref-impl/view-math.cpp Message-ID: <432E1EA8.5080103@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hypot.diff URL: From jules at codesourcery.com Mon Sep 19 03:40:38 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 18 Sep 2005 23:40:38 -0400 Subject: [patch] Final bit of cleanup. Message-ID: <432E3336.1070305@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fixme2.diff URL: From don at codesourcery.com Mon Sep 19 08:36:59 2005 From: don at codesourcery.com (Don McCoy) Date: Mon, 19 Sep 2005 02:36:59 -0600 Subject: [patch] matvec: dot, trans, kron Message-ID: <432E78AB.6010207@codesourcery.com> The attached patch implements some of the matrix and vector operations. I tested it against the functions in ref-impl/math-matvec.cpp and it passes up through kron(). Also wrote a supplementary test for kron that checks it when called with matrix views [matvec.cpp] (not checked in ref-impl tests). Don ______ Added support for dot, trans and kron functions in [math.matvec] * src/vsip/math.hpp: included impl/matvec.hpp * src/vsip/impl/matvec.hpp: new file * tests/matvec.cpp: new file -------------- next part -------------- A non-text attachment was scrubbed... Name: mv.diff Type: text/x-patch Size: 7168 bytes Desc: not available URL: From jules at codesourcery.com Mon Sep 19 10:08:38 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 19 Sep 2005 06:08:38 -0400 Subject: [vsipl++] [patch] matvec: dot, trans, kron In-Reply-To: <432E78AB.6010207@codesourcery.com> References: <432E78AB.6010207@codesourcery.com> Message-ID: <432E8E26.8040105@codesourcery.com> Don, Looks, good. Can you try out the changes to trans and see if it works? My suggestion for herm needs a proper return type to work, so let's keep the current version of that for now. Please check in. -- Jules Don McCoy wrote: > The attached patch implements some of the matrix and vector operations. > I tested it against the functions in ref-impl/math-matvec.cpp and it > passes up through kron(). Also wrote a supplementary test for kron that > checks it when called with matrix views [matvec.cpp] (not checked in > ref-impl tests). > > Don > + For trans and herm, I was thinking we should be able to directly return the subview: > + // Transpositions [math.matvec.transpose] > + > + /// transpose > + template constMatrix::transpose_view > + trans(const_Matrix m) VSIP_NOTHROW > + { > + return ( Matrix(m.transpose()) ); return m.transpose(); > + } > + > + /// conjugate transpose > + template > + const_Matrix > Uh, the return type for herm is a bit more complex... Maybe Stefan can suggest a type to use. If not, go ahead and keep the current function. > + herm(const_Matrix, Block> m) VSIP_NOTHROW > + { > + return Matrix >(conj(m.transpose())); return conj(m.transpose()); > + } > + From stefan at codesourcery.com Mon Sep 19 19:05:13 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 19 Sep 2005 15:05:13 -0400 Subject: [vsipl++] [patch] matvec: dot, trans, kron In-Reply-To: <432E8E26.8040105@codesourcery.com> References: <432E78AB.6010207@codesourcery.com> <432E8E26.8040105@codesourcery.com> Message-ID: <432F0BE9.4000406@codesourcery.com> Jules Bergmann wrote: >> + + /// conjugate transpose >> + template >> + const_Matrix > > > > Uh, the return type for herm is a bit more complex... Maybe Stefan can > suggest a type to use. If not, go ahead and keep the current function. > > >> + herm(const_Matrix, Block> m) VSIP_NOTHROW >> + { >> + return Matrix >(conj(m.transpose())); > > return conj(m.transpose()); > >> + } what about template typename Unary_func_view, Block>::transpose_type>::result_type herm(const_Matrix, Block> m) VSIP_NOTHROW { typedef typename const_Matrix, Block>::transpose_type transpose_type; typedef Unary_func_view functor_type; return functor_type::apply(m.transpose()); } This assumes the conj_functor is already defined (through the macro machinery in fns_elementwise.hpp that defines the conj function). Regards, Stefan From don at codesourcery.com Mon Sep 19 21:09:02 2005 From: don at codesourcery.com (Don McCoy) Date: Mon, 19 Sep 2005 15:09:02 -0600 Subject: [vsipl++] [patch] matvec: dot, trans, kron In-Reply-To: <432F0BE9.4000406@codesourcery.com> References: <432E78AB.6010207@codesourcery.com> <432E8E26.8040105@codesourcery.com> <432F0BE9.4000406@codesourcery.com> Message-ID: <432F28EE.1090107@codesourcery.com> Stefan Seefeld wrote: > Jules Bergmann wrote: > >> Uh, the return type for herm is a bit more complex... Maybe Stefan >> can suggest a type to use. If not, go ahead and keep the current >> function. > > > what about > > template > typename Unary_func_view typename const_Matrix, > Block>::transpose_type>::result_type > herm(const_Matrix, Block> m) VSIP_NOTHROW > { > typedef typename const_Matrix, Block>::transpose_type > transpose_type; > typedef Unary_func_view functor_type; > return functor_type::apply(m.transpose()); > } > > This assumes the conj_functor is already defined (through the macro > machinery in fns_elementwise.hpp that defines the conj function). > > Regards, > Stefan This worked. Thank you Stefan. Function trans() is updated as per your suggestion Jules. Thank you also. Retested with icc 8.0 and gcc 3.4.0. Checked in. _____ Changelog: Added support for dot, trans and kron functions in [math.matvec] * src/vsip/math.hpp: included impl/matvec.hpp * src/vsip/impl/matvec.hpp: new file * tests/matvec.cpp: new file -- Don McCoy CodeSourcery, LLC -------------- next part -------------- A non-text attachment was scrubbed... Name: mv.diff Type: text/x-patch Size: 7432 bytes Desc: not available URL: From ncm at codesourcery.com Tue Sep 20 00:55:07 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Mon, 19 Sep 2005 17:55:07 -0700 Subject: [PATCH] switch to --with-fft=... Message-ID: <20050920005507.GA10733@codesourcery.com> I have checked in the patch below. Your VSIPL++ "configure" command lines must change accordingly. In particular, --enable-fftw3 => --with-fft=fftw3 --enable-fftw2 --disable-fftw2-generic => --with-fft=fftw2-float --enable-fftw2 --enable-fftw2-generic => --with-fft=fftw2-generic --enable-ipp-fft => --with-fft=ipp Note that it is now possible to build with double-precision FFTW2, although the test suite's not very friendly to that choice. (Failures occur for fftw2-float, too, but fewer; more tests assume float support.) Also, if you're configuring in IPP, you'll need to add one of --with-ipp-suffix= --with-ipp-suffix=em64t --with-ipp-suffix=m7 or what-have-you, according to your IPP installation. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.262 diff -u -p -r1.262 ChangeLog --- ChangeLog 19 Sep 2005 21:06:45 -0000 1.262 +++ ChangeLog 20 Sep 2005 00:45:36 -0000 @@ -1,3 +1,10 @@ +2005-09-19 Nathan Myers + + * configure.ac: replace all --enable-fftw* and --enable-ipp-fft with + --with-fft={fftw3,fftw2-float,fftw2-double,fftw2-generic,ipp}. + Enable building with fftw2-double. Add --with-ipp-suffix, and + require it if using IPP. + 2005-09-19 Don McCoy Added support for dot, trans and kron functions in [math.matvec] Index: configure.ac =================================================================== RCS file: /home/cvs/Repository/vpp/configure.ac,v retrieving revision 1.38 diff -u -p -r1.38 configure.ac --- configure.ac 19 Sep 2005 03:39:54 -0000 1.38 +++ configure.ac 20 Sep 2005 00:45:36 -0000 @@ -41,42 +41,33 @@ AC_ARG_WITH(ipp_prefix, must be in PATH/include; libraries in PATH/lib.]), dnl If the user specified --with-ipp-prefix, they mean to use IPP for sure. [enable_ipp=yes]) - -AC_ARG_ENABLE([ipp-fft], - AS_HELP_STRING([--enable-ipp-fft], - [use IPP FFT (default is to use it if it is found and - no other FFT is enabled and found.)]),, - [enable_ipp_fft=probe]) - -AC_ARG_ENABLE([fftw3], - AS_HELP_STRING([--disable-fftw3], - [don't use FFTW3 (default is to use it if found)]),, - [enable_fftw3=probe]) +AC_ARG_WITH(ipp_suffix, + AS_HELP_STRING([--with-ipp-suffix=TARGET], + [Specify the optimization target of IPP libraries, such as + a6, em64t, i7, m7, mx, px, t7, w7. E.g. a6 => -lippsa6. + TARGET may be the empty string.]), + dnl If the user specified --with-ipp-suffix, they mean to use IPP for sure. + [enable_ipp=yes]) + +AC_ARG_WITH(fft, + AS_HELP_STRING([--with-fft=LIB], + [Specify FFT engine: fftw3, fftw2-float, fftw2-double, + fftw2-generic, or ipp. For fftw2-generic, float support + is in and -lfftw, not and -lsfftw.]), + [chose_fft=yes]) + AC_ARG_WITH(fftw3_prefix, AS_HELP_STRING([--with-fftw3-prefix=PATH], [Specify the installation prefix of the fftw3 library. Headers must be in PATH/include; libraries in PATH/lib.]), dnl If the user specified --with-fftw3-prefix, they mean to use FFTW3 for sure. - [enable_fftw3=yes]) + [with_fft=fftw3]) -AC_ARG_ENABLE([fftw2], - AS_HELP_STRING([--disable-fftw2], - [don't use FFTW2 (default is to try to use it)]),, - [enable_fftw2=probe]) AC_ARG_WITH(fftw2_prefix, AS_HELP_STRING([--with-fftw2-prefix=PATH], [Specify an installation prefix of the FFTW2 library. Headers must be in PATH/include; libraries in PATH/lib.]), - [enable_fftw2=yes]) -AC_ARG_ENABLE([fftw2-generic], - AS_HELP_STRING([--disable-fftw2-generic], - [Look in , not for fftw2 float headers. - Link -lsfftw instead of -lfftw to get float fftw2 lib]),, - [enable_fftw2_generic=yes]) -AC_ARG_ENABLE([fft_use_float], - AS_HELP_STRING([--disable-fft-use-float], - [Do not try to compile in float FFT support.]),, - [fft_use_float=1]) + [with_fft=fftw2]) # LAPACK and related libraries (Intel MKL) @@ -201,17 +192,32 @@ vsip_impl_avoid_posix_memalign= # At present, IPP, FFTW3, and FFTW2 are supported. # -if test "$enable_ipp_fft" == "yes"; then - if test "$enable_fftw3" == "yes"; then - AC_MSG_ERROR([Cannot enable both FFTW3 and IPP_FFT]) - fi - enable_fftw3="no" - - if test "$enable_fftw2" == "yes" ; then - AC_MSG_ERROR([Cannot enable both FFTW2 and IPP_FFT]) - fi - enable_fftw2="no" -fi +enable_fftw3="no" +enable_fftw2="no" +enable_ipp_fft="no" + +if test "$with_fft" = "fftw3"; then + enable_fftw3="yes" +elif test "$with_fft" = "fftw2-float"; then + enable_fftw2="yes" + enable_fftw2_float="yes" +elif test "$with_fft" = "fftw2-double"; then + enable_fftw2="yes" + enable_fftw2_double="yes" +elif test "$with_fft" = "fftw2-generic"; then + enable_fftw2="yes" + enable_fftw2_generic="yes" + enable_fftw2_float="yes" +elif test "$with_fft" = "ipp"; then + enable_ipp_fft="yes" +elif test "$chose_fft" != "yes"; then + enable_fftw3="probe" + enable_fftw2="probe" + enable_ipp_fft="probe" +else + AC_MSG_ERROR([Argument to --with-fft= must be one of fftw3, fftw2-float, + fftw2-double, fftw2-generic, or ipp.]) +fi if test "$enable_fftw3" != "no" ; then keep_CPPFLAGS=$CPPFLAGS @@ -231,8 +237,6 @@ if test "$enable_fftw3" != "no" ; then LIBS="$keep_LIBS" fi else - enable_ipp_fft="no" - enable_fftw2="no" AC_DEFINE_UNQUOTED(VSIP_IMPL_FFTW3, 1, [Define to build using FFTW3 headers.]) @@ -267,12 +271,19 @@ if test "$enable_fftw3" != "no" ; then keep_LIBS="$keep_LIBS -lfftw3l"]) LIBS="$keep_LIBS" + + enable_ipp_fft="no" + enable_fftw2="no" fi fi if test "$enable_fftw2" != "no" ; then - vsip_impl_use_float=1 + if test "$enable_fftw2_double" != "yes" ; then + vsip_impl_use_double=1 + else + vsip_impl_use_float=1 + fi vsip_impl_fftw2=1 FFT_CPPFLAGS= @@ -282,7 +293,8 @@ if test "$enable_fftw2" != "no" ; then FFT_LDFLAGS="-L$with_fftw2_prefix/lib" fi FFT_LIBS= - if test "$enable_fftw2_generic" == "yes" ; then + if test "$enable_fftw2_generic" == "yes" -o \ + "$enable_fftw2_double" ; then FFT_LIBS="-lfftw -lrfftw" fftw2_h="fftw.h" else @@ -306,9 +318,13 @@ if test "$enable_fftw2" != "no" ; then CPPFLAGS="$keep_CPPFLAGS" fi else - enable_ipp_fft="no" - AC_DEFINE_UNQUOTED(VSIP_IMPL_FFT_USE_FLOAT, $vsip_impl_use_float, - [Define to build code with support for FFT on float types.]) + if test "$enable_fftw2_double" == "yes"; then + AC_DEFINE_UNQUOTED(VSIP_IMPL_FFT_USE_DOUBLE, $vsip_impl_use_double, + [Define to build code with support for FFT on double types.]) + else + AC_DEFINE_UNQUOTED(VSIP_IMPL_FFT_USE_FLOAT, $vsip_impl_use_float, + [Define to build code with support for FFT on float types.]) + fi AC_DEFINE_UNQUOTED(VSIP_IMPL_FFTW2, $vsip_impl_fftw2, [Define to build using FFTW2 headers.]) if test "$enable_fftw2_generic" == "yes" ; then @@ -318,6 +334,8 @@ if test "$enable_fftw2" != "no" ; then AC_SUBST(FFT_CPPFLAGS) AC_SUBST(FFT_LIBS) + + enable_ipp_fft="no" fi fi @@ -436,8 +454,9 @@ AC_DEFINE_UNQUOTED(VSIP_IMPL_PAR_SERVICE if test "$enable_ipp_fft" == "yes"; then if test "$enable_ipp" == "no"; then AC_MSG_ERROR([IPP FFT requires IPP]) - fi - enable_ipp="yes" + else + enable_ipp="yes" + fi fi if test "$enable_ipp" != "no"; then @@ -454,22 +473,26 @@ if test "$enable_ipp" != "no"; then AC_CHECK_HEADER([ipps.h], [vsipl_ipps_h_name=''],, [// no prerequisites]) if test "$vsipl_ipps_h_name" == "not found"; then if test "$enable_ipp" != "probe" -o "$enable_ipp_fft" == "yes"; then - AC_MSG_ERROR([IPP or IPP_FFT enabled, but no ipps.h detected]) + AC_MSG_ERROR([IPP enabled, but no ipps.h detected]) else CPPFLAGS="$save_CPPFLAGS" fi + else + if test "${with_ipp_suffix-unset}" == "unset"; then + AC_MSG_ERROR([IPP enabled, but library suffix not set.]) + fi # Find the library. save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS $IPP_LDFLAGS" LIBS="-lpthread $LIBS" - AC_SEARCH_LIBS(ippCoreGetCpuType, [ippcoreem64t],, + AC_SEARCH_LIBS(ippCoreGetCpuType, ["ippcore$with_ipp_suffix"],, [LD_FLAGS="$save_LDFLAGS"]) save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS $IPP_LDFLAGS" - AC_SEARCH_LIBS(ippsMul_32f, [ippsem64t ippsm7 ipps], + AC_SEARCH_LIBS(ippsMul_32f, ["ipps$with_ipp_suffix"], [ AC_SUBST(VSIP_IMPL_HAVE_IPP, 1) AC_DEFINE_UNQUOTED(VSIP_IMPL_HAVE_IPP, 1, @@ -502,7 +525,7 @@ int main(int, char **) LDFLAGS="$LDFLAGS $IPP_FFT_LDFLAGS" AC_SEARCH_LIBS( - [ippiFFTFwd_CToC_32fc_C1R], [ippiem64t ippim7 ippi], + [ippiFFTFwd_CToC_32fc_C1R], ["ippi$with_ipp_suffix"], [ AC_SUBST(VSIP_IMPL_IPP_FFT, 1) AC_DEFINE_UNQUOTED(VSIP_IMPL_IPP_FFT, 1, From ncm at codesourcery.com Tue Sep 20 01:32:38 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Mon, 19 Sep 2005 18:32:38 -0700 Subject: [PATCH] fft-core.hpp minor cleanup Message-ID: <20050920013238.GA12541@codesourcery.com> The patch below is checked in. It does some minor whitespace cleanup, re-arranging, and comment improvements for better maintainability in fft-core.hpp. It doesn't matter much whether it ends up in the release. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.263 retrieving revision 1.264 diff -u -p -r1.263 -r1.264 --- ChangeLog 20 Sep 2005 00:46:29 -0000 1.263 +++ ChangeLog 20 Sep 2005 01:29:43 -0000 1.264 @@ -1,5 +1,10 @@ 2005-09-19 Nathan Myers + * src/vsip/impl/fft-core.hpp: minor format cleanup, documentation + improvements. + +2005-09-19 Nathan Myers + * configure.ac: replace all --enable-fftw* and --enable-ipp-fft with --with-fft={fftw3,fftw2-float,fftw2-double,fftw2-generic,ipp}. Enable building with fftw2-double. Add --with-ipp-suffix, and Index: src/vsip/impl/fft-core.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/fft-core.hpp,v retrieving revision 1.15 retrieving revision 1.16 diff -u -p -r1.15 -r1.16 --- src/vsip/impl/fft-core.hpp 19 Sep 2005 03:39:54 -0000 1.15 +++ src/vsip/impl/fft-core.hpp 20 Sep 2005 01:29:43 -0000 1.16 @@ -905,22 +905,10 @@ int_log2(unsigned size) // assume siz return n; } -template inline IppStatus dum(P**, int, int, IppHintAlgorithm) - { return ippStsNoErr; } -template inline IppStatus dum(P**, int, int, int, IppHintAlgorithm) - { return ippStsNoErr; } -template inline IppStatus dum(P**, IppiSize, int, IppHintAlgorithm) - { return ippStsNoErr; } -template inline IppStatus dum(P*) - { return ippStsNoErr; } -template inline IppStatus dum(P const*, int*) - { return ippStsNoErr; } -template inline IppStatus dum( - T const*, T*, P const*, Ipp8u*) - { return ippStsNoErr; } -template inline IppStatus dum( - T const*, int, T*, int, P const*, Ipp8u*) - { return ippStsNoErr; } +// Ipp_DFT_Base is the generic driver for all IPP calls. +// +// Note the differing signatures for 2D plans in the FFT (power-of-two +// array argument size) and DFT forms (non-), planFFun2 vs. planDFun2. template < vsip::dimension_type Dim, @@ -933,8 +921,8 @@ template < IppStatus (*forwardFFun1)(T const*, T*, planFT const*, Ipp8u*), IppStatus (*inverseFFun1)(T const*, T*, planFT const*, Ipp8u*), IppStatus (*forwardFFun2)(T const*, int, T*, int, planFT const*, Ipp8u*), - IppStatus (*inverseFFun2)(T const*, int, T*, int, planFT const*, Ipp8u*), - typename planDT, + IppStatus (*inverseFFun2) + (T const*, int, T*, int, planFT const*, Ipp8u*), typename planDT, IppStatus (*planDFun1)(planDT**, int, int, IppHintAlgorithm), IppStatus (*planDFun2)(planDT**, IppiSize, int, IppHintAlgorithm), IppStatus (*disposeDFun)(planDT*), @@ -1009,7 +997,8 @@ struct Ipp_DFT_base } static void - forward2(void* plan, void const* in, void* out, void* buffer, bool f) VSIP_NOTHROW + forward2(void* plan, void const* in, void* out, void* buffer, bool f) + VSIP_NOTHROW { IppStatus result = (f ? (*forwardFFun2)( @@ -1024,7 +1013,8 @@ struct Ipp_DFT_base } static void - inverse(void* plan, void const* in, void* out, void* buffer, bool f) VSIP_NOTHROW + inverse(void* plan, void const* in, void* out, void* buffer, bool f) + VSIP_NOTHROW { IppStatus result = (f ? (*inverseFFun1)( @@ -1039,7 +1029,8 @@ struct Ipp_DFT_base } static void - inverse2(void* plan, void const* in, void* out, void* buffer, bool f) VSIP_NOTHROW + inverse2(void* plan, void const* in, void* out, void* buffer, bool f) + VSIP_NOTHROW { IppStatus result = (f ? (*inverseFFun2)( @@ -1054,10 +1045,34 @@ struct Ipp_DFT_base } }; +// These are dummy functions to act as place-holders for arguments to +// template Ipp_DFT_base<>. + +template inline IppStatus dum(P**, int, int, IppHintAlgorithm) + { return ippStsNoErr; } +template inline IppStatus dum(P**, int, int, int, IppHintAlgorithm) + { return ippStsNoErr; } +template inline IppStatus dum(P**, IppiSize, int, IppHintAlgorithm) + { return ippStsNoErr; } +template inline IppStatus dum(P*) + { return ippStsNoErr; } +template inline IppStatus dum(P const*, int*) + { return ippStsNoErr; } +template inline IppStatus dum( + T const*, T*, P const*, Ipp8u*) + { return ippStsNoErr; } +template inline IppStatus dum( + T const*, int, T*, int, P const*, Ipp8u*) + { return ippStsNoErr; } + + +// Specializations of Ipp_DFT create the mappings from argument +// types to the appropriate IPP library functions. + template struct Ipp_DFT; -// 1D, C to C, float +// IPP driver, 1D, C to C, float template <> struct Ipp_DFT<1,std::complex > @@ -1077,7 +1092,7 @@ struct Ipp_DFT<1,std::complex > typedef std::complex out_type; }; -// 2D, C to C, float +// IPP driver, 2D, C to C, float template <> struct Ipp_DFT<2,std::complex > @@ -1097,7 +1112,7 @@ struct Ipp_DFT<2,std::complex > typedef std::complex out_type; }; -// 1D, C to C, double +// IPP driver, 1D, C to C, double template <> struct Ipp_DFT<1,std::complex > @@ -1119,7 +1134,7 @@ struct Ipp_DFT<1,std::complex > // 2D, C to C, double, power of 2 -// IPP has no 2D double +// IPP driver, IPP has no 2D double template <> struct Ipp_DFT<2,std::complex > : Ipp_DFT_base<2,Ipp64fc,void,dum,dum,dum,dum,dum,dum,dum,dum, @@ -1132,7 +1147,7 @@ struct Ipp_DFT<2,std::complex > ///////////////////////////////////////////////////////////////////////// -// 1D, R to/from C, float +// IPP driver, 1D, R to/from C, float template <> struct Ipp_DFT<1,float> @@ -1152,7 +1167,7 @@ struct Ipp_DFT<1,float> typedef std::complex out_type; }; -// 2D, R to C, float +// IPP driver, 2D, R to/from C, float template <> struct Ipp_DFT<2,float> @@ -1172,7 +1187,7 @@ struct Ipp_DFT<2,float> typedef std::complex out_type; }; -// 1D, R to C, double +// IPP driver, 1D, R to/from C, double template <> struct Ipp_DFT<1,double> @@ -1192,7 +1207,7 @@ struct Ipp_DFT<1,double> typedef std::complex out_type; }; -// 2D, R to C, double +// 2D, R to/from C, double // IPP doesn't implement 2D double template <> @@ -1253,7 +1268,7 @@ create_ipp_plan( } } -// IPP FFT any +// IPP FFT plan any template inline void @@ -1272,7 +1287,7 @@ create_plan( } -// IPP FFTM +// IPP FFTM plan template inline void From don at codesourcery.com Tue Sep 20 05:41:29 2005 From: don at codesourcery.com (Don McCoy) Date: Mon, 19 Sep 2005 23:41:29 -0600 Subject: [vsipl++] [patch] signal.windows In-Reply-To: <432F3522.2070203@codesourcery.com> References: <4329F238.1090406@codesourcery.com> <432F2E24.7060508@codesourcery.com> <432F3522.2070203@codesourcery.com> Message-ID: <432FA109.3050900@codesourcery.com> Don McCoy wrote: >> Don McCoy wrote: >> >> This implements the four windowing functions, Blackman, Chebyshev, >> Hanning and Kaiser. Tested agains Intel 8.0/9.0 and GCC 3.4.0. > > This module passes against the tests I wrote, but fails as of this > moment against ref-impl/signal-windows.cpp. If you are getting ready > to rebuild and need it checked in, please let me know. Otherwise, I'm > working on it as quickly as possible. Resolved. Passes against ref-impl tests as well as new unit tests. Please let me know if this is ready to be checked in. -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ChangeLog.window URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sw2.diff Type: text/x-patch Size: 14802 bytes Desc: not available URL: From jules at codesourcery.com Tue Sep 20 10:07:41 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 20 Sep 2005 06:07:41 -0400 Subject: [vsipl++] [patch] signal.windows In-Reply-To: <432FA109.3050900@codesourcery.com> References: <4329F238.1090406@codesourcery.com> <432F2E24.7060508@codesourcery.com> <432F3522.2070203@codesourcery.com> <432FA109.3050900@codesourcery.com> Message-ID: <432FDF6D.8000404@codesourcery.com> Don, This looks good, please check it in. thanks, -- Jules Don McCoy wrote: > Don McCoy wrote: > >>> Don McCoy wrote: >>> >>> This implements the four windowing functions, Blackman, Chebyshev, >>> Hanning and Kaiser. Tested agains Intel 8.0/9.0 and GCC 3.4.0. >> >> >> This module passes against the tests I wrote, but fails as of this >> moment against ref-impl/signal-windows.cpp. If you are getting ready >> to rebuild and need it checked in, please let me know. Otherwise, I'm >> working on it as quickly as possible. > > > > Resolved. Passes against ref-impl tests as well as new unit tests. > Please let me know if this is ready to be checked in. From ncm at codesourcery.com Tue Sep 20 15:53:29 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Tue, 20 Sep 2005 08:53:29 -0700 Subject: [PATCH] FFT off by default; clean signal-window.cpp Message-ID: <20050920155329.GA31596@codesourcery.com> This small cleanup is not yet applied, pending Jules's opinion. Note that running "configure" with no arguments still looks for and enables MPICH on my machine. I don't know if that is wanted, or if we should also try to turn off any MPI-dependent library components. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.265 diff -u -p -r1.265 ChangeLog --- ChangeLog 20 Sep 2005 12:38:56 -0000 1.265 +++ ChangeLog 20 Sep 2005 15:50:06 -0000 @@ -1,3 +1,10 @@ +2005-09-20 Nathan Myers + + * configure.ac: turn off all FFT libraries by default. + * src/vsip/signal-window.cpp: remove unused local variable. + * src/vsip/impl/signal-fft.hpp: move definition of member scale_ + outside #if to allow compilation with no FFT engines defined. + 2005-09-19 Don McCoy Implemented functions from [signal.windows] Index: configure.ac =================================================================== RCS file: /home/cvs/Repository/vpp/configure.ac,v retrieving revision 1.39 diff -u -p -r1.39 configure.ac --- configure.ac 20 Sep 2005 00:46:29 -0000 1.39 +++ configure.ac 20 Sep 2005 15:50:06 -0000 @@ -210,10 +210,10 @@ elif test "$with_fft" = "fftw2-generic"; enable_fftw2_float="yes" elif test "$with_fft" = "ipp"; then enable_ipp_fft="yes" -elif test "$chose_fft" != "yes"; then - enable_fftw3="probe" - enable_fftw2="probe" - enable_ipp_fft="probe" +elif test "$chose_fft" != "yes"; then : +# enable_fftw3="probe" +# enable_fftw2="probe" +# enable_ipp_fft="probe" else AC_MSG_ERROR([Argument to --with-fft= must be one of fftw3, fftw2-float, fftw2-double, fftw2-generic, or ipp.]) Index: src/vsip/signal-window.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/signal-window.cpp,v retrieving revision 1.1 diff -u -p -r1.1 signal-window.cpp --- src/vsip/signal-window.cpp 20 Sep 2005 12:38:57 -0000 1.1 +++ src/vsip/signal-window.cpp 20 Sep 2005 15:50:06 -0000 @@ -33,7 +33,6 @@ blackman(length_type len) VSIP_THROW((st Vector v(len); - length_type n = 0; scalar_f temp1 = 2 * M_PI / (len - 1); scalar_f temp2 = 2 * temp1; Index: src/vsip/impl/signal-fft.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fft.hpp,v retrieving revision 1.24 diff -u -p -r1.24 signal-fft.hpp --- src/vsip/impl/signal-fft.hpp 19 Sep 2005 03:39:54 -0000 1.24 +++ src/vsip/impl/signal-fft.hpp 20 Sep 2005 15:50:06 -0000 @@ -66,11 +66,6 @@ struct Fft_core : impl::Ref_countscale_ back to 1 so the caller will know not to repeat it. - - typename impl::Scalar_of::type scale_; - void* plan_in_place_; void* plan_from_to_; @@ -88,6 +83,11 @@ struct Fft_core : impl::Ref_countscale_ back to 1 so the caller will know not to repeat it. + + typename impl::Scalar_of::type scale_; }; // From stefan at codesourcery.com Tue Sep 20 19:49:48 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 20 Sep 2005 15:49:48 -0400 Subject: test database fix Message-ID: <433067DC.7090904@codesourcery.com> The attached patch fixes the test database to correctly recognize and scan subdirectories, even for the empty target. Checked in. Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: vpp_database.py Type: application/x-python Size: 9199 bytes Desc: not available URL: From stefan at codesourcery.com Tue Sep 20 19:51:39 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 20 Sep 2005 15:51:39 -0400 Subject: [vsipl++] test database fix Message-ID: <4330684B.7030407@codesourcery.com> Sorry, I meant to send the patch, not the entire file. Here it is. Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: vpp_database.py.diff Type: text/x-patch Size: 1330 bytes Desc: not available URL: From don at codesourcery.com Wed Sep 21 00:30:33 2005 From: don at codesourcery.com (Don McCoy) Date: Tue, 20 Sep 2005 18:30:33 -0600 Subject: [patch] fft_ext, window tests Message-ID: <4330A9A9.1060504@codesourcery.com> Attached is a patch that makes all of the "fft_ext" tests pass. Also added conditional compiler directive such that it will build, run and pass even if no FFT is defined. The fft_ext.cpp module may now be run on data files without command line options, provided that the first two letters of the filename indicate the desired fft type (c-c, c-r, or r-c). It also runs both single and double precision FFT's on the data, unless an option is provided to select one or the other. While making these changes, I also caught the fact that the window.cpp test also needed this conditional (because the Chebyshev function is dependent on FFT). Ok to commit? -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fe.changes URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fe.diff Type: text/x-patch Size: 10581 bytes Desc: not available URL: From jules at codesourcery.com Wed Sep 21 05:35:28 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 21 Sep 2005 01:35:28 -0400 Subject: [vsipl++] [PATCH] switch to --with-fft=... In-Reply-To: <20050920005507.GA10733@codesourcery.com> References: <20050920005507.GA10733@codesourcery.com> Message-ID: <4330F120.2090703@codesourcery.com> Nathan (Jasper) Myers wrote: > > Also, if you're configuring in IPP, you'll need to add one of > > --with-ipp-suffix= > --with-ipp-suffix=em64t > --with-ipp-suffix=m7 > > or what-have-you, according to your IPP installation. > Nathan, Is there a reason for requiring a suffix? If by default we search ipps.so (no suffix) and ippem64t.so, we'll do the right thing in most cases. On ia32 systems, ipps.so will hit. It is a dispatcher library that than detects the right processor specific library at runtime. On em64t systems, ippsem64t.so will hit. It is not clear from the IPP getting started page if it too is a dispatcher, but presumably it is. Any objections to making the suffix optional? This will let people need override the default when necessary, but not force everyone to set it. -- Jules From jules at codesourcery.com Wed Sep 21 09:52:36 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 21 Sep 2005 05:52:36 -0400 Subject: [patch] Pre-release fixes Message-ID: <43312D64.8050100@codesourcery.com> Several small patches. - merged Nathan's patch to disable FFT with a patch to disable the old libraries, - made --with-ipp-suffix optional, - fixed a missing static definition when timers are disabled, - added checking to the IPP dispatch to check the operands have the same type. It was matching an expression View> = View * View> but there was no corresponding IPP vmul wrapper. - reverse the order of parameters to IPP Subtract and Divide. ippsSub(A, B, Z, ...) is equivalent to Z = B - A go figure! Patches applied. These last two bugs were causing view-math to fail. I'm checking that it is fixed now. If it looks good, it will be our release 0.9! -- Jules From stefan at codesourcery.com Wed Sep 21 11:56:27 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 21 Sep 2005 07:56:27 -0400 Subject: [vsipl++] [patch] Pre-release fixes In-Reply-To: <43312D64.8050100@codesourcery.com> References: <43312D64.8050100@codesourcery.com> Message-ID: <43314A6B.3090003@codesourcery.com> Jules Bergmann wrote: > - added checking to the IPP dispatch to check the operands have the > same type. It was matching an expression > > View> = View * View> > > but there was no corresponding IPP vmul wrapper. > > - reverse the order of parameters to IPP Subtract and Divide. > > ippsSub(A, B, Z, ...) > > is equivalent to > > Z = B - A > > go figure! doh ! I was wondering how to best test serial dispatch. It appears I was a bit too sloppy when testing as I didn't add a new test that provides specifically expressions that match the ones IPP can deal with. Instead I locally modified an existing test to make sure the right backend was called, but without checking the results. What we need is a set of expression tests that match all the patterns we provide backends for, and then somehow mark them up during execution so we know we have complete coverage. Regards, Stefan From ncm at codesourcery.com Wed Sep 21 15:36:41 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Wed, 21 Sep 2005 08:36:41 -0700 Subject: [vsipl++] [PATCH] switch to --with-fft=... In-Reply-To: <4330F120.2090703@codesourcery.com> References: <20050920005507.GA10733@codesourcery.com> <4330F120.2090703@codesourcery.com> Message-ID: <20050921153641.GH31167@codesourcery.com> On Wed, Sep 21, 2005 at 01:35:28AM -0400, Jules Bergmann wrote: > Nathan (Jasper) Myers wrote: > >Also, if you're configuring in IPP, you'll need to add one of > > > > --with-ipp-suffix= > > --with-ipp-suffix=em64t > > --with-ipp-suffix=m7 > > > >or what-have-you, according to your IPP installation. > > Is there a reason for requiring a suffix? Just two. First, for some (e.g. in /opt/intel/ipp41_eval/em64t) there's no non-suffix version provided. Second, Mark recommended requiring the suffix so that we don't pick the wrong one by accident. Nathan Myers ncm From don at codesourcery.com Fri Sep 23 15:58:39 2005 From: don at codesourcery.com (Don McCoy) Date: Fri, 23 Sep 2005 09:58:39 -0600 Subject: [vsipl++] [patch] fft_ext, window tests In-Reply-To: <4330A9A9.1060504@codesourcery.com> References: <4330A9A9.1060504@codesourcery.com> Message-ID: <4334262F.3050206@codesourcery.com> Don McCoy wrote: > Attached is a patch that makes all of the "fft_ext" tests pass. Also > added conditional compiler directive such that it will build, run and > pass even if no FFT is defined. The fft_ext.cpp module may now be run > on data files without command line options, provided that the first > two letters of the filename indicate the desired fft type (c-c, c-r, > or r-c). It also runs both single and double precision FFT's on the > data, unless an option is provided to select one or the other. > > While making these changes, I also caught the fact that the window.cpp > test also needed this conditional (because the Chebyshev function is > dependent on FFT). > > Ok to commit? Resubmitting this after realizing I had incorrectly applied a conditional compilation directive around the FFT call in src/vsip/signal-window.cpp. -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fe2.changes URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fe2.diff Type: text/x-patch Size: 10287 bytes Desc: not available URL: From jules at codesourcery.com Fri Sep 23 16:06:00 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 23 Sep 2005 12:06:00 -0400 Subject: [vsipl++] [patch] fft_ext, window tests In-Reply-To: <4334262F.3050206@codesourcery.com> References: <4330A9A9.1060504@codesourcery.com> <4334262F.3050206@codesourcery.com> Message-ID: <433427E8.4070108@codesourcery.com> Don McCoy wrote: > Don McCoy wrote: > >> Attached is a patch that makes all of the "fft_ext" tests pass. Also >> added conditional compiler directive such that it will build, run and >> pass even if no FFT is defined. The fft_ext.cpp module may now be run >> on data files without command line options, provided that the first >> two letters of the filename indicate the desired fft type (c-c, c-r, >> or r-c). It also runs both single and double precision FFT's on the >> data, unless an option is provided to select one or the other. >> >> While making these changes, I also caught the fact that the window.cpp >> test also needed this conditional (because the Chebyshev function is >> dependent on FFT). >> >> Ok to commit? > > > Resubmitting this after realizing I had incorrectly applied a > conditional compilation directive around the FFT call in > src/vsip/signal-window.cpp. > Don, Looks good, please commit. -- Jules From jules at codesourcery.com Fri Sep 23 18:39:38 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 23 Sep 2005 14:39:38 -0400 Subject: [patch] VERSIONS Message-ID: <43344BEA.2030704@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: v.diff URL: From jules at codesourcery.com Fri Sep 23 19:58:10 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 23 Sep 2005 15:58:10 -0400 Subject: [patch] Vector assignment, sarsim bits Message-ID: <43345E52.9070909@codesourcery.com> A bunch of misc things collected over the past few weeks to optimize and parallel sarsim. Perhaps the most substantial bit, I changed the Vector assignment operators (+=, -=, etc) to go through the same dispatch as 'operator=', so that 'A += B' gets evaluated as 'A = A + B'. This throws away the knowledge that it is an update expression, but it lets it get evaluated by IPP when possible. In the long term, we may want to add special dispatch for operator assignment so we don't throw this knowledge away. Thoughts? -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc.diff URL: From mark at codesourcery.com Fri Sep 23 20:35:59 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Fri, 23 Sep 2005 13:35:59 -0700 Subject: [vsipl++] [patch] Vector assignment, sarsim bits In-Reply-To: <43345E52.9070909@codesourcery.com> References: <43345E52.9070909@codesourcery.com> Message-ID: <4334672F.2020409@codesourcery.com> Jules Bergmann wrote: > A bunch of misc things collected over the past few weeks to optimize and > parallel sarsim. > > Perhaps the most substantial bit, I changed the Vector assignment > operators (+=, -=, etc) to go through the same dispatch as 'operator=', > so that 'A += B' gets evaluated as 'A = A + B'. This throws away the > knowledge that it is an update expression, but it lets it get evaluated > by IPP when possible. In the long term, we may want to add special > dispatch for operator assignment so we don't throw this knowledge away. > > Thoughts? We do the same thing in the compiler; "i += j" is treated exactly like "i = i + j". If there are special operations for update you want to apply them in both cases, i.e., you want to optimize "i = i + j" and "i = j + i" if the user happens to right it that way. So, first you turn "i += j" into "i = i + j"; then you (later) look for the update case. In VSIPL++, you could do that at runtime-dispatch time. In a compiler, there's generally very little runtime dispatch; these things are decided up front. That does suggest that, in the long run, you may want to do compile-time dispatch for the += case if you have a library that specially supports that case. But, you'll probably want to do the runtime dispatch anyhow, and that will get you most of the bang. So, I think your strategy makes sense. -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From jules at codesourcery.com Fri Sep 23 21:00:59 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 23 Sep 2005 17:00:59 -0400 Subject: [patch] vector-matrix multiply Message-ID: <43346D0B.7010705@codesourcery.com> Rough implementation of vmmul. Tries to do the right thing with respect to dimension ordering of the matrix. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vm.diff URL: From ncm at codesourcery.com Fri Sep 23 23:11:31 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Fri, 23 Sep 2005 16:11:31 -0700 Subject: [vsipl++] [patch] Vector assignment, sarsim bits In-Reply-To: <43345E52.9070909@codesourcery.com> References: <43345E52.9070909@codesourcery.com> Message-ID: <20050923231131.GA15306@codesourcery.com> On Fri, Sep 23, 2005 at 03:58:10PM -0400, Jules Bergmann wrote: > ... I changed the Vector assignment > operators (+=, -=, etc) to go through the same dispatch as 'operator=', > so that 'A += B' gets evaluated as 'A = A + B'. This throws away the > knowledge that it is an update expression, but it lets it get evaluated > by IPP when possible. In the long term, we may want to add special > dispatch for operator assignment so we don't throw this knowledge away. I guess I think of op= as a special case of op+=, not the other way 'round. That is, if you imagine an operator # such that (a # b) => b, then a.op#=(b), which must mean (a = a # b), is identical to what we call a.op=(b). In normal code we usually implement op+ using op+=, making the latter the more fundamental. I don't know if that means anything in terms of the code we have. Anyhow I thought you were talking about distributed operation, rather than IPP. I see IPP implements both a = b + c and a += b. I wonder if we're better off ignoring one or other of those. My guess would be that if we used just one, it should be the second. Anyway it extends more naturally to their operation a += b * c. Nathan Myers ncm From ncm at codesourcery.com Mon Sep 26 08:26:27 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Mon, 26 Sep 2005 01:26:27 -0700 Subject: [PATCH] #if out FFT tests when not config'd Message-ID: <20050926082627.GA17236@codesourcery.com> This patch adds #if blocks around tests that depend on FFT support, pending addition of native FFT code to fill in lacunae. It also adds tests using double and complex. Note this does not patch the tests in ref-impl. OK to apply? Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.271 diff -u -p -r1.271 ChangeLog --- ChangeLog 23 Sep 2005 19:21:36 -0000 1.271 +++ ChangeLog 26 Sep 2005 08:26:47 -0000 @@ -1,3 +1,9 @@ +2005-09-26 Nathan Myers + + * tests/extdata-fft.cpp, tests/fft.cpp, tests/fftm-par.cpp, + tests/fftm.cpp: #if out tests that depend on FFT where FFT + is not enabled; add tests for double-precision. + 2005-09-23 Jules Bergmann * VERSIONS: New file, describes varius CVS tagged versions of @@ -32,7 +38,8 @@ 2005-09-20 Stefan Seefeld - * tests/QMTest/vpp_database.py: Make qmtest properly scan subdirectories. + * tests/QMTest/vpp_database.py: Make qmtest properly scan + subdirectories. 2005-09-19 Don McCoy Index: tests/extdata-fft.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/extdata-fft.cpp,v retrieving revision 1.3 diff -u -p -r1.3 extdata-fft.cpp --- tests/extdata-fft.cpp 18 Jun 2005 16:40:45 -0000 1.3 +++ tests/extdata-fft.cpp 26 Sep 2005 08:26:48 -0000 @@ -314,11 +314,10 @@ test_fft_1d(length_type size, int k) fft("subvector", in(Domain<1>(size)), out(Domain<1>(size))); } - - int main() { test_fft_1d > >(256, 3); test_fft_1d > >(256, 3); + return 0; } Index: tests/fft.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fft.cpp,v retrieving revision 1.6 diff -u -p -r1.6 fft.cpp --- tests/fft.cpp 19 Sep 2005 03:39:54 -0000 1.6 +++ tests/fft.cpp 26 Sep 2005 08:26:48 -0000 @@ -313,6 +313,8 @@ main() { vsipl init; +#if defined(VSIP_IMPL_FFT_USE_FLOAT) + test_by_ref >(2, 64); test_by_ref >(1, 68); test_by_ref >(2, 256); @@ -326,4 +328,26 @@ main() test_real(1, 128); test_real(2, 242); test_real(3, 16); + +#endif + +#if defined(VSIP_IMPL_FFT_USE_DOUBLE) + + test_by_ref >(2, 64); + test_by_ref >(1, 68); + test_by_ref >(2, 256); + test_by_ref >(2, 252); + test_by_ref >(3, 17); + + test_by_val >(1, 128); + test_by_val >(2, 256); + test_by_val >(3, 512); + + test_real(1, 128); + test_real(2, 242); + test_real(3, 16); + +#endif + + return 0; } Index: tests/fftm-par.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fftm-par.cpp,v retrieving revision 1.3 diff -u -p -r1.3 fftm-par.cpp --- tests/fftm-par.cpp 19 Sep 2005 03:39:54 -0000 1.3 +++ tests/fftm-par.cpp 26 Sep 2005 08:26:48 -0000 @@ -733,6 +733,7 @@ main(int argc, char** argv) comm.barrier(); #endif +#if defined(VSIP_IMPL_FFT_USE_FLOAT) test_by_ref_x >(18); test_by_ref_x >(64); test_by_ref_x >(68); @@ -749,11 +750,38 @@ main(int argc, char** argv) test_by_val_y >(18); test_by_val_y >(256); -#if 0 +# if 0 // Tests for test r->c, c->r. test_real(128); test_real(242); test_real(16); +# endif #endif + +#if defined(VSIP_IMPL_FFT_USE_DOUBLE) + test_by_ref_x >(18); + test_by_ref_x >(64); + test_by_ref_x >(68); + test_by_ref_x >(256); + test_by_ref_x >(252); + + test_by_ref_y >(68); + test_by_ref_y >(256); + + test_by_val_x >(128); + test_by_val_x >(256); + test_by_val_x >(512); + + test_by_val_y >(18); + test_by_val_y >(256); + +# if 0 + // Tests for test r->c, c->r. + test_real(128); + test_real(242); + test_real(16); +# endif +#endif + return 0; } Index: tests/fftm.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fftm.cpp,v retrieving revision 1.6 diff -u -p -r1.6 fftm.cpp --- tests/fftm.cpp 19 Sep 2005 03:39:54 -0000 1.6 +++ tests/fftm.cpp 26 Sep 2005 08:26:48 -0000 @@ -477,6 +477,7 @@ main() { vsipl init; +#if defined(VSIP_IMPL_FFT_USE_FLOAT) test_by_ref_x >(18); test_by_ref_x >(64); test_by_ref_x >(68); @@ -493,10 +494,38 @@ main() test_by_val_y >(18); test_by_val_y >(256); -#if 0 +# if 0 // Tests for test r->c, c->r. test_real(128); test_real(242); test_real(16); +# endif #endif + +#if defined(VSIP_IMPL_FFT_USE_DOUBLE) + test_by_ref_x >(18); + test_by_ref_x >(64); + test_by_ref_x >(68); + test_by_ref_x >(256); + test_by_ref_x >(252); + + test_by_ref_y >(68); + test_by_ref_y >(256); + + test_by_val_x >(128); + test_by_val_x >(256); + test_by_val_x >(512); + + test_by_val_y >(18); + test_by_val_y >(256); + +# if 0 + // Tests for test r->c, c->r. + test_real(128); + test_real(242); + test_real(16); +# endif +#endif + + return 0; } From stefan at codesourcery.com Mon Sep 26 12:59:26 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 26 Sep 2005 08:59:26 -0400 Subject: operator^ Message-ID: <4337F0AE.7060100@codesourcery.com> The attached patch implements the operator^ for view/view and view/scalar. In particular, as required by the spec, for View it maps to bxor, and to lxor for anything else. Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: xor.patch Type: text/x-patch Size: 3095 bytes Desc: not available URL: From jules at codesourcery.com Mon Sep 26 14:01:17 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Sep 2005 10:01:17 -0400 Subject: [vsipl++] operator^ In-Reply-To: <4337F0AE.7060100@codesourcery.com> References: <4337F0AE.7060100@codesourcery.com> Message-ID: <4337FF2D.9090301@codesourcery.com> Looks good, please commit. thanks -- Jules Stefan Seefeld wrote: > The attached patch implements the operator^ for view/view and view/scalar. > In particular, as required by the spec, for View it maps > to bxor, and to lxor for anything else. > > Regards, > Stefan > > From jules at codesourcery.com Mon Sep 26 14:21:07 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Sep 2005 10:21:07 -0400 Subject: [vsipl++] [PATCH] #if out FFT tests when not config'd In-Reply-To: <20050926082627.GA17236@codesourcery.com> References: <20050926082627.GA17236@codesourcery.com> Message-ID: <433803D3.70603@codesourcery.com> Nathan, extdata-fft doesn't call vsip::Fft, it just demonstrates how one might use Ext_data to implement FFTs. It shouldn't need anything #if'd out. Otherwise looks OK. In the short term, we need to make sure that any attempt to use an unimplemented FFT function results in either a compilation error or an "unimplemented" exception. In the long term, we need to implement a generic FFT that (a) works when no FFT library is provided and (b) fills in the gaps of whatever FFT library we're using. -- Jules Nathan (Jasper) Myers wrote: > This patch adds #if blocks around tests that depend on FFT support, > pending addition of native FFT code to fill in lacunae. It also adds > tests using double and complex. Note this does not patch the > tests in ref-impl. > > OK to apply? > From ncm at codesourcery.com Mon Sep 26 17:52:56 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Mon, 26 Sep 2005 10:52:56 -0700 Subject: [vsipl++-csl] [patch] Vector assignment, sarsim bits In-Reply-To: <4338290C.90809@codesourcery.com> References: <43345E52.9070909@codesourcery.com> <43345ECB.6030700@codesourcery.com> <43374B4D.3090001@codesourcery.com> <20050926014907.GC15306@codesourcery.com> <4338290C.90809@codesourcery.com> Message-ID: <20050926175256.GN4613@codesourcery.com> On Mon, Sep 26, 2005 at 09:59:56AM -0700, Mark Mitchell wrote: > Nathan (Jasper) Myers wrote: > > >>Writing a test for my fresh implementation for operator^ I observe > >>that > >> > >> std::cout << typeid(false^true).name() << std::endl; > >> > >>prints 'i', and not 'b' as I had expected. > > I think Nathan Sidwell may have already answered, but this is not a bug; > the usual arithmetic conversations are applied to the operands before > applying "^", so the result of "false ^ true" is of type "int". >From a C++ coder standpoint, this is very surprising. "The usual arithmetic conversions" was one of the areas where the C++ committee (library, perhaps, moreso than core?) deliberately broke from C. Am I right, then, that it's allowed-but-not-required for the result to stay bool? If G++ can do that, it should. Nathan Myers ncm From mark at codesourcery.com Mon Sep 26 18:04:35 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 26 Sep 2005 11:04:35 -0700 Subject: [vsipl++] Re: [vsipl++-csl] [patch] Vector assignment, sarsim bits In-Reply-To: <20050926175256.GN4613@codesourcery.com> References: <43345E52.9070909@codesourcery.com> <43345ECB.6030700@codesourcery.com> <43374B4D.3090001@codesourcery.com> <20050926014907.GC15306@codesourcery.com> <4338290C.90809@codesourcery.com> <20050926175256.GN4613@codesourcery.com> Message-ID: <43383833.40004@codesourcery.com> Nathan (Jasper) Myers wrote: > From a C++ coder standpoint, this is very surprising. "The usual > arithmetic conversions" was one of the areas where the C++ committee > (library, perhaps, moreso than core?) deliberately broke from C. > Am I right, then, that it's allowed-but-not-required for the result > to stay bool? If G++ can do that, it should. A conforming compiler is required to promote to int. See [expr]/9: "Otherwise, the integral promotions shall be performed on both operands". There's nothing special about "^"; the usual arithmetic conversions are applied to all operands of arithmetic binary operators, like +, -, *, etc., and, as a result, the type of such expressions is always at least as wide as "int". -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From stefan at codesourcery.com Mon Sep 26 18:08:55 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 26 Sep 2005 14:08:55 -0400 Subject: [vsipl++] Re: [vsipl++-csl] [patch] Vector assignment, sarsim bits In-Reply-To: <43383833.40004@codesourcery.com> References: <43345E52.9070909@codesourcery.com> <43345ECB.6030700@codesourcery.com> <43374B4D.3090001@codesourcery.com> <20050926014907.GC15306@codesourcery.com> <4338290C.90809@codesourcery.com> <20050926175256.GN4613@codesourcery.com> <43383833.40004@codesourcery.com> Message-ID: <43383937.4020307@codesourcery.com> Mark Mitchell wrote: > A conforming compiler is required to promote to int. See [expr]/9: > "Otherwise, the integral promotions shall be performed on both > operands". There's nothing special about "^"; the usual arithmetic > conversions are applied to all operands of arithmetic binary operators, > like +, -, *, etc., and, as a result, the type of such expressions is > always at least as wide as "int". Considering this logic, I'm wondering why the VSIPL++ specs require two distinct versions of operator^, one doing a binary and the other a logical xor, depending on the operands having type bool or not. Isn't that inconsistent with the above ? Thanks, Stefan From mark at codesourcery.com Mon Sep 26 19:58:53 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 26 Sep 2005 12:58:53 -0700 Subject: [vsipl++] Re: [vsipl++-csl] [patch] Vector assignment, sarsim bits In-Reply-To: <43383937.4020307@codesourcery.com> References: <43345E52.9070909@codesourcery.com> <43345ECB.6030700@codesourcery.com> <43374B4D.3090001@codesourcery.com> <20050926014907.GC15306@codesourcery.com> <4338290C.90809@codesourcery.com> <20050926175256.GN4613@codesourcery.com> <43383833.40004@codesourcery.com> <43383937.4020307@codesourcery.com> Message-ID: <433852FD.6050703@codesourcery.com> Stefan Seefeld wrote: > Mark Mitchell wrote: > >> A conforming compiler is required to promote to int. See [expr]/9: >> "Otherwise, the integral promotions shall be performed on both >> operands". There's nothing special about "^"; the usual arithmetic >> conversions are applied to all operands of arithmetic binary operators, >> like +, -, *, etc., and, as a result, the type of such expressions is >> always at least as wide as "int". > > > Considering this logic, I'm wondering why the VSIPL++ specs require > two distinct versions of operator^, one doing a binary and the other > a logical xor, depending on the operands having type bool or not. > Isn't that inconsistent with the above ? I'm sure that comes from VSIPL, but I'm not sure exactly why. Perhaps in VSIPL, "a lxor b" works even if "a" and "b" are of type "int"; i.e., maybe "a lxor b" is the C++ operation "bool(bool(a) ^ bool(b))". -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From jules at codesourcery.com Mon Sep 26 20:24:09 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 26 Sep 2005 16:24:09 -0400 Subject: [patch] Generator block, ramp() function Message-ID: <433858E9.4090603@codesourcery.com> Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ramp.diff URL: From nathan at codesourcery.com Tue Sep 27 07:35:17 2005 From: nathan at codesourcery.com (Nathan Sidwell) Date: Tue, 27 Sep 2005 08:35:17 +0100 Subject: [vsipl++-csl] [patch] Vector assignment, sarsim bits In-Reply-To: <20050926175256.GN4613@codesourcery.com> References: <43345E52.9070909@codesourcery.com> <43345ECB.6030700@codesourcery.com> <43374B4D.3090001@codesourcery.com> <20050926014907.GC15306@codesourcery.com> <4338290C.90809@codesourcery.com> <20050926175256.GN4613@codesourcery.com> Message-ID: <4338F635.4030808@codesourcery.com> Nathan (Jasper) Myers wrote: > From a C++ coder standpoint, this is very surprising. "The usual > arithmetic conversions" was one of the areas where the C++ committee > (library, perhaps, moreso than core?) deliberately broke from C. > Am I right, then, that it's allowed-but-not-required for the result > to stay bool? If G++ can do that, it should. Not in my understanding of clause 5. nathan -- Nathan Sidwell :: http://www.codesourcery.com :: CodeSourcery LLC nathan at codesourcery.com :: http://www.planetfall.pwp.blueyonder.co.uk From stefan at codesourcery.com Tue Sep 27 13:01:56 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 27 Sep 2005 09:01:56 -0400 Subject: Cleanup patch Message-ID: <433942C4.3040009@codesourcery.com> The attached patch does some cleanup in order to enhance header independency: * view_traits.hpp forward-declares views with all default arguments, which vector.hpp, matrix.hpp, and tensor.hpp then don't issue a second time. * dense.hpp doesn't depend on view_traits.hpp * expr_functor.hpp depends on expr_binary_operators.hpp * matvec.hpp requires promote.hpp and fns_elementwise.hpp to be self-contained. Ok to commit ? Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Tue Sep 27 13:33:33 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 27 Sep 2005 09:33:33 -0400 Subject: [vsipl++] Cleanup patch In-Reply-To: <433942C4.3040009@codesourcery.com> References: <433942C4.3040009@codesourcery.com> Message-ID: <43394A2D.8040701@codesourcery.com> Stefan, looks good. Please commit. -- Jules Stefan Seefeld wrote: > The attached patch does some cleanup in order to enhance > header independency: > > * view_traits.hpp forward-declares views with all default > arguments, which vector.hpp, matrix.hpp, and tensor.hpp > then don't issue a second time. > * dense.hpp doesn't depend on view_traits.hpp > * expr_functor.hpp depends on expr_binary_operators.hpp > * matvec.hpp requires promote.hpp and fns_elementwise.hpp > to be self-contained. > > Ok to commit ? > > Regards, > Stefan > > From jules at codesourcery.com Tue Sep 27 16:29:16 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 27 Sep 2005 12:29:16 -0400 Subject: [patch] SVD solver Message-ID: <4339735C.5060609@codesourcery.com> Implementation (and tests) of the SVD solver object, using LAPACK underneath. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svd.diff URL: From don at codesourcery.com Tue Sep 27 18:30:57 2005 From: don at codesourcery.com (Don McCoy) Date: Tue, 27 Sep 2005 12:30:57 -0600 Subject: [patch] matvec: outer, gem, cumsum Message-ID: <43398FE1.7080906@codesourcery.com> The attached patch rounds out the functionality of [math.matvec] with the exception of a few of the matrix-vector product functions. Since those are implemented in a separate file, this patch stands by itself pretty well. -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mv2.changes URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mv2.diff Type: text/x-patch Size: 16548 bytes Desc: not available URL: From stefan at codesourcery.com Tue Sep 27 20:40:42 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 27 Sep 2005 16:40:42 -0400 Subject: [selgen] Message-ID: <4339AE4A.4010007@codesourcery.com> The attached patch implements all functions from section 9.4 ([selgen]) of the spec, i.e. * first() * indexbool() * gather() * scatter() * clip() * invclip() * swap() together with unit tests. It also contains some bits and pieces I submitted earlier today to cleanup header dependencies etc., as I wasn't able to easily separate the two. Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: selgen.patch Type: text/x-patch Size: 17340 bytes Desc: not available URL: From jules at codesourcery.com Tue Sep 27 22:18:13 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 27 Sep 2005 18:18:13 -0400 Subject: [vsipl++] [selgen] In-Reply-To: <4339AE4A.4010007@codesourcery.com> References: <4339AE4A.4010007@codesourcery.com> Message-ID: <4339C525.7000707@codesourcery.com> Stefan Seefeld wrote: > The attached patch implements all functions from section 9.4 ([selgen]) > of the spec, i.e. > > * first() > * indexbool() > * gather() > * scatter() > * clip() > * invclip() > * swap() > > together with unit tests. It also contains some bits and pieces I > submitted earlier today to cleanup header dependencies etc., as I > wasn't able to easily separate the two. Stefan, Looks good. I have one suggestion for indexbool to make it a little more robust, otherwise it looks ready to check in. Also, were the unit tests included in the patch? thanks, -- Jules > > /*********************************************************************** > @@ -30,6 +31,29 @@ > namespace impl > { > > +template > +length_type > +indexbool(const_Vector source, Vector, B2> indices) > +{ > + index_type cursor = 0; > + for (index_type i = 0; i != source.size(); ++i) > + if (source.get(i)) > + indices.put(cursor++, Index<1>(i)); > + return cursor; > +} I'm trying to think if we can do better error checking here. This doesn't check if cursor < indices.size(0), but the put does, so that's good. It would be good to have an assertion in indexbool so that the failure is more obvious. However, I don't think the specification of indexbool makes it very useful. It should handle an overflow more gracefully than either aborting or corrupting memory. Since the overflow condition is data-dependent, it forces me to size indices for the absolute worst case. Hypotheticaly, if I'm doing target detection on an IR sensor and a flare goes off, I'm going to have way more detections for a few frames until I have a chance to adapt my thresholds. As a system engineer, I would probably choose to drop some detections for a few frames rather than size my detection buffer for the absolute worst-case. I certainly don't want the application to crash or corrupt itself! This is a future opportunity here to design a better interface (such as a stateful one that avoids overflow by getting the next N detections from source, where N is the size of indices). In the short term, let's check that cursor is less than indices.size() before doing the put, i.e.: index_type cursor = 0; for (index_type i = 0; i != source.size(); ++i) if (source.get(i) && cursor++ < indices.size()) indices.put(cursor-1, Index<1>(i)); return cursor; The returned value (cursor) is still the "number of non-false values in source" (as required by the spec) and we avoid overwriting memory. A concerned user can check if the returned value is greater than indices.size(). > + > +template > +length_type > +indexbool(const_Matrix source, Vector, B2> indices) > +{ > + index_type cursor = 0; > + for (index_type r = 0; r != source.size(0); ++r) > + for (index_type c = 0; c != source.size(1); ++c) > + if (source.get(r, c)) > + indices.put(cursor++, Index<2>(r, c)); > + return cursor; > +} Let's do the same as above. > > +namespace impl > +{ > +template > +struct clip_wrapper > +{ > + template > + struct clip_functor > + { > + typedef Tout result_type; > + result_type operator()(Tin0 t) const > + { > + return t <= lower_threshold ? lower_clip_value > + : t < upper_threshold ? t > + : upper_clip_value; > + } > + > + Tin1 lower_threshold; > + Tin1 upper_threshold; > + result_type lower_clip_value; > + result_type upper_clip_value; > + }; > + template > + struct invclip_functor > + { > + typedef Tout result_type; > + result_type operator()(Tin0 t) const > + { > + return t < lower_threshold ? t > + : t < middle_threshold ? lower_clip_value > + : t <= upper_threshold ? upper_clip_value > + : t; > + } > + > + Tin1 lower_threshold; > + Tin1 middle_threshold; > + Tin1 upper_threshold; > + result_type lower_clip_value; > + result_type upper_clip_value; > + }; > +}; > + Why are clip_functor and invclip_functor nested in clip_wrapper? (I'm just curious, I'm not suggesting that it should be changed) > + > +namespace impl > +{ > +/// Generic swapping of the content of two blocks. > +template > +struct Swap > +{ > + static void apply(Block1 &block1, Block2 &block2) > + { > + assert(block1.size() == block2.size()); > + for (index_type i = 0; i != block1.size(); ++i) > + { > + typename Block1::value_type tmp = block1.get(i); > + block1.put(i, block2.get(i)); > + block2.put(i, tmp); > + } > + > + } > +}; Looks good. We can plug in specializations to Swap for things like swapping pointers (if we decide it's worth doing). From stefan at codesourcery.com Tue Sep 27 22:44:31 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 27 Sep 2005 18:44:31 -0400 Subject: [vsipl++] [selgen] In-Reply-To: <4339C525.7000707@codesourcery.com> References: <4339AE4A.4010007@codesourcery.com> <4339C525.7000707@codesourcery.com> Message-ID: <4339CB4F.2080608@codesourcery.com> Jules Bergmann wrote: > Looks good. I have one suggestion for indexbool to make it a little > more robust, otherwise it looks ready to check in. > > Also, were the unit tests included in the patch? Oups, that was a new file, and thus it wasn't part of the diff. I attach it now for the record. > In the short term, let's check that cursor is less than indices.size() > before doing the put, i.e.: > > index_type cursor = 0; > for (index_type i = 0; i != source.size(); ++i) > if (source.get(i) && cursor++ < indices.size()) > indices.put(cursor-1, Index<1>(i)); > return cursor; > > The returned value (cursor) is still the "number of non-false values in > source" (as required by the spec) and we avoid overwriting memory. A > concerned user can check if the returned value is greater than > indices.size(). Done. > Why are clip_functor and invclip_functor nested in clip_wrapper? (I'm > just curious, I'm not suggesting that it should be changed) The Unary_expr_block harness expects a functor that is a class template taking a single parameter. As here we have three, I put the two additional parameters in the outer 'wrapper' template. I'm looking forward to times when template typedefs become available. :-) The patch is checked in now. Regards, Stefan From jules at codesourcery.com Tue Sep 27 22:56:48 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 27 Sep 2005 18:56:48 -0400 Subject: [vsipl++] [patch] matvec: outer, gem, cumsum In-Reply-To: <43398FE1.7080906@codesourcery.com> References: <43398FE1.7080906@codesourcery.com> Message-ID: <4339CE30.9070608@codesourcery.com> Don McCoy wrote: > The attached patch rounds out the functionality of [math.matvec] with > the exception of a few of the matrix-vector product functions. Since > those are implemented in a separate file, this patch stands by itself > pretty well. Don, gemp and gems need to support the mat_conj and mat_herm mat_op_types as well. (The spec is a bit confusing. [math.matvec.gem]/3 defines the 4 mat_op_types: mat_ntrans, mat_trans, mat_herm, and mat_conj. gemp's requirements than say that OpA and OpB must be mat_ntrans or mat_trans unless T is complex. The implication is that if T is complex, OpA and OpB can be mat_herm and mat_conj as well). The approach you've taken for gemp is fine, it is definitely possible to plug those additional cases in. However, since the number of cases is multiplicative (size(OpA) x size(OpB)), you might want to separate the handling of OpA and OpB to simplify things. One way to do this is to define a class that applies a mat_op to a single matrix: template struct Apply_mat_op; template struct Apply_mat_op { typedef typename const_Matrix result_type; static result_type exec(const_Matrix m) VSIP_NOTHROW { return m; } }; template struct Apply_mat_op { typedef typename const_Matrix::transpose_type result_type; static result_type exec(const_Matrix m) VSIP_NOTHROW { return m.transpose(); } }; template struct Apply_mat_op, Block> // this definition only makes mat_herm only valid for complex { ... }; You could optionaly provide a convenience function to use Apply_mat_op: template typename Apply_mat_op::result_type apply_mat_op(...) { return Apply_mat_op::exec(m); } Now, you could implement the top-level gemp as: void gemp( T0 alpha, const_Matrix A, const_Matrix B, T3 beta, Matrix C) VSIP_NOTHROW { // equivalent to C = alpha * OpA(A) * OpB(B) + beta * C impl::gemp(alpha, apply_mat_op(A), apply_mat_op(B), beta, C); } > > > ------------------------------------------------------------------------ > + > + > + template + typename T0, > + typename T1, > + typename Block0, > + typename Block1> > + void > + cumsum( > + const_Vector v, > + Vector w) > + VSIP_NOTHROW > + { > + // Effects: w has values equaling the cumulative sum of values in v. > + // > + // If View is Vector, d is ignored and, for > + // 0 <= i < v.size(), > + // w.get(i) equals the sum over 0 <= j <= i of v.get(j) > + assert( v.size() == w.size() ); > + > + for ( index_type i = 0; i < v.size(); ++i ) > + { > + T1 sum = T0(); > + for ( index_type j = 0; j <= i; ++j ) > + sum += v.get(j); > + w.put(i, sum); > + } You could avoid recomputing the sum each time by keeping a running total: T1 sum = T0(); for (index_type i=0; ...) { sum += v.get(i); w.put(i, sum); } You should be able to something similar for matrix cumsum. From stefan at codesourcery.com Tue Sep 27 22:57:47 2005 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 27 Sep 2005 18:57:47 -0400 Subject: [vsipl++] [selgen] In-Reply-To: <4339CB4F.2080608@codesourcery.com> References: <4339AE4A.4010007@codesourcery.com> <4339C525.7000707@codesourcery.com> <4339CB4F.2080608@codesourcery.com> Message-ID: <4339CE6B.70106@codesourcery.com> Stefan Seefeld wrote: > Jules Bergmann wrote: > >> Looks good. I have one suggestion for indexbool to make it a little >> more robust, otherwise it looks ready to check in. >> >> Also, were the unit tests included in the patch? > > > Oups, that was a new file, and thus it wasn't part of the diff. > I attach it now for the record. Yes I do ! Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: selgen.cpp Type: text/x-c++src Size: 3090 bytes Desc: not available URL: From ncm at codesourcery.com Wed Sep 28 00:37:39 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Tue, 27 Sep 2005 17:37:39 -0700 Subject: fix two small FFT bugs Message-ID: <20050928003739.GA20043@codesourcery.com> The patch below has been applied. It fixes the only FFT bug revealed thus far by comprehensive testing, and a bug Jules discovered by inspection (respectively). Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.276 diff -u -p -r1.276 ChangeLog --- ChangeLog 27 Sep 2005 22:44:40 -0000 1.276 +++ ChangeLog 28 Sep 2005 00:33:36 -0000 @@ -1,3 +1,9 @@ +2005-09-27 Nathan Myers + + * src/vsip/impl/signal-fft.hpp: fix compilation/instantiation typo + in 2D by-value FFT. + * src/vsip/impl/fft-core.hpp: fix IPP FFT scaling-request flag. + 2005-09-27 Stefan Seefeld * src/vsip/dense.hpp: Remove redundant header inclusion. Index: src/vsip/impl/signal-fft.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fft.hpp,v retrieving revision 1.26 diff -u -p -r1.26 signal-fft.hpp --- src/vsip/impl/signal-fft.hpp 26 Sep 2005 20:11:05 -0000 1.26 +++ src/vsip/impl/signal-fft.hpp 28 Sep 2005 00:33:36 -0000 @@ -241,7 +241,7 @@ empty_view_like(vsip::Domain<1> const& d template View empty_view_like(vsip::Domain<2> const& dom) - { return View(dom[0].size(), dom[1].size(1)); } + { return View(dom[0].size(), dom[1].size()); } template View Index: src/vsip/impl/fft-core.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/fft-core.hpp,v retrieving revision 1.16 diff -u -p -r1.16 fft-core.hpp --- src/vsip/impl/fft-core.hpp 20 Sep 2005 01:29:43 -0000 1.16 +++ src/vsip/impl/fft-core.hpp 28 Sep 2005 00:33:36 -0000 @@ -1250,7 +1250,7 @@ create_ipp_plan( self.doing_scaling_ = (self.scale_ == 1.0/dom.size()); const int flags = self.doing_scaling_ ? (self.is_forward_ ? - IPP_FFT_DIV_FWD_BY_N : IPP_FFT_DIV_FWD_BY_N) : IPP_FFT_NODIV_BY_ANY; + IPP_FFT_DIV_FWD_BY_N : IPP_FFT_DIV_INV_BY_N) : IPP_FFT_NODIV_BY_ANY; typedef typename Time_domain::type time_domain_type; typedef Ipp_DFT< (Dim-doFFTM),time_domain_type> fft_type; From don at codesourcery.com Wed Sep 28 17:38:40 2005 From: don at codesourcery.com (Don McCoy) Date: Wed, 28 Sep 2005 11:38:40 -0600 Subject: [patch] matvec: remaining prod functions Message-ID: <433AD520.6010108@codesourcery.com> The attached implements the last of the functions needed for [math.matvec]. -- Don McCoy CodeSourcery, LLC -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mp.changes URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mp.diff Type: text/x-patch Size: 13020 bytes Desc: not available URL: From jules at codesourcery.com Wed Sep 28 19:07:31 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 28 Sep 2005 15:07:31 -0400 Subject: [patch] enable use of refcount policy for ext_data Message-ID: <433AE9F3.9080403@codesourcery.com> Comparing our vector-add performance (using IPP) against IPP directly showed that we had some overhead for small vector sizes (for vector sizes less than 1024 elements, our red line falls below IPP's green line). This overhead appears to be from incrementing and decrementing reference counts for the blocks being used. This is being done by Ext_data when getting a pointer to the block's data to pass to IPP. Ext_data takes a policy template parameter to indicate whether reference counting should be done, but it was being ignored and reference counting always done. This patch adds a mechanism to View_block_storage to hold a reference using according to a reference-counting policy. With this patch, our performance (blue line) is closer to IPP for small vector sizes. Patch applied. -- Jules -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rp.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vmul.png Type: image/png Size: 7872 bytes Desc: not available URL: From mark at codesourcery.com Wed Sep 28 19:12:28 2005 From: mark at codesourcery.com (Mark Mitchell) Date: Wed, 28 Sep 2005 12:12:28 -0700 Subject: [vsipl++] [patch] enable use of refcount policy for ext_data In-Reply-To: <433AE9F3.9080403@codesourcery.com> References: <433AE9F3.9080403@codesourcery.com> Message-ID: <433AEB1C.9000403@codesourcery.com> Jules Bergmann wrote: > With this patch, our performance (blue line) is closer to IPP for small > vector sizes. Great! For very small vectors (16 elements), I bet we can eventually beat IPP by (when we're compiling with GCC) using GCC's vector extensions, and thereby avoiding the function-call overhead. -- Mark Mitchell CodeSourcery, LLC mark at codesourcery.com (916) 791-8304 From jules at codesourcery.com Wed Sep 28 22:10:09 2005 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 28 Sep 2005 18:10:09 -0400 Subject: [vsipl++] [patch] matvec: remaining prod functions In-Reply-To: <433AD520.6010108@codesourcery.com> References: <433AD520.6010108@codesourcery.com> Message-ID: <433B14C1.2020508@codesourcery.com> Don, This looks good, please check it in. -- Jules Don McCoy wrote: > The attached implements the last of the functions needed for [math.matvec]. > > From ncm at codesourcery.com Thu Sep 29 02:12:58 2005 From: ncm at codesourcery.com (Nathan (Jasper) Myers) Date: Wed, 28 Sep 2005 19:12:58 -0700 Subject: [PATCH] fix IPP 2D FFT, complete FFT tests Message-ID: <20050929021258.GA24272@codesourcery.com> I have checked in the patch below. It adds (nearly) exhaustive testing on Fft features, and fixes failures in IPP FFT support the testing reveals. It also adds tests for real->complex and complex -> real Fftm. Don't be surprised when fft.cpp takes one or two minutes to compile, now, and spends most of that time producing 40MB of assembly code. Nathan Myers ncm Index: ChangeLog =================================================================== RCS file: /home/cvs/Repository/vpp/ChangeLog,v retrieving revision 1.280 retrieving revision 1.281 diff -u -p -r1.280 -r1.281 --- ChangeLog 28 Sep 2005 19:07:26 -0000 1.280 +++ ChangeLog 29 Sep 2005 02:01:09 -0000 1.281 @@ -1,3 +1,17 @@ +2005-09-28 Nathan Myers + + * src/vsip/fft-core.hpp: Make IPP FFT work for 2D FFT. + Make unimplemented IPP driver functions report failure. + * src/vsip/signal-fft.hpp: Initialize scale member early enough + for IPP create_plan use. + * tests/fftm.cpp: Enable tests for complex->real, real->complex. + * tests/fft.cpp: Add comprehensive testing: + (2D, 3D) x ((cx->cx fwd, inv), ((re->cx, cx->re) x (all axes))) + x (Dense/row-major, Dense/column-major, Fast_block) + x (single,double) x (in-place, by_reference, by_value) + x (unscaled, arbitrary-scaled, scaled by N) + Tested with gcc-3.4/em64t/IPP and gcc-4.0.1/x86/FFTW3. + 2005-09-28 Jules Bergmann * src/vsip/impl/block-traits.hpp (View_block_storage): Index: src/vsip/impl/fft-core.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/fft-core.hpp,v retrieving revision 1.17 retrieving revision 1.18 diff -u -p -r1.17 -r1.18 --- src/vsip/impl/fft-core.hpp 28 Sep 2005 00:34:11 -0000 1.17 +++ src/vsip/impl/fft-core.hpp 29 Sep 2005 02:01:09 -0000 1.18 @@ -997,17 +997,19 @@ struct Ipp_DFT_base } static void - forward2(void* plan, void const* in, void* out, void* buffer, bool f) + forward2( + void* plan, void const* in, unsigned in_row_step, + void* out, unsigned out_row_step, void* buffer, bool f) VSIP_NOTHROW { IppStatus result = (f ? (*forwardFFun2)( - reinterpret_cast(in), sizeof(T), - reinterpret_cast(out), sizeof(T), + reinterpret_cast(in), in_row_step, + reinterpret_cast(out), out_row_step, reinterpret_cast(plan), reinterpret_cast(buffer)) : (*forwardDFun2)( - reinterpret_cast(in), sizeof(T), - reinterpret_cast(out), sizeof(T), + reinterpret_cast(in), in_row_step, + reinterpret_cast(out), out_row_step, reinterpret_cast(plan), reinterpret_cast(buffer))); assert(result == ippStsNoErr); } @@ -1029,17 +1031,19 @@ struct Ipp_DFT_base } static void - inverse2(void* plan, void const* in, void* out, void* buffer, bool f) + inverse2( + void* plan, void const* in, unsigned in_row_step, + void* out, unsigned out_row_step, void* buffer, bool f) VSIP_NOTHROW { IppStatus result = (f ? (*inverseFFun2)( - reinterpret_cast(in), sizeof(T), - reinterpret_cast(out), sizeof(T), + reinterpret_cast(in), in_row_step, + reinterpret_cast(out), out_row_step, reinterpret_cast(plan), reinterpret_cast(buffer)) : (*inverseDFun2)( - reinterpret_cast(in), sizeof(T), - reinterpret_cast(out), sizeof(T), + reinterpret_cast(in), in_row_step, + reinterpret_cast(out), out_row_step, reinterpret_cast(plan), reinterpret_cast(buffer))); assert(result == ippStsNoErr); } @@ -1049,21 +1053,21 @@ struct Ipp_DFT_base // template Ipp_DFT_base<>. template inline IppStatus dum(P**, int, int, IppHintAlgorithm) - { return ippStsNoErr; } + { return ippStsErr; } template inline IppStatus dum(P**, int, int, int, IppHintAlgorithm) - { return ippStsNoErr; } + { return ippStsErr; } template inline IppStatus dum(P**, IppiSize, int, IppHintAlgorithm) - { return ippStsNoErr; } + { return ippStsErr; } template inline IppStatus dum(P*) - { return ippStsNoErr; } + { return ippStsErr; } template inline IppStatus dum(P const*, int*) - { return ippStsNoErr; } + { return ippStsErr; } template inline IppStatus dum( T const*, T*, P const*, Ipp8u*) - { return ippStsNoErr; } + { return ippStsErr; } template inline IppStatus dum( T const*, int, T*, int, P const*, Ipp8u*) - { return ippStsNoErr; } + { return ippStsErr; } // Specializations of Ipp_DFT create the mappings from argument @@ -1255,10 +1259,15 @@ create_ipp_plan( typedef typename Time_domain::type time_domain_type; typedef Ipp_DFT< (Dim-doFFTM),time_domain_type> fft_type; - self.plan_from_to_ = ((Dim - doFFTM == 1) ? - fft_type::create_plan(sizex, flags, self.use_fft_) : - fft_type::create_plan2(sizex, sizey, flags, self.use_fft_)); - + if (Dim - doFFTM == 1) + self.plan_from_to_ = fft_type::create_plan(sizex, flags, self.use_fft_); + else + { + self.plan_from_to_ = + fft_type::create_plan2(sizex, sizey, flags, self.use_fft_); + self.row_step_ = sizeof(outT) * dom[0].size(); + } + self.p_buffer_ = impl::alloc_align( 16, fft_type::bufsize(self.plan_from_to_, self.use_fft_)); if (self.p_buffer_ == 0) @@ -1373,11 +1382,11 @@ from_to( // IPP doesn't implement 2D double FFT. Spec allows that. #if ! defined(VSIP_IMPL_DOUBLE) if (self.is_forward_) - Ipp_DFT<2,std::complex >::forward( - self.plan_from_to_, in, out, self.p_buffer_, self.use_fft_) ; + Ipp_DFT<2,std::complex >::forward2(self.plan_from_to_, + in, self.row_step_, out, self.row_step_, self.p_buffer_, self.use_fft_) ; else - Ipp_DFT<2,std::complex >::inverse( - self.plan_from_to_, in, out, self.p_buffer_, self.use_fft_); + Ipp_DFT<2,std::complex >::inverse2(self.plan_from_to_, + in, self.row_step_, out, self.row_step_, self.p_buffer_, self.use_fft_); if (self.doing_scaling_) self.scale_ = 1.0; @@ -1421,8 +1430,8 @@ from_to( VSIP_IMPL_THROW(impl::unimplemented( "IPP FFT-2D real->complex not implemented")); #if 0 - Ipp_DFT<1,SCALAR_TYPE>::forward2( - self.plan_from_to_, in, out, self.p_buffer_, self.use_fft_) ; + Ipp_DFT<1,SCALAR_TYPE>::forward2(self.plan_from_to_, + in, self.row_step_, out, self.row_step_, self.p_buffer_, self.use_fft_) ; // unpack in place if (self.doing_scaling_) self.scale_ = 1.0; @@ -1463,8 +1472,8 @@ from_to( #if 0 // pack in place; maybe this must happen in // fft_by_ref, where _in_, just copied into, is writeable. - Ipp_DFT<1,SCALAR_TYPE>::inverse2( - self.plan_from_to_, in, out, self.p_buffer_, self.use_fft_) ; + Ipp_DFT<1,SCALAR_TYPE>::inverse2(self.plan_from_to_, + in, self.row_step_, out, self.row_step_, self.p_buffer_, self.use_fft_) ; if (self.doing_scaling_) self.scale_ = 1.0; #endif Index: src/vsip/impl/signal-fft.hpp =================================================================== RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fft.hpp,v retrieving revision 1.27 retrieving revision 1.28 diff -u -p -r1.27 -r1.28 --- src/vsip/impl/signal-fft.hpp 28 Sep 2005 00:34:11 -0000 1.27 +++ src/vsip/impl/signal-fft.hpp 29 Sep 2005 02:01:10 -0000 1.28 @@ -80,6 +80,7 @@ struct Fft_core : impl::Ref_countinput_size_) , out_temp_(this->output_size_) { + core_->scale_ = scale; // IPP needs this. impl::Ext_data raw_in(this->in_temp_); impl::Ext_data raw_out(this->out_temp_); this->core_->create_plan( Index: tests/fftm.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fftm.cpp,v retrieving revision 1.7 retrieving revision 1.8 diff -u -p -r1.7 -r1.8 --- tests/fftm.cpp 28 Sep 2005 04:32:55 -0000 1.7 +++ tests/fftm.cpp 29 Sep 2005 02:01:10 -0000 1.8 @@ -107,6 +107,32 @@ void dft_y( } +template +void dft_y_real( + vsip::Matrix in, + vsip::Matrix, Block2> out) +{ + length_type const xsize = in.size(1); + length_type const ysize = in.size(0); + assert(in.size(0)/2 + 1 == out.size(0)); + assert(in.size(1) == out.size(1)); + typedef long double AccT; + + AccT const phi = -2.0 * M_PI/ysize; + + for (index_type v=0; v < xsize; ++v) + for (index_type w=0; w < ysize / 2 + 1; ++w) + { + vsip::complex sum = vsip::complex(); + for (index_type k=0; k(in(k,v)) * sin_cos(phi*k*w); + out(w,v) = vsip::complex(sum); + } +} + + // Error metric between two vectors. template c and c->r by-value Fft. template void -test_real(const int set, const length_type N) +test_real(const length_type N) { - typedef Fftm, col, 0, by_value, 1, alg_space> + typedef Fftm, col, fft_fwd, by_value, 1, alg_space> f_fftm_type; - typedef Fftm, T, col, 0, by_value, 1, alg_space> + typedef Fftm, T, col, fft_inv, by_value, 1, alg_space> i_fftm_type; const length_type N2 = N/2 + 1; - f_fftm_type f_fftm(Domain<1>(N), 1.0); - i_fftm_type i_fftm(Domain<1>(N), 1.0/(N)); + f_fftm_type f_fftm(Domain<2>(Domain<1>(N),Domain<1>(5)), 1.0); + i_fftm_type i_fftm(Domain<2>(Domain<1>(N),Domain<1>(5)), 1.0/N); - assert(f_fftm.input_size().size() == N); - assert(f_fftm.output_size().size() == N2); + assert(f_fftm.input_size().size() == 5*N); + assert(f_fftm.output_size().size() == 5*N2); - assert(i_fftm.input_size().size() == N2); - assert(i_fftm.output_size().size() == N); + assert(i_fftm.input_size().size() == 5*N2); + assert(i_fftm.output_size().size() == 5*N); assert(f_fftm.scale() == 1.0); // can represent exactly assert(i_fftm.scale() > 1.0/(N + 1) && i_fftm.scale() < 1.0/(N - 1)); assert(f_fftm.forward() == true); assert(i_fftm.forward() == false); - Matrix in(N, T()); - Matrix > out(N2); - Matrix > ref(N2); - Matrix inv(N); - Matrix inv2(N); + Matrix in(N, 5, T()); + Matrix > out(N2, 5); + Matrix > ref(N2, 5); + Matrix inv(N, 5); - setup_data(set, in, 3.0); + setup_data_y(in); + dft_y_real(in, ref); out = f_fftm(in); - - if (set == 1) - { - setup_data(3, ref, 3.0); - assert(error_db(ref, out) < -100); - } - if (set == 3) - { - setup_data(1, ref, 3.0 * N); - assert(error_db(ref, out) < -100); - } - - ref = out; inv = i_fftm(out); + assert(error_db(ref, out) < -100); assert(error_db(inv, in) < -100); - - // make sure out has not been scribbled in during the conversion. - assert(error_db(ref,out) < -100); } -#endif int @@ -494,12 +503,10 @@ main() test_by_val_y >(18); test_by_val_y >(256); -# if 0 // Tests for test r->c, c->r. test_real(128); test_real(242); test_real(16); -# endif #endif #if defined(VSIP_IMPL_FFT_USE_DOUBLE) @@ -519,12 +526,10 @@ main() test_by_val_y >(18); test_by_val_y >(256); -# if 0 // Tests for test r->c, c->r. test_real(128); test_real(242); test_real(16); -# endif #endif return 0; Index: tests/fft.cpp =================================================================== RCS file: /home/cvs/Repository/vpp/tests/fft.cpp,v retrieving revision 1.7 retrieving revision 1.8 diff -u -p -r1.7 -r1.8 --- tests/fft.cpp 28 Sep 2005 04:32:54 -0000 1.7 +++ tests/fft.cpp 29 Sep 2005 02:01:10 -0000 1.8 @@ -17,6 +17,7 @@ #include #include #include +#include #include "test.hpp" #include "output.hpp" @@ -134,6 +135,52 @@ error_db( return maxsum; } +// Error metric between two Matrices. + +template +double +error_db( + const_Matrix v1, + const_Matrix v2) +{ + double maxsum = -250; + for (unsigned i = 0; i < v1.size(0); ++i) + { + double sum = error_db(v1.row(i), v2.row(i)); + if (sum > maxsum) + maxsum = sum; + } + return maxsum; +} + + + +// Error metric between two Tensors. + +template +double +error_db( + const_Tensor v1, + const_Tensor v2) +{ + double maxsum = -250; + for (unsigned i = 0; i < v1.size(0); ++i) + { + vsip::Domain<1> y(v1.size(1)); + vsip::Domain<1> x(v1.size(2)); + double sum = error_db(v1(i,y,x), v2(i,y,x)); + if (sum > maxsum) + maxsum = sum; + } + return maxsum; +} + // Setup input data for Fft. @@ -307,12 +354,573 @@ test_real(const int set, const length_ty assert(error_db(ref,out) < -100); } +///////////////////////////////////////////////////////////////////// +// +// Comprehensive 2D, 3D test +// + +// Elt: unsigned -> element type + +template struct Elt; +template struct Elt +{ + typedef T in_type; + typedef std::complex out_type; +}; +template struct Elt +{ + typedef std::complex in_type; + typedef std::complex out_type; +}; + +template struct Arg; + +template +struct Arg +{ + typedef typename vsip::impl::View_of_dim::type> >::type type; +}; + +template +struct Arg +{ + typedef typename vsip::impl::View_of_dim::type> >::type type; +}; + +template +struct Arg +{ + typedef typename vsip::impl::View_of_dim::type, + vsip::impl::Stride_unit_dense + > > >::type type; +}; + +inline unsigned +adjust_size(unsigned size, bool is_short, bool is_short_dim, bool no_odds) +{ + // no odd sizes along axis for real->complex + if ((size & 1) && no_odds && is_short_dim) + ++size; + return (is_short && is_short_dim) ? size / 2 + 1 : size; +} + +template vsip::Domain make_dom(unsigned*, bool, int, bool); +template <> vsip::Domain<2> make_dom<2>( + unsigned* d, bool is_short, int sd, bool no_odds) +{ + return vsip::Domain<2>( + vsip::Domain<1>(adjust_size(d[1], is_short, sd == 0, no_odds)), + vsip::Domain<1>(adjust_size(d[2], is_short, sd == 1, no_odds))); +} +template <> vsip::Domain<3> make_dom<3>( + unsigned* d, bool is_short, int sd, bool no_odds) +{ + return vsip::Domain<3>( + vsip::Domain<1>(adjust_size(d[0], is_short, sd == 0, no_odds)), + vsip::Domain<1>(adjust_size(d[1], is_short, sd == 1, no_odds)), + vsip::Domain<1>(adjust_size(d[2], is_short, sd == 2, no_odds))); +} + +template +vsip::Domain<2> +domain_of(vsip::Matrix const& src) +{ + return vsip::Domain<2>(vsip::Domain<1>(src.size(0)), + vsip::Domain<1>(src.size(1))); +} + + +template +vsip::Domain<3> +domain_of(vsip::Tensor const& src) +{ + return vsip::Domain<2>(vsip::Domain<1>(src.size(0)), + vsip::Domain<1>(src.size(1)), + vsip::Domain<1>(src.size(2))); +} + +// + +template +vsip::Matrix +force_copy_init(vsip::Matrix const& src) +{ + vsip::Matrix tmp(src.size(0), src.size(1)); + tmp = src; + return tmp; +} + +template +vsip::Tensor +force_copy_init(vsip::Tensor const& src) +{ + vsip::Tensor tmp(src.size(0), src.size(1), src.size(2)); + tmp = src; + return tmp; +} + +// + +template void set_values(T& v1, T& v2) +{ v1 = T(10); v2 = T(20); } + +template void set_values(std::complex& z1, std::complex& z2) +{ + z1 = std::complex(T(10), T(10)); + z2 = std::complex(T(20), T(20)); +} + +#if 1 + +// 2D + +template +void fill_random( + vsip::Matrix in, vsip::Rand& rander) +{ + in = (rander.randu(in.size(0), in.size(1)) * 20.0) - 10.0; +} + +template +void fill_random( + vsip::Matrix,BlockT> in, + vsip::Rand >& rander) +{ + in = rander.randu(in.size(0), in.size(1)) * std::complex(20.0) - + std::complex(10.0, 10.0); +} + +// 3D + +template +void fill_random( + vsip::Tensor& in, vsip::Rand& rander) +{ + vsip::Domain<2> sub(vsip::Domain<1>(in.size(1)), + vsip::Domain<1>(in.size(2))); + for (unsigned i = in.size(0); i-- > 0;) + fill_random(in(i, vsip::Domain<1>(in.size(1)), + vsip::Domain<1>(in.size(2))), rander); +} + +#else +// debug -- keep this. + +// 2D + +template +void fill_random( + vsip::Matrix in, vsip::Rand& rander) +{ + in = T(0); + in.block().put(0, 0, T(1.0)); +} + +// 3D + +template +void fill_random( + vsip::Tensor& in, vsip::Rand& rander) +{ + in = T(0); + in.block().put(0, 0, 0, T(1.0)); +} + +#endif + +////// + +// 2D, cc + +template +void +compute_ref( + vsip::Matrix,inBlock> const& in, + vsip::Domain<2> const& in_dom, + vsip::Matrix,outBlock>& ref, + vsip::Domain<2> const& out_dom, + int (& /* dum */)[1]) +{ + vsip::Fftm,std::complex,0, + vsip::fft_fwd,vsip::by_reference,1> fftm_across(in_dom, 1.0); + fftm_across(in, ref); + + vsip::Fftm,std::complex,1, + vsip::fft_fwd,vsip::by_reference,1> fftm_down(out_dom, 1.0); + fftm_down(ref); +} + +// 2D, rc + +template +void +compute_ref( + vsip::Matrix const& in, + vsip::Domain<2> const& in_dom, + vsip::Matrix,outBlock>& ref, + vsip::Domain<2> const& out_dom, + int (& /* dum */)[1]) +{ + vsip::Fftm,1, + vsip::fft_fwd,vsip::by_reference,1> fftm_across(in_dom, 1.0); + fftm_across(in, ref); + + typedef std::complex CT; + vsip::Fftm fftm_down(out_dom, 1.0); + fftm_down(ref); +} + +// 2D, rc + +template +void +compute_ref( + vsip::Matrix const& in, + vsip::Domain<2> const& in_dom, + vsip::Matrix,outBlock>& ref, + vsip::Domain<2> const& out_dom, + int (& /* dum */)[2]) +{ + vsip::Fftm,0, + vsip::fft_fwd,vsip::by_reference,1> fftm_across(in_dom, 1.0); + fftm_across(in, ref); + + typedef std::complex CT; + vsip::Fftm fftm_down(out_dom, 1.0); + fftm_down(ref); +} + +// 3D, cc + +template +void +compute_ref( + vsip::Tensor,inBlock> const& in, + vsip::Domain<3> const& in_dom, + vsip::Tensor,outBlock>& ref, + vsip::Domain<3> const& out_dom, + int (& /* dum */)[1]) +{ + typedef std::complex CT; + + vsip::Fft fft_across( + vsip::Domain<2>(in_dom[1], in_dom[2]), 1.0); + for (unsigned i = in_dom[0].size(); i-- > 0; ) + fft_across(in(i, in_dom[1], in_dom[2]), + ref(i, out_dom[1], out_dom[2])); + + // note: axis ---v--- here is reverse of notation used otherwise. + vsip::Fftm fftm_down( + vsip::Domain<2>(in_dom[0], in_dom[1]), 1.0); + for (unsigned k = in_dom[2].size(); k-- > 0; ) + fftm_down(ref(out_dom[0], out_dom[1], k)); +} + +// 3D, rc, shorten bottom-top + +template +void +compute_ref( + vsip::Tensor const& in, + vsip::Domain<3> const& in_dom, + vsip::Tensor,outBlock>& ref, + vsip::Domain<3> const& out_dom, + int (& /* dum */)[1]) +{ + typedef std::complex CT; + + // first, planes left-right, squeeze top-bottom + vsip::Fft fft_across( + vsip::Domain<2>(in_dom[0], in_dom[1]), 1.0); + for (unsigned k = in_dom[2].size(); k-- > 0; ) + fft_across(in(in_dom[0], in_dom[1], k), + ref(out_dom[0], out_dom[1], k)); + + // planes top-bottom, running left-right + // note: axis ---v--- here is reverse of notation used otherwise. + vsip::Fftm fftm_down( + vsip::Domain<2>(in_dom[1], in_dom[2]), 1.0); + for (unsigned i = out_dom[0].size(); i-- > 0; ) + fftm_down(ref(i, out_dom[1], out_dom[2])); +} + +// 3D, rc, shorten front->back + +template +void +compute_ref( + vsip::Tensor const& in, + vsip::Domain<3> const& in_dom, + vsip::Tensor,outBlock>& ref, + vsip::Domain<3> const& out_dom, + int (& /* dum */)[2]) +{ + typedef std::complex CT; + + // planes top-bottom, squeeze front-back + vsip::Fft fft_across( + vsip::Domain<2>(in_dom[1], in_dom[2]), 1.0); + for (unsigned i = in_dom[0].size(); i-- > 0; ) + fft_across(in(i, in_dom[1], in_dom[2]), + ref(i, out_dom[1], out_dom[2])); + + // planes front-back, running bottom-top + // note: axis ---v--- here is reverse of notation used otherwise. + vsip::Fftm fftm_down( + vsip::Domain<2>(in_dom[0], in_dom[2]), 1.0); + for (unsigned j = out_dom[1].size(); j-- > 0; ) + fftm_down(ref(out_dom[0], j, out_dom[2])); +} + +// 3D, rc, shorten left-right + +template +void +compute_ref( + vsip::Tensor const& in, + vsip::Domain<3> const& in_dom, + vsip::Tensor,outBlock>& ref, + vsip::Domain<3> const& out_dom, + int (& /* dum */)[3]) +{ + typedef std::complex CT; + + // planes top-bottom, squeeze left-right + vsip::Fft fft_across( + vsip::Domain<2>(in_dom[1], in_dom[2]), 1.0); + for (unsigned i = in_dom[0].size(); i-- > 0; ) + fft_across(in(i, in_dom[1], in_dom[2]), + ref(i, out_dom[1], out_dom[2])); + + // planes left-right, running bottom-top + // note: axis ---v--- here is reverse of notation used otherwise. + vsip::Fftm fftm_down( + vsip::Domain<2>(in_dom[0], in_dom[1]), 1.0); + for (unsigned k = out_dom[2].size(); k-- > 0; ) + fftm_down(ref(out_dom[0], out_dom[1], k)); +} + +template +struct Test_fft; + +template +struct Test_fft<2,T1,T2,sD,How> +{ typedef vsip::Fft type; }; + +template +struct Test_fft<3,T1,T2,sD,How> +{ typedef vsip::Fft type; }; + +// check_in_place +// + +// there is no in-place for real->complex + +template