From jules at codesourcery.com Tue Oct 3 14:24:12 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 03 Oct 2006 10:24:12 -0400 Subject: Missing file for PAS Message-ID: <4522728C.4080800@codesourcery.com> Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: offset.diff URL: From stefan at codesourcery.com Thu Oct 5 01:45:27 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 04 Oct 2006 21:45:27 -0400 Subject: patch: python bindings prototype Message-ID: <452463B7.6020906@codesourcery.com> I just checked in the attached patch, which provides a prototype for python bindings for VSIPL++, using boost.python. This is really more a proof-of-concept than a complete binding, since most functions are still missing. However, I'm able to use it to create vectors and run simple functions, as well as ffts and convolutions on them. Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- A non-text attachment was scrubbed... Name: scripting.patch Type: text/x-patch Size: 16936 bytes Desc: not available URL: From jules at codesourcery.com Thu Oct 5 04:57:26 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Oct 2006 00:57:26 -0400 Subject: [vsipl++] patch: python bindings prototype In-Reply-To: <452463B7.6020906@codesourcery.com> References: <452463B7.6020906@codesourcery.com> Message-ID: <452490B6.9060304@codesourcery.com> Stefan, How does this behave when python isn't present on the system? It looks like configure will run python even if scripting isn't enabled. Can you gate the python bindings with --enable-scripting? If the user doesn't explicitly '--enable-scripting', then configure shouldn't try to run python. Also, what is the story with shared libraries? That is only for the scripting, right? -- Jules Stefan Seefeld wrote: > I just checked in the attached patch, which provides a prototype > for python bindings for VSIPL++, using boost.python. > This is really more a proof-of-concept than a complete binding, > since most functions are still missing. > However, I'm able to use it to create vectors and run simple > functions, as well as ffts and convolutions on them. > > Regards, > Stefan > > > > ------------------------------------------------------------------------ > > +} > Index: configure.ac > =================================================================== > --- configure.ac (revision 150667) > +++ configure.ac (working copy) > @@ -333,6 +333,28 @@ > > AC_SUBST(QMTEST, $with_qmtest) > > +AC_ARG_ENABLE(scripting, > + [ --enable-scripting Specify whether or not to build the python bindings.],, > + [enable_scripting="no"]) > + > +AC_ARG_WITH(python, > + [ --with-python=PATH Specify the Python interpreter.], > + PYTHON="$with_python", > + PYTHON="python" > +) > + > +AC_ARG_WITH(boost-prefix, > + [ --with-boost-prefix=PATH Specify the boost installation prefix.], > + BOOST_PREFIX="$with_boost_prefix", > + BOOST_PREFIX="/usr" > +) > + > +AC_ARG_WITH(boost-version, > + [ --with-boost-version=VERSION Specify the boost version.], > + BOOST_VERSION="$with_boost_version", > + BOOST_VERSION="1.33" > +) > + > # > # Put libs directory int INT_LDFLAGS: > # > @@ -1329,7 +1351,6 @@ > # Copy libg2c into libdir, if requested. > # > if test "x$with_g2c_copy" != "x"; then > - mkdir -p lib > cp $with_g2c_copy lib > curdir=`pwd` > G2C_LDFLAGS="-L$curdir/lib" > @@ -2009,6 +2030,76 @@ > AC_SUBST(INT_CPPFLAGS) > > # > +# Python frontend > +# > +echo "PYTHON $PYTHON" > +if test -n "$PYTHON" -a "$PYTHON" != yes; then Why is this code comment out? Either explain why it is commented out, our delete it. > +dnl AC_CHECK_FILE($PYTHON,,AC_MSG_ERROR([Cannot find Python interpreter])) > +dnl else > + AC_PATH_PROG(PYTHON, python2 python, python) > +fi > +PYTHON_INCLUDE=`$PYTHON -c "from distutils import sysconfig; print sysconfig.get_python_inc()"` > +PYTHON_EXT=`$PYTHON -c "from distutils import sysconfig; print sysconfig.get_config_var('SO')"` > + > +case $build in > +CYGWIN*) > + if test `$PYTHON -c "import os; print os.name"` = posix; then > + PYTHON_PREFIX=`$PYTHON -c "import sys; print sys.prefix"` > + PYTHON_VERSION=`$PYTHON -c "import sys; print '%d.%d'%(sys.version_info[[0]],sys.version_info[[1]])"` > + PYTHON_LIBS="-L $PYTHON_PREFIX/lib/python$PYTHON_VERSION/config -lpython$PYTHON_VERSION" This sounds like a FIXME. Let's just document what we do: "Cygwin doesn't have a -lutil, but some version of distutils tell us to use it anyway. This has been tested for cygwin versions UMPTY-UMP." and add an issue for the check each library thing if it is important to fix later. > +dnl Cygwin doesn't have an -lutil, but some versions of distutils tell us to use it anyway. > +dnl It would be better to check for each library it tells us to use with AC_CHECK_LIB, but > +dnl to do that, we need the name of a function in each one, so we'll just hack -lutil out > +dnl of the list. > + PYTHON_DEP_LIBS=`$PYTHON -c "from distutils import sysconfig; import re; print re.sub(r'\\s*-lutil', '', sysconfig.get_config_var('LIBS') or '')"` > + else dnl this is 'nt' > + if test "$CXX" = "g++"; then > + CFLAGS="-mno-cygwin $CFLAGS" > + CXXFLAGS="-mno-cygwin $CXXFLAGS" > + LDFLAGS="-mno-cygwin $LDFLAGS" > + PYTHON_PREFIX=`$PYTHON -c "import sys; print sys.prefix"` > + PYTHON_VERSION=`$PYTHON -c "import sys; print '%d%d'%(sys.version_info[[0]],sys.version_info[[1]])"` > + PYTHON_LIBS="-L `cygpath -a $PYTHON_PREFIX`/Libs -lpython$PYTHON_VERSION" > + fi > + PYTHON_INCLUDE=`cygpath -a $PYTHON_INCLUDE` > + PYTHON_DEP_LIBS=`$PYTHON -c "from distutils import sysconfig; print sysconfig.get_config_var('LIBS') or ''"` > + fi > + LDSHARED="$CXX -shared" > + PYTHON_LIBS="$PYTHON_LIBS $PYTHON_DEP_LIBS" > + ;; > +*) > + LDSHARED="$CXX -shared" > + ;; > +esac > + > +PYTHON_LIBS="$PYTHON_LIBS $PYTHON_DEP_LIBS" > + > +AC_SUBST(PYTHON) > +AC_SUBST(PYTHON_CPP, "-I $PYTHON_INCLUDE") > +AC_SUBST(PYTHON_LIBS) > +AC_SUBST(PYTHON_EXT) > + > +AC_SUBST(LDSHARED) > + Whats the AC_LANG(C++) for? We should have set it to C++ at the top of configure. > +AC_LANG(C++) > +if test "$enable_scripting" == "yes"; then > + AC_SUBST(enable_scripting, 1) > + if test -n "$with_boost_prefix"; then > + BOOST_CPPFLAGS="-I$with_boost_prefix/include" > + BOOST_LDFLAGS="-L$with_boost_prefix/lib" > + fi > + save_CPPFLAGS=$CPPFLAGS > + CPPFLAGS="$CPPFLAGS $BOOST_CPPFLAGS $PYTHON_CPP" > + AC_CHECK_HEADER([boost/python.hpp], [], > + [AC_MSG_ERROR([boost.python could not be found])]) > + CPPFLAGS="$save_CPPFLAGS" Likewise, why is this dnl? > +dnl save_LIBS=$LIBS > +dnl LIBS="$LIBS $BOOST_LDFLAGS -lboost_wave" > +dnl AC_CHECK_LIB(boost_wave, boost::wave::wave_init) > + AC_SUBST(BOOST_CPPFLAGS) > + AC_SUBST(BOOST_LDFLAGS) > +fi > +# > # Print summary. > # > AC_MSG_NOTICE(Summary) > @@ -2032,10 +2123,14 @@ > AC_MSG_RESULT([Complex storage format: interleaved]) > fi > AC_MSG_RESULT([Timer: ${enable_timer}]) > +AC_MSG_RESULT([With Python bindings: ${enable_scripting}]) > > # > # Done. > # > +mkdir -p bin > +mkdir -p lib > +mkdir -p lib/python/site-packages/vsip > mkdir -p src/vsip/impl/sal > mkdir -p src/vsip/impl/ipp > mkdir -p src/vsip/impl/fftw3 -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu Oct 5 05:01:05 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Oct 2006 01:01:05 -0400 Subject: Document --with-{obj,lib,exe}-ext configure opts Message-ID: <45249191.8010402@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: doc.diff URL: From jules at codesourcery.com Thu Oct 5 06:32:14 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 05 Oct 2006 02:32:14 -0400 Subject: PAS Updates Message-ID: <4524A6EE.1020708@codesourcery.com> This patch fixes several issues in the 1.2 release when using PAS: - Fix configure to work without a pkg-config file for PAS (tested for both Linux cluster PAS and MCOE PAS), - Fix to install PAS headers, - copy benchmark attempted to measure MPI parallel assign, even when configured for PAS, It also: - adds an early-binding PAS parallel assignment. - dispatches SIMD greater-than routine for less-than expressions. - adds heuristic to configure to determine correct LIBEXT for Mercury systems. - Fixes benchmarks attempting to copy communicators by value. - provides a function (library_config) that returns important ifdefs used to build library. - Makes portions of fft_be test conditional to reduce the compilation effort. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pas.diff URL: From stefan at codesourcery.com Thu Oct 5 11:40:43 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 05 Oct 2006 07:40:43 -0400 Subject: [vsipl++] patch: python bindings prototype In-Reply-To: <452490B6.9060304@codesourcery.com> References: <452463B7.6020906@codesourcery.com> <452490B6.9060304@codesourcery.com> Message-ID: <4524EF3B.1000609@codesourcery.com> Jules Bergmann wrote: > Stefan, > > How does this behave when python isn't present on the system? It looks > like configure will run python even if scripting isn't enabled. > > Can you gate the python bindings with --enable-scripting? If the user > doesn't explicitly '--enable-scripting', then configure shouldn't try to > run python. You are right. The attached patch moves the python checks into the block that is only executed if scripting is enabled. (Patch is checked in.) > Also, what is the story with shared libraries? That is only for the > scripting, right? Yes. Python extension modules are built as DSOs, so I had to add some harness to support that. Nothing else is affected by that. Regards, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From assem at codesourcery.com Mon Oct 9 15:27:03 2006 From: assem at codesourcery.com (Assem Salama) Date: Mon, 09 Oct 2006 11:27:03 -0400 Subject: Lu Solver Message-ID: <452A6A47.60003@codesourcery.com> Everyone, This is the new lu solver that uses the cvsipl backend. Thanks, Assem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn.diff.10092006.1.log URL: From jules at codesourcery.com Mon Oct 9 19:51:49 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 09 Oct 2006 15:51:49 -0400 Subject: [vsipl++] Lu Solver In-Reply-To: <452A6A47.60003@codesourcery.com> References: <452A6A47.60003@codesourcery.com> Message-ID: <452AA855.3040801@codesourcery.com> Assem Salama wrote: > Everyone, > This is the new lu solver that uses the cvsipl backend. Assem, Thanks, this looks good overall. I have several comments on specifics below. Did you run the test suite against this yet? -- Jules > ------------------------------------------------------------------------ > > Index: ref_impl/vsipl/solver_lu.hpp First, I would like to use another name besides 'ref_impl' because that implies the directory files are reference-implementation only. Second, I would like to use the subdirectory name 'cvsip' instead of 'vsipl' to avoid confusion between VSIPL (the C API) and VSIPL++. Also, we should use the name 'cvsip' instead of 'cvsipl' for consistency. We use the directory and namespace names 'vsip'. C-VSIPL uses 'vsip_' as a prefix, etc. If we use the name 'cvsipl' it will be a source of confusion. Please make sure all your uses in code of the name vsip/csvip (i.e. especially in namespaces, class names, and function names, but also in variable names, etc) avoid the 'l'. Using the names "VSIPL" "C-VSIPL", etc is OK in comments. > =================================================================== > --- ref_impl/vsipl/solver_lu.hpp (revision 0) > +++ ref_impl/vsipl/solver_lu.hpp (revision 0) > @@ -0,0 +1,230 @@ > +/* Copyright (c) 2005, 2006 by CodeSourcery, LLC. All rights reserved. */ Update copyright, it should be 2006 and it should be "CodeSourcery" instead of "CodeSourcery, LLC". > + > +/** @file vsip/impl/lapack/solver_lu.hpp [1] Update subdirectory name > + @author Assem Salama > + @date 2006-04-13 [2] Update the date. > + @brief VSIPL++ Library: LU linear system solver using lapack. [3] using cvsipl. > + > +*/ > + > +#ifndef VSIP_REF_IMPL_SOLVER_LU_HPP > +#define VSIP_REF_IMPL_SOLVER_LU_HPP [4] The ifdef guard should include the path. If we were going to keep this file in 'ref_impl/vsipl' the guard should be: #ifndef VSIP_REF_IMPL_VSIPL_SOLVER_LU_HPP > + > +/*********************************************************************** > + Included Files > +***********************************************************************/ > + > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > + > + > +/*********************************************************************** > + Declarations > +***********************************************************************/ > + > +namespace vsip > +{ > + > +namespace impl > +{ > + > +/// LU factorization implementation class. Common functionality > +/// for lud by-value and by-reference classes. > + > +template > +class Lud_impl > + : Compile_time_assert::valid> [5] We need a Cvsip_traits equivalent of Blas_traits to determine of C-VSIPL supports a value type. > +{ > + // BLAS/LAPACK require complex data to be in interleaved format. > + typedef Layout<2, col2_type, Stride_unit_dense, Cmplx_inter_fmt> data_LP; > + typedef Fast_block<2, T, data_LP> data_block_type; [6] C-VSIPL supports both split and interleaved complex. We should take advantage of that. If the split/interleave type used by C-VSIPL for solve and decompose has to be consistent, we should determine split/interleave based on the default for Dense. If not, let's just pass split/interleave directly through. [7] Although its not mentioned in the comment, BLAS/LAPACK also requires data to be column-major. That shouldn't be necessary for C-VSIPL. If possible, we should pass both row-major and column-major data directly to C-VSIPL and let it sort it out. > + > + // Constructors, copies, assignments, and destructors. > +public: > + Lud_impl(length_type) > + VSIP_THROW((std::bad_alloc)); > + Lud_impl(Lud_impl const&) > + VSIP_THROW((std::bad_alloc)); > + > + Lud_impl& operator=(Lud_impl const&) VSIP_NOTHROW; > + ~Lud_impl() VSIP_NOTHROW; > + > + // Accessors. > +public: > + length_type length()const VSIP_NOTHROW { return length_; } > + > + // Solve systems. > +public: > + template > + bool decompose(Matrix) VSIP_NOTHROW; > + > +protected: > + template + typename Block0, > + typename Block1> > + bool impl_solve(const_Matrix, Matrix) > + VSIP_NOTHROW; > + > + // Member data. > +private: > + typedef std::vector > vector_type; > + > + length_type length_; // Order of A. > + vector_type ipiv_; // Additional info on Q > + > + Matrix data_; // Factorized Cholesky matrix (A) > + vsip::ref_impl::cvsipl::CVSIPL_Matrix cvsipl_data_; > + vsip::ref_impl::cvsipl::CVSIPL_Lud cvsipl_lud_; > +}; > + > +} // namespace vsip::impl > + > + > +/*********************************************************************** > + Definitions > +***********************************************************************/ > + > +namespace impl > +{ > + > +template > +Lud_impl::Lud_impl( > + length_type length > + ) > +VSIP_THROW((std::bad_alloc)) > + : length_ (length), > + ipiv_ (length_), > + data_ (length_, length_), > + cvsipl_data_ (data_.block().impl_data(), length_, length_), > + cvsipl_lud_ (length_) > +{ > + assert(length_ > 0); > +} > + > + > + > +template > +Lud_impl::Lud_impl(Lud_impl const& lu) > +VSIP_THROW((std::bad_alloc)) > + : length_ (lu.length_), > + ipiv_ (length_), > + data_ (length_, length_), > + cvsipl_data_ (data_.block().impl_data(), length_, length_), > + cvsipl_lud_ (length_) > +{ > + data_ = lu.data_; > + for (index_type i=0; i + ipiv_[i] = lu.ipiv_[i]; > +} > + > + > + > +template > +Lud_impl::~Lud_impl() > + VSIP_NOTHROW > +{ > +} > + > + > + > +/// Form LU factorization of matrix A > +/// > +/// Requires > +/// A to be a square matrix, either > +/// > +/// FLOPS: > +/// real : UPDATE > +/// complex: UPDATE > + > +template > +template > +bool > +Lud_impl::decompose(Matrix m) > + VSIP_NOTHROW > +{ > + assert(m.size(0) == length_ && m.size(1) == length_); > + > + assign_local(data_, m); > + > + Ext_data ext(data_.block()); [8] 'ext' isn't being used. > + > + bool success = cvsipl_lud_.decompose(cvsipl_data_); > + > + > + return success; > +} > + > + > + > +/// Solve Op(A) x = b (where A previously given to decompose) > +/// > +/// Op(A) is > +/// A if tr == mat_ntrans > +/// A^T if tr == mat_trans > +/// A' if tr == mat_herm (valid for T complex only) > +/// > +/// Requires > +/// B to be a (length, P) matrix > +/// X to be a (length, P) matrix > +/// > +/// Effects: > +/// X contains solution to Op(A) X = B > + > +template > +template + typename Block0, > + typename Block1> > +bool > +Lud_impl::impl_solve( > + const_Matrix b, > + Matrix x) > + VSIP_NOTHROW > +{ > + assert(b.size(0) == length_); > + assert(b.size(0) == x.size(0) && b.size(1) == x.size(1)); > + > + vsip_mat_op trans; > + > + Matrix b_int(b.size(0), b.size(1)); > + assign_local(b_int, b); > + > + if (tr == mat_ntrans) > + trans = VSIP_MAT_NTRANS; > + else if (tr == mat_trans) > + trans = VSIP_MAT_TRANS; > + else if (tr == mat_herm) > + { > + assert(Is_complex::value); > + trans = VSIP_MAT_HERM; > + } > + > + { > + Ext_data b_ext(b_int.block()); > + > + vsip::ref_impl::cvsipl::CVSIPL_Matrix > + cvsipl_b_int(b_ext.data(), b.size(0),b.size(1)); > + > + cvsipl_lud_.solve(trans,cvsipl_b_int); > + > + } > + assign_local(x, b_int); > + > + return true; > +} > + > +} // namespace vsip::impl > + > +} // namespace vsip > + > + > +#endif // VSIP_IMPL_LAPACK_SOLVER_LU_HPP [9] Update guard name in comment. > Index: ref_impl/vsipl/cvsipl_support.hpp [10] This file looks like it will have the core traits and function definitions for using the C-VSIPL backend. Similar to the ipp.hpp, sal.hpp, and lapack.hpp files we use for those backends, I would recommend calling it 'impl/vsip/cvsip/cvsip.hpp' > =================================================================== > --- ref_impl/vsipl/cvsipl_support.hpp (revision 0) > +++ ref_impl/vsipl/cvsipl_support.hpp (revision 0) [11] All files should have the library header. > @@ -0,0 +1,195 @@ > +#ifndef CVSIPL_SUPPORT_HPP > +#define CVSIPL_SUPPORT_HPP [12] The guard should include the path VSIP_IMPL_CVSIP_CVSIP_SUPPORT_HPP > + > +extern "C" { > +#include > +} > +#include > + > +namespace vsip > +{ > + > +namespace ref_impl > +{ [13] The implementation namespace should always be 'impl', regardless of whether the code is shared or optimization only. > + > +namespace cvsipl > +{ > + [14] Let's add a comment to describe what the class is doing: // Traits class to define the C-VSIPL view type for a given // value type T. > + template > + struct CVSIPL_mview; [15] To follow our class name convention, this should be: 'Cvsip_mview'. > + > + template<> struct CVSIPL_mview { typedef vsip_mview_f type; }; > + template<> struct CVSIPL_mview { typedef vsip_mview_d type; }; > + template<> struct CVSIPL_mview > > + { typedef vsip_cmview_f type; }; > + template<> struct CVSIPL_mview > > + { typedef vsip_cmview_d type; }; > + [16] Add comment to this trait too > + template > + struct CVSIPL_block; > + > + template<> struct CVSIPL_block { typedef vsip_block_f type; }; > + template<> struct CVSIPL_block { typedef vsip_block_d type; }; > + template<> struct CVSIPL_block > > + { typedef vsip_cblock_f type; }; > + template<> struct CVSIPL_block > > + { typedef vsip_cblock_d type; }; > + > + > + template > + struct CVSIPL_Lud_object; > + > + template <> struct CVSIPL_Lud_object { typedef vsip_lu_f type; }; > + template <> struct CVSIPL_Lud_object { typedef vsip_lu_d type; }; > + template <> struct CVSIPL_Lud_object > > + { typedef vsip_clu_f type; }; > + template <> struct CVSIPL_Lud_object > > + { typedef vsip_clu_d type; }; [17] First, the 'Cvsip_mview', 'Cvsip_block', and 'Cvsip_lud_object' classes above all look good. However, they represent one approach to creating traits: one trait per class. Another approach is multiple traits per class. Here such a class might look like: template struct Cvsip_traits; template <> struct Cvsip_traits { typedef vsip_mview_f mview_type; typedef vsip_block_f block_type; typedef vsip_lu_f lu_solver_type; ... }; The general tradeoffs are: - One trait per class gives you finer grain control, while multiple traits per class forces you to define all traits even if only one trait is unique. - One trait per class is more verbose to define. In this particular usage, the first tradeoff doesn't by the one-trait -per-class approach much because all the traits need to be uniquely defined for each value type (i.e. C-VSIPL doesn't share the same types between float and double data structures). The approach you've taken is fine, but since there will be more traits to add, I would consider changing over to a multiple-traits per class approach. > + > + > +#define CVSIPL_BLOCKBIND(BT, T, ST, VF) \ > +inline BT *vsip_blockbind(T *data, vsip_length N, vsip_memory_hint hint) \ > +{ \ > + return VF((ST*)data, N, hint); \ > +} [18] I would remove the 'vsip_' prefix for these function names. You don't have to worry about name conflicts since the functions are already part of the vsip::impl::cvsip namespace. It just makes using them more verbose than necessary. If you want to maintain verbosity, you can use the 'vsip::impl' namespace but not the 'vsip::impl::csvip' namespace. Then refer to them as cvsip::blockbind(...) [19] Macro names in the library need to start with VSIP_IMPL_ to avoid conflicts with user code. I.e. this should be VSIP_IMPL_..._BLOCKBIND ... > + > +CVSIPL_LUSOL(vsip_lu_f, vsip_mview_f, vsip_lusol_f) > +CVSIPL_LUSOL(vsip_lu_d, vsip_mview_d, vsip_lusol_d) > +CVSIPL_LUSOL(vsip_clu_f, vsip_cmview_f, vsip_clusol_f) > +CVSIPL_LUSOL(vsip_clu_d, vsip_cmview_d, vsip_clusol_d) [20] If you're done with these macros, it is a good idea to undefine them. #undef VISP_IMPL_...LUSOL etc. > + > +} // namespace cvsipl > + > +} // namespace ref_impl > + > +} // namespace vsip > + > +#endif // CVSIPL_SUPPORT_HPP > Index: ref_impl/vsipl/cvsipl_lu.hpp > =================================================================== > --- ref_impl/vsipl/cvsipl_lu.hpp (revision 0) > +++ ref_impl/vsipl/cvsipl_lu.hpp (revision 0) > @@ -0,0 +1,72 @@ > +#ifndef CVSIPL_LU_HPP > +#define CVSIPL_LU_HPP > + > +#include > +#include > + > +namespace vsip > +{ > + > +namespace ref_impl > +{ > + > +namespace cvsipl > +{ > + > +template > +class CVSIPL_Lud; > + > +template > +class CVSIPL_Lud [21] This should be 'Non_copyable'. If a copy was made, vsip_lud_destroy(lu_) would get called twice. > +{ > + typedef typename CVSIPL_Lud_object::type lud_object_type; > + > + public: > + CVSIPL_Lud(int n); > + ~CVSIPL_Lud(); > + > + int decompose(CVSIPL_Matrix &a); > + int solve(vsip_mat_op op, CVSIPL_Matrix &xb); > + > + private: > + lud_object_type *lu_; > +}; > + > +template > +CVSIPL_Lud::CVSIPL_Lud(int n) > +{ > + vsip_lud_create(n, &lu_); > +} > + > +template > +CVSIPL_Lud::~CVSIPL_Lud() > +{ > + vsip_lud_destroy(lu_); > +} > + > +template > +int CVSIPL_Lud::decompose(CVSIPL_Matrix &a) > +{ > + a.admit(); [22] Here's a case where you want to admit with update true: This should be: a.admit(true); (Assuming you add an update flag to Cvsip_matrix, as suggested below). > + int ret = vsip_lud(lu_, a.get_view()); > + a.release(); [23] If vsip_lud did not modify 'a', this would also be a case where the update flag should also be true. Since you don't know what the user will do with 'a' next, it would be bad form to scramble the values. But, vsip_lud is allowed to modify 'a' and then later uses those values while solving. This make me doubt whether it is correct to immediately release 'a' at this point. Can you check the C-VSIPL spec on this? > + return ret; > +} > + > +template > +int CVSIPL_Lud::solve(vsip_mat_op op, CVSIPL_Matrix &xb) > +{ [24] Here update should be true for admit and release. > + xb.admit(); > + int ret = vsip_lusol(lu_, op, xb.get_view()); > + xb.release(); > + return ret; > +} > + > + > +} // namespace cvsipl > + > +} // namespace ref_impl > + > +} // namespace vsip > + > +#endif // CVSIPL_LU_HPP > Index: ref_impl/vsipl/cvsipl_matrix.hpp > =================================================================== > --- ref_impl/vsipl/cvsipl_matrix.hpp (revision 0) > +++ ref_impl/vsipl/cvsipl_matrix.hpp (revision 0) > @@ -0,0 +1,81 @@ > +#ifndef CVSIPL_MATRIX_HPP > +#define CVSIPL_MATRIX_HPP > + > +#include > + > +namespace vsip > +{ > + > +namespace ref_impl > +{ > + > +namespace cvsipl > +{ > + > +template > +class CVSIPL_Matrix; > + > +template > +class CVSIPL_Matrix [25] Should be Non_copyable. > +{ > + typedef typename CVSIPL_mview::type mview_type; > + typedef typename CVSIPL_block::type block_type; > + > + public: > + CVSIPL_Matrix(T *block, int m, int n); > + CVSIPL_Matrix(int m, int n); > + ~CVSIPL_Matrix(); > + > + mview_type *get_view() { return mview_; } > + void admit() { vsip_blockadmit(mblock_, false); } > + void release() { vsip_blockrelease(mblock_,false); } [26] Always setting the update flags to false is most definitely wrong. If you don't care about what values you pass to C-VSIPL, and you don't care about what values you get back, why bother with the computation? Always setting the update flags to true would be correct, but it would cause unnecessary data copies in some situations. You should pass update as an argument, with a default value of true. > + > + private: > + mview_type *mview_; > + block_type *mblock_; > + bool local_data_; > + > + > +}; > + > + > +template > +CVSIPL_Matrix::CVSIPL_Matrix(T *block, int m, int n) > +{ > + // block is allocated, just bind to it. > + mblock_ = vsip_blockbind(block, m*n, VSIP_MEM_NONE); > + > + // block must be dense > + mview_ = vsip_mbind(mblock_, 0, 1, n, n, m); > + > + local_data_ = false; > +} > + > +template > +CVSIPL_Matrix::CVSIPL_Matrix(int m, int n) > +{ [27] How/where is dimension-ordering handled? The VSIPL++ LU solver object creates a Cvsip_matrix for column-major VSIPL++ matrices. Is Cvsip_matrix implicitly column-major? It would be better to pass dimensionality to Cvsip_matrix explicitly, probably as a template parameter. > + // create block > + vsip_blockcreate(m*n, VSIP_MEM_NONE, &mblock_); > + > + // block must be dense > + mview_ = vsip_mbind(mblock_, 0, 1, n, n, m); > + > + local_data_ = true; > +} > + > +template > +CVSIPL_Matrix::~CVSIPL_Matrix() > +{ > + // destroy everything! > + if(local_data_) vsip_blockdestroy(mblock_); > + > + vsip_mdestroy(mview_); > +} > + > +} // namespace cvsipl > + > +} // namespace ref_impl > + > +} // namespace vsip > + > +#endif // CVSIPL_MATRIX_HPP > Index: impl/solver-lu.hpp > =================================================================== > --- impl/solver-lu.hpp (revision 151073) > +++ impl/solver-lu.hpp (working copy) > @@ -28,6 +28,9 @@ > #ifdef VSIP_IMPL_HAVE_LAPACK > # include > #endif [28] We need to distinguish between the presence of C-VSIPL backends and building the library in reference mode. It's possible to use the C-VSIPL backend with the optimized library. [29] This guard should be: #ifdef VSIP_IMPL_HAVE_CVSIP > +#ifdef VSIP_IMPL_HAVE_REF > +# include > +#endif > > > > @@ -62,6 +65,10 @@ > template > struct Choose_lud_impl > { [30] This guard should be: #ifdef VSIP_IMPL_IS_REF_IMPL > +#ifdef VSIP_IMPL_HAVE_REF > + typedef Ref_impl_tag use_type; > + > +#else > typedef typename Choose_solver_impl< > Is_lud_impl_avail, > T, > @@ -71,6 +78,8 @@ > Type_equal::value, > As_type, > As_type >::type use_type; > +#endif > + > }; > > } // namespace impl -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Mon Oct 9 20:15:15 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 09 Oct 2006 16:15:15 -0400 Subject: [vsipl++] Lu Solver In-Reply-To: <452AA855.3040801@codesourcery.com> References: <452A6A47.60003@codesourcery.com> <452AA855.3040801@codesourcery.com> Message-ID: <452AADD3.3040108@codesourcery.com> Jules Bergmann wrote: > Second, I would like to use the subdirectory name 'cvsip' instead of > 'vsipl' to avoid confusion between VSIPL (the C API) and VSIPL++. > Also, we should use the name 'cvsip' instead of 'cvsipl' for > consistency. We use the directory and namespace names 'vsip'. > C-VSIPL uses 'vsip_' as a prefix, etc. If we use the name 'cvsipl' it > will be a source of confusion. Please make sure all your uses in code > of the name vsip/csvip (i.e. especially in namespaces, class names, > and function names, but also in variable names, etc) avoid the 'l'. > Using the names "VSIPL" "C-VSIPL", etc is OK in comments. So, just to be totally clear: the new C-VSIPL bindings will be contained in the directory src/vsip/core/cvsip/, and the associated namespace will be vsip::impl::cvsip, right ? Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Mon Oct 9 20:21:57 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 09 Oct 2006 16:21:57 -0400 Subject: [vsipl++] Lu Solver In-Reply-To: <452AADD3.3040108@codesourcery.com> References: <452A6A47.60003@codesourcery.com> <452AA855.3040801@codesourcery.com> <452AADD3.3040108@codesourcery.com> Message-ID: <452AAF65.8010308@codesourcery.com> > So, just to be totally clear: the new C-VSIPL bindings will be contained > in the directory src/vsip/core/cvsip/, and the associated namespace will > be vsip::impl::cvsip, right ? Yes, that's right. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Wed Oct 11 01:56:04 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 10 Oct 2006 21:56:04 -0400 Subject: [patch] Fix for matrix-matrix subviews Message-ID: <452C4F34.1030505@codesourcery.com> This patch fixes several problems with distributed matrix-matrix subviews: - the get_local_block() overload for Subblock was not handling the case where the local processor has no subblock. - the Replicated_map and Global_map maps were missing several impl_ functions necessary to translate domains from global to local indices. Distributed matrix-matrix subviews get limited use because they are restricted to cases where the matrix dimensions are not distributed. This patch also includes a test case. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mmsv.diff URL: From jules at codesourcery.com Wed Oct 11 02:33:58 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 10 Oct 2006 22:33:58 -0400 Subject: [patch] Fix mcoe-setup.sh to work with solaris /bin/sh Message-ID: <452C5816.2010407@codesourcery.com> Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ex.diff URL: From jules at codesourcery.com Thu Oct 12 15:50:30 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 12 Oct 2006 11:50:30 -0400 Subject: [patch] Fix tests to use length_type for a number of processors Message-ID: <452E6446.1000402@codesourcery.com> Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fix-test.diff URL: From jules at codesourcery.com Fri Oct 13 12:19:33 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 13 Oct 2006 08:19:33 -0400 Subject: [Patch] fix for Re: [vsipl++-csl] Questions falling out from 'Scalable SAR' application In-Reply-To: <452F827C.2050609@codesourcery.com> References: <452F301A.9020901@codesourcery.com> <452F827C.2050609@codesourcery.com> Message-ID: <452F8455.2040407@codesourcery.com> This patch move the vmmul evaluator off of the Loop_fusion_tag to a new Op_expr_tag. This way, if the vmmul evaluator cannot handle a vmmul expression, loop fusion will be a backstop. In the future, other special operations similar to vmmul could be placed on this tag. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vmmul.diff URL: From assem at codesourcery.com Mon Oct 16 12:24:56 2006 From: assem at codesourcery.com (Assem Salama) Date: Mon, 16 Oct 2006 08:24:56 -0400 Subject: New file reordering Message-ID: <45337A18.3010702@codesourcery.com> I noticed that the GNUmakefiles still have impl stuff and no core and opt stuff. Did anyone test a make install? Thanks, Assem From jules at codesourcery.com Mon Oct 16 12:42:41 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 16 Oct 2006 08:42:41 -0400 Subject: [patch] More include updates Message-ID: <45337E41.8010508@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: simd.diff URL: From assem at codesourcery.com Tue Oct 17 17:49:07 2006 From: assem at codesourcery.com (Assem Salama) Date: Tue, 17 Oct 2006 13:49:07 -0400 Subject: LU Message-ID: <45351793.3050506@codesourcery.com> Everyone, This is the patch that adds support for CVSIP Lu backend. I'm still having some trouble with LU test but will fix that shortly. The basic CVSIP stuff should be ok. Thanks, Assem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn.diff.10172006.1.log URL: From jules at codesourcery.com Tue Oct 17 18:58:40 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 17 Oct 2006 14:58:40 -0400 Subject: [vsipl++] LU In-Reply-To: <45351793.3050506@codesourcery.com> References: <45351793.3050506@codesourcery.com> Message-ID: <453527E0.2020303@codesourcery.com> Assem, This is looking good. For priority, can you first address the items in core/vsip/vsip.hpp (5-9)? Once those are done, you can check in core/cvsip/cvsip.hpp. That way Stefan can merge in the bindings he needs for FFT. Next, can you address the items in core/cvsip/cvsip_matrix.hpp (11-15), and then check that file in? Let me know if the comments make sense, thanks, -- Jules Assem Salama wrote: > Everyone, > This is the patch that adds support for CVSIP Lu backend. I'm still > having some trouble with LU test but will fix that shortly. The basic > CVSIP stuff should be ok. > > Thanks, > Assem > > > ------------------------------------------------------------------------ > > Index: cvsip/solver_lu.hpp > =================================================================== > --- cvsip/solver_lu.hpp (revision 0) > +++ cvsip/solver_lu.hpp (revision 0) > @@ -0,0 +1,232 @@ > +/* Copyright (c) 2005, 2006 by CodeSourcery, LLC. All rights reserved. */ > + > +/** @file vsip/impl/lapack/solver_lu.hpp > + @author Assem Salama > + @date 2006-04-13 > + @brief VSIPL++ Library: LU linear system solver using lapack. > + > +*/ > + > +#ifndef VSIP_REF_IMPL_SOLVER_LU_HPP > +#define VSIP_REF_IMPL_SOLVER_LU_HPP [1*] fix header and guard names > + > +/*********************************************************************** > + Included Files > +***********************************************************************/ > + > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > + > + > +/*********************************************************************** > + Declarations > +***********************************************************************/ > + > +namespace vsip > +{ > + > +namespace impl > +{ > + > +/// LU factorization implementation class. Common functionality > +/// for lud by-value and by-reference classes. > + > +template > +class Lud_impl > + : Compile_time_assert::valid> > +{ > + typedef Layout<2, col2_type, Stride_unit_dense, Cmplx_inter_fmt> data_LP; > + typedef Fast_block<2, T, data_LP> data_block_type; [2] For now: change layout to row2_type. > + > + // Constructors, copies, assignments, and destructors. > +public: > + Lud_impl(length_type) > + VSIP_THROW((std::bad_alloc)); > + Lud_impl(Lud_impl const&) > + VSIP_THROW((std::bad_alloc)); > + > + Lud_impl& operator=(Lud_impl const&) VSIP_NOTHROW; > + ~Lud_impl() VSIP_NOTHROW; > + > + // Accessors. > +public: > + length_type length()const VSIP_NOTHROW { return length_; } > + > + // Solve systems. > +public: > + template > + bool decompose(Matrix) VSIP_NOTHROW; > + > +protected: > + template + typename Block0, > + typename Block1> > + bool impl_solve(const_Matrix, Matrix) > + VSIP_NOTHROW; > + > + // Member data. > +private: > + typedef std::vector > vector_type; > + > + length_type length_; // Order of A. > + vector_type ipiv_; // Additional info on Q [3] don't need ipiv_ for C-VSIPL > + > + Matrix data_; // Factorized Cholesky matrix (A) > + cvsip::Cvsip_matrix cvsip_data_; > + cvsip::Cvsip_lud cvsip_lud_; > +}; > + > +} // namespace vsip::impl > + > + > +/*********************************************************************** > + Definitions > +***********************************************************************/ > + > +namespace impl > +{ > + > +template > +Lud_impl::Lud_impl( > + length_type length > + ) > +VSIP_THROW((std::bad_alloc)) > + : length_ (length), > + ipiv_ (length_), > + data_ (length_, length_), > + cvsip_data_ (data_.block().impl_data(), length_, length_), > + cvsip_lud_ (length_) > +{ > + assert(length_ > 0); > +} > + > + > + > +template > +Lud_impl::Lud_impl(Lud_impl const& lu) > +VSIP_THROW((std::bad_alloc)) > + : length_ (lu.length_), > + ipiv_ (length_), > + data_ (length_, length_), > + cvsip_data_ (data_.block().impl_data(), length_, length_), > + cvsip_lud_ (length_) > +{ > + data_ = lu.data_; > + for (index_type i=0; i + ipiv_[i] = lu.ipiv_[i]; > +} > + > + > + > +template > +Lud_impl::~Lud_impl() > + VSIP_NOTHROW > +{ > +} > + > + > + > +/// Form LU factorization of matrix A > +/// > +/// Requires > +/// A to be a square matrix, either > +/// > +/// FLOPS: > +/// real : UPDATE > +/// complex: UPDATE > + > +template > +template > +bool > +Lud_impl::decompose(Matrix m) > + VSIP_NOTHROW > +{ > + assert(m.size(0) == length_ && m.size(1) == length_); [4] See [10] below. Basically, we need to manage admit/release here, not inside the cvsip_lud_ object. before the assignment, release the cvsip_data_ matrix so we can overwrite it (update is false, because we don't care about the values, we're going to overwrite them): cvsip_data_.release(false); > + > + assign_local(data_, m); Admit the data before decomposing it: cvsip_data_.admit(true); > + > + bool success = cvsip_lud_.decompose(cvsip_data_); > + > + > + return success; > +} > + > + > + > +/// Solve Op(A) x = b (where A previously given to decompose) > +/// > +/// Op(A) is > +/// A if tr == mat_ntrans > +/// A^T if tr == mat_trans > +/// A' if tr == mat_herm (valid for T complex only) > +/// > +/// Requires > +/// B to be a (length, P) matrix > +/// X to be a (length, P) matrix > +/// > +/// Effects: > +/// X contains solution to Op(A) X = B > + > +template > +template + typename Block0, > + typename Block1> > +bool > +Lud_impl::impl_solve( > + const_Matrix b, > + Matrix x) > + VSIP_NOTHROW > +{ > + typedef typename Block_layout::order_type order_type; > + typedef typename Block_layout::complex_type complex_type; > + typedef Layout<2, order_type, Stride_unit_dense, complex_type> data_LP; > + typedef Fast_block<2, T, data_LP, Local_map> block_type; > + > + assert(b.size(0) == length_); > + assert(b.size(0) == x.size(0) && b.size(1) == x.size(1)); > + > + vsip_mat_op trans; > + > + Matrix b_int(b.size(0), b.size(1)); > + assign_local(b_int, b); > + > + if (tr == mat_ntrans) > + trans = VSIP_MAT_NTRANS; > + else if (tr == mat_trans) > + trans = VSIP_MAT_TRANS; > + else if (tr == mat_herm) > + { > + assert(Is_complex::value); > + trans = VSIP_MAT_HERM; > + } > + > + { > + Ext_data b_ext(b_int.block()); > + > + cvsip::Cvsip_matrix > + cvsip_b_int(b_ext.data(),b_ext.size(0),b_ext.size(1), > + b_ext.stride(0),b_ext.stride(1)); > + > + cvsip_lud_.solve(trans,cvsip_b_int); > + > + } > + assign_local(x, b_int); > + > + return true; > +} > + > +} // namespace vsip::impl > + > +} // namespace vsip > + > + > +#endif // VSIP_IMPL_LAPACK_SOLVER_LU_HPP > Index: cvsip/cvsip.hpp > =================================================================== > --- cvsip/cvsip.hpp (revision 0) > +++ cvsip/cvsip.hpp (revision 0) > @@ -0,0 +1,208 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file vsip/core/cvsip/cvsip.hpp > + @author Assem Salama > + @date 2006-10-12 > + @brief VSIPL++ Library: CVSIP support wrappers. > + > +*/ > + > +#ifndef VSIP_CORE_CVSIP_CVSIPL_HPP > +#define VSIP_CORE_CVSIP_CVSIPL_HPP [5] s/CVSIPL/CVSIP/ > + > +extern "C" { > +#include > +} > +#include > + > +namespace vsip > +{ > + > +namespace impl > +{ > + > +namespace cvsip > +{ > + > + template > + struct Cvsip_traits; [6] Add the following body for the general case: { static bool const valid = false; }; This way checks for Cvsip_traits::valid will compile even if the type is not supported. > + [7] I asked stefan to define VSIP_IMPL_CVSIP_HAVE_FLOAT and ..._HAVE_DOUBLE in configure.ac, dpeneding on whether the C-VSIP library supports float and double (and correspondingly complex and complex). Let's use those to guard these traits: #if VSIP_IMPL_CVSIP_HAVE_FLOAT > + template<> struct Cvsip_traits > + { > + typedef vsip_mview_f mview_type; > + typedef vsip_block_f block_type; > + typedef vsip_lu_f lud_object_type; > + static bool const valid = true; > + }; > + #endif #if VSIP_IMPL_CVSIP_HAVE_DOUBLE > + template<> struct Cvsip_traits > + { > + typedef vsip_mview_d mview_type; > + typedef vsip_block_d block_type; > + typedef vsip_lu_d lud_object_type; > + static bool const valid = true; > + }; #endif #if VSIP_IMPL_CVSIP_HAVE_FLOAT > + > + template<> struct Cvsip_traits > > + { > + typedef vsip_cmview_f mview_type; > + typedef vsip_cblock_f block_type; > + typedef vsip_clu_f lud_object_type; > + static bool const valid = true; > + }; #endif #if VSIP_IMPL_CVSIP_HAVE_DOUBLE > + > + template<> struct Cvsip_traits > > + { > + typedef vsip_cmview_d mview_type; > + typedef vsip_cblock_d block_type; > + typedef vsip_clu_d lud_object_type; > + static bool const valid = true; > + }; #endif > + > + [8*] change macro names from CVSIPL_ to VSIP_IMPL_CVSIP_ > +#define CVSIPL_BLOCKBIND(BT, T, ST, VF) \ > +inline BT *blockbind(T *data, vsip_length N, vsip_memory_hint hint) \ > +{ \ > + return VF((ST*)data, N, hint); \ > +} > + > +#define CVSIPL_CBLOCKBIND(BT, T, ST, VF) \ > +inline BT *blockbind(complex *data, \ > + vsip_length N, vsip_memory_hint hint) \ > +{ \ > + return VF((ST*)data, NULL, N, hint); \ > +} > + > +#define CVSIPL_MBIND(VT, BT, VF) \ > +inline VT *mbind(const BT *b, vsip_offset o, \ > + vsip_stride cs, vsip_length cl, vsip_stride rs, vsip_length rl) \ > +{ \ > + return VF(b, o, cs, cl, rs, rl); \ > +} > + > +#define CVSIPL_BLOCKCREATE(BT, VF) \ > +inline void blockcreate(vsip_length N, vsip_memory_hint hint, BT **block) \ > +{ \ > + *block = VF(N,hint); \ > +} > + > +#define CVSIPL_BLOCKDESTROY(BT, VF) \ > +inline void blockdestroy(BT *block) \ > +{ \ > + VF(block); \ > +} > + > +#define CVSIPL_BLOCKADMIT(BT, VF) \ > +inline void blockadmit(BT *block, vsip_scalar_bl flag) \ > +{ \ > + VF(block,flag); \ > +} > + > +#define CVSIPL_BLOCKRELEASE(BT, VF) \ > +inline void blockrelease(BT *block, vsip_scalar_bl flag) \ > +{ \ > + VF(block,flag); \ > +} > + > +#define CVSIPL_CBLOCKRELEASE(BT, VF, ST) \ > +inline void blockrelease(BT *block, vsip_scalar_bl flag) \ > +{ \ > + ST *a1,*a2; \ > + VF(block,flag,&a1,&a2); \ > +} > + > +#define CVSIPL_MDESTROY(VT, VF) \ > +inline void mdestroy(VT *view) \ > +{ \ > + VF(view); \ > +} > + > +#define CVSIPL_LUD_CREATE(LT, VF) \ > +inline void lud_create(vsip_length N, LT **lu_obj) \ > +{ \ > + *lu_obj = VF(N); \ > +} > + > +#define CVSIPL_LUD_DESTROY(LT, VF) \ > +inline void lud_destroy(LT *lu_obj) \ > +{ \ > + VF(lu_obj); \ > +} > + > +#define CVSIPL_LUD(LT, VT, VF) \ > +inline int lud(LT *lu_obj, VT *view) \ > +{ \ > + return VF(lu_obj, view); \ > +} > + > +#define CVSIPL_LUSOL(LT, VT, VF) \ > +inline int lusol(LT *lu_obj, vsip_mat_op op, VT *view) \ > +{ \ > + return VF(lu_obj, op, view); \ > +} > +/****************************************************************************** > + * Function declarations > +******************************************************************************/ [9] Similar to the traits above, let's also guard these with VSIP_IMPL_CVSIP_HAVE_FLOAT and ..._HAVE_DOUBLE: > + > +CVSIPL_BLOCKBIND(vsip_block_f, float, vsip_scalar_f, vsip_blockbind_f) > +CVSIPL_BLOCKBIND(vsip_block_d, double, vsip_scalar_d, vsip_blockbind_d) > +CVSIPL_CBLOCKBIND(vsip_cblock_f, float, vsip_scalar_f,vsip_cblockbind_f) > +CVSIPL_CBLOCKBIND(vsip_cblock_d, double, vsip_scalar_d,vsip_cblockbind_d) > + > +CVSIPL_MBIND(vsip_mview_f, vsip_block_f, vsip_mbind_f) > +CVSIPL_MBIND(vsip_mview_d, vsip_block_d, vsip_mbind_d) > +CVSIPL_MBIND(vsip_cmview_f, vsip_cblock_f, vsip_cmbind_f) > +CVSIPL_MBIND(vsip_cmview_d, vsip_cblock_d, vsip_cmbind_d) > + > +CVSIPL_BLOCKCREATE(vsip_block_f, vsip_blockcreate_f) > +CVSIPL_BLOCKCREATE(vsip_block_d, vsip_blockcreate_d) > +CVSIPL_BLOCKCREATE(vsip_cblock_f, vsip_cblockcreate_f) > +CVSIPL_BLOCKCREATE(vsip_cblock_d, vsip_cblockcreate_d) > + > +CVSIPL_BLOCKDESTROY(vsip_block_f, vsip_blockdestroy_f) > +CVSIPL_BLOCKDESTROY(vsip_block_d, vsip_blockdestroy_d) > +CVSIPL_BLOCKDESTROY(vsip_cblock_f, vsip_cblockdestroy_f) > +CVSIPL_BLOCKDESTROY(vsip_cblock_d, vsip_cblockdestroy_d) > + > +CVSIPL_BLOCKADMIT(vsip_block_f, vsip_blockadmit_f) > +CVSIPL_BLOCKADMIT(vsip_block_d, vsip_blockadmit_d) > +CVSIPL_BLOCKADMIT(vsip_cblock_f, vsip_cblockadmit_f) > +CVSIPL_BLOCKADMIT(vsip_cblock_d, vsip_cblockadmit_d) > + > +CVSIPL_BLOCKRELEASE(vsip_block_f, vsip_blockrelease_f) > +CVSIPL_BLOCKRELEASE(vsip_block_d, vsip_blockrelease_d) > +CVSIPL_CBLOCKRELEASE(vsip_cblock_f, vsip_cblockrelease_f,vsip_scalar_f) > +CVSIPL_CBLOCKRELEASE(vsip_cblock_d, vsip_cblockrelease_d,vsip_scalar_d) > + > +CVSIPL_MDESTROY(vsip_mview_f, vsip_mdestroy_f) > +CVSIPL_MDESTROY(vsip_mview_d, vsip_mdestroy_d) > +CVSIPL_MDESTROY(vsip_cmview_f, vsip_cmdestroy_f) > +CVSIPL_MDESTROY(vsip_cmview_d, vsip_cmdestroy_d) > + > +CVSIPL_LUD_CREATE(vsip_lu_f, vsip_lud_create_f) > +CVSIPL_LUD_CREATE(vsip_lu_d, vsip_lud_create_d) > +CVSIPL_LUD_CREATE(vsip_clu_f, vsip_clud_create_f) > +CVSIPL_LUD_CREATE(vsip_clu_d, vsip_clud_create_d) > + > +CVSIPL_LUD_DESTROY(vsip_lu_f, vsip_lud_destroy_f) > +CVSIPL_LUD_DESTROY(vsip_lu_d, vsip_lud_destroy_d) > +CVSIPL_LUD_DESTROY(vsip_clu_f, vsip_clud_destroy_f) > +CVSIPL_LUD_DESTROY(vsip_clu_d, vsip_clud_destroy_d) > + > +CVSIPL_LUD(vsip_lu_f, vsip_mview_f, vsip_lud_f) > +CVSIPL_LUD(vsip_lu_d, vsip_mview_d, vsip_lud_d) > +CVSIPL_LUD(vsip_clu_f, vsip_cmview_f, vsip_clud_f) > +CVSIPL_LUD(vsip_clu_d, vsip_cmview_d, vsip_clud_d) > + > +CVSIPL_LUSOL(vsip_lu_f, vsip_mview_f, vsip_lusol_f) > +CVSIPL_LUSOL(vsip_lu_d, vsip_mview_d, vsip_lusol_d) > +CVSIPL_LUSOL(vsip_clu_f, vsip_cmview_f, vsip_clusol_f) > +CVSIPL_LUSOL(vsip_clu_d, vsip_cmview_d, vsip_clusol_d) > + > +} // namespace cvsip > + > +} // namespace impl > + > +} // namespace vsip > + > +#endif // VSIP_CORE_CVSIP_CVSIPL_HPP > Index: cvsip/cvsip_lu.hpp > =================================================================== > --- cvsip/cvsip_lu.hpp (revision 0) > +++ cvsip/cvsip_lu.hpp (revision 0) > @@ -0,0 +1,81 @@ > +/* Copyright (c) 2005, 2006 by CodeSourcery, LLC. All rights reserved. */ > + > +/** @file vsip/core/cvsip/cvsip_lu.hpp > + @author Assem Salama > + @date 2006-10-12 > + @brief VSIPL++ Library: CVSIP wrapper for LU object > + > +*/ > + > +#ifndef VSIP_CORE_CVSIP_CVSIP_LU_HPP > +#define VSIP_CORE_CVSIP_CVSIP_LU_HPP > + > +#include > +#include > + > +namespace vsip > +{ > + > +namespace impl > +{ > + > +namespace cvsip > +{ > + > +template > +class Cvsip_lud; > + > +template > +class Cvsip_lud : Non_copyable > +{ > + typedef typename Cvsip_traits::lud_object_type lud_object_type; > + > + public: > + Cvsip_lud(int n); > + ~Cvsip_lud(); > + > + int decompose(Cvsip_matrix &a); > + int solve(vsip_mat_op op, Cvsip_matrix &xb); > + > + private: > + lud_object_type *lu_; > +}; > + > +template > +Cvsip_lud::Cvsip_lud(int n) > +{ > + lud_create(n, &lu_); > +} > + > +template > +Cvsip_lud::~Cvsip_lud() > +{ > + lud_destroy(lu_); > +} > + > +template > +int Cvsip_lud::decompose(Cvsip_matrix &a) > +{ > + a.admit(true); > + int ret = lud(lu_, a.get_view()); > + a.release(true); [10] According to the C-VSIPL spec, the decomposition is allowed to overwrite 'a'. We can't modify 'a' "as long as the factorization is required". This includes releasing it. To handle this, let's move the admit/release out of Cvsip_lud::decompose and up into Lud_impl::decompose (see [4]). > + return ret; > +} > + > +template > +int Cvsip_lud::solve(vsip_mat_op op, Cvsip_matrix &xb) > +{ > + xb.admit(true); > + int ret = lusol(lu_, op, xb.get_view()); > + xb.release(true); > + return ret; > +} > + > + > +} // namespace cvsip > + > +} // namespace impl > + > +} // namespace vsip > + > +#endif // VSIP_CORE_CVSIP_CVSIP_LU_HPP > Index: cvsip/cvsip_matrix.hpp > =================================================================== > --- cvsip/cvsip_matrix.hpp (revision 0) > +++ cvsip/cvsip_matrix.hpp (revision 0) > @@ -0,0 +1,115 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file vsip/core/cvsip/cvsip_matrix.hpp > + @author Assem Salama > + @date 2006-10-12 > + @brief VSIPL++ Library: CVSIP wrapper for Matrix views. > + > +*/ > + > +#ifndef VSIP_CORE_CVSIP_CVSIP_MATRIX_HPP > +#define VSIP_CORE_CVSIP_CVSIP_MATRIX_HPP > + > +#include > + > +namespace vsip > +{ > + > +namespace impl > +{ > + > +namespace cvsip > +{ > + > +template > +class Cvsip_matrix; > + > +template > +class Cvsip_matrix : Non_copyable > +{ > + typedef typename Cvsip_traits::mview_type mview_type; > + typedef typename Cvsip_traits::block_type block_type; > + > + public: > + Cvsip_matrix(T *block, int m, int n, int s1, int s2); > + Cvsip_matrix(int m, int n, int s1, int s2); > + Cvsip_matrix(T *block, int m, int n); > + Cvsip_matrix(int m, int n); > + ~Cvsip_matrix(); > + > + mview_type *get_view() { return mview_; } > + void admit(bool flag) { blockadmit(mblock_, flag); } > + void release(bool flag) { blockrelease(mblock_, flag); } > + > + private: > + mview_type *mview_; > + block_type *mblock_; [11] Our coding standard prefers 'mview_type*' to 'mview_type *'. > + bool local_data_; > + > + > +}; > + > +template > +Cvsip_matrix::Cvsip_matrix(T *block, int m, int n, int s1, int s2) > +{ This interface is OK. You call it with values from a Ext_data object. > + // block is allocated, just bind to it. > + mblock_ = blockbind(block, m*n, VSIP_MEM_NONE); [12] Unfortunately, size != m*n if the block is not dense. I think the right size should be (n-1)*s1 + (m-1)*s2 + 1 > + > + // block must be dense > + mview_ = mbind(mblock_, 0, s1, n, s2, m); > + > + local_data_ = false; > +} > + > +template > +Cvsip_matrix::Cvsip_matrix(int m, int n, int s1, int s2) [13] Does this interface get used? I don't think it is a good one because it requires the user to specify the strides. There are only two correct values (1, n) and (m, 1), but many wrong ones. > +{ > + // create block > + blockcreate(m*n, VSIP_MEM_NONE, &mblock_); > + > + // block must be dense > + mview_ = mbind(mblock_, 0, s1, n, s2, m); > + > + local_data_ = true; > +} It would be better to do something like this: template template Cvsip_matrix::Cvsip_matrix(T *block, int m, int n, OrderT const& = row2_type()) { // create block blockcreate(m*n, VSIP_MEM_NONE, &mblock_); // block must be dense if (Type_equal::value) mview_ = mbind(mblock_, 0, m, n, 1, m); else mview_ = mbind(mblock_, 0, 1, n, n, m); local_data_ = true; } > + > +template > +Cvsip_matrix::Cvsip_matrix(T *block, int m, int n) [14] This could also take an OrderT template parameter > +{ > + // block is allocated, just bind to it. > + mblock_ = blockbind(block, m*n, VSIP_MEM_NONE); > + > + // block must be dense > + mview_ = mbind(mblock_, 0, 1, n, n, m); > + > + local_data_ = false; > +} > + > +template > +Cvsip_matrix::Cvsip_matrix(int m, int n) [15] this would go away > +{ > + // create block > + blockcreate(m*n, VSIP_MEM_NONE, &mblock_); > + > + // block must be dense > + mview_ = mbind(mblock_, 0, 1, n, n, m); > + > + local_data_ = true; > +} > + > +template > +Cvsip_matrix::~Cvsip_matrix() > +{ > + // destroy everything! > + if(local_data_) blockdestroy(mblock_); > + > + mdestroy(mview_); > +} > + > +} // namespace cvsip > + > +} // namespace impl > + > +} // namespace vsip > + > +#endif // VSIP_CORE_CVSIP_CVSIP_MATRIX_HPP > Index: solver/lu.hpp > =================================================================== > --- solver/lu.hpp (revision 151692) > +++ solver/lu.hpp (working copy) > @@ -28,6 +28,12 @@ > #ifdef VSIP_IMPL_HAVE_LAPACK > # include > #endif > +#ifdef VSIP_IMPL_HAVE_LAPACK > +# include > +#endif > +#ifdef VSIP_IMPL_HAVE_CVSIP > +# include > +#endif > > > > @@ -62,6 +68,10 @@ > template > struct Choose_lud_impl > { [16*] This guard should be (HAVE_CVSIP does not imply IS_REF_IMPL, we want to be able to use the C-VSIP backend with the optimized implementation): #ifdef VSIP_IMPL_IS_REF_IMPL > +#ifdef VSIP_IMPL_HAVE_CVSIP > + typedef Cvsip_tag use_type; > + typedef Cvsip_tag type; > +#else > typedef typename Choose_solver_impl< > Is_lud_impl_avail, > T, > @@ -71,6 +81,7 @@ > Type_equal::value, > As_type, > As_type >::type use_type; > +#endif > }; > > } // namespace impl > Index: solver/common.hpp > =================================================================== > --- solver/common.hpp (revision 151692) > +++ solver/common.hpp (working copy) > @@ -71,6 +71,7 @@ > > // Implementation tags > struct Lapack_tag; > +struct Cvsip_tag; > > // Error tags > struct Error_no_solver_for_this_type; -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From assem at codesourcery.com Wed Oct 18 13:35:44 2006 From: assem at codesourcery.com (Assem Salama) Date: Wed, 18 Oct 2006 09:35:44 -0400 Subject: [vsipl++] LU In-Reply-To: <453527E0.2020303@codesourcery.com> References: <45351793.3050506@codesourcery.com> <453527E0.2020303@codesourcery.com> Message-ID: <45362DB0.80109@codesourcery.com> > [13] Does this interface get used? I don't think it is a good one > because it requires the user to specify the strides. There are only > two correct values (1, n) and (m, 1), but many wrong ones. Why are there only two possible strides here? What if someone has an arbitray col stride, like for example 3 for an rgb image? From jules at codesourcery.com Wed Oct 18 13:55:51 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 18 Oct 2006 09:55:51 -0400 Subject: [vsipl++] LU In-Reply-To: <45362DB0.80109@codesourcery.com> References: <45351793.3050506@codesourcery.com> <453527E0.2020303@codesourcery.com> <45362DB0.80109@codesourcery.com> Message-ID: <45363267.4010804@codesourcery.com> Assem Salama wrote: > >> [13] Does this interface get used? I don't think it is a good one >> because it requires the user to specify the strides. There are only >> two correct values (1, n) and (m, 1), but many wrong ones. > Why are there only two possible strides here? What if someone has an > arbitray col stride, like for example 3 for an rgb image? > For a dense, 2-dim block (which is what is being constructed by blockcreate()), there are only two sets of valid strides, corresponding to row-major and column-major. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From assem at codesourcery.com Wed Oct 18 13:59:47 2006 From: assem at codesourcery.com (Assem Salama) Date: Wed, 18 Oct 2006 09:59:47 -0400 Subject: [vsipl++] LU In-Reply-To: <45363267.4010804@codesourcery.com> References: <45351793.3050506@codesourcery.com> <453527E0.2020303@codesourcery.com> <45362DB0.80109@codesourcery.com> <45363267.4010804@codesourcery.com> Message-ID: <45363353.1050908@codesourcery.com> Oh, I see. I didn't realize this was the constructor calling blockcreate! Thanks for pointing that out. --Assem Jules Bergmann wrote: > Assem Salama wrote: >> >>> [13] Does this interface get used? I don't think it is a good one >>> because it requires the user to specify the strides. There are only >>> two correct values (1, n) and (m, 1), but many wrong ones. >> Why are there only two possible strides here? What if someone has an >> arbitray col stride, like for example 3 for an rgb image? >> > > For a dense, 2-dim block (which is what is being constructed by > blockcreate()), there are only two sets of valid strides, > corresponding to row-major and column-major. > > -- Jules > From jules at codesourcery.com Thu Oct 19 02:12:24 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 18 Oct 2006 22:12:24 -0400 Subject: [patch] Parallel assignment algorithm and PAS fixes. Message-ID: <4536DF08.2000307@codesourcery.com> Fix dispatch to use non-early-binding version of PAS Par_assign for normal parallel assignments (i.e. non-Setup_assign). Fix bug in non-early-binding PAS Par_assign where PAS_WAIT was not always being set, creating a race condition (thanks to John Watson for catching this!) Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pas.diff URL: From jules at codesourcery.com Thu Oct 19 15:47:17 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 19 Oct 2006 11:47:17 -0400 Subject: [patch] Add dispatch to SAL vthresx and vthrx routines Message-ID: <45379E05.3010408@codesourcery.com> Plus fix a few headers and guards for files moved in the reorg. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu Oct 19 15:48:09 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 19 Oct 2006 11:48:09 -0400 Subject: [vsipl++] [patch] Add dispatch to SAL vthresx and vthrx routines In-Reply-To: <45379E05.3010408@codesourcery.com> References: <45379E05.3010408@codesourcery.com> Message-ID: <45379E39.3070703@codesourcery.com> Oops! Patch attached. Jules Bergmann wrote: > Plus fix a few headers and guards for files moved in the reorg. > > -- Jules > -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vthresh.diff URL: From jules at codesourcery.com Mon Oct 23 19:02:26 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 23 Oct 2006 15:02:26 -0400 Subject: [patch] Fix sarsim to compile Message-ID: <453D11C2.70509@codesourcery.com> This patch fixes sarsim to compile with the current library. It also adds support for parallel sarsim. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sarsim.diff URL: From assem at codesourcery.com Wed Oct 25 14:29:37 2006 From: assem at codesourcery.com (Assem Salama) Date: Wed, 25 Oct 2006 10:29:37 -0400 Subject: lu solver Message-ID: <453F74D1.5030805@codesourcery.com> Everone, This is the LU solver using CVSIP. This is still using Cvsip_matrix and Cvsip_lu. Thanks, Assem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn.diff.10252006.1.log URL: From stefan at codesourcery.com Wed Oct 25 15:00:40 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 25 Oct 2006 11:00:40 -0400 Subject: [vsipl++] lu solver In-Reply-To: <453F74D1.5030805@codesourcery.com> References: <453F74D1.5030805@codesourcery.com> Message-ID: <453F7C18.9050607@codesourcery.com> Assem, it would be best to generate patches from the toplevel source tree, as opposed some subdirectory therein. That makes it clear which files are being talked about. (This gets particularly confusing if the @file key on top of a given file is wrong, as below. :-) ) Assem Salama wrote: > Everone, > This is the LU solver using CVSIP. This is still using Cvsip_matrix > and Cvsip_lu. > > Thanks, > Assem > >------------------------------------------------------------------------ > >Index: solver_lu.hpp >=================================================================== >--- solver_lu.hpp (revision 151855) >+++ solver_lu.hpp (working copy) >@@ -3,12 +3,12 @@ > /** @file vsip/impl/lapack/solver_lu.hpp > > I suspect that should be vsip/core/cvsip/solver_lu.hpp, right ? > @author Assem Salama > @date 2006-04-13 > > Some nit-picking: If we insist on having a @date key in the files, they should contain some real value, not just a copy of a file this originally was a copy of. (Note that I'm indeed not sure about the need for @date, nor most of the other keys. But that's for another discussion...) >- @brief VSIPL++ Library: LU linear system solver using lapack. >+ @brief VSIPL++ Library: LU linear system solver using cvsip. > > */ > >-#ifndef VSIP_REF_IMPL_SOLVER_LU_HPP >-#define VSIP_REF_IMPL_SOLVER_LU_HPP >+#ifndef VSIP_CORE_CVSIP_SOLVER_LU_HPP >+#define VSIP_CORE_CVSIP_SOLVER_LU_HPP > > /*********************************************************************** > Included Files >@@ -25,6 +25,7 @@ > #include > #include > >+#include > > This file shouldn't depend on vsip_csl code. > > > /*********************************************************************** >@@ -78,7 +79,6 @@ > typedef std::vector > vector_type; > > length_type length_; // Order of A. >- vector_type ipiv_; // Additional info on Q > > Matrix data_; // Factorized Cholesky matrix (A) > cvsip::Cvsip_matrix cvsip_data_; >@@ -101,9 +101,8 @@ > ) > VSIP_THROW((std::bad_alloc)) > : length_ (length), >- ipiv_ (length_), > data_ (length_, length_), >- cvsip_data_ (data_.block().impl_data(), length_, length_), >+ cvsip_data_ (data_.block().impl_data(), length_, length_, col2_type()), > cvsip_lud_ (length_) > { > assert(length_ > 0); >@@ -115,14 +114,11 @@ > Lud_impl::Lud_impl(Lud_impl const& lu) > VSIP_THROW((std::bad_alloc)) > : length_ (lu.length_), >- ipiv_ (length_), > data_ (length_, length_), > cvsip_data_ (data_.block().impl_data(), length_, length_), > cvsip_lud_ (length_) > { > data_ = lu.data_; >- for (index_type i=0; i- ipiv_[i] = lu.ipiv_[i]; > } > > >@@ -143,6 +139,7 @@ > /// FLOPS: > /// real : UPDATE > /// complex: UPDATE >+// > > template > template >@@ -152,16 +149,15 @@ > { > assert(m.size(0) == length_ && m.size(1) == length_); > >+ cvsip_data_.release(false); > assign_local(data_, m); >+ cvsip_data_.admit(true); > > bool success = cvsip_lud_.decompose(cvsip_data_); > >- > return success; > } > >- >- > /// Solve Op(A) x = b (where A previously given to decompose) > /// > /// Op(A) is >@@ -201,12 +197,13 @@ > > if (tr == mat_ntrans) > trans = VSIP_MAT_NTRANS; >- else if (tr == mat_trans) >+ else if (tr == mat_trans && ! Is_complex::value) > trans = VSIP_MAT_TRANS; >- else if (tr == mat_herm) >- { >- assert(Is_complex::value); >+ else if (tr == mat_herm && Is_complex::value) > trans = VSIP_MAT_HERM; >+ else { >+ VSIP_IMPL_THROW(unimplemented( >+ "Lud_impl cvsip solver doesn't support this transformation")); > } > > > Since the above exception would percolate up to the public API, I don't think "Lud_impl cvsip solver" is the best name to give to the actual code. May be "cvsip LU solver backend" ? > { >@@ -215,7 +212,6 @@ > cvsip::Cvsip_matrix > cvsip_b_int(b_ext.data(),b_ext.size(0),b_ext.size(1), > b_ext.stride(0),b_ext.stride(1)); >- > cvsip_lud_.solve(trans,cvsip_b_int); > > } >@@ -229,4 +225,4 @@ > } // namespace vsip > > >-#endif // VSIP_IMPL_LAPACK_SOLVER_LU_HPP >+#endif // VSIP_CORE_CVSIP_SOLVER_LU_HPP >Index: cvsip.hpp >=================================================================== >--- cvsip.hpp (revision 151857) >+++ cvsip.hpp (working copy) >@@ -147,6 +147,8 @@ > { \ > return VF(lu_obj, op, view); \ > } >+ >+ > /****************************************************************************** > * Function declarations > ******************************************************************************/ > > Please make sure only real changes make their way into a patch. >Index: cvsip_lu.hpp >=================================================================== >--- cvsip_lu.hpp (revision 151855) >+++ cvsip_lu.hpp (working copy) >@@ -31,8 +31,8 @@ > typedef typename Cvsip_traits::lud_object_type lud_object_type; > > public: >- Cvsip_lud(int n); >- ~Cvsip_lud(); >+ Cvsip_lud(int n); >+ ~Cvsip_lud(); > > The original 'Cvsip_lud' declarator should be fine. > > int decompose(Cvsip_matrix &a); > int solve(vsip_mat_op op, Cvsip_matrix &xb); >@@ -56,10 +56,9 @@ > template > int Cvsip_lud::decompose(Cvsip_matrix &a) > { >- a.admit(false); >+ > int ret = lud(lu_, a.get_view()); >- a.release(true); >- return ret; >+ return !ret; > } > > template >@@ -67,7 +66,6 @@ > { > xb.admit(true); > int ret = lusol(lu_, op, xb.get_view()); >- printf("RET: %d\n", ret); > xb.release(true); > return ret; > } >Index: cvsip_matrix.hpp >=================================================================== >--- cvsip_matrix.hpp (revision 151855) >+++ cvsip_matrix.hpp (working copy) >@@ -32,9 +32,10 @@ > > public: > Cvsip_matrix(T *block, int m, int n, int s1, int s2); >- Cvsip_matrix(int m, int n, int s1, int s2); >- Cvsip_matrix(T *block, int m, int n); >- Cvsip_matrix(int m, int n); >+ template >+ Cvsip_matrix(int m, int n, OrderT const&); >+ template >+ Cvsip_matrix(T *block, int m, int n, OrderT const&); > ~Cvsip_matrix(); > > mview_type *get_view() { return mview_; } >@@ -42,8 +43,8 @@ > void release(bool flag) { blockrelease(mblock_, flag); } > > private: >- mview_type *mview_; >- block_type *mblock_; >+ mview_type* mview_; >+ block_type* mblock_; > bool local_data_; > > >@@ -53,51 +54,47 @@ > Cvsip_matrix::Cvsip_matrix(T *block, int m, int n, int s1, int s2) > { > // block is allocated, just bind to it. >- mblock_ = blockbind(block, m*n, VSIP_MEM_NONE); >+ mblock_ = blockbind(block, (n-1)*s2 + (m-1)*s1 + 1, VSIP_MEM_NONE); > >- // block must be dense >- mview_ = mbind(mblock_, 0, s1, n, s2, m); >+ mview_ = mbind(mblock_, 0, s1, m, s2, n); > > local_data_ = false; > } > > template >-Cvsip_matrix::Cvsip_matrix(int m, int n, int s1, int s2) >+template >+Cvsip_matrix::Cvsip_matrix(int m, int n, OrderT const& = row2_type()) > { > // create block > blockcreate(m*n, VSIP_MEM_NONE, &mblock_); > > // block must be dense >- mview_ = mbind(mblock_, 0, s1, n, s2, m); >+ if(Type_equal::value) >+ mview_ = mbind(mblock_, 0, m, n, 1, m); >+ else >+ mview_ = mbind(mblock_, 0, 1, n, n, m); > > local_data_ = true; > } > > template >-Cvsip_matrix::Cvsip_matrix(T *block, int m, int n) >+template >+Cvsip_matrix::Cvsip_matrix(T *block, int m, int n, >+ OrderT const& = row2_type()) > { > // block is allocated, just bind to it. > mblock_ = blockbind(block, m*n, VSIP_MEM_NONE); > > // block must be dense >- mview_ = mbind(mblock_, 0, 1, n, n, m); >+ if(Type_equal::value) >+ mview_ = mbind(mblock_, 0, m, n, 1, m); >+ else >+ mview_ = mbind(mblock_, 0, 1, n, n, m); > > local_data_ = false; > } > > template >-Cvsip_matrix::Cvsip_matrix(int m, int n) >-{ >- // create block >- blockcreate(m*n, VSIP_MEM_NONE, &mblock_); >- >- // block must be dense >- mview_ = mbind(mblock_, 0, 1, n, n, m); >- >- local_data_ = true; >-} >- >-template > Cvsip_matrix::~Cvsip_matrix() > { > // destroy everything! >Index: solver-lu.cpp > > As we just relocated and renamed most files, please make sure to follow the naming conventions. Use '_' instead of '-', and use lu.hpp, instead of solver_hpp (and likewise for the cpp). >=================================================================== >--- solver-lu.cpp (revision 151693) >+++ solver-lu.cpp (working copy) >@@ -26,6 +26,12 @@ > #include "test-random.hpp" > #include "solver-common.hpp" > >+#ifdef VSIP_IMPL_HAVE_CVSIP >+#define TEST_TRANSPOSE_SOLVE 0 >+#else >+#define TEST_TRANSPOSE_SOLVE 1 >+#endif >+ > #define VERBOSE 0 > > This looks like debug code. Should that really go into the repository ? > #define DO_ASSERT 1 > > Same here. Additionally, why don't you use instead (i.e. a noop in release mode, and a real test with potential abort() otherwise) ? > #define DO_SWEEP 0 > > Likewise. >@@ -100,7 +106,9 @@ > > // 2. Solve A X = B. > lu.template solve(b, x1); >+#if TEST_TRANSPOSE_SOLVE == 1 > lu.template solve(b, x2); >+#endif > lu.template solve::trans>(b, x3); // mat_herm if T complex > } > if (rtm == by_value) >@@ -114,7 +122,9 @@ > > // 2. Solve A X = B. > x1 = lu.template solve(b); >+#if TEST_TRANSPOSE_SOLVE == 1 > x2 = lu.template solve(b); >+#endif > x3 = lu.template solve::trans>(b); // mat_herm if T complex > } > >@@ -126,7 +136,9 @@ > Matrix chk3(n, p); > > prod(a, x1, chk1); >+#if TEST_TRANSPOSE_SOLVE == 1 > prod(trans(a), x2, chk2); >+#endif > prod(trans_or_herm(a), x3, chk3); > > typedef typename vsip::impl::Scalar_of::type scalar_type; >@@ -169,8 +181,13 @@ > { > scalar_type residual_1 = norm_2((b - chk1).col(i)); > scalar_type err1 = residual_1 / (a_norm_2 * norm_2(x1.col(i)) * eps); >+#if TEST_TRANSPOSE_SOLVE == 1 > scalar_type residual_2 = norm_2((b - chk2).col(i)); > scalar_type err2 = residual_2 / (a_norm_2 * norm_2(x2.col(i)) * eps); >+#else >+ scalar_type residual_2 = 0; >+ scalar_type err2 = 0; >+#endif > scalar_type residual_3 = norm_2((b - chk3).col(i)); > scalar_type err3 = residual_3 / (a_norm_2 * norm_2(x3.col(i)) * eps); > >@@ -192,7 +209,9 @@ > > #if DO_ASSERT > test_assert(err1 < p_limit); >+#if TEST_TRANSPOSE_SOLVE == 1 > test_assert(err2 < p_limit); >+#endif > test_assert(err3 < p_limit); > #endif > >@@ -247,7 +266,9 @@ > > // 2. Solve A X = B. > lu.template solve(b, x1); >+#if TEST_TRANSPOSE_SOLVE == 1 > lu.template solve(b, x2); >+#endif > lu.template solve::trans>(b, x3); // mat_herm if T complex > } > if (rtm == by_value) >@@ -261,7 +282,9 @@ > > // 2. Solve A X = B. > impl::assign_local(x1, lu.template solve(b)); >+#if TEST_TRANSPOSE_SOLVE == 1 > impl::assign_local(x2, lu.template solve(b)); >+#endif > impl::assign_local(x3, lu.template solve::trans>(b)); > } > >@@ -273,7 +296,9 @@ > Matrix chk3(n, p); > > prod(a, x1, chk1); >+#if TEST_TRANSPOSE_SOLVE == 1 > prod(trans(a), x2, chk2); >+#endif > prod(trans_or_herm(a), x3, chk3); > > typedef typename vsip::impl::Scalar_of::type scalar_type; >@@ -317,8 +342,13 @@ > { > scalar_type residual_1 = norm_2((b - chk1).col(i)); > scalar_type err1 = residual_1 / (a_norm_2 * norm_2(x1.col(i)) * eps); >+#if TEST_TRANSPOSE_SOLVE == 1 > scalar_type residual_2 = norm_2((b - chk2).col(i)); > scalar_type err2 = residual_2 / (a_norm_2 * norm_2(x2.col(i)) * eps); >+#else >+ scalar_type residual_2 = 0; >+ scalar_type err2 = 0; >+#endif > scalar_type residual_3 = norm_2((b - chk3).col(i)); > scalar_type err3 = residual_3 / (a_norm_2 * norm_2(x3.col(i)) * eps); > >@@ -339,7 +369,9 @@ > #endif > > test_assert(err1 < p_limit); >+#if TEST_TRANSPOSE_SOLVE == 1 > test_assert(err2 < p_limit); >+#endif > test_assert(err3 < p_limit); > > if (err1 > max_err1) max_err1 = err1; > > Thanks, Stefan From jules at codesourcery.com Wed Oct 25 15:54:31 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 25 Oct 2006 11:54:31 -0400 Subject: [vsipl++] lu solver In-Reply-To: <453F7C18.9050607@codesourcery.com> References: <453F74D1.5030805@codesourcery.com> <453F7C18.9050607@codesourcery.com> Message-ID: <453F88B7.2050207@codesourcery.com> > > Some nit-picking: If we insist on having a @date key in the files, they > should contain some real value, not just > a copy of a file this originally was a copy of. (Note that I'm indeed > not sure about the need for @date, nor > most of the other keys. But that's for another discussion...) > Right. Let's make the keys be valid. (Changing/removing the keys is not open for discussion :) >> + VSIP_IMPL_THROW(unimplemented( >> + "Lud_impl cvsip solver doesn't support this transformation")); >> } >> >> >> > > Since the above exception would percolate up to the public API, I don't > think "Lud_impl cvsip solver" is the best > name to give to the actual code. May be "cvsip LU solver backend" ? > I agree. How about "LU solver (CVSIP backend) does not implement this transformation". Start with the user-level VSIPL++ object that failed, give some extra info that might be useful (the backend in this case), then give the error. >> Index: solver-lu.cpp >> >> > > As we just relocated and renamed most files, please make sure to follow > the naming > conventions. Use '_' instead of '-', and use lu.hpp, instead of > solver_hpp (and likewise > for the cpp). This file is an existing unit test in the tests subdirectory (which would be more obvious if the diff was taken from the top-level) -- i.e. we can't beat up on Assem too much for the name I gave it way back when :) >> + >> #define VERBOSE 0 >> >> > > This looks like debug code. Should that really go into the repository ? > >> #define DO_ASSERT 1 >> >> > > Same here. Additionally, why don't you use instead > (i.e. a noop in release mode, and a real test with potential abort() > otherwise) ? This is in a unit test, so debug code like this is OK. That way when we write a new backend, debugging test failures is marginally easier. The DO_ASSERT flag lets assertions be turned off, which IIRC was useful in debugging for getting passed the first error to see other errors. Just to be clear, in unit tests, we should use 'test_assert', not 'assert' from . This lets us run the test cases with the release mode flags (which include -DNDEBUG, which disables asserts). Inside the library, we should use 'assert' for the same reason. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Wed Oct 25 20:12:36 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 25 Oct 2006 16:12:36 -0400 Subject: [patch] Add functions for isfinite, isnan, and isnormal; use them from error_db Message-ID: <453FC534.2020308@codesourcery.com> This patch adds view functions for isfinite, isnan, and isnormal that take a view of floating point type values (including complex) and return a view of bools. To check if a view contains NaNs: if (anytrue(isnan(view))) ... To count the number of NaNs in a view: int count = sumval(isnan(view)); etc etc This patch extends error_db to return 201 if either input view contains a NaN. (Note that the largest value that error_db can return for two views that contain only finite numbers is 0). The reason that error_db was not propagating the NaN value is that reductions like maxval do not reliably propagate NaNs. Deep inside maxval there is a loop: maxval = X.get(0); for (i= 1 .. size) if (X.get(i) > maxval) maxval = X.get(i) If X.get(i) is a NaN, the comparison is false and the value is skipped over. If X.get(0) is NaN, this would be propagated. We could change maxval to check for NaN: for (i = ... if (X.get(i) > maxval || isnan(X.get(i))) .. but that is going down a murky path. Primarily it would degrade performance. It would also create differences when another library is used to perform maxval (such as SAL) that doesn't check for NaNs. C-VSIPL has the concept of development and release modes for the libraries, with the idea that in development mode the library might do additional checks (such as check for NaNs) that aren't done in release mode. At some future point we could do something along those lines, perhaps taking advantage of C++ capabilities, such as passing maxval a policy for NaN checking, the default being no NaN checking. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: nan.diff URL: From mark at codesourcery.com Wed Oct 25 20:29:54 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Wed, 25 Oct 2006 13:29:54 -0700 Subject: [vsipl++] [patch] Add functions for isfinite, isnan, and isnormal; use them from error_db In-Reply-To: <453FC534.2020308@codesourcery.com> References: <453FC534.2020308@codesourcery.com> Message-ID: <453FC942.8050305@codesourcery.com> Jules Bergmann wrote: > Deep inside maxval there is a loop: > > maxval = X.get(0); > for (i= 1 .. size) > if (X.get(i) > maxval) > maxval = X.get(i) > > If X.get(i) is a NaN, the comparison is false and the value is skipped > over. I agree that this is the right behavior by default. As you say, checking for this case would be too expensive. -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From jules at codesourcery.com Thu Oct 26 16:22:44 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 26 Oct 2006 12:22:44 -0400 Subject: [patch] Test for threshold dispatch to SAL Message-ID: <4540E0D4.50003@codesourcery.com> These tests go along with the earlier SAL dispatch patch to vthresx and vthrx. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: threshold-test.diff URL: From jules at codesourcery.com Fri Oct 27 13:19:57 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 27 Oct 2006 09:19:57 -0400 Subject: [patch] Add evaluators for SAL vector comparison functions. Message-ID: <4542077D.10005@codesourcery.com> This patch adds dispatch to the SAL vector comparison functions (lvgtx, etc). It also extends the load_view and save_view functions in vsip_csl to work with distributed data (this was used for the fast convolution demo). Finally, it fixes a few miscellanea: - Fix the new assign() function in dispatch_assign to not strip the top-level 'const' from expression templates. This was preventing the math library evaluators from applying. - Fix tests using fast_block's to use the right include path. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sal-lvgt.diff URL: From jules at codesourcery.com Fri Oct 27 19:25:51 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 27 Oct 2006 15:25:51 -0400 Subject: [patch] Use QMtest CommandHost Message-ID: <45425D3F.2050200@codesourcery.com> This patch adds support for using QMTest's CommandHost target to run tests with a proxy command. This is currently used for PAS on Linux cluster testing. It could be used for general MCOE testing. Stefan, is this ok to commit? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qm-cmd.diff URL: From stefan at codesourcery.com Fri Oct 27 19:59:37 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Fri, 27 Oct 2006 15:59:37 -0400 Subject: [vsipl++] [patch] Use QMtest CommandHost In-Reply-To: <45425D3F.2050200@codesourcery.com> References: <45425D3F.2050200@codesourcery.com> Message-ID: <45426529.2090202@codesourcery.com> Jules Bergmann wrote: > This patch adds support for using QMTest's CommandHost target to run > tests with a proxy command. > > This is currently used for PAS on Linux cluster testing. It could be > used for general MCOE testing. > > Stefan, is this ok to commit? Yes, this looks good. (I find --with-qmtest-command not very descriptive -- as it sounds like the command to invoke qmtest itself -- but a) I don't have anything other to suggest and b) chances are that the group of potential users for this particular option is rather limitted :-) ) > case "$host_cpu" in > - (ia32|i686|x86_64) fftw3_f_simd="--enable-sse" > + ia32|i686|x86_64) fftw3_f_simd="--enable-sse" > fftw3_d_simd="--enable-sse2" > ;; > - (ppc*) fftw3_f_simd="--enable-altivec" ;; > + ppc*) fftw3_f_simd="--enable-altivec" ;; > esac > AC_MSG_NOTICE([fftw3 config options: $fftw3_opts $fftw3_simd.]) I remember Nathan (ncm) introducing this '(a)' syntax, with some explanation about broken shells. How exactly did this fail ? > Index: tests/GNUmakefile.inc.in > =================================================================== > --- tests/GNUmakefile.inc.in (revision 152549) > +++ tests/GNUmakefile.inc.in (working copy) > @@ -49,6 +49,7 @@ > sed -e "s|@CPPFLAGS_@|`$(tests_pkgconfig) --variable=cppflags`|" | \ > sed -e "s|@CXXFLAGS_@|`$(tests_pkgconfig) --variable=cxxflags`|" | \ > sed -e "s|@LIBS_@|`$(tests_pkgconfig) --libs`|" | \ > + sed -e "s|@QMTEST_TARGET_@|`$(tests_pkgconfig) --variable=qmtest_target`|" | \ > sed -e "s|@PAR_SERVICE_@|`$(tests_pkgconfig) --variable=par_service`|" \ > > tests/context-installed > cd tests; \ Doesn't this require that you define the 'qmtest_target' variable in the vsipl++.pc.in template, too ? Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From jules at codesourcery.com Fri Oct 27 20:39:53 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 27 Oct 2006 16:39:53 -0400 Subject: [vsipl++] [patch] Use QMtest CommandHost In-Reply-To: <45426529.2090202@codesourcery.com> References: <45425D3F.2050200@codesourcery.com> <45426529.2090202@codesourcery.com> Message-ID: <45426E99.6000900@codesourcery.com> > Yes, this looks good. (I find --with-qmtest-command not very descriptive -- > as it sounds like the command to invoke qmtest itself -- but a) I don't have > anything other to suggest and b) chances are that the group of potential > users for this particular option is rather limitted :-) ) I agree, it does seem to imply it is the qmtest executable (but we have '--with-qmtest=QMTEST' for the qmtest executable). Since the QMtest target class is called CommandHost, how about --with-qmtest-commandhost=XXX? > >> case "$host_cpu" in >> - (ia32|i686|x86_64) fftw3_f_simd="--enable-sse" >> + ia32|i686|x86_64) fftw3_f_simd="--enable-sse" >> fftw3_d_simd="--enable-sse2" >> ;; >> - (ppc*) fftw3_f_simd="--enable-altivec" ;; >> + ppc*) fftw3_f_simd="--enable-altivec" ;; >> esac >> AC_MSG_NOTICE([fftw3 config options: $fftw3_opts $fftw3_simd.]) > > I remember Nathan (ncm) introducing this '(a)' syntax, with some explanation > about broken shells. How exactly did this fail ? It fails on solaris. It doesn't fail when I run configure directly, but it does fail when configure is run by the Makefile, if it detects that configure is out of data w.r.t. configure.ac. I noticed similar problems with the atlas configure on gannon a while back. Running atlas configure directly was OK, but running atlas configure via the top-level configure made it very picky about syntax. There must be a flag to enable/disable this ultra-picky mode for solaris /bin/sh, but I don't know what it is. > > >> Index: tests/GNUmakefile.inc.in >> =================================================================== >> --- tests/GNUmakefile.inc.in (revision 152549) >> +++ tests/GNUmakefile.inc.in (working copy) >> @@ -49,6 +49,7 @@ >> sed -e "s|@CPPFLAGS_@|`$(tests_pkgconfig) --variable=cppflags`|" | \ >> sed -e "s|@CXXFLAGS_@|`$(tests_pkgconfig) --variable=cxxflags`|" | \ >> sed -e "s|@LIBS_@|`$(tests_pkgconfig) --libs`|" | \ >> + sed -e "s|@QMTEST_TARGET_@|`$(tests_pkgconfig) --variable=qmtest_target`|" | \ >> sed -e "s|@PAR_SERVICE_@|`$(tests_pkgconfig) --variable=par_service`|" \ >> > tests/context-installed >> cd tests; \ > > Doesn't this require that you define the 'qmtest_target' variable in the vsipl++.pc.in > template, too ? Yes, good catch. I forgot to include that file in the patch. Attached. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qm-cmd.diff URL: From stefan at codesourcery.com Fri Oct 27 21:09:12 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Fri, 27 Oct 2006 17:09:12 -0400 Subject: [vsipl++] [patch] Use QMtest CommandHost In-Reply-To: <45426E99.6000900@codesourcery.com> References: <45425D3F.2050200@codesourcery.com> <45426529.2090202@codesourcery.com> <45426E99.6000900@codesourcery.com> Message-ID: <45427578.6040406@codesourcery.com> Jules Bergmann wrote: > >> Yes, this looks good. (I find --with-qmtest-command not very >> descriptive -- >> as it sounds like the command to invoke qmtest itself -- but a) I >> don't have >> anything other to suggest and b) chances are that the group of potential >> users for this particular option is rather limitted :-) ) > > I agree, it does seem to imply it is the qmtest executable (but we have > '--with-qmtest=QMTEST' for the qmtest executable). > > Since the QMtest target class is called CommandHost, how about > --with-qmtest-commandhost=XXX? Good ! >>> case "$host_cpu" in >>> - (ia32|i686|x86_64) fftw3_f_simd="--enable-sse" >>> + ia32|i686|x86_64) fftw3_f_simd="--enable-sse" >>> fftw3_d_simd="--enable-sse2" >>> ;; >>> - (ppc*) fftw3_f_simd="--enable-altivec" ;; >>> + ppc*) fftw3_f_simd="--enable-altivec" ;; >>> esac >>> AC_MSG_NOTICE([fftw3 config options: $fftw3_opts $fftw3_simd.]) >> >> I remember Nathan (ncm) introducing this '(a)' syntax, with some >> explanation >> about broken shells. How exactly did this fail ? > > It fails on solaris. > > It doesn't fail when I run configure directly, but it does fail when > configure is run by the Makefile, if it detects that configure is out of > data w.r.t. configure.ac. > > I noticed similar problems with the atlas configure on gannon a while > back. Running atlas configure directly was OK, but running atlas > configure via the top-level configure made it very picky about syntax. > > There must be a flag to enable/disable this ultra-picky mode for solaris > /bin/sh, but I don't know what it is. Typically, Makefiles define the SHELL variable explicitely. Ours doesn't. May be it should ? (I don't have experience with Solaris, but I have heared bad things about its default shell.) >>> Index: tests/GNUmakefile.inc.in >>> =================================================================== >>> --- tests/GNUmakefile.inc.in (revision 152549) >>> +++ tests/GNUmakefile.inc.in (working copy) >>> @@ -49,6 +49,7 @@ >>> sed -e "s|@CPPFLAGS_@|`$(tests_pkgconfig) >>> --variable=cppflags`|" | \ >>> sed -e "s|@CXXFLAGS_@|`$(tests_pkgconfig) >>> --variable=cxxflags`|" | \ >>> sed -e "s|@LIBS_@|`$(tests_pkgconfig) --libs`|" | \ >>> + sed -e "s|@QMTEST_TARGET_@|`$(tests_pkgconfig) >>> --variable=qmtest_target`|" | \ >>> sed -e "s|@PAR_SERVICE_@|`$(tests_pkgconfig) >>> --variable=par_service`|" \ >>> > tests/context-installed >>> cd tests; \ >> >> Doesn't this require that you define the 'qmtest_target' variable in >> the vsipl++.pc.in >> template, too ? > > Yes, good catch. I forgot to include that file in the patch. Attached. OK, that looks good. Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From don at codesourcery.com Mon Oct 30 10:14:41 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 30 Oct 2006 03:14:41 -0700 Subject: [patch] Scalable SAR benchmark Message-ID: <4545D091.1040307@codesourcery.com> The attached patch adds a new application -- a portion of the third Scalable Synthetic Compact Application (SSCA) Benchmark, SAR Sensor Processing, Knowledge Formation, and File IO. More information may be found on the HPCS website: http://www.highproductivity.org/SSCABmks.htm This implements Kernel 1, which produces images from raw radar data. Note that this code follows the Matlab example code obtained through the above site and is not optimized (beyond simple things such as creating all FFT objects and views at initialization time when the dimensions are known at compile time). At present, the makefile depends on having an installed version of VSIPL++ (in the default location, /usr/local). The install path should be updated along with the package suffix in order to run on different platforms. Build and run the application using 'make; make check'. For verification, the computed image is compared against the Matlab-generated image (which is of a regularly spaced grid of corner reflectors). All testing (so far) was performed using the serial-builtin-32 configuration, with version 1.2 of VSIPL++. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar.diff URL: From stefan at codesourcery.com Mon Oct 30 15:01:49 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 30 Oct 2006 10:01:49 -0500 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <4545D091.1040307@codesourcery.com> References: <4545D091.1040307@codesourcery.com> Message-ID: <454613DD.50602@codesourcery.com> Don, This looks good. I like the heavily commented / documented code. That helps a lot in understanding what the code is doing ! I have some high-level / stylistic comments: Don McCoy wrote: > Index: apps/ssar/load_save.hpp > =================================================================== > --- apps/ssar/load_save.hpp (revision 0) > +++ apps/ssar/load_save.hpp (revision 0) > @@ -0,0 +1,114 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file load_save.hpp > + @author Don McCoy > + @date 2006-10-26 > + @brief Extensions to allow type double to be used as the view > + data type while using float as the storage type on disk. I think it would be best to follow the same idiom we agreed on for view I/O (and which we now use for our matlab reader / writer), e.g. input_stream >> Decoder, float>(view); This would help us promote this idiom, and make documentation easier. > +*/ > + > +#ifndef LOAD_SAVE_HPP > +#define LOAD_SAVE_HPP > + > +#include > +#include > + > +using namespace vsip_csl; > + > +template > +void > +save_view( > + char* filename, This should be 'char const *'. > + vsip::const_Matrix, Block> view) > +{ > + vsip::Matrix > sp_view(view.size(0), view.size(1)); > + > + for (index_type i = 0; i < view.size(0); ++i) > + for (index_type j = 0; j < view.size(1); ++j) > + sp_view.put(i, j, static_cast >(view.get(i, j))); > + > + Save_view<2, complex >::save(filename, sp_view); Where is the Save_view template defined ? I couldn't find it anywhere. (I'm wondering whether this could be generalized to do the type cast during the streaming, to avoid the above extra copy.) > +vsip::Matrix > > +load_view( > + char* filename, > + vsip::Domain<2> const& dom) > +{ > + vsip::Matrix > sp_view(dom[0].size(), dom[1].size()); > + sp_view = Load_view<2, complex >(filename, dom).view(); > + > + vsip::Matrix > view(dom[0].size(), dom[1].size()); > + > + for (index_type i = 0; i < dom[0].size(); ++i) > + for (index_type j = 0; j < dom[1].size(); ++j) > + view.put(i, j, static_cast >(sp_view.get(i, j))); Same comment here. There must be a way to load the view without this extra copy. I believe the matlab formatter allows that, too, IIRC. > Index: apps/ssar/diffview.cpp > =================================================================== > --- apps/ssar/diffview.cpp (revision 0) > +++ apps/ssar/diffview.cpp (revision 0) > @@ -0,0 +1,110 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file diffview.cpp > + @author Don McCoy > + @date 2006-10-29 > + @brief Utility to compare VSIPL++ views to determine equality > +*/ > + > +#include > +#include > + > +#include > +#include > + > +#include > +#include > +#include > + > + > +using namespace vsip; > +using namespace vsip_csl; > +using namespace std; > + > + > +typedef enum > +{ > + COMPLEX_VIEW = 0, > + REAL_VIEW, > + INTEGER_VIEW > +} data_format_type; What's the reason this is a typedef, as opposed to enum data_format_type {...}; ? (This looks like C-style programming :-) ) > + > +static void compare(data_format_type format, > + char* infile, char* ref, length_type rows, length_type cols); Shouldn't these be 'char const *' (infile, ref) ? Replace this use of 'static' with an unnamed namespace to get the same effect. Though I'm not sure what the desired effect is, since this is the main source file anyway... > Index: apps/ssar/kernel1.hpp > =================================================================== > --- apps/ssar/kernel1.hpp (revision 0) > +++ apps/ssar/kernel1.hpp (revision 0) > @@ -0,0 +1,537 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file kernel.hpp > + @author Don McCoy > + @date 2006-10-26 > + @brief VSIPL++ implementation of SSCA #3: Kernel 1, Image Formation > +*/ > + > +#include > + > +#include "load_save.hpp" > + > +#if 0 > +#define VERBOSE > +#define SAVE_VIEW(a, b) save_view(a, b) > +#else > +#define SAVE_VIEW(a, b) > +#endif > + > +// Files required to be in the data directory: > +#define SAR_DIMENSIONS "dims.txt" > +#define RAW_SAR_DATA "sar.view" > +#define FAST_TIME_FILTER "ftfilt.view" > +#define SLOW_TIME_WAVENUMBER "k.view" > +#define SLOW_TIME_COMPRESSED_APERTURE_POSITION "uc.view" > +#define SLOW_TIME_APERTURE_POSITION "u.view" > +#define SLOW_TIME_SPATIAL_FREQUENCY "ku.view" Can these become char const *SAR_DIMENSIONS = "dims.txt"; etc., instead ? (Let's not use macros more than necessary !) > Index: apps/ssar/viewtoraw.cpp > =================================================================== > --- apps/ssar/viewtoraw.cpp (revision 0) > +++ apps/ssar/viewtoraw.cpp (revision 0) > @@ -0,0 +1,121 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file viewtoraw.cpp > + @author Don McCoy > + @date 2006-10-28 > + @brief Utility to convert VSIPL++ views to raw greyscale > +*/ > + > +#include > +#include > + > +#include > +#include > + > +#include > +#include > + > + > +using namespace vsip; > +using namespace vsip_csl; > +using namespace std; > + > + > +typedef enum > +{ > + COMPLEX_MAG = 0, > + COMPLEX_REAL, > + COMPLEX_IMAG, > + SCALAR_FLOAT, > + SCALAR_INTEGER > +} data_format_type; Same comment as above. > + > +static void convert_to_greyscale(data_format_type format, > + char* infile, char* outfile, length_type rows, length_type cols); Same comment(s) as above. > Index: apps/ssar/ssar.cpp > =================================================================== > --- apps/ssar/ssar.cpp (revision 0) > +++ apps/ssar/ssar.cpp (revision 0) > @@ -0,0 +1,93 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file ssar.cpp > + @author Don McCoy > + @date 2006-10-26 > + @brief VSIPL++ implementation of HPCS Challenge Benchmarks > + Scalable Synthetic Compact Applications - > + SSCA #3: Sensor Processing and Knowledge Formation > +*/ > + > +#include > +#include > +#include This should be . Thanks, Stefan -- Stefan Seefeld CodeSourcery stefan at codesourcery.com (650) 331-3385 x718 From mark at codesourcery.com Mon Oct 30 17:23:50 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Mon, 30 Oct 2006 09:23:50 -0800 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <4545D091.1040307@codesourcery.com> References: <4545D091.1040307@codesourcery.com> Message-ID: <45463526.7040507@codesourcery.com> Don McCoy wrote: > The attached patch adds a new application -- a portion of the third > Scalable Synthetic Compact Application (SSCA) Benchmark, SAR Sensor > Processing, Knowledge Formation, and File IO. Very exciting. Jules, when this is checked in, make sure we fire off a message to Rich, Jeremy, etc. (You might also mention to Rich that lots of progress is being made on the reference implementation, since this has been one of his hot buttons.) Thanks, -- Mark Mitchell CodeSourcery mark at codesourcery.com (650) 331-3385 x713 From don at codesourcery.com Mon Oct 30 19:53:25 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 30 Oct 2006 12:53:25 -0700 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454613DD.50602@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454613DD.50602@codesourcery.com> Message-ID: <45465835.5050701@codesourcery.com> Stefan Seefeld wrote: > Don, > > This looks good. I like the heavily commented / documented code. > That helps a lot in understanding what the code is doing ! > > Thank you. I should have mentioned that the comments come verbatim from the Matlab code. That, along with the other documentation from HPCS, made this project much easier. > > I think it would be best to follow the same idiom we agreed on for view I/O > (and which we now use for our matlab reader / writer), e.g. > > input_stream >> Decoder, float>(view); > > This would help us promote this idiom, and make documentation easier. > Good idea. I will look into this for the next revision. >> +template >> +void >> +save_view( >> + char* filename, >> > > This should be 'char const *'. > > Agreed. I recall now why it is not -- because the templates in vsip_csl/save_view.hpp (but not load_view.hpp) use char*. I fixed it for the present time using const_cast in order to avoid having to modify save_view.hpp. My reason for this is that I suspect I'll replace all of this code very soon in favor of something like you propose above. >> +typedef enum >> +{ >> + COMPLEX_VIEW = 0, >> + REAL_VIEW, >> + INTEGER_VIEW >> +} data_format_type; >> > > What's the reason this is a typedef, as opposed to > > enum data_format_type {...}; > > ? (This looks like C-style programming :-) ) > Old habits are hard to break? ;-) >> + >> +static void compare(data_format_type format, >> + char* infile, char* ref, length_type rows, length_type cols); >> > > Shouldn't these be 'char const *' (infile, ref) ? > > > Replace this use of 'static' with an unnamed namespace to get the > same effect. Though I'm not sure what the desired effect is, since > this is the main source file anyway... > See above re: 'old habits'. In my former life as a C programmer, I told myself to do this, even in top-level source because it may not always be the main source file. Thanks for the C++-y suggestion. I didn't know you could have an unnamed namespace -- sounds a bit paradoxical. :) >> +// Files required to be in the data directory: >> +#define SAR_DIMENSIONS "dims.txt" >> +#define RAW_SAR_DATA "sar.view" >> +#define FAST_TIME_FILTER "ftfilt.view" >> +#define SLOW_TIME_WAVENUMBER "k.view" >> +#define SLOW_TIME_COMPRESSED_APERTURE_POSITION "uc.view" >> +#define SLOW_TIME_APERTURE_POSITION "u.view" >> +#define SLOW_TIME_SPATIAL_FREQUENCY "ku.view" >> > > Can these become > > char const *SAR_DIMENSIONS = "dims.txt"; > > etc., instead ? (Let's not use macros more than necessary !) > > Sure. Done. Do our coding standards allow all-cap names in this case? >> +#include >> > > This should be . > > Done. > Thanks, > Stefan > Thanks for the feedback! Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar2.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar2.diff URL: From jules at codesourcery.com Mon Oct 30 20:00:34 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 30 Oct 2006 15:00:34 -0500 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <4545D091.1040307@codesourcery.com> References: <4545D091.1040307@codesourcery.com> Message-ID: <454659E2.3090902@codesourcery.com> Don McCoy wrote: > The attached patch adds a new application -- a portion of the third > Scalable Synthetic Compact Application (SSCA) Benchmark, SAR Sensor > Processing, Knowledge Formation, and File IO. More information may be > found on the HPCS website: > > http://www.highproductivity.org/SSCABmks.htm > > This implements Kernel 1, which produces images from raw radar data. > Note that this code follows the Matlab example code obtained through the > above site and is not optimized (beyond simple things such as creating > all FFT objects and views at initialization time when the dimensions are > known at compile time). > > At present, the makefile depends on having an installed version of > VSIPL++ (in the default location, /usr/local). The install path should > be updated along with the package suffix in order to run on different > platforms. Build and run the application using 'make; make check'. For > verification, the computed image is compared against the > Matlab-generated image (which is of a regularly spaced grid of corner > reflectors). > > All testing (so far) was performed using the serial-builtin-32 > configuration, with version 1.2 of VSIPL++. Don, This looks good. I have several comments below, plus some general comments. Since this code isn't going into the core library, and since this is going to be in a flux as we optimize, let's do the following: - address the easy comments: - Definitely 1, 5, 8 - Perhaps 4, 6, 7 - Later: 2, 3, 9. - check in code as a baseline, - address the remaining comments as you perform optimizations. Does that sound OK? Also, I haven't looked at this in detail from a performance perspective yet. I suspect a big optimization will be to change from processing an entire matrix at time to processing a row or column at a time. Definitely for the fast time filter, bandwidth expansion, and the application of fs_ref. I'm not sure about the interpolation part though. -- Jules General comments: - Avoid returning views by value (both for builtin operations, like Fftm, and for user defined functions, like fft_shift, load_view, and ...). Do you think the by-value notation is easier to read? If so, let me know. I have a partially finished patch for return-block optimization that can make the by-value forms as efficient as by-reference. However, this would be for builtin operations only, not user defined ones. - Continue to move intermediate views out of Kernel1 member functions and replace them with Kernel1 member variables. - To avoid confusion, I think it would be better to have Kernel1 member functions work directly on member variables, instead of passing them as arguments. For example, digital_spotlighting should just use s_filt_ instead of having it passed in as a parameter. > ------------------------------------------------------------------------ > > Index: apps/ssar/load_save.hpp > =================================================================== > --- apps/ssar/load_save.hpp (revision 0) > +++ apps/ssar/load_save.hpp (revision 0) > @@ -0,0 +1,114 @@ > +/* Copyright (c) 2006 by CodeSourcery. All rights reserved. */ > + > +/** @file load_save.hpp > + @author Don McCoy > + @date 2006-10-26 > + @brief Extensions to allow type double to be used as the view > + data type while using float as the storage type on disk. > +*/ > + > +#ifndef LOAD_SAVE_HPP > +#define LOAD_SAVE_HPP > + > +#include > +#include > + > +using namespace vsip_csl; [1] In general, putting 'using namespace' decls in a header is considered bad form. Its effect depends on the current state of the vsip_csl namespace, which can cause subtle bugs. This is definitely forbidden in library headers. We should avoid it in the SSAR to set a good example (and potentially to save ourselves debugging time later). > + > +template > +void > +save_view( > + char* filename, > + vsip::const_Matrix, Block> view) > +{ > + vsip::Matrix > sp_view(view.size(0), view.size(1)); > + > + for (index_type i = 0; i < view.size(0); ++i) > + for (index_type j = 0; j < view.size(1); ++j) > + sp_view.put(i, j, static_cast >(view.get(i, j))); > + > + Save_view<2, complex >::save(filename, sp_view); > +} [2] For saving intermediate views for debugging, this is fine. It would be more general purpose to pass the disk value type as a template parameter. Then you could (almost) replace these three functions with a single function: template void save_view_as( char* filename, ViewT view) { typedef typename View_of_dim >::type view_type; view_type disk_view = impl::clone_view(view); disk_view = view_cast(view); Save_view::save(filename, disk_view); } I say "almost" because we don't have have view_cast in a convenient place yet (it is currently in apps/sarsim and called cast_view). I'll fix that! However, eventually we need to set things up so that no memory allocations are necessary during "steady state" operation. All memory allocations that are necessary should done when constructing a Kernel1 object. A simple way to do this is to pre-allocate views for staging data for load/store that have the right precision for the file on disk. In the case where we're processing double, but the file on disk is float, this does exactly what we want, with no overhead. However, if the file on disk is float, and we're processing float, this creates unnecessary overhead for the storage and unnecessary copy. template class Save_view_as { Save_view_as(Domain const& dom) ... void operator()( char* filename, ViewT view) { io_view_ = view; save_file(filename, io_view_); } View_of_dim io_view_; }; // specialization for case where IoT and ViewValueT are the // same type and no intermediate view is required. template class Save_view_as { Save_view_as(Domain const&) ... void operator()( char* filename, ViewT view) { save_file(filename, view); } } > +vsip::Matrix > > +load_view( > + char* filename, > + vsip::Domain<2> const& dom) > +{ > + vsip::Matrix > sp_view(dom[0].size(), dom[1].size()); > + sp_view = Load_view<2, complex >(filename, dom).view(); > + > + vsip::Matrix > view(dom[0].size(), dom[1].size()); > + > + for (index_type i = 0; i < dom[0].size(); ++i) > + for (index_type j = 0; j < dom[1].size(); ++j) > + view.put(i, j, static_cast >(sp_view.get(i, j))); > + > + return view; > +} [3] Similar comments as for save_view. Also, it would be more efficient to return the result by-reference in a view passed as an argument. template void load_view( char* filename, ViewT view) Also, vsip_csl/load_view.hpp now has a load_view function with this signature. This wasn't there for the 1.2 release. Finally, in its current form as a non-template function, this should be in a .cpp files. If we use load_save.hpp in multiple compilation units, we would get object defined multiple times errors. Changing to template function "avoids" this (however, that by itself should not be a sufficient reason to convert to a template function). > Index: apps/ssar/diffview.cpp > =================================================================== > + data_format_type format = COMPLEX_VIEW; > + if (argc == 6) > + { [4] For orthogonality, why not also accept "-c" to set format = COMPLEX_VIEW? > + if (0 == strncmp("-r", argv[1], 2)) > + format = REAL_VIEW; > + else if (0 == strncmp("-n", argv[1], 2)) > + format = INTEGER_VIEW; > + argv++; > + } > + > + compare(format, argv[1], argv[2], atoi(argv[3]), atoi(argv[4])); > + } > + > + return 0; > +} > + > + > +void > +compare(data_format_type format, > + char* infile, char* ref, length_type rows, length_type cols) > +{ > + if (format == REAL_VIEW) > + { > + typedef Matrix matrix_type; > + Domain<2> dom(rows, cols); > + > + matrix_type in(rows, cols); > + in = Load_view<2, scalar_f>(infile, dom).view(); > + > + matrix_type refv(rows, cols); > + refv = Load_view<2, scalar_f>(ref, dom).view(); > + > + cout << error_db(in, refv) << endl; > + } > + else if (format == INTEGER_VIEW) > + { > + typedef Matrix matrix_type; > + Domain<2> dom(rows, cols); > + > + matrix_type in(rows, cols); > + in = Load_view<2, scalar_i>(infile, dom).view(); > + > + matrix_type refv(rows, cols); > + refv = Load_view<2, scalar_i>(ref, dom).view(); > + > + cout << error_db(in, refv) << endl; > + } > + else // Using complex views. > + { > + typedef Matrix matrix_type; > + Domain<2> dom(rows, cols); > + > + matrix_type in(rows, cols); > + in = Load_view<2, cscalar_f>(infile, dom).view(); > + > + matrix_type refv(rows, cols); > + refv = Load_view<2, cscalar_f>(ref, dom).view(); > + > + cout << error_db(in, refv) << endl; > + } > +} You can cut down on duplicated code by making compare() a template function: template void compare(char* infile, char* ref, length_type rows, length_type cols) { typedef Matrix matrix_type; Domain<2> dom(rows, cols); matrix_type in(rows, cols); in = Load_view<2, T>(infile, dom).view(); matrix_type refv(rows, cols); refv = Load_view<2, T>(ref, dom).view(); cout << error_db(in, refv) << endl; } Then from main you can call it like so: if (format == REAL_VIEW) compare(argv[1], argv[2], atoi(argv[3]), atoi(argv[4])); else if (format == INTEGER_VIEW) compare(argv[1], argv[2], atoi(argv[3]), atoi(argv[4])); else compare >(argv[1], argv[2], atoi(argv[3]), atoi(argv[4])); > + > Index: apps/ssar/kernel1.hpp > =================================================================== > +// Files required to be in the data directory: > +#define SAR_DIMENSIONS "dims.txt" > +#define RAW_SAR_DATA "sar.view" > +#define FAST_TIME_FILTER "ftfilt.view" > +#define SLOW_TIME_WAVENUMBER "k.view" > +#define SLOW_TIME_COMPRESSED_APERTURE_POSITION "uc.view" > +#define SLOW_TIME_APERTURE_POSITION "u.view" > +#define SLOW_TIME_SPATIAL_FREQUENCY "ku.view" I agree with Stefan's comments here. In C++ it is good practice to use const variables instead of macros in cases like this. Eventually these could be 'char*' variables inside of main, so they can be set from the command line options. > + > + > +class Kernel1 Now is probably a good time to make Kernel1 a template class, with 'T' as a template parameter. That will make it easier to experiment with converting the precision back to float. > +{ > +public: > + typedef double T; > + typedef Matrix > complex_matrix_type; > + typedef Vector > complex_vector_type; > + typedef Matrix real_matrix_type; > + typedef Vector real_vector_type; > + typedef Fftm, complex, col> col_fftm_type; > + typedef Fftm, complex, row> row_fftm_type; > + typedef Fftm, complex, row, fft_inv> ifftm_type; > + > + Kernel1(length_type scale, length_type n, length_type mc, length_type m); > + ~Kernel1() {} > + > + void process_image(); > + > +private: > + void > + fast_time_filtering(complex_matrix_type s_raw, > + complex_vector_type fast_time_filter); [5] Does this function exist? > + > + void > + digital_spotlighting(complex_matrix_type s_filt, > + real_vector_type k, real_vector_type uc, real_vector_type u ); > + > + real_matrix_type > + interpolation(complex_matrix_type fs_spotlit, real_vector_type k, > + real_vector_type ku0); [6] return result by-reference in parameter. > + > + complex_matrix_type > + fft_shift(complex_matrix_type in); > + > + real_vector_type > + fft_shift(real_vector_type in); [7] First, fft_shift would be useful for other matlab conversion projects. Instead of being a member of Kernel1, they would be more useful as free functions. Why don't you put them into vsip_csl in a matlab_utils.hpp file. Second, it would be better to define these as template functions for several reasons: a) It is not guarenteed that they will always be called with a real_vector_type or a complex_matrix_type. For example, because it is implemented defined, there is no guarentee what block type a by-value Fftm object will return. If it returned a Matrix > then initializing fft_shift's arguments would require a temporary and a copy to initialize it. Similarly, once you start optimizing this to process data a row at time, you'll want to apply fft_shift to subviews, which also have implementation defined block type. b) There's no reason to limit fft_shift to just complex matrices and real vectors, esp. if we want to reuse it in the future. Finally, it would be more efficient to return the result by-reference into an argument. I.e. fft_shift(in, out); instead of out = fft_shift(in); Because returning the result by-value requires a temporary and extra copy. Since you currently use fft_shift for out-of-place shifts, I would recommend an interface like that of signal-processing objects such as Fft: template Vector fft_shift( const_Vector in, Vector out) { ... } Where the return value is the 'out' parameter for convenience. Later, you might find an in-place version useful too. > + > +private: > + length_type scale_; > + length_type n_; > + length_type mc_; > + length_type m_; > + length_type nx_; > + length_type interp_sidelobes_; > + T range_factor_; > + T aspect_ratio_; > + T L_; > + T Y0_; > + T X0_; > + T Xc_; > + > + complex_matrix_type s_raw_; > + complex_vector_type fast_time_filter_; > + > + real_vector_type slow_time_wavenumber_; > + real_vector_type slow_time_compressed_aperture_position_; > + real_vector_type slow_time_aperture_position_; > + real_vector_type slow_time_spatial_frequency_; > + complex_matrix_type s_filt_; > + complex_matrix_type fs_spotlit_; > + real_vector_type ks_; > + real_vector_type ucs_; > + complex_matrix_type s_compr_; > + complex_matrix_type fs_; > + complex_matrix_type fs_padded_; > + complex_matrix_type s_padded_; > + real_vector_type us_; > + complex_matrix_type s_decompr_; > + real_matrix_type ku_; > + real_matrix_type k1_; > + real_matrix_type kx0_; > + real_matrix_type kx_; > + complex_matrix_type fs_ref_; > + complex_matrix_type fsm_; > + Vector icKX_; > + > + col_fftm_type col_fftm; > + row_fftm_type row_fftm; > + row_fftm_type row_fftm2; > + ifftm_type ifftm; [8] don't forget '_' suffix for these member variables. > +}; > +Kernel1::Kernel1(length_type scale, length_type n, length_type mc, > + length_type m) > +void > +Kernel1::process_image() > +void > +Kernel1::digital_spotlighting(complex_matrix_type s_filt, > + real_vector_type k, real_vector_type uc, real_vector_type u ) > +Kernel1::real_matrix_type > +Kernel1::interpolation(complex_matrix_type fs_spotlit, real_vector_type k, > + real_vector_type ku0) At the moment, these are all non-inline, non-template functions. These shouldn't be in a header file, they might end up in multiple object files, leading to link errors. Making Kernel1 a tempalate class circumvents this problem. > Index: apps/ssar/viewtoraw.cpp > =================================================================== > +void > +convert_to_greyscale(data_format_type format, > + char* infile, char* outfile, length_type rows, length_type cols) > +{ > + typedef Matrix matrix_type; > + Domain<2> dom(rows, cols); > + > + matrix_type in(rows, cols); > + > + if (format == COMPLEX_MAG) > + in = mag(Load_view<2, cscalar_f>(infile, dom).view()); > + else if (format == COMPLEX_REAL) > + in = real(Load_view<2, cscalar_f>(infile, dom).view()); > + else if (format == COMPLEX_IMAG) > + in = imag(Load_view<2, cscalar_f>(infile, dom).view()); > + else if (format == SCALAR_FLOAT) > + in = Load_view<2, scalar_f>(infile, dom).view(); > + else if (format == SCALAR_INTEGER) > + in = Load_view<2, scalar_i>(infile, dom).view(); > + else > + cerr << "Error: format type " << format << " not supported." << endl; > + > + > + Index<2> idx; > + scalar_f minv = minval(in, idx); > + scalar_f maxv = maxval(in, idx); > + scalar_f scale = (maxv - minv ? maxv - minv : 1.f); > + > + Matrix outf(rows, cols); > + outf = (in - minv) * 255.f / scale; > + > + Matrix out(rows, cols); > + for (index_type i = 0; i < rows; ++i) > + for (index_type j = 0; j < cols; ++j) > + out.put(i, j, static_cast(outf.get(i, j))); [9] If we had view_cast in vsip or vsip_csl (currently it is part of sarsim), we could write a single line: out = view_cast((in - minv) * 255.f / scale); I'll move that somethime this week. > + > + save_view(outfile, out); > + > + // The min and max values are displayed to reveal the scale > + cout << infile << " [" << rows << " x " << cols << "] : " > + << "min " << minv << ", max " << maxv << endl; > +} > + > Index: apps/ssar/ssar.cpp > =================================================================== > +void > +process_ssar_options(int argc, char** argv, ssar_options& options) > +{ > + if (argc != 2) > + { > + cerr << "Usage: " << argv[0] << " " << endl; > + exit(-1); > + } > + > + if (chdir(argv[1]) < 0) > + { > + perror(argv[1]); > + exit(-1); > + } [10] I'm probably just being cranky, but I think it would be better to manually prepend the directory path to the filename, or to pass the filenames in as command line arguments. Those would make it easier to put the output files in another directory (which makes it slightly easier to clean up) and give us flexibility in the future. But, if it works, it works! I don't see a compelling reason to change this. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Tue Oct 31 00:23:49 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 30 Oct 2006 17:23:49 -0700 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <454659E2.3090902@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> Message-ID: <45469795.8020207@codesourcery.com> Jules Bergmann wrote: > This looks good. I have several comments below, plus some general > comments. > > Since this code isn't going into the core library, and since this is > going to be in a flux as we optimize, let's do the following: > > - address the easy comments: > - Definitely 1, 5, 8 > - Perhaps 4, 6, 7 > - Later: 2, 3, 9. > I did 1, 5 and 8. Also, I converted it to a template class (back to a template class, that is) and eliminated the passing of member views. Ok to check in? > - Avoid returning views by value (both for builtin operations, like > Fftm, and for user defined functions, like fft_shift, load_view, > and ...). > > Do you think the by-value notation is easier to read? If so, let > me know. I have a partially finished patch for return-block > optimization that can make the by-value forms as efficient as > by-reference. However, this would be for builtin operations only, > not user defined ones. > I blithely followed the matlab code here. It is easier just in the sense that it can be combined into expressions more easily, but passing (in, out) and returning 'out' will be almost as good. > - Continue to move intermediate views out of Kernel1 member functions > and replace them with Kernel1 member variables. > I believe there are no more views to handle this way, specifically because the number of columns in the final image is not known until the interpolation phase (member nx_). I have an idea for how to fix this: If the bits to compute nx_ can be factored out, putting them into a separate class (sort of a SAR imaging pre-processor), it would allow the remaining views to be member variables, in addition to the last two inverse FFT's. I'll let you know if I run into a problem doing this. But if you know a better way... > - To avoid confusion, I think it would be better to have Kernel1 > member functions work directly on member variables, instead of > passing them as arguments. > > For example, digital_spotlighting should just use s_filt_ instead > of having it passed in as a parameter. > I'd debated this. I think I've changed my mind (and now agree with your suggestion). Thanks for the comments! -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar3.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssar3.diff URL: From jules at codesourcery.com Tue Oct 31 12:56:52 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 31 Oct 2006 07:56:52 -0500 Subject: [vsipl++] [patch] Scalable SAR benchmark In-Reply-To: <45469795.8020207@codesourcery.com> References: <4545D091.1040307@codesourcery.com> <454659E2.3090902@codesourcery.com> <45469795.8020207@codesourcery.com> Message-ID: <45474814.1040503@codesourcery.com> > I did 1, 5 and 8. Also, I converted it to a template class (back to a > template class, that is) and eliminated the passing of member views. > > Ok to check in? Yes, please do! thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Tue Oct 31 16:30:41 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 31 Oct 2006 11:30:41 -0500 Subject: [patch] PAS binary-package Message-ID: <45477A31.9070108@codesourcery.com> Changes for building a PAS for Linux binary-package. This binary package automates testing that source packages will build with PAS for MCOE. Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pas-bin-pkg.diff URL: From jules at codesourcery.com Tue Oct 31 16:53:31 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 31 Oct 2006 11:53:31 -0500 Subject: [patch] view_cast function for type conversions; rename is_nan; misc Message-ID: <45477F8B.30604@codesourcery.com> This patch adds a view_cast function to perform type conversions on views. For example, to convert a floating point matrix into a char matrix (suitable for grayscale display), you could: Matrix data(rows, cols); Matrix img (rows, cols); float minv = minval(data, idx); float maxv = maxval(data, idx); float scale = 255.f / (maxv - minv); img = view_cast((data - minv) * scale); It also renames the isnan functions to is_nan since isnan from math.h/cmath will typically be a macro. The previous version worked with GCC, whose cmath captures the isnan macro into a function. However it was broken with GreenHills. The new version works with both compilers. Finally, it includes some misc fixes. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc.diff URL: From assem at codesourcery.com Tue Oct 31 21:25:31 2006 From: assem at codesourcery.com (Assem Salama) Date: Tue, 31 Oct 2006 16:25:31 -0500 Subject: QR Solver Message-ID: <4547BF4B.8050607@codesourcery.com> Everyone, This patch implements the QR backend using Cvsipl. Thanks, Assem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: svn.diff.10312006.1.log URL: