[vsipl++] Lu Solver

Mon Oct 9 19:51:49 UTC 2006

Assem Salama wrote:
 > Everyone,
 >  This is the new lu solver that uses the cvsipl backend.

Assem,

Thanks, this looks good overall.  I have several comments on
specifics below.

Did you run the test suite against this yet?

				-- Jules

 > ------------------------------------------------------------------------
 >
 > Index: ref_impl/vsipl/solver_lu.hpp

First, I would like to use another name besides 'ref_impl' because
that implies the directory files are reference-implementation only.

Second, I would like to use the subdirectory name 'cvsip' instead of
'vsipl' to avoid confusion between VSIPL (the C API) and VSIPL++.
Also, we should use the name 'cvsip' instead of 'cvsipl' for
consistency.  We use the directory and namespace names 'vsip'.
C-VSIPL uses 'vsip_' as a prefix, etc.  If we use the name 'cvsipl' it
will be a source of confusion.  Please make sure all your uses in code
of the name vsip/csvip (i.e. especially in namespaces, class names,
and function names, but also in variable names, etc) avoid the 'l'.
Using the names "VSIPL" "C-VSIPL", etc is OK in comments.

 > ===================================================================
 > --- ref_impl/vsipl/solver_lu.hpp	(revision 0)
 > +++ ref_impl/vsipl/solver_lu.hpp	(revision 0)
 > @@ -0,0 +1,230 @@
 > +/* Copyright (c) 2005, 2006 by CodeSourcery, LLC.  All rights 
reserved. */

Update copyright, it should be 2006 and it should be "CodeSourcery" instead
of "CodeSourcery, LLC".

 > +
 > +/** @file    vsip/impl/lapack/solver_lu.hpp

[1] Update subdirectory name

 > +    @author  Assem Salama
 > +    @date    2006-04-13

[2] Update the date.

 > +    @brief   VSIPL++ Library: LU linear system solver using lapack.

[3] using cvsipl.

 > +
 > +*/
 > +
 > +#ifndef VSIP_REF_IMPL_SOLVER_LU_HPP
 > +#define VSIP_REF_IMPL_SOLVER_LU_HPP

[4] The ifdef guard should include the path.  If we were going to
keep this file in 'ref_impl/vsipl' the guard should be:

#ifndef VSIP_REF_IMPL_VSIPL_SOLVER_LU_HPP

 > +
 > +/***********************************************************************
 > +  Included Files
 > +***********************************************************************/
 > +
 > +#include <algorithm>
 > +
 > +#include <vsip/support.hpp>
 > +#include <vsip/matrix.hpp>
 > +#include <vsip/impl/math-enum.hpp>
 > +#include <vsip/impl/lapack.hpp>
 > +#include <vsip/impl/temp_buffer.hpp>
 > +#include <vsip/impl/solver_common.hpp>
 > +
 > +#include <vsip/ref_impl/vsipl/cvsipl_matrix.hpp>
 > +#include <vsip/ref_impl/vsipl/cvsipl_lu.hpp>
 > +
 > +
 > +
 > +/***********************************************************************
 > +  Declarations
 > +***********************************************************************/
 > +
 > +namespace vsip
 > +{
 > +
 > +namespace impl
 > +{
 > +
 > +/// LU factorization implementation class.  Common functionality
 > +/// for lud by-value and by-reference classes.
 > +
 > +template <typename T>
 > +class Lud_impl<T, Ref_impl_tag>
 > +  : Compile_time_assert<blas::Blas_traits<T>::valid>

[5] We need a Cvsip_traits equivalent of Blas_traits to determine of
C-VSIPL supports a value type.

 > +{
 > +  // BLAS/LAPACK require complex data to be in interleaved format.
 > +  typedef Layout<2, col2_type, Stride_unit_dense, Cmplx_inter_fmt> 
data_LP;
 > +  typedef Fast_block<2, T, data_LP> data_block_type;

[6] C-VSIPL supports both split and interleaved complex.  We should
take advantage of that.

If the split/interleave type used by C-VSIPL for solve and decompose
has to be consistent, we should determine split/interleave based on
the default for Dense.

If not, let's just pass split/interleave directly through.

[7] Although its not mentioned in the comment, BLAS/LAPACK also
requires data to be column-major.  That shouldn't be necessary for
C-VSIPL.  If possible, we should pass both row-major and column-major
data directly to C-VSIPL and let it sort it out.

 > +
 > +  // Constructors, copies, assignments, and destructors.
 > +public:
 > +  Lud_impl(length_type)
 > +    VSIP_THROW((std::bad_alloc));
 > +  Lud_impl(Lud_impl const&)
 > +    VSIP_THROW((std::bad_alloc));
 > +
 > +  Lud_impl& operator=(Lud_impl const&) VSIP_NOTHROW;
 > +  ~Lud_impl() VSIP_NOTHROW;
 > +
 > +  // Accessors.
 > +public:
 > +  length_type length()const VSIP_NOTHROW { return length_; }
 > +
 > +  // Solve systems.
 > +public:
 > +  template <typename Block>
 > +  bool decompose(Matrix<T, Block>) VSIP_NOTHROW;
 > +
 > +protected:
 > +  template <mat_op_type tr,
 > +	    typename    Block0,
 > +	    typename    Block1>
 > +  bool impl_solve(const_Matrix<T, Block0>, Matrix<T, Block1>)
 > +    VSIP_NOTHROW;
 > +
 > +  // Member data.
 > +private:
 > +  typedef std::vector<int, Aligned_allocator<int> > vector_type;
 > +
 > +  length_type  length_;			// Order of A.
 > +  vector_type  ipiv_;			// Additional info on Q
 > +
 > +  Matrix<T, data_block_type> data_;	// Factorized Cholesky matrix (A)
 > +  vsip::ref_impl::cvsipl::CVSIPL_Matrix<T>           cvsipl_data_;
 > +  vsip::ref_impl::cvsipl::CVSIPL_Lud<T>              cvsipl_lud_;
 > +};
 > +
 > +} // namespace vsip::impl
 > +
 > +
 > +/***********************************************************************
 > +  Definitions
 > +***********************************************************************/
 > +
 > +namespace impl
 > +{
 > +
 > +template <typename T>
 > +Lud_impl<T, Ref_impl_tag>::Lud_impl(
 > +  length_type length
 > +  )
 > +VSIP_THROW((std::bad_alloc))
 > +  : length_      (length),
 > +    ipiv_        (length_),
 > +    data_        (length_, length_),
 > +    cvsipl_data_ (data_.block().impl_data(), length_, length_),
 > +    cvsipl_lud_  (length_)
 > +{
 > +  assert(length_ > 0);
 > +}
 > +
 > +
 > +
 > +template <typename T>
 > +Lud_impl<T, Ref_impl_tag>::Lud_impl(Lud_impl const& lu)
 > +VSIP_THROW((std::bad_alloc))
 > +  : length_      (lu.length_),
 > +    ipiv_        (length_),
 > +    data_        (length_, length_),
 > +    cvsipl_data_ (data_.block().impl_data(), length_, length_),
 > +    cvsipl_lud_  (length_)
 > +{
 > +  data_ = lu.data_;
 > +  for (index_type i=0; i<length_; ++i)
 > +    ipiv_[i] = lu.ipiv_[i];
 > +}
 > +
 > +
 > +
 > +template <typename T>
 > +Lud_impl<T, Ref_impl_tag>::~Lud_impl()
 > +  VSIP_NOTHROW
 > +{
 > +}
 > +
 > +
 > +
 > +/// Form LU factorization of matrix A
 > +///
 > +/// Requires
 > +///   A to be a square matrix, either
 > +///
 > +/// FLOPS:
 > +///   real   : UPDATE
 > +///   complex: UPDATE
 > +
 > +template <typename T>
 > +template <typename Block>
 > +bool
 > +Lud_impl<T, Ref_impl_tag>::decompose(Matrix<T, Block> m)
 > +  VSIP_NOTHROW
 > +{
 > +  assert(m.size(0) == length_ && m.size(1) == length_);
 > +
 > +  assign_local(data_, m);
 > +
 > +  Ext_data<data_block_type> ext(data_.block());

[8] 'ext' isn't being used.

 > +
 > +  bool success = cvsipl_lud_.decompose(cvsipl_data_);
 > +
 > +
 > +  return success;
 > +}
 > +
 > +
 > +
 > +/// Solve Op(A) x = b (where A previously given to decompose)
 > +///
 > +/// Op(A) is
 > +///   A   if tr == mat_ntrans
 > +///   A^T if tr == mat_trans
 > +///   A'  if tr == mat_herm (valid for T complex only)
 > +///
 > +/// Requires
 > +///   B to be a (length, P) matrix
 > +///   X to be a (length, P) matrix
 > +///
 > +/// Effects:
 > +///   X contains solution to Op(A) X = B
 > +
 > +template <typename T>
 > +template <mat_op_type tr,
 > +	  typename    Block0,
 > +	  typename    Block1>
 > +bool
 > +Lud_impl<T, Ref_impl_tag>::impl_solve(
 > +  const_Matrix<T, Block0> b,
 > +  Matrix<T, Block1>       x)
 > +  VSIP_NOTHROW
 > +{
 > +  assert(b.size(0) == length_);
 > +  assert(b.size(0) == x.size(0) && b.size(1) == x.size(1));
 > +
 > +  vsip_mat_op trans;
 > +
 > +  Matrix<T, data_block_type> b_int(b.size(0), b.size(1));
 > +  assign_local(b_int, b);
 > +
 > +  if (tr == mat_ntrans)
 > +    trans = VSIP_MAT_NTRANS;
 > +  else if (tr == mat_trans)
 > +    trans = VSIP_MAT_TRANS;
 > +  else if (tr == mat_herm)
 > +  {
 > +    assert(Is_complex<T>::value);
 > +    trans = VSIP_MAT_HERM;
 > +  }
 > +
 > +  {
 > +    Ext_data<data_block_type> b_ext(b_int.block());
 > +
 > +    vsip::ref_impl::cvsipl::CVSIPL_Matrix<T>
 > +	      cvsipl_b_int(b_ext.data(), b.size(0),b.size(1));
 > +
 > +    cvsipl_lud_.solve(trans,cvsipl_b_int);
 > +
 > +  }
 > +  assign_local(x, b_int);
 > +
 > +  return true;
 > +}
 > +
 > +} // namespace vsip::impl
 > +
 > +} // namespace vsip
 > +
 > +
 > +#endif // VSIP_IMPL_LAPACK_SOLVER_LU_HPP

[9] Update guard name in comment.

 > Index: ref_impl/vsipl/cvsipl_support.hpp

[10] This file looks like it will have the core traits and function
definitions for using the C-VSIPL backend.  Similar to the ipp.hpp,
sal.hpp, and lapack.hpp files we use for those backends, I would
recommend calling it 'impl/vsip/cvsip/cvsip.hpp'

 > ===================================================================
 > --- ref_impl/vsipl/cvsipl_support.hpp	(revision 0)
 > +++ ref_impl/vsipl/cvsipl_support.hpp	(revision 0)

[11] All files should have the library header.

 > @@ -0,0 +1,195 @@
 > +#ifndef CVSIPL_SUPPORT_HPP
 > +#define CVSIPL_SUPPORT_HPP

[12] The guard should include the path VSIP_IMPL_CVSIP_CVSIP_SUPPORT_HPP

 > +
 > +extern "C" {
 > +#include <vsip.h>
 > +}
 > +#include <complex>
 > +
 > +namespace vsip
 > +{
 > +
 > +namespace ref_impl
 > +{

[13] The implementation namespace should always be 'impl', regardless of
whether the code is shared or optimization only.

 > +
 > +namespace cvsipl
 > +{
 > +

[14] Let's add a comment to describe what the class is doing:

// Traits class to define the C-VSIPL view type for a given
// value type T.

 > +  template <typename T>
 > +  struct CVSIPL_mview;

[15] To follow our class name convention, this should be: 'Cvsip_mview'.

 > +
 > +  template<> struct CVSIPL_mview<float>      { typedef vsip_mview_f 
  type; };
 > +  template<> struct CVSIPL_mview<double>     { typedef vsip_mview_d 
  type; };
 > +  template<> struct CVSIPL_mview<std::complex<float> >
 > +    { typedef vsip_cmview_f type; };
 > +  template<> struct CVSIPL_mview<std::complex<double> >
 > +    { typedef vsip_cmview_d type; };
 > +

[16] Add comment to this trait too

 > +  template <typename T>
 > +  struct CVSIPL_block;
 > +
 > +  template<> struct CVSIPL_block<float>      { typedef vsip_block_f 
  type; };
 > +  template<> struct CVSIPL_block<double>     { typedef vsip_block_d 
  type; };
 > +  template<> struct CVSIPL_block<std::complex<float> >
 > +    { typedef vsip_cblock_f type; };
 > +  template<> struct CVSIPL_block<std::complex<double> >
 > +    { typedef vsip_cblock_d type; };
 > +
 > +
 > +  template <typename T>
 > +  struct CVSIPL_Lud_object;
 > +
 > +  template <> struct CVSIPL_Lud_object<float>  { typedef vsip_lu_f 
type; };
 > +  template <> struct CVSIPL_Lud_object<double> { typedef vsip_lu_d 
type; };
 > +  template <> struct CVSIPL_Lud_object<std::complex<float> >
 > +    { typedef vsip_clu_f type; };
 > +  template <> struct CVSIPL_Lud_object<std::complex<double> >
 > +    { typedef vsip_clu_d type; };

[17] First, the 'Cvsip_mview', 'Cvsip_block', and 'Cvsip_lud_object'
classes above all look good.

However, they represent one approach to creating traits: one trait per
class.

Another approach is multiple traits per class.

Here such a class might look like:

	template <typename T>
	struct Cvsip_traits;

	template <>
	struct Cvsip_traits<float>
	{
	  typedef vsip_mview_f mview_type;
	  typedef vsip_block_f block_type;
           typedef vsip_lu_f    lu_solver_type;
	  ...
	};

The general tradeoffs are:
  - One trait per class gives you finer grain control, while multiple
    traits per class forces you to define all traits even if only one
    trait is unique.
  - One trait per class is more verbose to define.

In this particular usage, the first tradeoff doesn't by the one-trait
-per-class approach much because all the traits need to be uniquely
defined for each value type (i.e. C-VSIPL doesn't share the same types
between float and double data structures).

The approach you've taken is fine, but since there will be more traits
to add, I would consider changing over to a multiple-traits per class
approach.

 > +
 > +
 > +#define CVSIPL_BLOCKBIND(BT, T, ST, VF) \
 > +inline BT *vsip_blockbind(T *data, vsip_length N, vsip_memory_hint 
hint) \
 > +{ \
 > +  return VF((ST*)data, N, hint); \
 > +}

[18] I would remove the 'vsip_' prefix for these function names.  You
don't have to worry about name conflicts since the functions are
already part of the vsip::impl::cvsip namespace.  It just makes using
them more verbose than necessary.

If you want to maintain verbosity, you can use the 'vsip::impl' namespace
but not the 'vsip::impl::csvip' namespace.  Then refer to them as

	cvsip::blockbind(...)

[19] Macro names in the library need to start with VSIP_IMPL_ to avoid
conflicts with user code.  I.e. this should be VSIP_IMPL_..._BLOCKBIND

...

 > +
 > +CVSIPL_LUSOL(vsip_lu_f,  vsip_mview_f,  vsip_lusol_f)
 > +CVSIPL_LUSOL(vsip_lu_d,  vsip_mview_d,  vsip_lusol_d)
 > +CVSIPL_LUSOL(vsip_clu_f, vsip_cmview_f, vsip_clusol_f)
 > +CVSIPL_LUSOL(vsip_clu_d, vsip_cmview_d, vsip_clusol_d)

[20] If you're done with these macros, it is a good idea to undefine them.

#undef VISP_IMPL_...LUSOL
etc.

 > +
 > +}  // namespace cvsipl
 > +
 > +}  // namespace ref_impl
 > +
 > +}  // namespace vsip
 > +
 > +#endif // CVSIPL_SUPPORT_HPP
 > Index: ref_impl/vsipl/cvsipl_lu.hpp
 > ===================================================================
 > --- ref_impl/vsipl/cvsipl_lu.hpp	(revision 0)
 > +++ ref_impl/vsipl/cvsipl_lu.hpp	(revision 0)
 > @@ -0,0 +1,72 @@
 > +#ifndef CVSIPL_LU_HPP
 > +#define CVSIPL_LU_HPP
 > +
 > +#include <vsip/ref_impl/vsipl/cvsipl_support.hpp>
 > +#include <vsip/ref_impl/vsipl/cvsipl_matrix.hpp>
 > +
 > +namespace vsip
 > +{
 > +
 > +namespace ref_impl
 > +{
 > +
 > +namespace cvsipl
 > +{
 > +
 > +template <typename T>
 > +class CVSIPL_Lud;
 > +
 > +template <typename T>
 > +class CVSIPL_Lud

[21] This should be 'Non_copyable'.  If a copy was made, 
vsip_lud_destroy(lu_)
would get called twice.

 > +{
 > +  typedef typename CVSIPL_Lud_object<T>::type     lud_object_type;
 > +
 > +  public:
 > +    CVSIPL_Lud(int n);
 > +    ~CVSIPL_Lud();
 > +
 > +    int decompose(CVSIPL_Matrix<T> &a);
 > +    int solve(vsip_mat_op op, CVSIPL_Matrix<T> &xb);
 > +
 > +  private:
 > +    lud_object_type       *lu_;
 > +};
 > +
 > +template <typename T>
 > +CVSIPL_Lud<T>::CVSIPL_Lud(int n)
 > +{
 > +  vsip_lud_create(n, &lu_);
 > +}
 > +
 > +template <typename T>
 > +CVSIPL_Lud<T>::~CVSIPL_Lud()
 > +{
 > +  vsip_lud_destroy(lu_);
 > +}
 > +
 > +template <typename T>
 > +int CVSIPL_Lud<T>::decompose(CVSIPL_Matrix<T> &a)
 > +{
 > +  a.admit();

[22] Here's a case where you want to admit with update true:
This should be:

	a.admit(true);

(Assuming you add an update flag to Cvsip_matrix, as suggested below).

 > +  int ret = vsip_lud(lu_, a.get_view());
 > +  a.release();

[23] If vsip_lud did not modify 'a', this would also be a case where the
update flag should also be true.  Since you don't know what the user
will do with 'a' next, it would be bad form to scramble the values.

But, vsip_lud is allowed to modify 'a' and then later uses those
values while solving.  This make me doubt whether it is correct to
immediately release 'a' at this point.

Can you check the C-VSIPL spec on this?

 > +  return ret;
 > +}
 > +
 > +template <typename T>
 > +int CVSIPL_Lud<T>::solve(vsip_mat_op op, CVSIPL_Matrix<T> &xb)
 > +{

[24] Here update should be true for admit and release.

 > +  xb.admit();
 > +  int ret = vsip_lusol(lu_, op, xb.get_view());
 > +  xb.release();
 > +  return ret;
 > +}
 > +
 > +
 > +} // namespace cvsipl
 > +
 > +} // namespace ref_impl
 > +
 > +} // namespace vsip
 > +
 > +#endif // CVSIPL_LU_HPP
 > Index: ref_impl/vsipl/cvsipl_matrix.hpp
 > ===================================================================
 > --- ref_impl/vsipl/cvsipl_matrix.hpp	(revision 0)
 > +++ ref_impl/vsipl/cvsipl_matrix.hpp	(revision 0)
 > @@ -0,0 +1,81 @@
 > +#ifndef CVSIPL_MATRIX_HPP
 > +#define CVSIPL_MATRIX_HPP
 > +
 > +#include <vsip/ref_impl/vsipl/cvsipl_support.hpp>
 > +
 > +namespace vsip
 > +{
 > +
 > +namespace ref_impl
 > +{
 > +
 > +namespace cvsipl
 > +{
 > +
 > +template <typename T>
 > +class CVSIPL_Matrix;
 > +
 > +template <typename T>
 > +class CVSIPL_Matrix

[25] Should be Non_copyable.

 > +{
 > +  typedef typename CVSIPL_mview<T>::type       mview_type;
 > +  typedef typename CVSIPL_block<T>::type       block_type;
 > +
 > +  public:
 > +    CVSIPL_Matrix<T>(T *block, int m, int n);
 > +    CVSIPL_Matrix<T>(int m, int n);
 > +    ~CVSIPL_Matrix<T>();
 > +
 > +    mview_type *get_view() { return mview_; }
 > +    void admit() { vsip_blockadmit(mblock_, false); }
 > +    void release() { vsip_blockrelease(mblock_,false); }

[26] Always setting the update flags to false is most definitely
wrong.  If you don't care about what values you pass to C-VSIPL,
and you don't care about what values you get back, why bother
with the computation?

Always setting the update flags to true would be correct, but it would
cause unnecessary data copies in some situations.

You should pass update as an argument, with a default value of true.

 > +
 > +  private:
 > +    mview_type         *mview_;
 > +    block_type         *mblock_;
 > +    bool               local_data_;
 > +
 > +
 > +};
 > +
 > +
 > +template <typename T>
 > +CVSIPL_Matrix<T>::CVSIPL_Matrix(T *block, int m, int n)
 > +{
 > +  // block is allocated, just bind to it.
 > +  mblock_ = vsip_blockbind(block, m*n, VSIP_MEM_NONE);
 > +
 > +  // block must be dense
 > +  mview_ = vsip_mbind(mblock_, 0, 1, n, n, m);
 > +
 > +  local_data_ = false;
 > +}
 > +
 > +template <typename T>
 > +CVSIPL_Matrix<T>::CVSIPL_Matrix(int m, int n)
 > +{

[27] How/where is dimension-ordering handled?  The VSIPL++ LU solver object
creates a Cvsip_matrix for column-major VSIPL++ matrices.  Is Cvsip_matrix
implicitly column-major?

It would be better to pass dimensionality to Cvsip_matrix explicitly,
probably as a template parameter.

 > +  // create block
 > +  vsip_blockcreate(m*n, VSIP_MEM_NONE, &mblock_);
 > +
 > +  // block must be dense
 > +  mview_ = vsip_mbind(mblock_, 0, 1, n, n, m);
 > +
 > +  local_data_ = true;
 > +}
 > +
 > +template <typename T>
 > +CVSIPL_Matrix<T>::~CVSIPL_Matrix()
 > +{
 > +  // destroy everything!
 > +  if(local_data_) vsip_blockdestroy(mblock_);
 > +
 > +  vsip_mdestroy(mview_);
 > +}
 > +
 > +} // namespace cvsipl
 > +
 > +} // namespace ref_impl
 > +
 > +} // namespace vsip
 > +
 > +#endif // CVSIPL_MATRIX_HPP
 > Index: impl/solver-lu.hpp
 > ===================================================================
 > --- impl/solver-lu.hpp	(revision 151073)
 > +++ impl/solver-lu.hpp	(working copy)
 > @@ -28,6 +28,9 @@
 >  #ifdef VSIP_IMPL_HAVE_LAPACK
 >  #  include <vsip/impl/lapack/solver_lu.hpp>
 >  #endif

[28] We need to distinguish between the presence of C-VSIPL backends and
building the library in reference mode.  It's possible to use the
C-VSIPL backend with the optimized library.

[29] This guard should be:

#ifdef VSIP_IMPL_HAVE_CVSIP

 > +#ifdef VSIP_IMPL_HAVE_REF
 > +#  include <vsip/ref_impl/vsipl/solver_lu.hpp>
 > +#endif

 >
 >
 >
 > @@ -62,6 +65,10 @@
 >  template <typename T>
 >  struct Choose_lud_impl
 >  {

[30] This guard should be:

#ifdef VSIP_IMPL_IS_REF_IMPL

 > +#ifdef VSIP_IMPL_HAVE_REF
 > +  typedef Ref_impl_tag use_type;
 > +
 > +#else
 >    typedef typename Choose_solver_impl<
 >      Is_lud_impl_avail,
 >      T,
 > @@ -71,6 +78,8 @@
 >      Type_equal<type, None_type>::value,
 >      As_type<Error_no_solver_for_this_type>,
 >      As_type<type> >::type use_type;
 > +#endif
 > +
 >  };
 >
 >  } // namespace impl

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705