[vsipl++] [patch] double support for SAL LU solver

Sun May 7 01:35:55 UTC 2006

Don McCoy wrote:
> This was tested against C-SAL but without the portions of the tests 
> excercising the transpose options (when using the "old" functions).

Don,

This looks good.  Can you:

  - Move the reciprocal call from sal_matfbs to sal_matlud.  That way
    if multiple sal_matfbs calls are made (either because B/X have
    multiple columns, or because the LU object is used multiple times),
    vrecip will only get called once.

  - Create a typedef for the block_type of recip_.  That way the Ext_data
    for recip_ is guaranteed to have the correct block type if recip_
    ever changes.

  - a few more comments sprinkled below.

If these comments make sense, once you address them this looks good to 
check in.

How do we test this?  By manually disabling the the mat_trans and 
mat_herm cases?

				thanks,
				-- Jules

> + 
> + 
> + 
> + // "Legacy" SAL functions - The single-precision versions are listed
> + // in the Appendix of the SAL Reference manual.  Although the double-
> + // precision ones are still part of the normal API, we refer to both 
> + // sets of functions as legacy functions just for ease of naming.
> + 
> + // Legacy SAL LUD decomposition functions
> + #define VSIP_IMPL_SAL_LUD_DEC( T, SAL_T, SALFCN ) \
> + inline void                  \
> + sal_matlud(                  \
> +   T *c,                      \
> +   int *d, int n)             \
> + {                            \
> +   SALFCN((SAL_T*) c, d, n);  \

If you pass recip_ in, you can perform the reciprocal one time here.

> + }

> --- 285,308 ----
>   
>   protected:
>     template <mat_op_type tr,
> !             typename    Block0,
> !             typename    Block1>
>     bool impl_solve(const_Matrix<T, Block0>, Matrix<T, Block1>)
>       VSIP_NOTHROW;
>   
> +   length_type max_decompose_size();
> + 
>     // Member data.
>   private:
>     typedef std::vector<int, Aligned_allocator<int> > vector_type;
>   
> !   length_type  length_;                 // Order of A.
> !   vector_type  ipiv_;                   // Pivot table for Q. This gets
>                                           // generated from the decompose and
> !                                         // gets used in the solve
> !   Vector<T> recip_;                     // Vector of reciprocals used
> !                                         // with legacy solvers

Use a typedef for recip_'s block type.

> !   Matrix<T, data_block_type> data_;     // Factorized matrix (A)
>   };
>   
>   
> *************** Lud_impl<T,Mercury_sal_tag>::Lud_impl(
> *** 191,196 ****
> --- 320,326 ----
>   VSIP_THROW((std::bad_alloc))
>     : length_ (length),
>       ipiv_   (length_),
> +     recip_  (length_),
>       data_   (length_, length_)
>   {
>     assert(length_ > 0);
> *************** Lud_impl<T,Mercury_sal_tag>::Lud_impl(Lu
> *** 203,213 ****
> --- 333,347 ----
>   VSIP_THROW((std::bad_alloc))
>     : length_ (lu.length_),
>       ipiv_   (length_),
> +     recip_  (length_),
>       data_   (length_, length_)
>   {
>     data_ = lu.data_;
>     for (index_type i=0; i<length_; ++i)
> +   {
>       ipiv_[i] = lu.ipiv_[i];
> +     recip_.put(i, lu.recip_.get(i));
> +   }

Since recip_ is a vector, you could just say:

	recip_ = lu.recip_;

>   }
>   
>   

>     else 
> --- 464,498 ----
>       }
>       Ext_data<data_block_type>   a_ext((tr == mat_trans)?
>                                           data_int.block():data_.block());
> +     Ext_data<Dense<1, T> >  r_ext(recip_.block());
>   
>       // sal_mat_lud_sol only takes vectors, so, we have to do this for each
>       // column in the matrix
>       ptr_type b_ptr = b_ext.data();
>       ptr_type x_ptr = x_ext.data();
> !     for(index_type i=0;i<b.size(1);i++) 
> !     {
> ! #if VSIP_IMPL_SAL_USE_MAT_LUD
>         sal_mat_lud_sol(a_ext.data(), a_ext.stride(0),
>                         &ipiv_[0],
> !                       storage_type::offset(b_ptr,i*length_),
> !                       storage_type::offset(x_ptr,i*length_),
> !                       length_,trans);
> ! #else
> !       if (x_ext.stride(0) != 1)
> !         VSIP_IMPL_THROW(unimplemented(
> !           "Lud_impl<>::impl_solve - data must be dense (have unit stride)"));

This should either be an assertion, or removed.  x_ext refers to x_int, 
which is declared by the LU object to be column major.  Since we know 
the block is column major, the condition x_ext.stride(0) != 1 would 
indicate a bug in Ext_data (i.e. something impossible happened -> assert 
failure), as opposed to unsupported behavior (user tried to do something 
unsupported -> throw exception).

> !       if (tr == mat_ntrans)
> !         sal_matfbs(a_ext.data(), r_ext.data(), &ipiv_[0],
> !                    storage_type::offset(b_ptr, i*length_),
> !                    storage_type::offset(x_ptr, i*length_),
> !                    length_);
> !       else
> !         VSIP_IMPL_THROW(unimplemented(
> !           "Lud_impl<mat_op_type!=mat_ntrans>::impl_solve - unimplemented"));

Good!  Well, actually bad (SAL doesn't support mat_trans), but throwing 
an exception is the right thing to do.

> ! #endif
>       }
>   
>       assign_local(x, x_int);
>     }
>     else 

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705