[vsipl++] [patch] Scalable SAR benchmark

Mon Oct 30 20:00:34 UTC 2006

Don McCoy wrote:
 > The attached patch adds a new application -- a portion of the third
 > Scalable Synthetic Compact Application (SSCA) Benchmark, SAR Sensor
 > Processing, Knowledge Formation, and File IO.  More information may be
 > found on the HPCS website:
 >
 >    http://www.highproductivity.org/SSCABmks.htm
 >
 > This implements Kernel 1, which produces images from raw radar data.
 > Note that this code follows the Matlab example code obtained through the
 > above site and is not optimized (beyond simple things such as creating
 > all FFT objects and views at initialization time when the dimensions are
 > known at compile time).
 >
 > At present, the makefile depends on having an installed version of
 > VSIPL++ (in the default location, /usr/local).  The install path should
 > be updated along with the package suffix in order to run on different
 > platforms.  Build and run the application using 'make; make check'.  For
 > verification, the computed image is compared against the
 > Matlab-generated image (which is of a regularly spaced grid of corner
 > reflectors).
 >
 > All testing (so far) was performed using the serial-builtin-32
 > configuration, with version 1.2 of VSIPL++.

Don,

This looks good.  I have several comments below, plus some general
comments.

Since this code isn't going into the core library, and since this is
going to be in a flux as we optimize, let's do the following:

  - address the easy comments:
     - Definitely 1, 5, 8
     - Perhaps 4, 6, 7
     - Later: 2, 3, 9.

  - check in code as a baseline,

  - address the remaining comments as you perform optimizations.

Does that sound OK?

Also, I haven't looked at this in detail from a performance
perspective yet.  I suspect a big optimization will be to change from
processing an entire matrix at time to processing a row or column at a
time.  Definitely for the fast time filter, bandwidth expansion, and
the application of fs_ref.  I'm not sure about the interpolation part
though.

				-- Jules

General comments:

  - Avoid returning views by value (both for builtin operations, like
    Fftm, and for user defined functions, like fft_shift, load_view,
    and ...).

    Do you think the by-value notation is easier to read?  If so, let
    me know.  I have a partially finished patch for return-block
    optimization that can make the by-value forms as efficient as
    by-reference.  However, this would be for builtin operations only,
    not user defined ones.

  - Continue to move intermediate views out of Kernel1 member functions
    and replace them with Kernel1 member variables.

  - To avoid confusion, I think it would be better to have Kernel1
    member functions work directly on member variables, instead of
    passing them as arguments.

    For example, digital_spotlighting should just use s_filt_ instead
    of having it passed in as a parameter.

 > ------------------------------------------------------------------------
 >
 > Index: apps/ssar/load_save.hpp
 > ===================================================================
 > --- apps/ssar/load_save.hpp	(revision 0)
 > +++ apps/ssar/load_save.hpp	(revision 0)
 > @@ -0,0 +1,114 @@
 > +/* Copyright (c) 2006 by CodeSourcery.  All rights reserved. */
 > +
 > +/** @file    load_save.hpp
 > +    @author  Don McCoy
 > +    @date    2006-10-26
 > +    @brief   Extensions to allow type double to be used as the view
 > +             data type while using float as the storage type on disk.
 > +*/
 > +
 > +#ifndef LOAD_SAVE_HPP
 > +#define LOAD_SAVE_HPP
 > +
 > +#include <vsip_csl/load_view.hpp>
 > +#include <vsip_csl/save_view.hpp>
 > +
 > +using namespace vsip_csl;

[1] In general, putting 'using namespace' decls in a header is
considered bad form.  Its effect depends on the current state of the
vsip_csl namespace, which can cause subtle bugs.

This is definitely forbidden in library headers.

We should avoid it in the SSAR to set a good example (and potentially
to save ourselves debugging time later).

 > +
 > +template <typename Block>
 > +void
 > +save_view(
 > +  char* filename,
 > +  vsip::const_Matrix<complex<double>, Block> view)
 > +{
 > +  vsip::Matrix<complex<float> > sp_view(view.size(0), view.size(1));
 > +
 > +  for (index_type i = 0; i < view.size(0); ++i)
 > +    for (index_type j = 0; j < view.size(1); ++j)
 > +      sp_view.put(i, j, static_cast<complex<float> >(view.get(i, j)));
 > +
 > +  Save_view<2, complex<float> >::save(filename, sp_view);
 > +}

[2] For saving intermediate views for debugging, this is fine.  It
would be more general purpose to pass the disk value type as a
template parameter.  Then you could (almost) replace these three
functions with a single function:

	template <typename T,
	          typename ViewT>
	void
	save_view_as(
	  char* filename,
	  ViewT view)
	{
	  typedef
	    typename View_of_dim<ViewT::dim, T, Dense<ViewT::dim, T> >::type
	    view_type;

	  view_type disk_view = impl::clone_view<view_type>(view);

	  disk_view = view_cast<T>(view);

	  Save_view<ViewT::dim, T>::save(filename, disk_view);
	}

I say "almost" because we don't have have view_cast in a convenient
place yet (it is currently in apps/sarsim and called cast_view).  I'll
fix that!

However, eventually we need to set things up so that no memory
allocations are necessary during "steady state" operation.  All memory
allocations that are necessary should done when constructing a Kernel1
object.

A simple way to do this is to pre-allocate views for staging data
for load/store that have the right precision for the file on disk.
In the case where we're processing double, but the file on disk is
float, this does exactly what we want, with no overhead.  However,
if the file on disk is float, and we're processing float, this
creates unnecessary overhead for the storage and unnecessary copy.

	template <typename ViewT,
	          typename IoT,
	          typename ViewValueT = typename ViewT::value_type>
	class Save_view_as
	{
	  Save_view_as(Domain<ViewT::dim> const& dom)
	    ...

	  void operator()(
	    char* filename,
	    ViewT view)
	  {
	    io_view_ = view;
	    save_file(filename, io_view_);
	  }

	  View_of_dim<ViewT::dim, IoT> io_view_;
	};

	// specialization for case where IoT and ViewValueT are the
   	// same type and no intermediate view is required.
	template <typename ViewT,
	          typename IoT,
	          typename ViewValueT = typename ViewT::value_type>
	class Save_view_as
	{
	  Save_view_as(Domain<ViewT::dim> const&)
	    ...

	  void operator()(
	    char* filename,
	    ViewT view)
	  {
	    save_file(filename, view);
	  }
	}

 > +vsip::Matrix<complex<double> >
 > +load_view(
 > +  char* filename,
 > +  vsip::Domain<2> const& dom)
 > +{
 > +  vsip::Matrix<complex<float> > sp_view(dom[0].size(), dom[1].size());
 > +  sp_view = Load_view<2, complex<float> >(filename, dom).view();
 > +
 > +  vsip::Matrix<complex<double> > view(dom[0].size(), dom[1].size());
 > +
 > +  for (index_type i = 0; i < dom[0].size(); ++i)
 > +    for (index_type j = 0; j < dom[1].size(); ++j)
 > +      view.put(i, j, static_cast<complex<double> >(sp_view.get(i, j)));
 > +
 > +  return view;
 > +}

[3] Similar comments as for save_view.

Also, it would be more efficient to return the result by-reference
in a view passed as an argument.

	template <typename ViewT>
	void
	load_view(
	  char* filename,
	  ViewT view)

Also, vsip_csl/load_view.hpp now has a load_view function with this
signature.  This wasn't there for the 1.2 release.

Finally, in its current form as a non-template function, this should
be in a .cpp files.  If we use load_save.hpp in multiple compilation
units, we would get object defined multiple times errors.  Changing to
template function "avoids" this (however, that by itself should not
be a sufficient reason to convert to a template function).

 > Index: apps/ssar/diffview.cpp
 > ===================================================================

 > +    data_format_type format = COMPLEX_VIEW;
 > +    if (argc == 6)
 > +    {

[4] For orthogonality, why not also accept "-c" to set format = 
COMPLEX_VIEW?

 > +      if (0 == strncmp("-r", argv[1], 2))
 > +        format = REAL_VIEW;
 > +      else if (0 == strncmp("-n", argv[1], 2))
 > +        format = INTEGER_VIEW;
 > +      argv++;
 > +    }
 > +
 > +    compare(format, argv[1], argv[2], atoi(argv[3]), atoi(argv[4]));
 > +  }
 > +
 > +  return 0;
 > +}
 > +
 > +
 > +void
 > +compare(data_format_type format,
 > +  char* infile, char* ref, length_type rows, length_type cols)
 > +{
 > +  if (format == REAL_VIEW)
 > +  {
 > +    typedef Matrix<scalar_f> matrix_type;
 > +    Domain<2> dom(rows, cols);
 > +
 > +    matrix_type in(rows, cols);
 > +    in = Load_view<2, scalar_f>(infile, dom).view();
 > +
 > +    matrix_type refv(rows, cols);
 > +    refv = Load_view<2, scalar_f>(ref, dom).view();
 > +
 > +    cout << error_db(in, refv) << endl;
 > +  }
 > +  else if (format == INTEGER_VIEW)
 > +  {
 > +    typedef Matrix<scalar_i> matrix_type;
 > +    Domain<2> dom(rows, cols);
 > +
 > +    matrix_type in(rows, cols);
 > +    in = Load_view<2, scalar_i>(infile, dom).view();
 > +
 > +    matrix_type refv(rows, cols);
 > +    refv = Load_view<2, scalar_i>(ref, dom).view();
 > +
 > +    cout << error_db(in, refv) << endl;
 > +  }
 > +  else          // Using complex views.
 > +  {
 > +    typedef Matrix<cscalar_f> matrix_type;
 > +    Domain<2> dom(rows, cols);
 > +
 > +    matrix_type in(rows, cols);
 > +    in = Load_view<2, cscalar_f>(infile, dom).view();
 > +
 > +    matrix_type refv(rows, cols);
 > +    refv = Load_view<2, cscalar_f>(ref, dom).view();
 > +
 > +    cout << error_db(in, refv) << endl;
 > +  }
 > +}

You can cut down on duplicated code by making compare() a template
function:

    template <typename T>
    void
    compare(char* infile, char* ref, length_type rows, length_type cols)
    {
      typedef Matrix<T> matrix_type;
      Domain<2> dom(rows, cols);

      matrix_type in(rows, cols);
      in = Load_view<2, T>(infile, dom).view();

      matrix_type refv(rows, cols);
      refv = Load_view<2, T>(ref, dom).view();

      cout << error_db(in, refv) << endl;
    }

Then from main you can call it like so:

   if (format == REAL_VIEW)
     compare<float>(argv[1], argv[2], atoi(argv[3]), atoi(argv[4]));
   else if (format == INTEGER_VIEW)
     compare<int>(argv[1], argv[2], atoi(argv[3]), atoi(argv[4]));
   else
     compare<complex<float> >(argv[1], argv[2], atoi(argv[3]), 
atoi(argv[4]));

 > +

 > Index: apps/ssar/kernel1.hpp
 > ===================================================================

 > +// Files required to be in the data directory:
 > +#define SAR_DIMENSIONS                          "dims.txt"
 > +#define RAW_SAR_DATA                            "sar.view"
 > +#define FAST_TIME_FILTER                        "ftfilt.view"
 > +#define SLOW_TIME_WAVENUMBER                    "k.view"
 > +#define SLOW_TIME_COMPRESSED_APERTURE_POSITION  "uc.view"
 > +#define SLOW_TIME_APERTURE_POSITION             "u.view"
 > +#define SLOW_TIME_SPATIAL_FREQUENCY             "ku.view"

I agree with Stefan's comments here.  In C++ it is good practice to
use const variables instead of macros in cases like this.  Eventually
these could be 'char*' variables inside of main, so they can be set
from the command line options.

 > +
 > +
 > +class Kernel1

Now is probably a good time to make Kernel1 a template class, with 'T'
as a template parameter.  That will make it easier to experiment with
converting the precision back to float.

 > +{
 > +public:
 > +  typedef double T;
 > +  typedef Matrix<complex<T> > complex_matrix_type;
 > +  typedef Vector<complex<T> > complex_vector_type;
 > +  typedef Matrix<T> real_matrix_type;
 > +  typedef Vector<T> real_vector_type;
 > +  typedef Fftm<complex<T>, complex<T>, col> col_fftm_type;
 > +  typedef Fftm<complex<T>, complex<T>, row> row_fftm_type;
 > +  typedef Fftm<complex<T>, complex<T>, row, fft_inv> ifftm_type;
 > +
 > +  Kernel1(length_type scale, length_type n, length_type mc, 
length_type m);
 > +  ~Kernel1() {}
 > +
 > +  void process_image();
 > +
 > +private:
 > +  void
 > +  fast_time_filtering(complex_matrix_type s_raw,
 > +    complex_vector_type fast_time_filter);

[5] Does this function exist?

 > +
 > +  void
 > +  digital_spotlighting(complex_matrix_type s_filt,
 > +    real_vector_type k, real_vector_type uc, real_vector_type u );
 > +
 > +  real_matrix_type
 > +  interpolation(complex_matrix_type fs_spotlit, real_vector_type k,
 > +    real_vector_type ku0);

[6] return result by-reference in parameter.

 > +
 > +  complex_matrix_type
 > +  fft_shift(complex_matrix_type in);
 > +
 > +  real_vector_type
 > +  fft_shift(real_vector_type in);

[7] First, fft_shift would be useful for other matlab conversion
projects.  Instead of being a member of Kernel1, they would be more
useful as free functions.  Why don't you put them into vsip_csl in a
matlab_utils.hpp file.

Second, it would be better to define these as template functions for
several reasons:

  a) It is not guarenteed that they will always be called with a
     real_vector_type or a complex_matrix_type.  For example, because
     it is implemented defined, there is no guarentee what block type a
     by-value Fftm object will return.  If it returned a Matrix<T,
     Fast_block<...> > then initializing fft_shift's arguments would
     require a temporary and a copy to initialize it.

     Similarly, once you start optimizing this to process data a
     row at time, you'll want to apply fft_shift to subviews, which
     also have implementation defined block type.

  b) There's no reason to limit fft_shift to just complex matrices
     and real vectors, esp. if we want to reuse it in the future.

Finally, it would be more efficient to return the result by-reference
into an argument.  I.e.

	fft_shift(in, out);

instead of

	out = fft_shift(in);

Because returning the result by-value requires a temporary and extra
copy.

Since you currently use fft_shift for out-of-place shifts, I would
recommend an interface like that of signal-processing objects such as
Fft:

	template <typename T,
		  typename Block1,
		  typename Block2>
	Vector<T, Block2>
	fft_shift(
	  const_Vector<T, Block1> in,
	  Vector<T, Block2>       out)
	{
	  ...
	}

Where the return value is the 'out' parameter for convenience.

Later, you might find an in-place version useful too.

 > +
 > +private:
 > +  length_type scale_;
 > +  length_type n_;
 > +  length_type mc_;
 > +  length_type m_;
 > +  length_type nx_;
 > +  length_type interp_sidelobes_;
 > +  T range_factor_;
 > +  T aspect_ratio_;
 > +  T L_;
 > +  T Y0_;
 > +  T X0_;
 > +  T Xc_;
 > +
 > +  complex_matrix_type s_raw_;
 > +  complex_vector_type fast_time_filter_;
 > +
 > +  real_vector_type slow_time_wavenumber_;
 > +  real_vector_type slow_time_compressed_aperture_position_;
 > +  real_vector_type slow_time_aperture_position_;
 > +  real_vector_type slow_time_spatial_frequency_;
 > +  complex_matrix_type s_filt_;
 > +  complex_matrix_type fs_spotlit_;
 > +  real_vector_type ks_;
 > +  real_vector_type ucs_;
 > +  complex_matrix_type s_compr_;
 > +  complex_matrix_type fs_;
 > +  complex_matrix_type fs_padded_;
 > +  complex_matrix_type s_padded_;
 > +  real_vector_type us_;
 > +  complex_matrix_type s_decompr_;
 > +  real_matrix_type ku_;
 > +  real_matrix_type k1_;
 > +  real_matrix_type kx0_;
 > +  real_matrix_type kx_;
 > +  complex_matrix_type fs_ref_;
 > +  complex_matrix_type fsm_;
 > +  Vector<index_type> icKX_;
 > +
 > +  col_fftm_type col_fftm;
 > +  row_fftm_type row_fftm;
 > +  row_fftm_type row_fftm2;
 > +  ifftm_type ifftm;

[8] don't forget '_' suffix for these member variables.

 > +};

 > +Kernel1::Kernel1(length_type scale, length_type n, length_type mc,
 > +  length_type m)

 > +void
 > +Kernel1::process_image()

 > +void
 > +Kernel1::digital_spotlighting(complex_matrix_type s_filt,
 > +  real_vector_type k, real_vector_type uc, real_vector_type u )

 > +Kernel1::real_matrix_type
 > +Kernel1::interpolation(complex_matrix_type fs_spotlit, 
real_vector_type k,
 > +  real_vector_type ku0)

At the moment, these are all non-inline, non-template functions.
These shouldn't be in a header file, they might end up in multiple
object files, leading to link errors.

Making Kernel1 a tempalate class circumvents this problem.

 > Index: apps/ssar/viewtoraw.cpp
 > ===================================================================

 > +void
 > +convert_to_greyscale(data_format_type format,
 > +  char* infile, char* outfile, length_type rows, length_type cols)
 > +{
 > +  typedef Matrix<scalar_f> matrix_type;
 > +  Domain<2> dom(rows, cols);
 > +
 > +  matrix_type in(rows, cols);
 > +
 > +  if (format == COMPLEX_MAG)
 > +    in = mag(Load_view<2, cscalar_f>(infile, dom).view());
 > +  else if (format == COMPLEX_REAL)
 > +    in = real(Load_view<2, cscalar_f>(infile, dom).view());
 > +  else if (format == COMPLEX_IMAG)
 > +    in = imag(Load_view<2, cscalar_f>(infile, dom).view());
 > +  else if (format == SCALAR_FLOAT)
 > +    in = Load_view<2, scalar_f>(infile, dom).view();
 > +  else if (format == SCALAR_INTEGER)
 > +    in = Load_view<2, scalar_i>(infile, dom).view();
 > +  else
 > +    cerr << "Error: format type " << format << " not supported." << 
endl;
 > +
 > +
 > +  Index<2> idx;
 > +  scalar_f minv = minval(in, idx);
 > +  scalar_f maxv = maxval(in, idx);
 > +  scalar_f scale = (maxv - minv ? maxv - minv : 1.f);
 > +
 > +  Matrix<scalar_f> outf(rows, cols);
 > +  outf = (in - minv) * 255.f / scale;
 > +
 > +  Matrix<char> out(rows, cols);
 > +  for (index_type i = 0; i < rows; ++i)
 > +    for (index_type j = 0; j < cols; ++j)
 > +      out.put(i, j, static_cast<char>(outf.get(i, j)));

[9] If we had view_cast in vsip or vsip_csl (currently it is part of
sarsim), we could write a single line:

	out = view_cast<unsigned char>((in - minv) * 255.f / scale);

I'll move that somethime this week.

 > +
 > +  save_view(outfile, out);
 > +
 > +  // The min and max values are displayed to reveal the scale
 > +  cout << infile << " [" << rows << " x " << cols << "] : "
 > +       << "min " << minv << ", max " << maxv << endl;
 > +}
 > +
 > Index: apps/ssar/ssar.cpp
 > ===================================================================

 > +void
 > +process_ssar_options(int argc, char** argv, ssar_options& options)
 > +{
 > +  if (argc != 2)
 > +  {
 > +    cerr << "Usage: " << argv[0] << " <data dir>" << endl;
 > +    exit(-1);
 > +  }
 > +
 > +  if (chdir(argv[1]) < 0)
 > +  {
 > +    perror(argv[1]);
 > +    exit(-1);
 > +  }

[10] I'm probably just being cranky, but I think it would be better to
manually prepend the directory path to the filename, or to pass the
filenames in as command line arguments.  Those would make it easier to
put the output files in another directory (which makes it slightly
easier to clean up) and give us flexibility in the future.

But, if it works, it works!  I don't see a compelling reason to change
this.

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705