[vsipl++] [patch] pwarp

Thu Dec 13 16:44:24 UTC 2007

Jules Bergmann wrote:
> I need to put more thought into how handle
> some of the 8-bit math.  Also, the location of the SPU kernel (in vsip)
> makes it unsightly to depend on code from vsip_csl.

Right. We should think a bit about how to decouple that. In particular,
how to not make the task_manager.hpp file the central repository for all
tasks / task types.

> Ok to apply?

Yes. I have some suggestions below that you may address.

> Index: src/vsip/opt/cbe/pwarp_params.h
> ===================================================================
> --- src/vsip/opt/cbe/pwarp_params.h	(revision 0)
> +++ src/vsip/opt/cbe/pwarp_params.h	(revision 0)
> @@ -0,0 +1,43 @@
> +/* Copyright (c) 2007 by CodeSourcery.  All rights reserved.
> +
> +   This file is available for license from CodeSourcery, Inc. under the terms
> +   of a commercial license and under the GPL.  It is not part of the VSIPL++
> +   reference implementation and is not available under the BSD license.
> +*/
> +/** @file    vsip/opt/cbe/pwarp_params.h
> +    @author  Jules Bergmann
> +    @date    2007-11-19
> +    @brief   VSIPL++ Library: Parameters for PWarp kernels.
> +*/
> +
> +#ifndef VSIP_OPT_CBE_PWARP_PARAMS_H
> +#define VSIP_OPT_CBE_PWARP_PARAMS_H
> +
> +/***********************************************************************
> +  Definitions
> +***********************************************************************/
> +
> +// Structures used in DMAs should be sized in multiples of 128-bits
> +
> +typedef struct
> +{
> +  float P[9];			// perspective warp matrix
> +  int   pad[3];

I'm still not sure whether there are in fact size restrictions for task
parameters. Or may be they changed between SDK 2.1 and SDK 3.0.

> +
> +  unsigned long long ea_in;	// input block EA
> +  unsigned long long ea_out;	// output block EA
> +
> +  unsigned int in_row_0;	// input origin row
> +  unsigned int in_col_0;	// input origin column
> +  unsigned int in_rows;		// input number of rows
> +  unsigned int in_cols;		// input number of cols
> +  unsigned int in_stride_0;	// input stride to next row
> +
> +  unsigned int out_row_0;	// output origin row
> +  unsigned int out_col_0;	// output origin column
> +  unsigned int out_rows;	// output number of rows
> +  unsigned int out_cols;	// output number of cols
> +  unsigned int out_stride_0;	// output stride to next row
> +} Pwarp_params;
> +
> +#endif // VSIP_OPT_CBE_FFT_PARAMS_H

> Index: src/vsip/opt/cbe/ppu/task_manager.hpp
> ===================================================================
> --- src/vsip/opt/cbe/ppu/task_manager.hpp	(revision 188740)
> +++ src/vsip/opt/cbe/ppu/task_manager.hpp	(working copy)
> @@ -43,6 +43,7 @@
>  struct Fastconv_tag;
>  struct Fastconvm_tag;
>  struct Vmmul_tag;
> +struct Pwarp_tag;

As noted above, we should start thinking about alternate ways to
register task types, to avoid artificial dependencies.

(I may look into boost.python for inspiration. There are tricks to
register types and type conversions that seem to avoid single
bottlenecks...)

>  
>  
>  namespace cbe
> @@ -150,5 +151,6 @@
>  DEFINE_TASK(6, Fastconvm_tag, void(std::complex<float>, std::complex<float>), fconvm_c)
>  DEFINE_TASK(7, Fastconvm_tag, void(split_float_type, split_float_type), fconvm_split_c)
>  DEFINE_TASK(8, Vmmul_tag, std::complex<float>(std::complex<float>, std::complex<float>), vmmul_c)
> +DEFINE_TASK(9, Pwarp_tag, void(unsigned char, unsigned char), pwarp_ub)
>  
>  #endif

> Index: tests/vsip_csl/pwarp.cpp
> ===================================================================

> +template <typename T,
> +	  typename BlockT>
> +void
> +setup_p(
> +  Matrix<T, BlockT> P,
> +  int               i)
> +{
> +  switch (i) {
> +  case 0:
> +    P        = T();
> +    P.diag() = T(1);
> +
> +  case 1:
> +    P(0,0) = T(0.999982);    P(0,1) = T(0.000427585); P(0,2) = T(-0.180836);
> +    P(1,0) = T(-0.00207906); P(1,1) = T(0.999923);    P(1,2) = T(0.745001);
> +    P(2,0) = T(1.01958e-07); P(2,1) = T(8.99655e-08); P(2,2) = T(1);
> +    break;
> +
> +  case 2:
> +    P(0,0) = 8.28282751190698e-01; 
> +    P(0,1) = 2.26355321374407e-02;
> +    P(0,2) = -1.10504985681804e+01;
> +
> +    P(1,0) = -2.42950546474237e-01;
> +    P(1,1) = 8.98035288576380e-01;
> +    P(1,2) = 1.05162748265872e+02;
> +
> +    P(2,0) = -1.38973743578922e-04;
> +    P(2,1) = -9.01955477542629e-05;
> +    P(2,2) = 1;
> +    break;
> +  }
> +}

Where do these numbers come from ? And what mean the cases ? Could you
add some comments so casual users (such as myself) are less puzzled ? :-)

Finally, could you make suitable modifications to the affected makefiles
such that 'make install' does the right thing ? I checked in a small
patch earlier this morning, but that didn't take into account that
vsip_csl/img/ has subdirectories, so it isn't quite complete.

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718