From don at codesourcery.com  Mon May  1 05:48:57 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 30 Apr 2006 23:48:57 -0600
Subject: [patch] New benchmark - vector division
Message-ID: <4455A149.20504@codesourcery.com>

Here is a new benchmark for testing element-wise vector division.  Also 
attached are two performance graphs comparing multiplication and 
division - one shows mega flops per second and the other latency, or the 
number of microseconds per operation.

The graph showing flops per second is somewhat misleading for two 
reasons: both divide and multiply for real numbers are each counted as a 
"flop" even though they take a different number of clock cycles to 
perform.  Second, complex-complex division takes more operations (11, 
two of which are real-real divisions) than complex-complex 
multiplication (6).  This gives them a comparable flop count, even 
though the division takes roughly twice as long.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vd.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060430/e7bed83a/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vd.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060430/e7bed83a/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_div_real_cplx.png
Type: image/png
Size: 5783 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060430/e7bed83a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_div_real_cplx_lat.png
Type: image/png
Size: 5160 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060430/e7bed83a/attachment-0001.png>

From jules at codesourcery.com  Mon May  1 12:10:10 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 01 May 2006 08:10:10 -0400
Subject: [vsipl++] [patch] New benchmark - vector division
In-Reply-To: <4455A149.20504@codesourcery.com>
References: <4455A149.20504@codesourcery.com>
Message-ID: <4455FAA2.3060106@codesourcery.com>

Don McCoy wrote:
> Here is a new benchmark for testing element-wise vector division.  Also 
> attached are two performance graphs comparing multiplication and 
> division - one shows mega flops per second and the other latency, or the 
> number of microseconds per operation.

Don, This looks good, please check in in. -- Jules
> 
> The graph showing flops per second is somewhat misleading for two 
> reasons: both divide and multiply for real numbers are each counted as a 
> "flop" even though they take a different number of clock cycles to 
> perform.

It does answer the question of whether a division FLOP is really the 
same as a multiply FLOP.  Depending on problem size, it looks like 1 div 
FLOP ~ 8 mul FLOPS.

>  Second, complex-complex division takes more operations (11, 
> two of which are real-real divisions) than complex-complex 
> multiplication (6).  This gives them a comparable flop count, even 
> though the division takes roughly twice as long.

Comparing complex-multiply MFLOPS vs complex-division MFLOPS is somewhat 
of an apples to oranges comparison.  The latency numbers, or 
alternatively measuring points per second, are a good way to look at it.

What machine/configuration are the results from?

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Mon May  1 15:59:49 2006
From: don at codesourcery.com (Don McCoy)
Date: Mon, 01 May 2006 09:59:49 -0600
Subject: [vsipl++] [patch] New benchmark - vector division
In-Reply-To: <4455FAA2.3060106@codesourcery.com>
References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com>
Message-ID: <44563075.6020507@codesourcery.com>

Jules Bergmann wrote:
> Comparing complex-multiply MFLOPS vs complex-division MFLOPS is somewhat 
> of an apples to oranges comparison.  The latency numbers, or 
> alternatively measuring points per second, are a good way to look at it.
> 
That's what I was thinking as well.  We have a benchmark that uses both 
multiply and divide (CFAR), it seemed odd to account for them as one 
operation each.

> What machine/configuration are the results from?
> 
Xeon 3.8 G w/ 2 M cache

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From don at codesourcery.com  Mon May  1 16:04:44 2006
From: don at codesourcery.com (Don McCoy)
Date: Mon, 01 May 2006 10:04:44 -0600
Subject: [vsipl++] [patch] New benchmark - vector division
In-Reply-To: <44563075.6020507@codesourcery.com>
References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> <44563075.6020507@codesourcery.com>
Message-ID: <4456319C.7060906@codesourcery.com>

Don McCoy wrote:
> Jules Bergmann wrote:
> 
>> What machine/configuration are the results from?
>>
> Xeon 3.8 G w/ 2 M cache
> 
The configuration uses optimization flags from "SerialBuiltin". 
Otherwise --with-fft=builtin is set and --with-lapack not specified 
(defaulting to CLAPACK, no?).

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From jules at codesourcery.com  Mon May  1 16:42:07 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 01 May 2006 12:42:07 -0400
Subject: [vsipl++] [patch] New benchmark - vector division
In-Reply-To: <4456319C.7060906@codesourcery.com>
References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> <44563075.6020507@codesourcery.com> <4456319C.7060906@codesourcery.com>
Message-ID: <44563A5F.5000603@codesourcery.com>

Don McCoy wrote:
> Don McCoy wrote:
>> Jules Bergmann wrote:
>>
>>> What machine/configuration are the results from?
>>>
>> Xeon 3.8 G w/ 2 M cache
>>
> The configuration uses optimization flags from "SerialBuiltin". 
> Otherwise --with-fft=builtin is set and --with-lapack not specified 
> (defaulting to CLAPACK, no?).
> 

I think that is right:

no '--with-lapack' option results in NO lapack being used at all

plain '--with-lapack' results in configure searching for the presence of 
installed atlas, installed generic lapack, and then builtin atlas (using 
clapack).

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Mon May  1 16:43:32 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 01 May 2006 12:43:32 -0400
Subject: [vsipl++] [patch] New benchmark - vector division
In-Reply-To: <44563075.6020507@codesourcery.com>
References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> <44563075.6020507@codesourcery.com>
Message-ID: <44563AB4.4040802@codesourcery.com>

Don McCoy wrote:

>> What machine/configuration are the results from?
>>
> Xeon 3.8 G w/ 2 M cache
> 

Don,

You should try building with IPP to see how that affects performance.

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Tue May  2 00:43:27 2006
From: don at codesourcery.com (Don McCoy)
Date: Mon, 01 May 2006 18:43:27 -0600
Subject: [patch] HPEC CFAR Detection benchmark
Message-ID: <4456AB2F.801@codesourcery.com>

The attached patch implements the CFAR benchmark.  Briefly, this problem 
involves finding targets based on data within a three-dimensional cube 
of 'beam locations', 'range gates' and 'doppler bins'.  It does this by 
comparing the signal in a given cell to that of nearby cells in order to 
avoid false-detection of targets.  The range gate parameter is varied 
when considering 'nearby' cells.  A certain number of guard cells are 
skipped, resulting in a computation that sums the values from two thick 
slices of this data cube (one on either side of the slice for a 
particular range gate).  The HPEC PCA Kernel-Level benchmark paper has a 
diagram that shows one cell under consideration.  Please refer to it if 
needed.

The algorithm involves these basic steps:
   - compute the squares of all the values in the data cube
   - for each range gate:
     - sum the squares of desired values around the current range gate
     - compute the normalized power for each cell in the slice
     - search for values that exceed a certain threshold

Some of the code relates to boundary conditions (near either end of the 
'range gates' parameter), but otherwise it follows the above description.

For now, the original implementation used get/put (actually operator()) 
instead of using subviews and the element-wise operators.  Switching 
from one to the other resulted in about a 25% improvement in performance 
for the first set of data (see attached graph).  The other sets 
experienced improvement as well, to varying degrees.  I'd like to 
consider how we can improve the throughput further.  Switching the 
processing order may help possibly.  Thoughts are welcome.

The benchmark only varies the number of range gates based upon the four 
sets of parameters defined in the HPEC paper.  As the workload is 
equally dependent on each of the three dimensions, sweeping the other 
two would not add much value.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, .712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cf.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060501/08091759/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cf.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060501/08091759/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cfar_optimized_mflops.png
Type: image/png
Size: 4346 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060501/08091759/attachment.png>

From jules at codesourcery.com  Tue May  2 15:17:16 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 02 May 2006 11:17:16 -0400
Subject: [patch] Minor benchmark fixes
Message-ID: <445777FC.7000100@codesourcery.com>

Patch applied.
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bm.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060502/64b07cd9/attachment.ksh>

From jules at codesourcery.com  Tue May  2 18:29:15 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 02 May 2006 14:29:15 -0400
Subject: [vsipl++] [patch] HPEC CFAR Detection benchmark
In-Reply-To: <4456AB2F.801@codesourcery.com>
References: <4456AB2F.801@codesourcery.com>
Message-ID: <4457A4FB.9040505@codesourcery.com>

Don McCoy wrote:
> The attached patch implements the CFAR benchmark.  Briefly, this problem 
> involves finding targets based on data within a three-dimensional cube 
> of 'beam locations', 'range gates' and 'doppler bins'.  It does this by 
> comparing the signal in a given cell to that of nearby cells in order to 
> avoid false-detection of targets.  The range gate parameter is varied 
> when considering 'nearby' cells.  A certain number of guard cells are 
> skipped, resulting in a computation that sums the values from two thick 
> slices of this data cube (one on either side of the slice for a 
> particular range gate).  The HPEC PCA Kernel-Level benchmark paper has a 
> diagram that shows one cell under consideration.  Please refer to it if 
> needed.
> 
> The algorithm involves these basic steps:
>   - compute the squares of all the values in the data cube
>   - for each range gate:
>     - sum the squares of desired values around the current range gate
>     - compute the normalized power for each cell in the slice
>     - search for values that exceed a certain threshold
> 
> Some of the code relates to boundary conditions (near either end of the 
> 'range gates' parameter), but otherwise it follows the above description.

Don,

Excellent description of the benchmark!  Can you put it into the file 
header as a comment?

> 
> For now, the original implementation used get/put (actually operator()) 
> instead of using subviews and the element-wise operators.  Switching 
> from one to the other resulted in about a 25% improvement in performance 
> for the first set of data (see attached graph).  The other sets 
> experienced improvement as well, to varying degrees.  I'd like to 
> consider how we can improve the throughput further.  Switching the 
> processing order may help possibly.  Thoughts are welcome.

General comments:
  - Avoid memory allocation/deallocation inside the compute loop.
    For example, the 't1' temporary matrix in cfar_find_targets()
    is being allocated/deallocated multiple times during a single
    test_cfar() call.

    You could avoid this by moving cfar_find_targets() and cfar_detect()
    into t_cfar_base class, and then defining the temporary
    matrices/tensors as member variables.

  - When taking a slice of a matrix/tensor, use a subview instead of
    copying data.

    For example, the 'pow_slice' matrix currently looks like:

	Matrix<T> pow_slice = cpow(dom0, 0, dom2);
  	cfar_find_targets(... pow_slice ...);
	for (...)
	{
	  ...
	  pow_slice = cpow(dom0, j, dom2);
	  cfar_find_targets(... pow_slice ...);
	}

    As written, pow_slice is a separate matrix holding a copy of
    the slice from cpow.  Each iteration through the loop, new data
    is copied into pow_slice.

    The reason that pow_slice is a copy instead of a reference
    is because its block type (Dense<2, T>) is different from the block
    type returned by 'cpow(...)' (an impl-defined block type that I don't
    know off the top of my head, Subset_block maybe?).  To have
    pow_slice reference the data instead of copying it, it needs
    to have the same block type returned from 'cpow()'.

    You can use Tensor<T>::submtraix<2>::type to get the right type
    (where Tensor<T> is the type of cpow):

	typename Tensor<T>::submatrix<2>::type
	  pow0_slice = cpow(dom0, 0, dom2);
	cfar_find_targets(... pow0_slice ...);
	for (...)
	{
	  ...
	  typename Tensor<T>::submatrix<2>::type
	    pow_slice = cpow(dom0, j, dom2);
	  cfar_find_targets(... pow_slice ...);
	}

    Note that you can't change what pow_slice refers to after
    you create it (i.e. change from '0' to 'j').  That's why this
    has 'pow0_slice' and 'pow_slice'.

    Of course, you could also do away with the explicit
    variable 'pow_slice' altogether:

	cfar_find_targets(... cpow(dom0, 0, dom2), ...);
	for (...)
	{
	  ...
	  cfar_find_targets(... cpow(dom0, j, dom2), ...);
	}

  - When iterating through each element in a matrix or tensor,
    try to arrange the variables to coincide with the dimension-order.

    For example, if you have a 3 by 3 row-major matrix:

	Matrix<T> mat(3, 3);

    The data will be laid out like this in memory:

	Address		Matrix element
	0		0,0
	1		0,1
	2		0,2
	3		1,0
	4		1,1
	5		1,2
	6		2,0
	7		2,1
	8		2,2

    If you iterate over the elements like so:

	for (index_type j=0; j<mat.size(1); ++j)
	  for (index_type i=0; i<mat.size(0); ++i)
	    .. use mat(i, j) ..

    The sequence of addresses you will be accessing from memory will
    look like:

	0, 3, 6, 1, 4, 7, 2, 5, 8

    This type of sequence makes poor utilization of the cache
    because cache lines may be flushed out before all their
    elements are used.

    For example, the access to location 0 would pull in other
    locations in the same cache line, such as location 1.  However
    if the matrix is large enough, one of the other access
    (3 or 6 in this case) might flush location 1 out of the cache
    before that element is accessed.

    Instead, if you iterate over the elements like this:

	for (index_type i=0; i<mat.size(0); ++i)
	  for (index_type j=0; j<mat.size(1); ++j)
	    .. use mat(i, j) ..

    You will get the nice sequence:

	0, 1, 2, 3, 4, 5, 6, 7, 8

> 
> The benchmark only varies the number of range gates based upon the four 
> sets of parameters defined in the HPEC paper.  As the workload is 
> equally dependent on each of the three dimensions, sweeping the other 
> two would not add much value.
> 
> Regards,

> + /***********************************************************************
> +   Support
> + ***********************************************************************/
> + 
> + template <typename T,
> +           typename Block1,
> +           typename Block2> 
> + inline
> + void
> + cfar_find_targets(
> +   const_Matrix<T, Block1> sum,       // Sum of values in Cfar gates
> +   length_type             gates,     // Total number of Cfar gates used
> +   const_Matrix<T, Block2> pow_slice, // A set of squared values of range gates
> +   const length_type       mu,        // Threshold for determining targets
> +   Matrix<index_type>      targets,   // All of the targets detected so far. 
> +   index_type&             next,      // the next empty slot in targets
> +   const length_type       j)         // Current range gate number. 
> + {
> +   if ( next >= targets.size(0) )  // full, nothing to do.
> +     return;
> + 
> +   // Compute the local noise estimate.  The inverse is calculated in advance
> +   // for efficiency.
> +   T inv_gates = (1.0 / gates);
> +   Matrix<T> tl = sum * inv_gates;
> + 
> +   // Make sure we don't divide by zero!  We take advantage of a
> +   // reduction function here, knowing the values are positive.
> +   Index<const_Matrix<T>::dim> idx;
> +   if ( minval(tl, idx) == T() )

Checking that minval == T() is actually overhead.  I.e. expanding it out:

	// compute minval
	for (i = ...)
	  for (k = ...)
	    if (t1(i, k) < minval) minval = ...

	// set 0
	if (val == 0.0)
	  for (i = ...)
	    for (k = ...)
	      if (t1(i, k) == 0.0) ...

In effect it is looping through the matrix multiple times.  Just going 
through the matrix and looking for zeros should be less expensive.


> +   {
> +     for ( index_type k = 0; k < tl.size(1); ++k )
> +       for ( index_type i = 0; i < tl.size(0); ++i )

Since t1 is row-major, you should reverse the loop nest.

> +         if ( tl(i,k) == 0.0 ) {
> +           tl(i,k) = Precision_traits<T>::eps;
> +           cout << "! " << i << " " << k << endl;
> +         }
> +   }
> + 
> +   // Compute the normalized power in the cell
> +   Matrix<T> normalized_power = pow_slice / tl;

Instead of using a separate matrix for normalize_power, you could update 
t1 in-place:

	t1 = pow_slice / t1;

> + 
> + 
> +   // If the normalized power is larger than mu record the coordinates.  The
> +   // list of target are held in a [N x 3] matrix, with each row containing 
> +   // the beam location, range gate and doppler bin location of each target. 
> +   //
> +   for ( index_type k = 0; k < tl.size(1); ++k )
> +     for ( index_type i = 0; i < tl.size(0); ++i )
> +     {
> +       if ( normalized_power(i,k) > mu )
> +       {
> +         targets(next,0) = i;
> +         targets(next,1) = j;
> +         targets(next,2) = k;
> +         if ( ++next == targets.size(0) )  // full, nothing else to do.
> +           return;
> +       }
> +     }

Looking at this entire function (cfar_find_targets), it could benefit 
from loop fusion.  It has 5 separate loops:

  - compute t1
  - (find minimum -- we can remove this)
  - replace zero values with eps
  - compute normalized power
  - look for detections.

Fusing these loops together would process each element start-to-finish, 
improving temporal locality.

Ignoring any vectorization potential, it would be more efficient to have 
a single loop:

	for (i = ...)
	  for (k = ...)
	  {
	    T t1 = sum(i, k) * inv_gates;
	    if (t1 == T()
	      t1 = eps;
	    T norm_power = pow_slice(i, k) / t1;
	    if (norm_power > mu)
	       ... record detection ...
	  }

It would be nice if we could write a high-level VSIPL++ expression that 
did the same thing.  Something like this might work:

	count = indexbool( pow_slice / max(sum * inv_gates, eps) > mu,
		  targets(Domain<1>(next, 1, targets.size() - next));
	next += count;

It would be good to compare the performance of the explicit for loop 
with the VSIPL++ approach to see if VSIPL++ does a good job.

> + }
> + 
> + 
> + template <typename T,
> +           typename Block>
> + void
> + cfar_detect(
> +   Tensor<T, Block>   cube,
> +   Matrix<index_type> found,
> +   length_type        cfar_gates,
> +   length_type        guard_cells,
> +   length_type        mu)
> + {
> + // Description:
> + //   Main computational routine for the Cfar Kernel Benchmark. Determines 
> + //   targets by finding SNR signal data points that are greater than the 
> + //   noise threshold mu
> + //
> + // Inputs:
> + //    cube: [beams x gates x bins] The radar datacube
> + //
> + // Note: this function assumes that second dimension of input cube C  
> + // has length (range gates) greater than 2(cfar gates + guard cells).
> + // If this were not the case, then the parameters of the radar signal 
> + // processing would be flawed!

Can you put this comment near the assertion that checks it?


> + 
> +   length_type beams = cube.size(0);
> +   length_type gates = cube.size(1);
> +   length_type dbins = cube.size(2);
> +   test_assert( 2*(cfar_gates+guard_cells) < gates );
> + 
> +   Tensor<T> cpow = pow(cube, 2);
> + 
> +   Domain<1> dom0(beams);
> +   Domain<1> dom2(dbins);
> +   Matrix<T> sum(beams, dbins, T());
> +   for ( length_type lnd = guard_cells; lnd < guard_cells+cfar_gates; ++lnd )
> +     sum += cpow(dom0, 1+lnd, dom2);
> + 
> +   Matrix<T> pow_slice = cpow(dom0, 0, dom2);
> + 
> +   index_type next_found = 0;
> +   cfar_find_targets(sum, cfar_gates, pow_slice, mu, found, next_found, 0);
> + 
> +   for ( index_type j = 1; j < gates; ++j )
> +   {
> +     length_type gates_used = 0;
> +     length_type c = cfar_gates;
> +     length_type g = guard_cells;
> + 

You could move this 'if-then-else' statement outside of the loop.  This 
would result in multiple loops.  Since the majority of time is spent in 
case 3, keeping cases 1 & 2 and 4 & 5 together would be OK.  I.e.:
  - loop for cases 1 & 2
  - loop for case 3
  - loop for cases 4 & 5

> +     // Case 1: No cell included on left side of CFAR; 
> +     // very close to left boundary 
> +     if ( j < (g + 1) ) 
> +     {
> +       gates_used = c;
> +       sum += cpow(dom0, j+g+c, dom2)   - cpow(dom0, j+g, dom2);
> +     }
> +     // Case 2: Some cells included on left side of CFAR;
> +     // close to left boundary 
> +     else if ( (j >= (g + 1)) & (j < (g + c + 1)) )
> +     {
> +       gates_used = c + j - (g + 1);
> +       sum += cpow(dom0, j+g+c, dom2)   - cpow(dom0, j+g, dom2) 
> +            + cpow(dom0, j-(g+1), dom2);
> +     }
> +     // Case 3: All cells included on left and right side of CFAR
> +     // somewhere in the middle of the range vector
> +     else if ( (j >= (g + c + 1)) & ((j + (g + c)) < gates) )
> +     {
> +       gates_used = 2 * c;
> +       sum += cpow(dom0, j+g+c, dom2)   - cpow(dom0, j+g, dom2) 
> +            + cpow(dom0, j-(g+1), dom2) - cpow(dom0, j-(c+g+1), dom2);
> +     }
> +     // Case 4: Some cells included on right side of CFAR;
> +     // close to right boundary
> +     else if ( (j + (g + c) >= gates) & ((j + g) < gates) )
> +     {
> +       gates_used = c + gates - (j + g);
> +       sum +=                           - cpow(dom0, j+g, dom2) 
> +            + cpow(dom0, j-(g+1), dom2) - cpow(dom0, j-(c+g+1), dom2);
> +     }
> +     // Case 5: No cell included on right side of CFAR; 
> +     // very close to right boundary 
> +     else if (j + g >= gates)
> +     {
> +       gates_used = c;
> +       sum += cpow(dom0, j-(g+1), dom2) - cpow(dom0, j-(c+g+1), dom2);
> +     }    
> +     else
> +     {
> +       cerr << "Error: fell through if statements in Cfar detection - " << 
> +         j << endl;
> +       test_assert(0);
> +     }
> + 
> +     pow_slice = cpow(dom0, j, dom2);
> +     cfar_find_targets(sum, gates_used, pow_slice, mu, found, next_found, j);
> +   }
> + }

> Index: benchmarks/loop.hpp
> ===================================================================
> RCS file: /home/cvs/Repository/vpp/benchmarks/loop.hpp,v
> retrieving revision 1.17
> diff -c -p -r1.17 loop.hpp
> *** benchmarks/loop.hpp	13 Apr 2006 19:21:07 -0000	1.17
> --- benchmarks/loop.hpp	2 May 2006 00:26:12 -0000
> *************** Loop1P::sweep(Functor fcn)
> *** 286,292 ****
>   
>       float factor = goal_sec_ / time;
>       if (factor < 1.0) factor += 0.1 * (1.0 - factor);
> !     loop = (int)(factor * loop);
>   
>       if (factor >= 0.75 && factor <= 1.25)
>         break;
> --- 286,299 ----
>   
>       float factor = goal_sec_ / time;
>       if (factor < 1.0) factor += 0.1 * (1.0 - factor);
> !     if ( loop == (int)(factor * loop) )
> !       break;          // Avoid getting stuck when factor ~= 1 and loop is small
> !     else
> !       loop = (int)(factor * loop);
> !     if ( loop == 0 ) 
> !       loop = 1; 
> !     if ( loop == 1 )  // Quit if loop cannot get smaller
> !       break;

I was a little confused by this logic at first, but after considering 
it, it seems OK.

I've thought about always starting the loop count at 1 for calibration 
and only letting it grow.  If the new loop is ever smaller than the old 
one, that would end calibration (calibration would also end of 0.75 <= 
factor <= 1.25 as currently).  Do you think that would work?


>   
>       if (factor >= 0.75 && factor <= 1.25)
>         break;


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue May  2 21:26:22 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 02 May 2006 17:26:22 -0400
Subject: [patch] Solver dispatch
Message-ID: <4457CE7E.30903@codesourcery.com>

This patch changes the dispatch for the LU and cholesky solvers to work 
when the lapack backend is not available.  It updates the LU cholesky 
solver tests to only test types supported by a backend.

It also adds support to use Lapack bindings provided by the AMD Core 
Math Library (ACML) when --with-lapack=acml.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: solver-dispatch.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060502/3051b2e7/attachment.ksh>

From stefan at codesourcery.com  Wed May  3 15:27:23 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Wed, 03 May 2006 11:27:23 -0400
Subject: patch: sal/fft.hpp fix
Message-ID: <4458CBDB.9020508@codesourcery.com>

The attached patch removes the incorrect use of SFINAE by a more
explicit form to indicate that the SAL backend doesn't support
long double types. Before this patch the SAL backend was always
skipped. Ok to check in ?

Regards,
		Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fft.hpp.diff
Type: text/x-patch
Size: 2399 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060503/3feaa50b/attachment.bin>

From jules at codesourcery.com  Wed May  3 15:36:09 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 03 May 2006 11:36:09 -0400
Subject: [vsipl++] patch: sal/fft.hpp fix
In-Reply-To: <4458CBDB.9020508@codesourcery.com>
References: <4458CBDB.9020508@codesourcery.com>
Message-ID: <4458CDE9.6070000@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch removes the incorrect use of SFINAE by a more
> explicit form to indicate that the SAL backend doesn't support
> long double types. Before this patch the SAL backend was always
> skipped. Ok to check in ?

Stefan,

How would the SAL FFT react if a user accidentally tried to do an FFT on 
integral data?

Instead of listing the types that SAL doesn't support (long double), 
could you instead list the types that it does support?

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Wed May  3 15:57:50 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Wed, 03 May 2006 11:57:50 -0400
Subject: [vsipl++] patch: sal/fft.hpp fix
In-Reply-To: <4458CDE9.6070000@codesourcery.com>
References: <4458CBDB.9020508@codesourcery.com> <4458CDE9.6070000@codesourcery.com>
Message-ID: <4458D2FE.1060104@codesourcery.com>

Jules Bergmann wrote:
> Stefan Seefeld wrote:
>> The attached patch removes the incorrect use of SFINAE by a more
>> explicit form to indicate that the SAL backend doesn't support
>> long double types. Before this patch the SAL backend was always
>> skipped. Ok to check in ?
> 
> Stefan,
> 
> How would the SAL FFT react if a user accidentally tried to do an FFT on 
> integral data?
> 
> Instead of listing the types that SAL doesn't support (long double), 
> could you instead list the types that it does support?

That's what I tried with my sfinae approach. I agree that having an inclusive
list is better than an exclusive (incomplete) list, and I'm still thinking about
how to do that, without having to duplicate all the evaluator logic for all
supported types.

Meanwhile, I'd like to specifically disable long double for sal since fftw
supports it, and thus it makes sense to have tests trying to run FFTs
with long double types.

Regards,
		Stefan


From jules at codesourcery.com  Wed May  3 16:57:48 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 03 May 2006 12:57:48 -0400
Subject: [vsipl++] patch: sal/fft.hpp fix
In-Reply-To: <4458D2FE.1060104@codesourcery.com>
References: <4458CBDB.9020508@codesourcery.com> <4458CDE9.6070000@codesourcery.com> <4458D2FE.1060104@codesourcery.com>
Message-ID: <4458E10C.4070907@codesourcery.com>

Stefan Seefeld wrote:

> Meanwhile, I'd like to specifically disable long double for sal since fftw
> supports it, and thus it makes sense to have tests trying to run FFTs
> with long double types.
> 


Ok, that patch looks fine then. -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Sat May  6 20:07:15 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sat, 06 May 2006 16:07:15 -0400
Subject: [patch] Run-time external data access.
Message-ID: <445D01F3.7030500@codesourcery.com>

Attach patch implements and tests external data access with run-time 
selection of the layout.

In theory, should work something like this:

	// Assume this is an operator() function for a class similar
	// to Fft.  It calls a backend (backend_) to do the work.
	// Since the backend is chosen at run-time (and is derived
	// from a virtual base class), we can't use the normal
	// Ext_data because it requires that layout be chosen
	// at compile-time.  Instead we need to use run-time ext_data.

	operator()(
	  const_Vector<T, Block0> in,
	  Vector<T, Block1> out)
	{

	  // First, determine layout of blocks:
	  Rt_layout<1> rtl_in  = block_layout(in.block());
	  Rt_layout<1> rtl_out = block_layout(out.block());

	  // Second, queury the backend about what layout
	  // it can support.
	  // Backend will modify rtl_in and rtl_out.
	  //
	  // For example, it might:
	  //  - set strides to unit-stride if it only supports
	  //    unit-stride,
	  //  - set complex formats to match,
	  //  - set dimension-ordering ,
	  //  - etc.
	  backend_->query_layout(rtl_in, rtl_out);

	  // Thrid, create run-time Ext_data objects
	  Rt_ext_data<Block0, 1> ext_in(in.block(), rtl_in);
	  Rt_ext_data<Block1, 1> ext_out(out.block(), rtl_out);

	  // Fourth, call functions in backend.
	  //
	  // Some knowledge may get encoded here.  In particular,
	  // because split- and interleaved- complex have
	  // different types, we need to call the appropriate
	  // backend function.  The backends could do this dispatch
	  // too.

	  // backends don't have functions with mixed split/interleaved
	  // arguments.
	  assert(rtl_in.complex == rtl_out.complex);
	  if (rtl_in.complex == cmplx_inter_fmt)
	  {
	    backend_->doit(rtl_in.data().as_inter(),
	                   rtl_in.stride(0),
	                   rtl_out.data().as_inter(),
	                   rtl_out.stride(0),
	                   out.size());
	  }
	  else // (rtl_in.complex == cmplx_split_fmt)
	  {
	    backend_->doit(rtl_in.data().as_split(),
	                   rtl_in.stride(0),
	                   rtl_out.data().as_split(),
	                   rtl_out.stride(0),
	                   out.size());
	  }
	}

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rtex.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060506/bc3016da/attachment.ksh>

From don at codesourcery.com  Sun May  7 00:47:21 2006
From: don at codesourcery.com (Don McCoy)
Date: Sat, 06 May 2006 18:47:21 -0600
Subject: [patch] double support for SAL LU solver
Message-ID: <445D4399.7000100@codesourcery.com>

This was tested against C-SAL but without the portions of the tests 
excercising the transpose options (when using the "old" functions).

Note: some lines were changed only in that tabs were replaced with spaces!

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lu.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060506/c6fd1c76/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lu.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060506/c6fd1c76/attachment-0001.ksh>

From jules at codesourcery.com  Sun May  7 01:35:55 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sat, 06 May 2006 21:35:55 -0400
Subject: [vsipl++] [patch] double support for SAL LU solver
In-Reply-To: <445D4399.7000100@codesourcery.com>
References: <445D4399.7000100@codesourcery.com>
Message-ID: <445D4EFB.2030506@codesourcery.com>

Don McCoy wrote:
> This was tested against C-SAL but without the portions of the tests 
> excercising the transpose options (when using the "old" functions).

Don,

This looks good.  Can you:

  - Move the reciprocal call from sal_matfbs to sal_matlud.  That way
    if multiple sal_matfbs calls are made (either because B/X have
    multiple columns, or because the LU object is used multiple times),
    vrecip will only get called once.

  - Create a typedef for the block_type of recip_.  That way the Ext_data
    for recip_ is guaranteed to have the correct block type if recip_
    ever changes.

  - a few more comments sprinkled below.

If these comments make sense, once you address them this looks good to 
check in.

How do we test this?  By manually disabling the the mat_trans and 
mat_herm cases?

				thanks,
				-- Jules


> + 
> + 
> + 
> + // "Legacy" SAL functions - The single-precision versions are listed
> + // in the Appendix of the SAL Reference manual.  Although the double-
> + // precision ones are still part of the normal API, we refer to both 
> + // sets of functions as legacy functions just for ease of naming.
> + 
> + // Legacy SAL LUD decomposition functions
> + #define VSIP_IMPL_SAL_LUD_DEC( T, SAL_T, SALFCN ) \
> + inline void                  \
> + sal_matlud(                  \
> +   T *c,                      \
> +   int *d, int n)             \
> + {                            \
> +   SALFCN((SAL_T*) c, d, n);  \

If you pass recip_ in, you can perform the reciprocal one time here.

> + }

> --- 285,308 ----
>   
>   protected:
>     template <mat_op_type tr,
> !             typename    Block0,
> !             typename    Block1>
>     bool impl_solve(const_Matrix<T, Block0>, Matrix<T, Block1>)
>       VSIP_NOTHROW;
>   
> +   length_type max_decompose_size();
> + 
>     // Member data.
>   private:
>     typedef std::vector<int, Aligned_allocator<int> > vector_type;
>   
> !   length_type  length_;                 // Order of A.
> !   vector_type  ipiv_;                   // Pivot table for Q. This gets
>                                           // generated from the decompose and
> !                                         // gets used in the solve
> !   Vector<T> recip_;                     // Vector of reciprocals used
> !                                         // with legacy solvers

Use a typedef for recip_'s block type.

> !   Matrix<T, data_block_type> data_;     // Factorized matrix (A)
>   };
>   
>   
> *************** Lud_impl<T,Mercury_sal_tag>::Lud_impl(
> *** 191,196 ****
> --- 320,326 ----
>   VSIP_THROW((std::bad_alloc))
>     : length_ (length),
>       ipiv_   (length_),
> +     recip_  (length_),
>       data_   (length_, length_)
>   {
>     assert(length_ > 0);
> *************** Lud_impl<T,Mercury_sal_tag>::Lud_impl(Lu
> *** 203,213 ****
> --- 333,347 ----
>   VSIP_THROW((std::bad_alloc))
>     : length_ (lu.length_),
>       ipiv_   (length_),
> +     recip_  (length_),
>       data_   (length_, length_)
>   {
>     data_ = lu.data_;
>     for (index_type i=0; i<length_; ++i)
> +   {
>       ipiv_[i] = lu.ipiv_[i];
> +     recip_.put(i, lu.recip_.get(i));
> +   }

Since recip_ is a vector, you could just say:

	recip_ = lu.recip_;

>   }
>   
>   

>     else 
> --- 464,498 ----
>       }
>       Ext_data<data_block_type>   a_ext((tr == mat_trans)?
>                                           data_int.block():data_.block());
> +     Ext_data<Dense<1, T> >  r_ext(recip_.block());
>   
>       // sal_mat_lud_sol only takes vectors, so, we have to do this for each
>       // column in the matrix
>       ptr_type b_ptr = b_ext.data();
>       ptr_type x_ptr = x_ext.data();
> !     for(index_type i=0;i<b.size(1);i++) 
> !     {
> ! #if VSIP_IMPL_SAL_USE_MAT_LUD
>         sal_mat_lud_sol(a_ext.data(), a_ext.stride(0),
>                         &ipiv_[0],
> !                       storage_type::offset(b_ptr,i*length_),
> !                       storage_type::offset(x_ptr,i*length_),
> !                       length_,trans);
> ! #else
> !       if (x_ext.stride(0) != 1)
> !         VSIP_IMPL_THROW(unimplemented(
> !           "Lud_impl<>::impl_solve - data must be dense (have unit stride)"));

This should either be an assertion, or removed.  x_ext refers to x_int, 
which is declared by the LU object to be column major.  Since we know 
the block is column major, the condition x_ext.stride(0) != 1 would 
indicate a bug in Ext_data (i.e. something impossible happened -> assert 
failure), as opposed to unsupported behavior (user tried to do something 
unsupported -> throw exception).

> !       if (tr == mat_ntrans)
> !         sal_matfbs(a_ext.data(), r_ext.data(), &ipiv_[0],
> !                    storage_type::offset(b_ptr, i*length_),
> !                    storage_type::offset(x_ptr, i*length_),
> !                    length_);
> !       else
> !         VSIP_IMPL_THROW(unimplemented(
> !           "Lud_impl<mat_op_type!=mat_ntrans>::impl_solve - unimplemented"));

Good!  Well, actually bad (SAL doesn't support mat_trans), but throwing 
an exception is the right thing to do.

> ! #endif
>       }
>   
>       assign_local(x, x_int);
>     }
>     else 


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Sun May  7 17:16:58 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 07 May 2006 13:16:58 -0400
Subject: [patch] Run-time external data access
Message-ID: <445E2B8A.3000705@codesourcery.com>

Add missing block_layout function.

Fix several hard-coded dimensions in rt_extdata.hpp (thanks Stefan)

Use block's dimension as default dimension for Rt_ext_data (thanks Stefan!)

Patch applied.

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rtex2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060507/6c9b876a/attachment.ksh>

From jules at codesourcery.com  Sun May  7 18:38:58 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 07 May 2006 14:38:58 -0400
Subject: [patch] Forcing a copy for run-time external data access.
Message-ID: <445E3EC2.4030506@codesourcery.com>

This patch adds support for a SYNC_IN_NOPRESERVE flag with Rt_ext_data. 
  It requires the block to be synchronized with the external data when 
the Rt_ext_data is created, and it requires that changes made to the 
external data are not reflected in the original block.  In short, it 
forces data to be copied, even if the block already has the requested 
layout.

The intention is to support FFT backends like SAL that need to 
reorganize data in-place for packing before performing real-to-complex 
FFTs.  The backend would communicate that it requires the input data to 
be copied so that it can pack as necessary.

Applying this to the earlier example:

     operator()(
       const_Vector<T, Block0> in,
       Vector<T, Block1> out)
     {

       // First, determine layout of blocks:
       Rt_layout<1> rtl_in  = block_layout(in.block());
       Rt_layout<1> rtl_out = block_layout(out.block());

       // Second, queury the backend about what layout
       // it can support.
       // Backend will modify rtl_in and rtl_out.
       //
       // For example, it might:
       //  - set strides to unit-stride if it only supports
       //    unit-stride,
       //  - set complex formats to match,
       //  - set dimension-ordering ,
       //  - etc.
       backend_->query_layout(rtl_in, rtl_out);

       // Determine if backend needs to modify the input data
       // (for example, if performing a real-to-complex FFT requires
       // a special packing format).
       //
       // If backend does need to modify it, we'll use SYNC_IN_NOPRESEVE
       // which effectively forces a copy.
       sync_action_type in_sync = backend_->requires_copy(rtl_in)
                                ? SYNC_IN_NOPRESERVE
                                : SYNC_IN;

       // Thrid, create run-time Ext_data objects
       Rt_ext_data<Block0, 1> ext_in (in.block(),  rtl_in,  in_sync);
       Rt_ext_data<Block1, 1> ext_out(out.block(), rtl_out, SYNC_OUT);

       // Fourth, call functions in backend.
       //
       // Some knowledge may get encoded here.  In particular,
       // because split- and interleaved- complex have
       // different types, we need to call the appropriate
       // backend function.  The backends could do this dispatch
       // too.

       // backends don't have functions with mixed split/interleaved
       // arguments.
       assert(rtl_in.complex == rtl_out.complex);
       if (rtl_in.complex == cmplx_inter_fmt)
       {
         backend_->doit(rtl_in.data().as_inter(),
                        rtl_in.stride(0),
                        rtl_out.data().as_inter(),
                        rtl_out.stride(0),
                        out.size());
       }
       else // (rtl_in.complex == cmplx_split_fmt)
       {
         backend_->doit(rtl_in.data().as_split(),
                        rtl_in.stride(0),
                        rtl_out.data().as_split(),
                        rtl_out.stride(0),
                        out.size());
       }
     }

Stefan, this is a bit different than adding the 'force_copy' field to 
the Rt_layout that I was suggesting before.  However, it seems cleaner 
in that the 'force_copy' is not really a property of the layout.  Do you 
think this will work OK?


				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rtex3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060507/c91bb30f/attachment.ksh>

From don at codesourcery.com  Sun May  7 19:42:19 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 07 May 2006 13:42:19 -0600
Subject: [vsipl++] [patch] double support for SAL LU solver
In-Reply-To: <445D4EFB.2030506@codesourcery.com>
References: <445D4399.7000100@codesourcery.com> <445D4EFB.2030506@codesourcery.com>
Message-ID: <445E4D9B.7070100@codesourcery.com>

Jules Bergmann wrote:
> 
> This looks good.  Can you:
> 
>  - Move the reciprocal call from sal_matfbs to sal_matlud.  That way
>    if multiple sal_matfbs calls are made (either because B/X have
>    multiple columns, or because the LU object is used multiple times),
>    vrecip will only get called once.
> 
>  - Create a typedef for the block_type of recip_.  That way the Ext_data
>    for recip_ is guaranteed to have the correct block type if recip_
>    ever changes.
> 
>  - a few more comments sprinkled below.
> 
> If these comments make sense, once you address them this looks good to 
> check in.

Committed with suggested changes.  Thanks for catching those things.

> 
> How do we test this?  By manually disabling the the mat_trans and 
> mat_herm cases?
> 

Exactly.


-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lu2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060507/fc2f1032/attachment.ksh>

From jules at codesourcery.com  Sun May  7 19:49:33 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 07 May 2006 15:49:33 -0400
Subject: [vsipl++] [patch] double support for SAL LU solver
In-Reply-To: <445E4D9B.7070100@codesourcery.com>
References: <445D4399.7000100@codesourcery.com> <445D4EFB.2030506@codesourcery.com> <445E4D9B.7070100@codesourcery.com>
Message-ID: <445E4F4D.1000904@codesourcery.com>

Don, Thanks for getting this checked in! -- Jules


Don McCoy wrote:
>   VSIP_THROW((std::bad_alloc))
>     : length_ (lu.length_),
>       ipiv_   (length_),
> +     recip_  (length_),
>       data_   (length_, length_)
>   {
>     data_ = lu.data_;
>     for (index_type i=0; i<length_; ++i)
> +   {
>       ipiv_[i] = lu.ipiv_[i];
> +     recip_ = lu.recip_;

The recip_ assignment should go outside the loop, right?

> +   }
>   }


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Sun May  7 19:54:52 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 07 May 2006 15:54:52 -0400
Subject: [patch] Extend rt_extdata test coverage for vectors and tensors
Message-ID: <445E508C.5080907@codesourcery.com>

Patch applied.
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rtex4.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060507/7ad43c22/attachment.ksh>

From don at codesourcery.com  Sun May  7 20:07:38 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 07 May 2006 14:07:38 -0600
Subject: [vsipl++] [patch] double support for SAL LU solver
In-Reply-To: <445E4F4D.1000904@codesourcery.com>
References: <445D4399.7000100@codesourcery.com> <445D4EFB.2030506@codesourcery.com> <445E4D9B.7070100@codesourcery.com> <445E4F4D.1000904@codesourcery.com>
Message-ID: <445E538A.5040507@codesourcery.com>

Jules Bergmann wrote:
> Don McCoy wrote:
> 
>>   VSIP_THROW((std::bad_alloc))
>>     : length_ (lu.length_),
>>       ipiv_   (length_),
>> +     recip_  (length_),
>>       data_   (length_, length_)
>>   {
>>     data_ = lu.data_;
>>     for (index_type i=0; i<length_; ++i)
>> +   {
>>       ipiv_[i] = lu.ipiv_[i];
>> +     recip_ = lu.recip_;
> 
> 
> The recip_ assignment should go outside the loop, right?
> 
>> +   }
>>   }
> 
> 
> 
Yes.  Corrected.


-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lu3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060507/5f4d81ed/attachment.ksh>

From don at codesourcery.com  Sun May  7 20:36:08 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 07 May 2006 14:36:08 -0600
Subject: [vsipl++] [patch] New benchmark - vector division
In-Reply-To: <4455FAA2.3060106@codesourcery.com>
References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com>
Message-ID: <445E5A38.3050901@codesourcery.com>

Jules Bergmann wrote:
> Don McCoy wrote:
> 
>> Here is a new benchmark for testing element-wise vector division.  
>> Also attached are two performance graphs comparing multiplication and 
>> division - one shows mega flops per second and the other latency, or 
>> the number of microseconds per operation.
> 
> 
> Don, This looks good, please check in in. -- Jules
> 
Checked in.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From stefan at codesourcery.com  Sun May  7 21:51:36 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Sun, 07 May 2006 17:51:36 -0400
Subject: [vsipl++] [patch] Forcing a copy for run-time external data access.
In-Reply-To: <445E3EC2.4030506@codesourcery.com>
References: <445E3EC2.4030506@codesourcery.com>
Message-ID: <445E6BE8.50600@codesourcery.com>

Jules Bergmann wrote:

> Stefan, this is a bit different than adding the 'force_copy' field to 
> the Rt_layout that I was suggesting before.  However, it seems cleaner 
> in that the 'force_copy' is not really a property of the layout.  Do you 
> think this will work OK?

Yes, it should work. I'll give it a try tonight. I already played with the
split/interleaved (non-)conversion earlier today, which seems to work fine. Yay !

Regards,
		Stefan


From don at codesourcery.com  Mon May  8 03:49:58 2006
From: don at codesourcery.com (Don McCoy)
Date: Sun, 07 May 2006 21:49:58 -0600
Subject: [vsipl++] [patch] HPEC benchmark makefiles
In-Reply-To: <444E71AC.9070006@codesourcery.com>
References: <443D0393.50800@codesourcery.com> <444E71AC.9070006@codesourcery.com>
Message-ID: <445EBFE6.3030904@codesourcery.com>

Jules Bergmann wrote:
> Don McCoy wrote:
> 
>> The attached patch moves the HPEC Kernel benchmarks to their own 
>> directory in benchmarks/hpec-kernel/ and includes new makefiles for 
>> both developers and users.
...
> 
> Don,
> 
> This looks good.  Please check it in.  thanks, -- Jules
> 

Fixed a minor issue with src/vsip_csl/GNUmakefile.inc.in (it did not 
actually copy the correct files) and with 
benchmarks/hpec_kernel/make.standalone (to reference ../ for include 
files and main.o).

Committed.

> 
> Does gnumake allow variable names with "-" in them?  If so, this is OK.
> 
> If not, let's replace the "-" with a "_" (and update the GNUmakefile.in 
> 'norm_dir' function accordingly).
> 

Changed to 'benchmarks/hpec_kernel' as discussed.


Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hpec2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060507/adf5b7c9/attachment.ksh>

From jules at codesourcery.com  Mon May  8 13:16:28 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 08 May 2006 09:16:28 -0400
Subject: [patch] QR
Message-ID: <445F44AC.3090201@codesourcery.com>

This patch builds on the QR portion of Assem's earlier QR/SVD patch.  It 
has the following changes:

  - Explicitly disables using Qrd for full-QR (storage_type == qrd_saveq)
    when using SAL as the implementation.  Trying to create a Qrd object
    for qrd_saveq will throw an unimplemented exception.

  - Uses the dispatcher from LU and Cholesky.

  - Fix the use of Ext_data objects to avoid modifying the block being
    accessed during the Ext_data object's lifetime.

    When using an Ext_data object to access a block's data directly, you
    should not modify the block's values during the Ext_data object's
    lifetime.  When the block supports direct access, this happens to
    work OK, but when the block does not support direct access, Ext_data
    will copy data and changes made to the block will not be reflected in
    the copy.

    While we did choose the block types in Qrd with direct access in
    mind, we should keep the usage "correct" so that it doesn't get
    exported via cut-and-paste or subtly break in the future.

    (One test idea: we could build a special version of the library that
    forces all Ext_data objects to copy and see what breaks!)

  - Add support for split-complex

  - Added back assertions on input sizes to impl_prodq and impl_rsol.

    In general, assertions are a good thing.  Here they help enforce
    that input matrices from the user have the right shape.

Also in this patch:

  - Updated QR tests to only cover types and storage types supported
    by the implementation.  In particular, avoids testing double
    precision and full-QR when using SAL.

  - Small configure.ac fix for FFTW3.  Adds an AC_SUBST for
    VSIP_IMPL_FFTW3.  Slight logic change when checking if
    FFTW3 is not enabled (was checking against empty string,
    now checks against "no").

A couple of questions

Assem,

Is there any reason we are using matmgs_dqr instead of matmgs_dqrx? 
Likewise for the other SAL functions (magmgs_srhr, etc).  I don't see 
matmgs_dqr documented in the SAL reference manual (nor the other "non-x" 
functions documented).  It looks like the difference is the missing ESAL 
flags.

If the "non-x" functions are not documented, we should use the "x" 
functions instead.


Stefan,

Does the configure.ac bit for FFTW look OK?


				thanks,
				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qr.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060508/09837907/attachment.ksh>

From stefan at codesourcery.com  Mon May  8 20:44:57 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Mon, 08 May 2006 16:44:57 -0400
Subject: [vsipl++] [patch] Forcing a copy for run-time external data access.
In-Reply-To: <445E3EC2.4030506@codesourcery.com>
References: <445E3EC2.4030506@codesourcery.com>
Message-ID: <445FADC9.9090105@codesourcery.com>

The attached patch rewrites the 1D workspaces used to prepare
data to be 'sent' to the FFT backends. It now uses Jules' new
rt_extdata harness, to take advantage of the backend's handling
of split/interleaved, as well as non-unit strides, if possible.
Now the number of copies of the data blocks should be minimal.

As I'm still in the process of debugging 2D and M cases,
I send in this partial patch, in the hope that it is useful
such as for benchmarking, to make sure the performance is
at least on par with what it used to be before the redesign.

Hopefully I'm able to send out more patches later tonight...

Regards,
		Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fft1d.patch
Type: text/x-patch
Size: 46329 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060508/9cacf675/attachment.bin>

From jules at codesourcery.com  Mon May  8 22:19:05 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 08 May 2006 18:19:05 -0400
Subject: [vsipl++] [patch] Forcing a copy for run-time external data access.
In-Reply-To: <445FADC9.9090105@codesourcery.com>
References: <445E3EC2.4030506@codesourcery.com> <445FADC9.9090105@codesourcery.com>
Message-ID: <445FC3D9.5060207@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch rewrites the 1D workspaces used to prepare
> data to be 'sent' to the FFT backends. It now uses Jules' new
> rt_extdata harness, to take advantage of the backend's handling
> of split/interleaved, as well as non-unit strides, if possible.
> Now the number of copies of the data blocks should be minimal.
> 
> As I'm still in the process of debugging 2D and M cases,
> I send in this partial patch, in the hope that it is useful
> such as for benchmarking, to make sure the performance is
> at least on par with what it used to be before the redesign.
> 
> Hopefully I'm able to send out more patches later tonight...

Stefan,

This looks good.

I need to check how Rt_extdata handles requests for 1D 
stride_unit_dense.  In general it recongnizes that stride_unit_dense is 
a stricter requirement than stride_unit (i.e. anything that is 
stride_unit_dense is also stride_unit, but not visa-versa).  It should 
make an exception for 1D since there are no higher dimensions.

Alternatively, for 1D data you could just request stride_unit, since 
that is the minimal requirement.

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Tue May  9 16:48:40 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 09 May 2006 12:48:40 -0400
Subject: [patch] Allow non-complex data to have split format
Message-ID: <4460C7E8.8010900@codesourcery.com>

Removes a few assumptions that non-complex data must have interleaved 
format.  Adds additional test coverage for those cases.

Patch applied.
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: split.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060509/64ddf1c7/attachment.ksh>

From stefan at codesourcery.com  Tue May  9 19:49:35 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 09 May 2006 15:49:35 -0400
Subject: patch: FFT 1D / 2D / M bug fixes and test enhancements.
Message-ID: <4460F24F.7070102@codesourcery.com>

The attached patch enhances the fft_be.cpp tests to cover all
1D, 2D, and M Fft variants, inclusively non-square and non-unit-stride
matrices. Doing this revealed a number of (more or less subtle) bugs
in the various backends, which are now fixed.

(There is still one case that I didn't manage to fix: the c->r dft 2D
  case. If anybody wants to have a look, that would be appreciated.
  The relevant code is in fft/dft.hpp:386. The appropriate tests in fft_be.cpp
  are commented out for the moment.)

The only remaining issue, then, is the finalization of the 3D FFTs,
which is rather simple, as only fftw and dft support it.


Regards,
		Stefan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060509/ce8d84b1/attachment.ksh>

From stefan at codesourcery.com  Tue May  9 20:19:29 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 09 May 2006 16:19:29 -0400
Subject: patch: Fix bug in configure.ac that caused VSIP_IMPL_FFTW3 not always
 to be defined as requested
Message-ID: <4460F951.7070107@codesourcery.com>

Just what the subject implies. The variables are defined if either
'fftw3' or 'builtin' backend is selected.

Ok, to checkin ?

Thanks,	
		Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.ac.diff
Type: text/x-patch
Size: 1031 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060509/52336150/attachment.bin>

From jules at codesourcery.com  Wed May 10 01:24:26 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 09 May 2006 21:24:26 -0400
Subject: [vsipl++] patch: FFT 1D / 2D / M bug fixes and test enhancements.
In-Reply-To: <4460F24F.7070102@codesourcery.com>
References: <4460F24F.7070102@codesourcery.com>
Message-ID: <446140CA.6030007@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch enhances the fft_be.cpp tests to cover all
> 1D, 2D, and M Fft variants, inclusively non-square and non-unit-stride
> matrices. Doing this revealed a number of (more or less subtle) bugs
> in the various backends, which are now fixed.
> 
> (There is still one case that I didn't manage to fix: the c->r dft 2D
>  case. If anybody wants to have a look, that would be appreciated.
>  The relevant code is in fft/dft.hpp:386. The appropriate tests in 
> fft_be.cpp
>  are commented out for the moment.)
> 
> The only remaining issue, then, is the finalization of the 3D FFTs,
> which is rather simple, as only fftw and dft support it.

Stefan,

This is great work!  Please check it in.

Things we need to do before 1.1 (but after check in):

  - Workspaces need to allocate temporary storage that exists for
    the life of the Fft object.

    To help maange with split/interleaved, Rt_ext_data will convert a
    pointer to interleaved into a pointer for split (but not
    visa versa).

  - Add FFTW3 split support.

Minor things to think about after 1.1

  - Merging requires_copy into queury_layout to reduce number of virtual
    function calls.  This is pretty far down the path of diminishing
    returns.

  - Naming convention for Fft/Fftm axii.  With the 'A' and '1-A', it
    would be good to have a convention that indicated whether an axis was
    Fftm-convention of Fft-convention.  Could be something as simple
    as Ax and Ay (i.e. Ax == 1 - Ay), but we should be able to do better.

Would you like me to take a look at the FFTW3 split support?

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Wed May 10 01:25:51 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 09 May 2006 21:25:51 -0400
Subject: [vsipl++] patch: Fix bug in configure.ac that caused VSIP_IMPL_FFTW3
 not always to be defined as requested
In-Reply-To: <4460F951.7070107@codesourcery.com>
References: <4460F951.7070107@codesourcery.com>
Message-ID: <4461411F.7040705@codesourcery.com>

Stefan Seefeld wrote:
> Just what the subject implies. The variables are defined if either
> 'fftw3' or 'builtin' backend is selected.
> 
> Ok, to checkin ?

Looks good to me. -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Wed May 10 03:11:48 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 09 May 2006 23:11:48 -0400
Subject: [vsipl++] patch: FFT 1D / 2D / M bug fixes and test enhancements.
In-Reply-To: <446140CA.6030007@codesourcery.com>
References: <4460F24F.7070102@codesourcery.com> <446140CA.6030007@codesourcery.com>
Message-ID: <446159F4.3050903@codesourcery.com>

Jules Bergmann wrote:

> This is great work!  Please check it in.

Thanks. It's checked in now (with the ChangeLog ! :-) ).

I will look into the task list tomorrow morning.

Thanks,
		Stefan


From jules at codesourcery.com  Wed May 10 03:43:17 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 09 May 2006 23:43:17 -0400
Subject: [vsipl++] patch: FFT 1D / 2D / M bug fixes and test enhancements.
In-Reply-To: <4460F24F.7070102@codesourcery.com>
References: <4460F24F.7070102@codesourcery.com>
Message-ID: <44616155.3080004@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch enhances the fft_be.cpp tests to cover all
> 1D, 2D, and M Fft variants, inclusively non-square and non-unit-stride
> matrices. Doing this revealed a number of (more or less subtle) bugs
> in the various backends, which are now fixed.
> 
> (There is still one case that I didn't manage to fix: the c->r dft 2D
>  case. If anybody wants to have a look, that would be appreciated.
>  The relevant code is in fft/dft.hpp:386. The appropriate tests in 
> fft_be.cpp
>  are commented out for the moment.)
> 

Stefan,

It turns out the 1D complex->real DFT was broken.  It was using the 
wrong index/exponent when calling sin_cos().  The symmetry means the 
complex values wrap around, but the exponents still progress as normal.

I believe the 1D C->R tests weren't finding this because the ramp 
function only generates real inputs.  For the 2D C->R case, the initial 
1D C->C FFT generates values with non-zero imaginary parts which trips 
up the 1D C->R bug.

				-- Jules


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dft.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060509/8fccae19/attachment.ksh>

From stefan at codesourcery.com  Wed May 10 13:14:28 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Wed, 10 May 2006 09:14:28 -0400
Subject: patch: Preallocate FFT buffers in workspace
Message-ID: <4461E734.3030309@codesourcery.com>

The attached patch lets workspaces pre-allocate memory potentially to
be used in the call operators.
It also contains a fix for the 1D c->r dft failure, reported earlier
(thanks Jules !).

Ok to checkin ?

Thanks,
		Stefan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/15ddfba1/attachment.ksh>

From jules at codesourcery.com  Wed May 10 13:40:00 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 10 May 2006 09:40:00 -0400
Subject: [vsipl++] patch: Preallocate FFT buffers in workspace
In-Reply-To: <4461E734.3030309@codesourcery.com>
References: <4461E734.3030309@codesourcery.com>
Message-ID: <4461ED30.40101@codesourcery.com>

Stefan Seefeld wrote:
> The attached patch lets workspaces pre-allocate memory potentially to
> be used in the call operators.
> It also contains a fix for the 1D c->r dft failure, reported earlier
> (thanks Jules !).
> 
> Ok to checkin ?

This looks good, please check it in. -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Wed May 10 21:11:46 2006
From: don at codesourcery.com (Don McCoy)
Date: Wed, 10 May 2006 15:11:46 -0600
Subject: [patch] Quickstart Guide update for FFT and LAPACK options
Message-ID: <44625712.8010206@codesourcery.com>

The attached patch needs verifying yet, but I was having some trouble 
building the docs.  Thought it would be good to put this up and make 
sure it was technically correct in the meantime.

Don


-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qs.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/a9026d86/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qs.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/a9026d86/attachment-0001.ksh>

From jules at codesourcery.com  Thu May 11 01:46:18 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 10 May 2006 21:46:18 -0400
Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK
 options
In-Reply-To: <44625712.8010206@codesourcery.com>
References: <44625712.8010206@codesourcery.com>
Message-ID: <4462976A.4020402@codesourcery.com>

Don,


Don McCoy wrote:
> The attached patch needs verifying yet, but I was having some trouble 
> building the docs.  Thought it would be good to put this up and make 
> sure it was technically correct in the meantime.
> 
> Don
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 2006-05-10  Don McCoy  <don at codesourcery.com>
> 
> 	* doc/quickstart/quickstart.xml: Updated options for --enable-fft= and
> 	  --with-lapack to reflect recent additions.
> 
> 
> ------------------------------------------------------------------------
> 
> 
> Index: doc/quickstart/quickstart.xml
> ===================================================================
> RCS file: /home/cvs/Repository/vpp/doc/quickstart/quickstart.xml,v
> retrieving revision 1.29
> diff -c -p -r1.29 quickstart.xml
> *** doc/quickstart/quickstart.xml	28 Apr 2006 23:25:43 -0000	1.29
> --- doc/quickstart/quickstart.xml	10 May 2006 21:00:24 -0000
> ***************
> *** 742,759 ****
>        </varlistentry>
>   
>        <varlistentry>
> !       <term><option>--with-fft=<replaceable>lib</replaceable></option></term>
>         <listitem>
>          <para>
>           Search for and use the FFT library indicated by
>           <replaceable>lib</replaceable> to perform FFTs.  Valid
> ! 	choices for <replaceable>lib</replaceable> include
> ! 	<option>fftw3</option>, <option>ipp</option>, and
> !         <option>sal</option>, which select the FFTW3, IPP, and SAL
> !         libraries respectively.  If no FFT library is to be used
> !         (disabling Sourcery VSIPL++'s FFT functionality),
> !         <option>none</option> should be chosen for
> !         <replaceable>lib</replaceable>.
>          </para>
>         </listitem>
>        </varlistentry>
> --- 742,764 ----
>        </varlistentry>
>   
>        <varlistentry>
> !       <term><option>--enable-fft=<replaceable>lib</replaceable></option></term>
>         <listitem>
>          <para>
>           Search for and use the FFT library indicated by
>           <replaceable>lib</replaceable> to perform FFTs.  Valid
> ! 	choices for <replaceable>lib</replaceable> include 
> ! 	<option>fftw3</option>, <option>ipp</option>, and 
> !         <option>sal</option>, which select FFTW3, IPP, and SAL
> !         libraries respectively.  A fourth option, <option>builtin</option>,
> !         selects the FFTW3 library that comes with Sourcery VSIPL++ (default).
> !         This option should be used if an existing FFTW3 library is not available.
> !         If no FFT library is to be used (disabling Sourcery VSIPL++'s FFT 
> !         functionality), <option>no_fft</option> should be chosen for
> !         <replaceable>lib</replaceable>.  Advanced uses may specify 

"Advanced users ..." might scare people off :)  How about say something 
like:

"Multiple libraries may be given as a comma separated list.  When 
performing an FFT, VSIPL++ will use the first library in the list that 
can support the FFT parameters.  For example, on Mercury systems 
<option>--enable-fft=sal,builtin</option> would use SAL's FFT when 
possible, falling back to VSIPL++'s builtin FFTW3 otherwise.

> !         more than one option separated by commans.  This causes VSIPL++
                                               ^ commas
> !         to attempt to use one FFT library before falling back to 
> !         another if necessary.  Example: --enable-fft=sal,builtin
>          </para>
>         </listitem>
>        </varlistentry>
> ***************
> *** 794,807 ****
>           Search for and use the LAPACK library indicated by
>           <replaceable>lib</replaceable> to perform linear algebra
>           (matrix-vector products and solvers).  Valid choices for
> !         <replaceable>lib</replaceable> include <option>mkl</option>,
> ! 	<option>atlas</option>, <option>generic</option>, and
> ! 	<option>builtin</option>.
>          </para>
>   
>          <para>
> !         <option>mkl</option> selects the Intel Math Kernel Library (MKL)
> ! 	to perform linear algebra if found.
>          </para>
>          <para>
>           <option>atlas</option> selects the ATLAS library
> --- 799,817 ----
>           Search for and use the LAPACK library indicated by
>           <replaceable>lib</replaceable> to perform linear algebra
>           (matrix-vector products and solvers).  Valid choices for

I think we should leave the text as is.  The '--with-lapack=mkl' option 
still works.  Internally configure tries to do the right thing by 
intrepeting it as '--with-lapack="mkl7 mkl5"', searching first for MKL 
v7 or v8, then for MKL v5.

I will modify the configure.ac's builtin documentation to match the 
quickstart.

However, we should add 'acml' to the list of lapack libraries.

"<option>acml</option> selects the AMD Core Math Library (ACML) to 
perform linear algebra if found."

Also, if we document --with-mkl-prefix, we should document 
--with-acml-prefix too.

> !         <replaceable>lib</replaceable> include <option>mkl7</option>, 
> !         <option>mkl5</option>, <option>atlas</option>, 
> !         <option>generic</option>, <option>builtin</option>, and
> ! 	<option>fortran-builtin</option>.
>          </para>
>   
>          <para>
> !         <option>mkl7</option> selects the Intel Math Kernel Library (MKL)
> !         version 7.x or above to perform linear algebra if found.
> !        </para>
> !        <para>
> !         <option>mkl5</option> selects the Intel Math Kernel Library (MKL)
> !         version 5.x to perform linear algebra if found.
>          </para>
>          <para>
>           <option>atlas</option> selects the ATLAS library
> ***************
> *** 812,822 ****
>   	(-llapack) to perform linear algebra if found.
>          </para>
>          <para>
> !         <option>builtin</option> selects the builtin version of ATLAS
> !         to perform linear algebra.  This option requires building
> !         ATLAS which can take considerable time and is not supported
> !         on all platforms.  It is only recommended if MKL, ATLAS, or
> ! 	a generic LAPACK or not already installed on the platform.
>          </para>
>         </listitem>
>        </varlistentry>
> --- 822,842 ----
>   	(-llapack) to perform linear algebra if found.
>          </para>
>          <para>
> !         <option>builtin</option> selects the builtin version of 
> !         ATLAS/C-LAPACK to perform linear algebra.  This option 
> !         requires building ATLAS which can take considerable time 
> !         and is not supported on all platforms.  It is only recommended 
> !         if MKL, ATLAS, or a generic LAPACK or not already installed on 
> !         the platform.
> !        </para>
> !        <para>
> !         <option>fortran-builtin</option> selects the builtin version
> !         of ATLAS/F77-LAPACK to perform linear algebra.  Like the 
> !         <option>builtin</option>, this option requires building ATLAS
> !         as well.  In this case, it uses the FORTRAN version instead of
> !         the C version of LAPACK.  Note this option requires the g2c

Note this option requires *a fortran compiler* and the g2c library.

> !         library.  Use the <option>--with-g2c-path=</option> option if
> !         this library is not installed in a standard location.
>          </para>
>         </listitem>
>        </varlistentry>


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Thu May 11 02:16:43 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 10 May 2006 22:16:43 -0400
Subject: [patch] Fft by-reference of const_View and View
Message-ID: <44629E8B.9080605@codesourcery.com>

Stefan,

Here is a test that illustrates the failure I'm seeing with the fft.cpp 
test, along with a patch that fixes it.  Ok to commit?

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fft.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/6cbff94d/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: x-fft.cpp
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/6cbff94d/attachment-0001.ksh>

From jules at codesourcery.com  Thu May 11 03:11:27 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 10 May 2006 23:11:27 -0400
Subject: [patch] Build FFTW on GreenHills/PowerPC/MCOE
Message-ID: <4462AB5F.70009@codesourcery.com>

The following patch adds support for timers available on MCOE.

It also fixes several issues with the object file extension.

			-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cl.fftw
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/917722bc/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fftw.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/917722bc/attachment-0001.ksh>

From jules at codesourcery.com  Thu May 11 03:14:30 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 10 May 2006 23:14:30 -0400
Subject: [patch] Disable use of Fortran in ATLAS
Message-ID: <4462AC16.4070607@codesourcery.com>

This patch adds a new '--disable-fortran' configure option to ATLAS.  It 
disables configure's probing of the Fortran API, and it disables 
building of the libf77blas wrapper library.

The top-level configure automatically configures ATLAS with 
--disable-fortran when using the builtin C Lapack (i.e. 
--with-lapack=buildin).

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: atlas.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/e61237d8/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cl.atlas
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/e61237d8/attachment-0001.ksh>

From don at codesourcery.com  Thu May 11 04:37:20 2006
From: don at codesourcery.com (Don McCoy)
Date: Wed, 10 May 2006 22:37:20 -0600
Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK
 options
In-Reply-To: <4462976A.4020402@codesourcery.com>
References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com>
Message-ID: <4462BF80.7030701@codesourcery.com>

Patch with suggested changes is attached.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qsg.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/a3948fed/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qsg.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/a3948fed/attachment-0001.ksh>

From don at codesourcery.com  Thu May 11 04:54:15 2006
From: don at codesourcery.com (Don McCoy)
Date: Wed, 10 May 2006 22:54:15 -0600
Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK
 options
In-Reply-To: <4462BF80.7030701@codesourcery.com>
References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> <4462BF80.7030701@codesourcery.com>
Message-ID: <4462C377.8080007@codesourcery.com>

Don McCoy wrote:
> Patch with suggested changes is attached.
> 

I caught a grammatical error and a leftover 'the', so I took the 
opportunity to reword that paragraph slightly.  Please disregard the 
previous patch. :)


-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qsg.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/5523b678/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: qsg.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060510/5523b678/attachment-0001.ksh>

From jules at codesourcery.com  Thu May 11 10:45:43 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 11 May 2006 06:45:43 -0400
Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK
 options
In-Reply-To: <4462C377.8080007@codesourcery.com>
References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> <4462BF80.7030701@codesourcery.com> <4462C377.8080007@codesourcery.com>
Message-ID: <446315D7.7090703@codesourcery.com>

Don McCoy wrote:
> Don McCoy wrote:
>> Patch with suggested changes is attached.
>>
> 
> I caught a grammatical error and a leftover 'the', so I took the 
> opportunity to reword that paragraph slightly.  Please disregard the 
> previous patch. :)

Don, this looks good to me! -- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From stefan at codesourcery.com  Thu May 11 12:09:45 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Thu, 11 May 2006 08:09:45 -0400
Subject: [vsipl++] [patch] Fft by-reference of const_View and View
In-Reply-To: <44629E8B.9080605@codesourcery.com>
References: <44629E8B.9080605@codesourcery.com>
Message-ID: <44632989.3010800@codesourcery.com>

Jules,

I'm sorry I didn't get back to you on this earlier. I was unable to reproduce
the failure, which might be because I'm using gcc 4.1, while you are using
3.4.x, right ?
I looked into the issue briefly when you told me initially about a potential
problem with passing const views, but put the issue aside when I noticed
that my tests indeed use const views, yet there was no error. I'm not
sure whether gcc 4.1 is too permissive here, i.e. whether somehow it
manages to create a temporary non-const view out of a const view. That's
something to follow up on later...


Jules Bergmann wrote:
> Stefan,
> 
> Here is a test that illustrates the failure I'm seeing with the fft.cpp 
> test, along with a patch that fixes it.  Ok to commit?

The patch looks good !

Thanks,
		Stefan


From jules at codesourcery.com  Thu May 11 14:56:44 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 11 May 2006 10:56:44 -0400
Subject: [patch] update release.sh
Message-ID: <446350AC.3070505@codesourcery.com>

These changes were made to release.sh for building the solaris snapshot.
release.sh is the script we used to build the 1.0 and subsequent 
snapshots.  It basically runs scripts/package.py after setting the 
environment.

Stefan, release.sh has the approximate paths we should be using for the 
the source and binary packages.  There are some hacks, in particular 
gc6.6 and dot disappeared from cugel at some point.  I rebuilt dot in 
~jules/local/graphviz-2.6 and copied gc6.6 into ~jules/build-cugel. 
Perhaps we should build those packages "once and for all" in 
/home/vsiplxx after 1.1 to lock them down.  Also, for solaris, 
~jules/local/sun4/bin has 'pkg-config'.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rel.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060511/2ffb1bed/attachment.ksh>

From don at codesourcery.com  Fri May 12 02:50:24 2006
From: don at codesourcery.com (Don McCoy)
Date: Thu, 11 May 2006 20:50:24 -0600
Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK
 options
In-Reply-To: <446315D7.7090703@codesourcery.com>
References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> <4462BF80.7030701@codesourcery.com> <4462C377.8080007@codesourcery.com> <446315D7.7090703@codesourcery.com>
Message-ID: <4463F7F0.6010306@codesourcery.com>

Jules Bergmann wrote:
> 
> 
> Don, this looks good to me! -- Jules
> 

Committed.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From jules at codesourcery.com  Fri May 12 12:22:16 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 12 May 2006 08:22:16 -0400
Subject: [patch] Parallel FFTM
Message-ID: <44647DF8.4080405@codesourcery.com>

This patch adds support for Fftm to work in parallel.  At the core, this 
required three minor changes:

  - Have fftm_facade call the workspace with 'view.local()' instead
    of view

    I.e. for by_reference, change call from

	workspace_.by_refernce(..., in, out);

    to:

	workspace_.by_reference(..., in.local(), out.local());

  - Have the fftw3 backend use the rows/cols passed from the workspace
    to determine the number of FFTs to perform (instead of the saved
    value mult_).

    (Similar changes may need to be made for the other backends.  I
    will look at SAL next).

  - For by_value Fftm, use the input's map as the map for the output.
    (Also, since Fast_block apparently can't be distributed at the
    moment, use Dense block for distributed results).

Also included in this patch:

  - Add input checking to fftm_facade.  In particular
     - check that input and output view sizes are correct.
     - check that maps for distributed data are supported.
    This is primarily a "usability" enhancement to detect incorrect usage
    at the root.

  - Fix Wall warnings (unsigned vs signed comparison) in
    fftw3/fft_impl.hpp

  - (Non FFT related): have configure default to --with-lapack=probe
    if no --with-lapack option given.  (This is consistent with our
    MPI behavior).

  - (Non FFT related): add --with-test-level option to set
    VSIP_IMPL_TEST_LEVEL.


Stefan, OK to apply?

					-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fftm.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060512/19b45132/attachment.ksh>

From jules at codesourcery.com  Fri May 12 19:29:01 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Fri, 12 May 2006 15:29:01 -0400
Subject: [patch] Improvements for distributed split-complex and solver tests
 on mercury.
Message-ID: <4464E1FD.4000503@codesourcery.com>

Fixes some problems encountered when building a parallel library with 
split-complex.

Fixes for solver tests to run on Mercury.  All solver tests pass on 
mercury with exception of solver-lu.

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060512/3af0123c/attachment.ksh>

From stefan at codesourcery.com  Sun May 14 00:27:13 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Sat, 13 May 2006 20:27:13 -0400
Subject: patch: remove declarators for unused parameters.
Message-ID: <44667961.8010605@codesourcery.com>

The attached patch removes declarators for unused function parameters,
and fixes a wrong signature of a function forward-declaration.
The patch is checked in.

Regards,
		Stefan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060513/5f513f01/attachment.ksh>

From jules at codesourcery.com  Sun May 14 02:20:07 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sat, 13 May 2006 22:20:07 -0400
Subject: [patch] Last minute patches
Message-ID: <446693D7.1010606@codesourcery.com>

A collection of miscellaneous patches.  Patch applied.	-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060513/6775ed11/attachment.ksh>

From stefan at codesourcery.com  Sun May 14 05:52:45 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Sun, 14 May 2006 01:52:45 -0400
Subject: patch: Add full 3D FFT support.
Message-ID: <4466C5AD.5030102@codesourcery.com>

The attached patch adds full 3D FFT support (to the fftw3 and dft backends),
and cleans up some places I missed in the previous patch.

Regards,
		Stefan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060514/e9c6c185/attachment.ksh>

From jules at codesourcery.com  Sun May 14 06:55:38 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 14 May 2006 02:55:38 -0400
Subject: [patch] Cleanup for release.
Message-ID: <4466D46A.6010401@codesourcery.com>

This patch
  - fixes a bug with Rt_extdata incorrectly attempting to deallocate
    storage.
  - fixes a bug in the handling of column-major data for 2D and 3D
    FFTs with the FFTW3 backend.
  - adds checks on data layout for the 2D and 3D FFTW3 backends.
  - fixes the SAL FFTM backend to use the rows/cols of the data
    being processed to determine the number of FFTs to perform
    (as opposed to the size of the FFTM object when created).
    Necessary for distributed FFTMs.
  - Disables SAL's FFTM evaluator from trying to do long-double
    FFTMs
  - Cleans up fft.cpp test to make the ifdefs a little more manageable.

Patch applied.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From jules at codesourcery.com  Sun May 14 07:13:27 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Sun, 14 May 2006 03:13:27 -0400
Subject: [vsipl++] [patch] Cleanup for release.
In-Reply-To: <4466D46A.6010401@codesourcery.com>
References: <4466D46A.6010401@codesourcery.com>
Message-ID: <4466D897.3090901@codesourcery.com>

Doh!  Forgot the patch -- Jules

Jules Bergmann wrote:
> This patch
>  - fixes a bug with Rt_extdata incorrectly attempting to deallocate
>    storage.
>  - fixes a bug in the handling of column-major data for 2D and 3D
>    FFTs with the FFTW3 backend.
>  - adds checks on data layout for the 2D and 3D FFTW3 backends.
>  - fixes the SAL FFTM backend to use the rows/cols of the data
>    being processed to determine the number of FFTs to perform
>    (as opposed to the size of the FFTM object when created).
>    Necessary for distributed FFTMs.
>  - Disables SAL's FFTM evaluator from trying to do long-double
>    FFTMs
>  - Cleans up fft.cpp test to make the ifdefs a little more manageable.
> 
> Patch applied.
> 
>                 -- Jules
> 


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc3.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060514/66503b4f/attachment.ksh>

From jules at codesourcery.com  Mon May 15 18:56:56 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Mon, 15 May 2006 14:56:56 -0400
Subject: [patch]  Finaly 1.1 items
Message-ID: <4468CEF8.7000400@codesourcery.com>

FYI, I applied this patch yesterday in preparation for 1.1 release. -- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc5.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060515/6c43948d/attachment.ksh>

From assem at codesourcery.com  Fri May 19 16:01:45 2006
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 19 May 2006 12:01:45 -0400
Subject: Matlab IO
Message-ID: <446DEBE9.9020201@codesourcery.com>

Everyone,
  This patch adds support for Matlab M file output. I only did diff in 
src/vsip_csl  because I'm chainging other stuff.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05192006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060519/e70725ee/attachment.ksh>

From mark at codesourcery.com  Fri May 19 16:18:17 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Fri, 19 May 2006 09:18:17 -0700
Subject: [vsipl++] Matlab IO
In-Reply-To: <446DEBE9.9020201@codesourcery.com>
References: <446DEBE9.9020201@codesourcery.com>
Message-ID: <446DEFC9.5090808@codesourcery.com>

Assem Salama wrote:
> Everyone,
>  This patch adds support for Matlab M file output. I only did diff in
> src/vsip_csl  because I'm chainging other stuff.

> +all:: lib/libvsip_csl.a

Is this a new library?  If so, why?  We can put this in the ordinary
VISPL++ library.

> Index: matlabformatter.hpp
> ===================================================================
> RCS file: matlabformatter.hpp
> diff -N matlabformatter.hpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ matlabformatter.hpp	19 May 2006 15:59:06 -0000
> @@ -0,0 +1,103 @@

You're missing the usual header-file comments, copyright notice, etc.

> +  //template <template <typename,typename> class ViewT>

Do not check in commented-out code.  Be disciplined; review your
changes, and remove any #if 0'd or commented-out code.

> +      MatlabFormatter(ViewT v) : v_(v), view_name_("a")  {}

I don't think it makes sense to write out an imaginary view name.

Matlab m-files are code; there is a Matlab programming language.
Something like:

  a = [ ... ]

is a statement.  But, a view is just an expression, not a whole
statement.  Just write out the part inside the square brackets.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From stefan at codesourcery.com  Fri May 19 16:22:22 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Fri, 19 May 2006 12:22:22 -0400
Subject: [vsipl++] Matlab IO
In-Reply-To: <446DEFC9.5090808@codesourcery.com>
References: <446DEBE9.9020201@codesourcery.com> <446DEFC9.5090808@codesourcery.com>
Message-ID: <446DF0BE.5060309@codesourcery.com>

Mark Mitchell wrote:
> Assem Salama wrote:
>> Everyone,
>>  This patch adds support for Matlab M file output. I only did diff in
>> src/vsip_csl  because I'm chainging other stuff.
> 
>> +all:: lib/libvsip_csl.a
> 
> Is this a new library?  If so, why?  We can put this in the ordinary
> VISPL++ library.

Actually, we had quite some discussion with Jules about this, and the
agreement was to have libvsip_csl.a. The patch for the GNUmakefile.inc.in
file actually came from my png patch, as there I compile png.cpp into
libvsip_csl.a. The present patch only contains a header...

Regards,
		Stefan


From mark at codesourcery.com  Fri May 19 16:25:40 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Fri, 19 May 2006 09:25:40 -0700
Subject: [vsipl++] Matlab IO
In-Reply-To: <446DF0BE.5060309@codesourcery.com>
References: <446DEBE9.9020201@codesourcery.com> <446DEFC9.5090808@codesourcery.com> <446DF0BE.5060309@codesourcery.com>
Message-ID: <446DF184.5080600@codesourcery.com>

Stefan Seefeld wrote:
> Mark Mitchell wrote:
>> Assem Salama wrote:
>>> Everyone,
>>>  This patch adds support for Matlab M file output. I only did diff in
>>> src/vsip_csl  because I'm chainging other stuff.
>>
>>> +all:: lib/libvsip_csl.a
>>
>> Is this a new library?  If so, why?  We can put this in the ordinary
>> VISPL++ library.
> 
> Actually, we had quite some discussion with Jules about this, and the
> agreement was to have libvsip_csl.a.

OK.  If this has already been settled, I'm happy.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From stefan at codesourcery.com  Fri May 19 16:35:16 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Fri, 19 May 2006 12:35:16 -0400
Subject: [vsipl++] Matlab IO
In-Reply-To: <446DEBE9.9020201@codesourcery.com>
References: <446DEBE9.9020201@codesourcery.com>
Message-ID: <446DF3C4.9020909@codesourcery.com>

Assem,

this looks good. I have some (mostly) stylistic comments:

Assem Salama wrote:

> Index: matlabformatter.hpp

For long composite names I'd find 'matlab_formatter.hpp' more readable.
Also, as there will be a binary matlab format, I think a more descriptive
name for this formatter would be 'Matlab_ascii_formatter'.

> ===================================================================
> RCS file: matlabformatter.hpp
> diff -N matlabformatter.hpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ matlabformatter.hpp	19 May 2006 15:59:06 -0000
> @@ -0,0 +1,103 @@
> +#ifndef VSIP_CSL_MATLABFORMATTER_HPP
> +#define VSIP_CSL_MATLABFORMATTER_HPP
> +
> +#include <string>
> +#include <vsip/support.hpp>
> +
> +/* Declare our classes that we will use for formatting stream output. Note that
> + * these classes will only work for ascii streams
> + */
> +namespace vsip_csl
> +{
> +
> +  //template <template <typename,typename> class ViewT>
> +  template <typename ViewT>
> +  class MatlabFormatter

Same argument here: Isn't this an 'Matlab_ascii_formatter' ? (Note our naming
conventions, i.e. underscore instead of CamelCase.)

> +  {
> +    /* Constructors */
> +    public:
> +      MatlabFormatter(ViewT v) : v_(v), view_name_("a")  {}
> +      MatlabFormatter(ViewT v,std::string name) 
> +        : v_(v), view_name_(name)  {}
> +
> +
> +      MatlabFormatter() {}

I think only providing the constructor taking two arguments is fine.


> +
> +      ~MatlabFormatter() {}
> +
> +    /* Accessors */
> +    public:
> +      ViewT get_view() { return v_; }
> +      std::string get_name() { return view_name_; }

As your class is really just a placeholder, why don't you make it a
struct with only two public members, and a constructor for convenience ?
There is no need to protect these members by accessor methods...

template <typename ViewT>
struct Matlab_text_formatter
{
    Matlab_text_formatter(ViewT v, std::string const &n) : view(v), name(n) {}
    ViewT view;
    std::string name;
};

> +template <typename T,
> +          typename Block0>
> +inline
> +std::ostream&
> +operator<<(
> +  std::ostream&		                          out,
> +  MatlabFormatter<vsip::Matrix<T,Block0> >        mf)
> +  VSIP_NOTHROW

I don't think we want VSIP_NOTHROW here. If you really do mean to assert
that no exception is thrown from within this operator, you should wrap
the whole body in a try block. But I'm not sure why this would be needed.

Also, I think the second argument should be
'MatlabFormatter<const_Matrix<T,Block0> > const &' (note the two consts).
Similarly for the other dimensions.

Do you have an environment to test the generated format ? We should make
sure that matlab actually accepts it.


Regards,
		Stefan


From mark at codesourcery.com  Fri May 19 16:47:58 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Fri, 19 May 2006 09:47:58 -0700
Subject: [vsipl++] Matlab IO
In-Reply-To: <446DF3C4.9020909@codesourcery.com>
References: <446DEBE9.9020201@codesourcery.com> <446DF3C4.9020909@codesourcery.com>
Message-ID: <446DF6BE.4090805@codesourcery.com>

Stefan Seefeld wrote:
> Assem,
> 
> this looks good. I have some (mostly) stylistic comments:
> 
> Assem Salama wrote:
> 
>> Index: matlabformatter.hpp
> 
> For long composite names I'd find 'matlab_formatter.hpp' more readable.
> Also, as there will be a binary matlab format, I think a more descriptive
> name for this formatter would be 'Matlab_ascii_formatter'.

"text" surely?  ASCII is a particular encoding.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From stefan at codesourcery.com  Fri May 19 16:48:29 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Fri, 19 May 2006 12:48:29 -0400
Subject: [vsipl++] Matlab IO
In-Reply-To: <446DF6BE.4090805@codesourcery.com>
References: <446DEBE9.9020201@codesourcery.com> <446DF3C4.9020909@codesourcery.com> <446DF6BE.4090805@codesourcery.com>
Message-ID: <446DF6DD.4000000@codesourcery.com>

Mark Mitchell wrote:
> Stefan Seefeld wrote:
>> Assem,
>>
>> this looks good. I have some (mostly) stylistic comments:
>>
>> Assem Salama wrote:
>>
>>> Index: matlabformatter.hpp
>> For long composite names I'd find 'matlab_formatter.hpp' more readable.
>> Also, as there will be a binary matlab format, I think a more descriptive
>> name for this formatter would be 'Matlab_ascii_formatter'.
> 
> "text" surely?  ASCII is a particular encoding.

Youp. Sorry.

Regards,
		Stefan


From don at codesourcery.com  Sat May 20 00:14:54 2006
From: don at codesourcery.com (Don McCoy)
Date: Fri, 19 May 2006 18:14:54 -0600
Subject: [patch] HPEC CFAR Detection benchmark
In-Reply-To: <4456AB2F.801@codesourcery.com>
References: <4456AB2F.801@codesourcery.com>
Message-ID: <446E5F7E.5000803@codesourcery.com>

Attached is a revised version of the CFAR benchmark.  Suggestions for 
improving the efficiency have been incorporated (thanks Jules!) and 
support for parallel execution has been added.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cf2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060519/d22542c8/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cf2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060519/d22542c8/attachment-0001.ksh>

From assem at codesourcery.com  Mon May 22 17:48:34 2006
From: assem at codesourcery.com (Assem Salama)
Date: Mon, 22 May 2006 13:48:34 -0400
Subject: Matlab IO
Message-ID: <4471F972.7070305@codesourcery.com>

Everyone,
  This patch allows a user to output a matrix using the Matlab binary 
format or the Matlab m file format. I'm still working on the vector 
binary format.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ChangeLog.05222006
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060522/46f28618/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05222006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060522/46f28618/attachment-0001.ksh>

From mark at codesourcery.com  Mon May 22 17:54:57 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Mon, 22 May 2006 10:54:57 -0700
Subject: [vsipl++] Matlab IO
In-Reply-To: <4471F972.7070305@codesourcery.com>
References: <4471F972.7070305@codesourcery.com>
Message-ID: <4471FAF1.3090603@codesourcery.com>

Assem Salama wrote:
> Everyone,
>  This patch allows a user to output a matrix using the Matlab binary
> format or the Matlab m file format.

Good!

I think Stefan should review this formally for check-in.

> +  // some structures to helps determine if a type is single precision
> +  template <typename T>
> +  struct Is_single
> +  { static bool const value = false; };
> +
> +  template <>
> +  struct Is_single<float>
> +  { static bool const value = true; };
> +
> +  template <>
> +  struct Is_single<std::complex<float> >
> +  { static bool const value = true; };

Depending on what you mean by "single-precision" this may not be the
right test.  For example, you might mean "32-bit IEEE", which is not
necessarily the same as "float"; on some systems "double" is 32-bits as
well, and on other systems "float" can be 16 bits, or not IEEE format.

Were you able to test that the format is correct by loading it into
Matlab (or Octave)?

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From assem at codesourcery.com  Mon May 22 17:57:30 2006
From: assem at codesourcery.com (Assem Salama)
Date: Mon, 22 May 2006 13:57:30 -0400
Subject: [vsipl++] Matlab IO
In-Reply-To: <4471FAF1.3090603@codesourcery.com>
References: <4471F972.7070305@codesourcery.com> <4471FAF1.3090603@codesourcery.com>
Message-ID: <4471FB8A.6040908@codesourcery.com>

Mark,
  Yes, I have tested the formatter with octave and it works. I don't 
have Matlab but it seams like octave is compatible in the file loading 
and saving.

Assem Salama

Mark Mitchell wrote:
> Assem Salama wrote:
>   
>> Everyone,
>>  This patch allows a user to output a matrix using the Matlab binary
>> format or the Matlab m file format.
>>     
>
> Good!
>
> I think Stefan should review this formally for check-in.
>
>   
>> +  // some structures to helps determine if a type is single precision
>> +  template <typename T>
>> +  struct Is_single
>> +  { static bool const value = false; };
>> +
>> +  template <>
>> +  struct Is_single<float>
>> +  { static bool const value = true; };
>> +
>> +  template <>
>> +  struct Is_single<std::complex<float> >
>> +  { static bool const value = true; };
>>     
>
> Depending on what you mean by "single-precision" this may not be the
> right test.  For example, you might mean "32-bit IEEE", which is not
> necessarily the same as "float"; on some systems "double" is 32-bits as
> well, and on other systems "float" can be 16 bits, or not IEEE format.
>
> Were you able to test that the format is correct by loading it into
> Matlab (or Octave)?
>
> Thanks,
>
>   


From assem at codesourcery.com  Tue May 23 16:47:15 2006
From: assem at codesourcery.com (Assem Salama)
Date: Tue, 23 May 2006 12:47:15 -0400
Subject: Matalb IO patch
Message-ID: <44733C93.1090204@codesourcery.com>

Everyone,
  This is a new patch with Stefan's suggestions.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05232006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060523/afb56a24/attachment.ksh>

From stefan at codesourcery.com  Tue May 23 17:17:03 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Tue, 23 May 2006 13:17:03 -0400
Subject: [vsipl++] Matalb IO patch
In-Reply-To: <44733C93.1090204@codesourcery.com>
References: <44733C93.1090204@codesourcery.com>
Message-ID: <4473438F.5020300@codesourcery.com>

Assem,

you are on the right way. However, the following issues aren't addressed yet:

* naming issues (redundant 'matlab_' prefixes, as well as some abbreviated
   names such as 'mtrx' and 'hdr', presumably for 'matrix' and 'header')

* dummy constructors with default values that don't appear to be useful

* operator<< not taking const reference for input argument type

* using string::c_str() instead of string::data() (and strcpy instead of strncpy)
   to access content

* using temporary buffer instead of writing directly to output

* Using a C-style cast to dump a compound type as a character buffer,
   which isn't portable (also see Mark's mail concerning float and double
   binary encoding).


Thanks,
		Stefan


From assem at codesourcery.com  Tue May 23 18:57:07 2006
From: assem at codesourcery.com (Assem Salama)
Date: Tue, 23 May 2006 14:57:07 -0400
Subject: Matlab IO patch
Message-ID: <44735B03.1080007@codesourcery.com>

Everyone,
  Some more changes according to Stefan's suggestions.

Assem
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05232006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060523/9bc5b3a0/attachment.ksh>

From assem at codesourcery.com  Wed May 24 10:26:18 2006
From: assem at codesourcery.com (Assem Salama)
Date: Wed, 24 May 2006 06:26:18 -0400
Subject: Matlab IO Patch
Message-ID: <447434CA.9030805@codesourcery.com>

Everyone,
  This patch adds support for tensors.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05242006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060524/b5bef138/attachment.ksh>

From assem at codesourcery.com  Wed May 24 16:54:06 2006
From: assem at codesourcery.com (Assem Salama)
Date: Wed, 24 May 2006 12:54:06 -0400
Subject: Matlab IO Patch
Message-ID: <44748FAE.1080702@codesourcery.com>

Everyone,
  Changes according to Stefan's suggestions.

Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05242006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060524/3adb0b3c/attachment.ksh>

From don at codesourcery.com  Thu May 25 06:58:05 2006
From: don at codesourcery.com (Don McCoy)
Date: Thu, 25 May 2006 00:58:05 -0600
Subject: [patch] HPEC CFAR and SVD combined patches
Message-ID: <4475557D.2060200@codesourcery.com>

For convenience, I've put together these two recent patches along with 
some other minor changes needed for building the HPEC benchmarks on the 
Mercury system.

Relative to the previously posted patch, the changes to the CFAR 
benchmark were only to correct a usage of a non-const array size and 
(also for SVD) the modification needed to elimate a sign-change warning 
on the r/wiob_per_point() functions (which was also done to FIR Bank).

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hb.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/b8aa56f5/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hb.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/b8aa56f5/attachment-0001.ksh>

From jules at codesourcery.com  Thu May 25 14:06:46 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 25 May 2006 10:06:46 -0400
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <44748FAE.1080702@codesourcery.com>
References: <44748FAE.1080702@codesourcery.com>
Message-ID: <4475B9F6.9080203@codesourcery.com>

Assem,

Can you take a look at generalizing the functions and structures to deal 
with arbitrary dimensions, rather than having separate instances for 
each?  Things like:

  - common struct for matlab header (instead of matrix, tensor, etc).

  - common write routine

I'm not sure how vector fits in, since it looks like matlab treats it as 
a special case of matrx.

				-- Jules

Assem Salama wrote:

> +
> +  struct matrix
> +  {
> +    data_element header;
> +    data_element array_flags_header;
> +    char array_flags[8];
> +    data_element dim_header;
> +    int32_t dim1;
> +    int32_t dim2;
> +    data_element array_name_header;
> +  };
> +
> +  struct tensor
> +  {
> +    data_element header;
> +    data_element array_flags_header;
> +    char array_flags[8];
> +    data_element dim_header;
> +    int32_t dim1;
> +    int32_t dim2;
> +    int32_t dim3;
> +    int32_t pad;
> +    data_element array_name_header;
> +  };

To generalize this, how about something like:

template <typename Dim>
struct matlab_header
{
     data_element header;
     data_element array_flags_header;
     char array_flags[8];
     data_element dim_header;
     int32_t dim[Dim];
     int32_t pad[Dim%2];
     data_element array_name_header;
}

> +
> +  // some structures to helps determine if a type is single precision
> +  template <typename T>
> +  struct Is_single
> +  { static bool const value = false; };
> +
> +  template <>
> +  struct Is_single<float>
> +  { static bool const value = true; };
> +
> +  template <>
> +  struct Is_single<std::complex<float> >
> +  { static bool const value = true; };

If Is_single<complex<T> >::value is always the same as 
Is_single<T>::value, then the following is preferrable since it avoids 
the duplicated entries for 'float' and 'complex<float>':

template <typename T>
struct Is_single<std::complex<T> >
   : Is_single<T>
{};

However, judging from how Is_single is used (to determine an enum to 
indicate the value type of elements in a view), we need something more 
general to deal with types other than float and double (i.e. we will 
want to read/write views of int, short, etc).  A traits class that 
converts a C++ type into a matlab enum would work well for this:

// General case.
template <typename T>
struct Matlab_type_traits;


// Complex types reduce to same value as scalar_type.
template <typename T>
struct Matlab_type_traits<std::complex<T> >
   : Matlab_type_traits<T>
{};

template <>
struct Matlab_type_traits<float>
{
   static int const data_type  = muSINGLE;
   static int const class_type = mxSINGLE_CLASS;
}

... double

... int

... etc

For int, we need some support from configure.ac to determine whether int 
is 32 bits (miINT32) or 64 bits (miINT64).

> +
> +  struct header
> +  {
> +    char description[116];
> +    char subsyt_data[8];
> +    char version[2];
> +    char endian[2];
> +  };
> +
> +  // constants for matlab binary format
> +
> +  // data types
> +  const int miINT8           = 1;

Coding standard point: we prefer to put the type before the 'const' 
(i.e. 'int const' instead of 'const int').  For simple types like 'int' 
they are equivalent, but for pointer and reference types, the location 
of the const changes the meaning, i.e. 'const int*' == 'int const*' != 
'int* const'.

> +  const int miUINT8          = 2;


We should generalize this function to work with arbitrary dimension 
views (i.e. vectors, matrices, and tensors).

> +// operator to write tensor to matlab file
> +template <typename T,
> +          typename Block0>
> +inline
> +std::ostream&
> +operator<<(
> +  std::ostream&                                               o,
> +  Matlab_bin_formatter<vsip::const_Tensor<T,Block0> >const&   mbf)
> +{
> +  typedef typename vsip::impl::Scalar_of<T>::type scalar_type;
> +  matlab::data_element temp_data_el;
> +  int    num_points = mbf.v.size(0)*mbf.v.size(1)*mbf.v.size(2);
> +  int    sz;
> +  matlab::tensor m_tensor;
> +
> +  memset(&m_tensor,0,sizeof(m_tensor));
> +
> +  // matrix data type
> +  m_tensor.header.type = matlab::miMATRIX;
> +  m_tensor.header.size = 1; // TEMP
> +
> +  // array flags
> +  m_tensor.array_flags_header.type = matlab::miUINT32;
> +  m_tensor.array_flags_header.size = 8;
> +  if(vsip::impl::Is_complex<T>::value) 
> +    m_tensor.array_flags[1] |= 0x8; // Complex
> +  if(matlab::Is_single<T>::value)
> +    m_tensor.array_flags[0] = matlab::mxSINGLE_CLASS; // single precision
> +  else
> +    m_tensor.array_flags[0] = matlab::mxDOUBLE_CLASS; // double precision
> +  
> +  // dimension sizes
> +  m_tensor.dim_header.type = matlab::miINT32;
> +  m_tensor.dim_header.size = 12;
> +  m_tensor.dim1 = mbf.v.size(0);
> +  m_tensor.dim2 = mbf.v.size(1);
> +  m_tensor.dim3 = mbf.v.size(2);
> +
> +  // array name
> +  m_tensor.array_name_header.type = matlab::miINT8;
> +  m_tensor.array_name_header.size = mbf.view_name.length();
> +
> +
> +  // calculate size
> +  sz = sizeof(m_tensor)-8;
> +  sz += mbf.view_name.length();
> +  sz += (8-mbf.view_name.length())&0x7;
> +  sz += 8; // 8 bytes of header for real data
> +  if(vsip::impl::Is_complex<T>::value) sz += 8; // 8 more for complex data
> +  sz += num_points*sizeof(T);
> +  m_tensor.header.size = sz;
> +
> +  o.write(reinterpret_cast<char*>(&m_tensor),sizeof(m_tensor));
> +
> +  // write array name
> +  o.write(mbf.view_name.c_str(),mbf.view_name.length());
> +  // pad

Can you be more specific about the padding requirements?  I.e.

	// Pad this array to 8-byte boundary.

> +  { 
> +    char c=0;
> +    for(int i=0;i < ((8-mbf.view_name.length())&0x7);i++) o.write(&c,1);
> +  }
> +
> +  // write real data
> +  if(matlab::Is_single<T>::value)
> +    temp_data_el.type = matlab::miSINGLE;
> +  else
> +    temp_data_el.type = matlab::miDOUBLE;
> +
> +  temp_data_el.size = sizeof(scalar_type)*num_points;
> +  o.write(reinterpret_cast<char*>(&temp_data_el),sizeof(temp_data_el));
> +
> +  {
> +    scalar_type real_data;
> +

Instead of explicitly handling each dimension, you could use the 
Index/Extent bits here.  The performance is slightly worse, but it 
shouldn't be noticeable on top of performing IO an element at a time.

To improve performance, we should check if data is in right format 
(dense, col-major for non-complex; dense, col-major, split for complex) 
for using Ext_data.  If it is, we can write the data with one or two 
large writes.

> +    // Matlab wants data in col major format
> +    for(vsip::length_type i=0;i<mbf.v.size(2);i++) {
> +      for(vsip::length_type j=0;j<mbf.v.size(1);j++) {
> +        for(vsip::length_type k=0;k<mbf.v.size(0);k++) {
> +          real_data = vsip::impl::fn::impl_real(mbf.v.get(k,j,i));
> +          o.write(reinterpret_cast<char*>(&real_data),sizeof(real_data));
> +	}
> +      }
> +    }
> +  }
> +
> +  if(!vsip::impl::Is_complex<T>::value) return o; // we are done here
> +
> +  // write imaginary data
> +  if(matlab::Is_single<T>::value)
> +    temp_data_el.type = matlab::miSINGLE;
> +  else
> +    temp_data_el.type = matlab::miDOUBLE;
> +
> +  temp_data_el.size = sizeof(scalar_type)*num_points;
> +  o.write(reinterpret_cast<char*>(&temp_data_el),sizeof(temp_data_el));
> +
> +  {
> +    scalar_type imag_data;
> +
> +    // Matlab wants data in col major format
> +    for(vsip::length_type i=0;i<mbf.v.size(2);i++) {
> +      for(vsip::length_type j=0;j<mbf.v.size(1);j++) {
> +        for(vsip::length_type k=0;k<mbf.v.size(0);k++) {
> +          imag_data = vsip::impl::fn::impl_imag(mbf.v.get(k,j,i));
> +          o.write(reinterpret_cast<char*>(&imag_data),sizeof(imag_data));
> +	}
> +      }
> +    }
> +  }
> +
> +  return o;
> +}
> +

> +
> +// operator to write vector to matlab file
> +template <typename T,
> +          typename Block0>
> +inline
> +std::ostream&
> +operator<<(
> +  std::ostream&                                               o,
> +  Matlab_bin_formatter<vsip::const_Vector<T,Block0> > const&  mbf)
> +{

This function will go away as we merge the write functions together 
(although it looks like handling vectors will require some special case 
logic).

> +  // A vector is treated like a mx1 matrix
> +  vsip::Matrix<T> m(1,mbf.v_.size(0));
> +  m.row(0) = mbf.v_;
> +  return o << Matlab_bin_formatter<vsip::Matrix<T> >(m,mbf.view_name_);
> +}
> +


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From mark at codesourcery.com  Thu May 25 14:46:44 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 25 May 2006 07:46:44 -0700
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <4475B9F6.9080203@codesourcery.com>
References: <44748FAE.1080702@codesourcery.com> <4475B9F6.9080203@codesourcery.com>
Message-ID: <4475C354.5030302@codesourcery.com>

Jules Bergmann wrote:

>    int32_t pad[Dim%2];
>    data_element array_name_header;

Zero-element arrays are not valid ISO C++.  However, if data_element is
a 64-bit type, then it will be aligned to an 8-byte boundary on most
systems.

> For int, we need some support from configure.ac to determine whether int is 32 bits (miINT32) or 64 bits (miINT64). 

Why not just:

template <>
struct Matlab_type_traits<int>
{
  static int const data_type = sizeof (int) == 4) ? miINT32 : miINT64;
}


-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Thu May 25 15:03:14 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 25 May 2006 11:03:14 -0400
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <4475C354.5030302@codesourcery.com>
References: <44748FAE.1080702@codesourcery.com> <4475B9F6.9080203@codesourcery.com> <4475C354.5030302@codesourcery.com>
Message-ID: <4475C732.7000108@codesourcery.com>

Mark Mitchell wrote:
> Jules Bergmann wrote:
> 
>>    int32_t pad[Dim%2];
>>    data_element array_name_header;
> 
> Zero-element arrays are not valid ISO C++.  However, if data_element is
> a 64-bit type, then it will be aligned to an 8-byte boundary on most
> systems.

Thanks Mark for pointing that out, I was worried that zero-element 
arrays might not work :(

data_element is a struct containing two 32-bit types, will that be 
aligned to 64-bits?

If we rely on compiler to do the padding for us, we definitely should 
capture this with an assertion so that we know if it ever breaks.

Alternatively, we could use a union to do the padding, something like:

   union
   {
     int32_t dim[Dim];
     int32_t pad[Dim + Dim%2];
   }

> 
>> For int, we need some support from configure.ac to determine whether int is 32 bits (miINT32) or 64 bits (miINT64). 
> 
> Why not just:
> 
> template <>
> struct Matlab_type_traits<int>
> {
>   static int const data_type = sizeof (int) == 4) ? miINT32 : miINT64;
> }

Good suggestion, thanks!

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From mark at codesourcery.com  Thu May 25 15:28:18 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 25 May 2006 08:28:18 -0700
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <4475C732.7000108@codesourcery.com>
References: <44748FAE.1080702@codesourcery.com> <4475B9F6.9080203@codesourcery.com> <4475C354.5030302@codesourcery.com> <4475C732.7000108@codesourcery.com>
Message-ID: <4475CD12.60606@codesourcery.com>

Jules Bergmann wrote:
> Mark Mitchell wrote:
>> Jules Bergmann wrote:
>>
>>>    int32_t pad[Dim%2];
>>>    data_element array_name_header;
>>
>> Zero-element arrays are not valid ISO C++.  However, if data_element is
>> a 64-bit type, then it will be aligned to an 8-byte boundary on most
>> systems.
> 
> Thanks Mark for pointing that out, I was worried that zero-element
> arrays might not work :(
> 
> data_element is a struct containing two 32-bit types, will that be
> aligned to 64-bits?

No...  There's a GCC extension you can use (__attribute__((align))), but
that won't work on all compilers.  You could use:

  int32_t dim[2 * ((Dim + 1) / 2)];

which puts the padding into the array.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Thu May 25 15:30:18 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 25 May 2006 11:30:18 -0400
Subject: [vsipl++] [patch] HPEC CFAR and SVD combined patches
In-Reply-To: <4475557D.2060200@codesourcery.com>
References: <4475557D.2060200@codesourcery.com>
Message-ID: <4475CD8A.4000306@codesourcery.com>

Don McCoy wrote:
> For convenience, I've put together these two recent patches along with 
> some other minor changes needed for building the HPEC benchmarks on the 
> Mercury system.
> 
> Relative to the previously posted patch, the changes to the CFAR 
> benchmark were only to correct a usage of a non-const array size and 
> (also for SVD) the modification needed to elimate a sign-change warning 
> on the r/wiob_per_point() functions (which was also done to FIR Bank).

Don,

This looks good, please check it in (with one exception, see below).

				-- Jules

> 	* src/vsip/signal-window.cpp: Added instantiations of three versions
> 	  of vsip::impl::cost(Block, LP) to include these in the library
> 	  (because of the way the GreenHills compiler behaves).


What benchmark failed to build because of these missing instantiations? 
  We may need to fix this another way (making vsip::impl::cost inline is 
the most likely solution) if the set of instantiations needs to grow 
further.

> Index: src/vsip/signal-window.cpp
> ===================================================================
> RCS file: /home/cvs/Repository/vpp/src/vsip/signal-window.cpp,v
> retrieving revision 1.8
> diff -c -p -r1.8 signal-window.cpp
> *** src/vsip/signal-window.cpp	14 May 2006 20:57:05 -0000	1.8
> --- src/vsip/signal-window.cpp	25 May 2006 06:43:17 -0000
> *************** kaiser( length_type len, scalar_f beta )
> *** 231,235 ****
> --- 231,241 ----
>   #pragma instantiate bool vsip::impl::data_access::is_direct_ok<impl::Fast_block<1, complex<float>, impl::Layout<1, row1_type, impl::Stride_unit_dense, impl::Cmplx_split_fmt>, Local_map>, impl::Rt_layout<1> >(const impl::Fast_block<1, complex<float>, impl::Layout<1, row1_type, impl::Stride_unit_dense, impl::Cmplx_split_fmt>, Local_map> &, const impl::Rt_layout<1>&)
>   #endif
>   
> + #pragma instantiate int vsip::impl::cost<vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unknown, vsip::impl::Cmplx_inter_fmt>, vsip::impl::Component_block<vsip::Dense<1, std::complex<float>, vsip::tuple<0, 1, 2>, vsip::Local_map>, vsip::impl::Real_extractor> >(  vsip::impl::Component_block<vsip::Dense<1, std::complex<float>, vsip::tuple<0, 1, 2>, vsip::Local_map>, vsip::impl::Real_extractor> const& block, vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unknown, vsip::impl::Cmplx_inter_fmt> const& layout)
> + 
> + #pragma instantiate int vsip::impl::cost<vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unit_dense, vsip::impl::Cmplx_inter_fmt>, vsip::Dense<1, std::complex<float>, vsip::tuple<0, 1, 2>, vsip::Local_map> >(const vsip::Dense<1, std::complex<float>, vsip::tuple<0, 1, 2>, vsip::Local_map> & block, const vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unit_dense, vsip::impl::Cmplx_inter_fmt> & layout)
> + 
> + #pragma instantiate int vsip::impl::cost<vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unit_dense, vsip::impl::Cmplx_inter_fmt>, vsip::impl::Fast_block<1, std::complex<float>, vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unit_dense, vsip::impl::Cmplx_inter_fmt>, vsip::Local_map> >(const vsip::impl::Fast_block<1, std::complex<float>, vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unit_dense, vsip::impl::Cmplx_inter_fmt>, vsip::Local_map> & block, const vsip::impl::Layout<1, vsip::tuple<0, 1, 2>, vsip::impl::Stride_unit_dense, vsip::impl::Cmplx_inter_fmt> & layout)
> + 
>   } // namespace vsip


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Thu May 25 16:30:24 2006
From: don at codesourcery.com (Don McCoy)
Date: Thu, 25 May 2006 10:30:24 -0600
Subject: [vsipl++] [patch] HPEC CFAR and SVD combined patches
In-Reply-To: <4475CD8A.4000306@codesourcery.com>
References: <4475557D.2060200@codesourcery.com> <4475CD8A.4000306@codesourcery.com>
Message-ID: <4475DBA0.9040106@codesourcery.com>

Jules Bergmann wrote:
> 
> What benchmark failed to build because of these missing instantiations? 
>  We may need to fix this another way (making vsip::impl::cost inline is 
> the most likely solution) if the set of instantiations needs to grow 
> further.
> 
It was the FIR Bank benchmark initially, but the standard FFT one 
complained about these missing symbols as well.


-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712


From jules at codesourcery.com  Thu May 25 16:36:23 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 25 May 2006 12:36:23 -0400
Subject: [patch] Fix quickstart doc typos, fix configure --disable-fft
Message-ID: <4475DD07.8020007@codesourcery.com>

This patch fixes several typos in the quickstart section on mercury 
configuration.

It also fixes configure.ac to recognize --disable-fft as a synonym for 
--enable-fft= (with no backend).

Stefan, does this look OK?  Or should we make --disable-fft a synonym 
for --enable-fft=no_fft ?

It also includes several 1.1 bits (document the tag in VERSIONS, minor 
scripts changes).

				-- Jules
-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: misc.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/ac7e11a7/attachment.ksh>

From mark at codesourcery.com  Thu May 25 16:42:43 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 25 May 2006 09:42:43 -0700
Subject: [vsipl++] [patch] Fix quickstart doc typos, fix configure --disable-fft
In-Reply-To: <4475DD07.8020007@codesourcery.com>
References: <4475DD07.8020007@codesourcery.com>
Message-ID: <4475DE83.5020803@codesourcery.com>

Jules Bergmann wrote:

> It also fixes configure.ac to recognize --disable-fft as a synonym for
> --enable-fft= (with no backend).

Personally, I'd make --disable-fft an error.  Or, if that bothers you,
I'd make it a synonym for no_fft.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From jules at codesourcery.com  Thu May 25 17:07:08 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Thu, 25 May 2006 13:07:08 -0400
Subject: [vsipl++] [patch] Fix quickstart doc typos, fix configure --disable-fft
In-Reply-To: <4475DE83.5020803@codesourcery.com>
References: <4475DD07.8020007@codesourcery.com> <4475DE83.5020803@codesourcery.com>
Message-ID: <4475E43C.3020800@codesourcery.com>

Mark Mitchell wrote:
> Jules Bergmann wrote:
> 
>> It also fixes configure.ac to recognize --disable-fft as a synonym for
>> --enable-fft= (with no backend).
> 
> Personally, I'd make --disable-fft an error.  Or, if that bothers you,
> I'd make it a synonym for no_fft.
> 

That doesn't bother me.  If we do that, we should rename the option 
'--with-fft' since the current '--enable-fft' implies that it could be 
disabled.

At a high-level, configure would behave like:

  - by default, the library provides FFT support, using the builtin FFT
    backend and perhaps SAL if we detect its existence

    i.e. on most platforms the following option is implied

	--with-fft=builtin

    and on mercury platforms the following option is implied

	--with-fft=sal,builtin

  - users that want to use a different backend can specify it manually:

    for example:

	--with-fft=ipp

    or

	--with-fft=ipp,fftw3

    Optionally, we can have infer which backends to use from the
    options indicating the presence of libraries
    such as IPP.

  - power users that want to disable fft can do so either by specifying
    no backends:

	--with-fft=

    (which causes FFT usage to fail at compile-time)

    or by specifying the empty backend:

	--with-fft=no_fft


How does this sound?

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From don at codesourcery.com  Thu May 25 19:03:56 2006
From: don at codesourcery.com (Don McCoy)
Date: Thu, 25 May 2006 13:03:56 -0600
Subject: [vsipl++] [patch] HPEC CFAR and SVD combined patches
In-Reply-To: <4475CD8A.4000306@codesourcery.com>
References: <4475557D.2060200@codesourcery.com> <4475CD8A.4000306@codesourcery.com>
Message-ID: <4475FF9C.5050409@codesourcery.com>

Jules Bergmann wrote:
> This looks good, please check it in (with one exception, see below).
> 
>                 -- Jules
> 
>>     * src/vsip/signal-window.cpp: Added instantiations of three versions
>>       of vsip::impl::cost(Block, LP) to include these in the library
>>       (because of the way the GreenHills compiler behaves).
> 
> 
> 
> What benchmark failed to build because of these missing instantiations? 
>  We may need to fix this another way (making vsip::impl::cost inline is 
> the most likely solution) if the set of instantiations needs to grow 
> further.
> 

Making that function inline worked.  Checked in.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hb2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/538eb37d/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hb2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/538eb37d/attachment-0001.ksh>

From mark at codesourcery.com  Thu May 25 19:35:14 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Thu, 25 May 2006 12:35:14 -0700
Subject: [vsipl++] [patch] Fix quickstart doc typos, fix configure --disable-fft
In-Reply-To: <4475E43C.3020800@codesourcery.com>
References: <4475DD07.8020007@codesourcery.com> <4475DE83.5020803@codesourcery.com> <4475E43C.3020800@codesourcery.com>
Message-ID: <447606F2.20205@codesourcery.com>

Jules Bergmann wrote:

>  - power users that want to disable fft can do so either by specifying
>    no backends:
> 
>     --with-fft=
> 
>    (which causes FFT usage to fail at compile-time)
> 
>    or by specifying the empty backend:
> 
>     --with-fft=no_fft
> 
> 
> How does this sound?

That sounds fine to me.  However, if you want to have the mode where FFT
fails at compile-time, then I think --disable-fft is fine; I didn't
realize that might be useful.  In other words, I'm also OK with the
original proposal, now that I understand it better.  I have no real
preference.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From assem at codesourcery.com  Fri May 26 00:19:12 2006
From: assem at codesourcery.com (Assem Salama)
Date: Thu, 25 May 2006 20:19:12 -0400
Subject: Matlab IO Patch
Message-ID: <44764980.2000205@codesourcery.com>

Everyone,
  This new patch was a little bit of a rewrite of previous patches, it 
is more generic. I have the reading side almost done, I'm cleaning up 
the code and making it more generic.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05252006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/99e75822/attachment.ksh>

From stefan at codesourcery.com  Fri May 26 00:50:40 2006
From: stefan at codesourcery.com (Stefan Seefeld)
Date: Thu, 25 May 2006 20:50:40 -0400
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <44764980.2000205@codesourcery.com>
References: <44764980.2000205@codesourcery.com>
Message-ID: <447650E0.60003@codesourcery.com>

Assem Salama wrote:

[...]

> +  template <int Dim>
> +  struct view_header
> +  {
> +    data_element header;
> +    data_element array_flags_header;
> +    char array_flags[8];
> +    data_element dim_header;
> +    int32_t dim[Dim + Dim%2];

It would be good to add a little comment with the rationale for
this 'Dim + Dim % 2' expression.

> +    data_element array_name_header;
> +  };
> +
> +  // some structures to helps determine if a type is single precision
> +  template <typename T>
> +  struct Is_single
> +  { static bool const value = false; };

By the way, wouldn't it make sense to add these Is_single helpers to
metaprogramming.hpp ?


> +
> +  template <>
> +  struct Is_single<float>
> +  { static bool const value = true; };
> +
> +  template <>
> +  struct Is_single<std::complex<float> >
> +  { static bool const value = true; };
> +
> +  // a generic reader that allows us to read a generic type and cast to another
> +  template<typename T1,typename T2>
> +  struct Generic_reader
> +  {
> +    // the read function
> +    template <typename T,
> +	      typename Block0>
> +    void read(std::istream& is,vsip::Matrix<T,Block0> m)
> +    {
> +      for(int i=0;i<m.size(1);i++) {
> +        for(int j=0;j<m.size(0);j++) {
> +          is.read(reinterpret_cast<char*>(&data),sizeof(data));
> +	  converted_data = data;
> +	  m.put(j,i,converted_data);
> +	}
> +      }
> +    }
> +
> +    T1 data;
> +    T2 converted_data;
> +  };

You define three distinct types here (T, T1, and T2). I can see the
need for two (if you are reading in a record of floats, but want to
create a block of double, say), what about the third ?
And a related question: what is the use of 'data' and 'converted_data'
outside the read() function ? Why do you use a struct, instead of
simply a function ? (For example:

template <typename T1, typename T2, typename BlockT>
inline void
read_matlab_binary(std::istream &is, Matrix<T2, BlockT> m)
{
   for(int i=0;i<m.size(1);i++)
     for(int j=0;j<m.size(0);j++)
     {
       T1 data;
       is.read(reinterpret_cast<char*>(&data),sizeof(data));
       m.put(j,i,T2(data));
     }
}


which would be used as

Matrix<double> m(rows, cols);
read_matlab_binary<float>(input, m);

[...]

> +  template <typename ViewT>
> +  struct Matlab_bin_formatter
> +  {
> +    Matlab_bin_formatter(ViewT v,std::string const& name) :
> +      v(v), view_name(name)  {}
> +
> +    ViewT v;
> +    std::string view_name;

What about 'view' and 'name' instead of 'v' and 'view_name' ?
Also, this whole struct could be moved into the 'matlab' namespace,
may be as 'Binary_formatter'...

> +
> +  };
> +
> +  struct Matlab_bin_hdr

...and this too, as 'Binary_header'.

Regards,
		Stefan


From assem at codesourcery.com  Fri May 26 01:09:55 2006
From: assem at codesourcery.com (Assem Salama)
Date: Thu, 25 May 2006 21:09:55 -0400
Subject: Funny
Message-ID: <44765563.3000101@codesourcery.com>

I think this is so us :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: funny.JPG
Type: image/jpeg
Size: 34761 bytes
Desc: not available
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060525/ff300578/attachment.jpe>

From assem at codesourcery.com  Fri May 26 20:09:38 2006
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 26 May 2006 16:09:38 -0400
Subject: Matlab IO Patch
Message-ID: <44776082.9060406@codesourcery.com>

Everyone,
  This patch adds the support for reading in views from a matlab binary 
file. Note that this will only work if dims and types match. It doesn't 
create a matrix suitable for the current data set.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cvs.diff.05262006.1.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060526/55541f00/attachment.ksh>

From mark at codesourcery.com  Fri May 26 20:23:50 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Fri, 26 May 2006 13:23:50 -0700
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <44776082.9060406@codesourcery.com>
References: <44776082.9060406@codesourcery.com>
Message-ID: <447763D6.8040407@codesourcery.com>

Assem Salama wrote:

> +  // helper struct to get the imaginary part of a view.
> +  template <typename ViewT,
> +            bool IsComplex =
> +	      vsip::impl::Is_complex<typename ViewT::value_type>::value>
> +  struct Subview_helper;

If this going to come up in other situations (and I bet it will!) I
think we should just add "impl_real" and "impl_imag" to all of our
views.  For complex views, these would just forward to the standard
real/imag functions; for other views, they would return the view unchanged.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From assem at codesourcery.com  Fri May 26 20:25:51 2006
From: assem at codesourcery.com (Assem Salama)
Date: Fri, 26 May 2006 16:25:51 -0400
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <447763D6.8040407@codesourcery.com>
References: <44776082.9060406@codesourcery.com> <447763D6.8040407@codesourcery.com>
Message-ID: <4477644F.2020100@codesourcery.com>

Mark Mitchell wrote:
> Assem Salama wrote:
>
>   
>> +  // helper struct to get the imaginary part of a view.
>> +  template <typename ViewT,
>> +            bool IsComplex =
>> +	      vsip::impl::Is_complex<typename ViewT::value_type>::value>
>> +  struct Subview_helper;
>>     
>
> If this going to come up in other situations (and I bet it will!) I
> think we should just add "impl_real" and "impl_imag" to all of our
> views.  For complex views, these would just forward to the standard
> real/imag functions; for other views, they would return the view unchanged.
>
>   
I agree!!! I have already had to pull this trick in many different places.


From jules at codesourcery.com  Tue May 30 20:18:24 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Tue, 30 May 2006 16:18:24 -0400
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <44776082.9060406@codesourcery.com>
References: <44776082.9060406@codesourcery.com>
Message-ID: <447CA890.5030605@codesourcery.com>

Assem Salama wrote:
> Everyone,
>  This patch adds the support for reading in views from a matlab binary 
> file. Note that this will only work if dims and types match. It doesn't 
> create a matrix suitable for the current data set.
> 
> Thanks,
> Assem Salama
> 
> 
> ------------------------------------------------------------------------
> 

> Index: matlab.hpp
> ===================================================================
> RCS file: matlab.hpp

This file is in the src/vsip_csl directory, correct?

> diff -N matlab.hpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ matlab.hpp	26 May 2006 20:03:54 -0000
> @@ -0,0 +1,287 @@
> +#ifndef VSIP_CSL_MATLAB_HPP
> +#define VSIP_CSL_MATLAB_HPP
> +
> +#include <iostream>
> +#include <vsip/impl/metaprogramming.hpp>
> +#include <vsip/impl/fns_elementwise.hpp>
> +#include <vsip/impl/length.hpp>
> +#include <vsip/impl/domain-utils.hpp>
> +
> +namespace vsip_csl
> +{
> +
> +namespace matlab
> +{
> +  struct data_element
> +  {
> +    int32_t type;
> +    int32_t size;
> +  };
> +
> +  template <int Dim>
> +  struct view_header
> +  {
> +    data_element header;
> +    data_element array_flags_header;
> +    char array_flags[8];
> +    data_element dim_header;
> +    int32_t dim[Dim + Dim%2]; //the dim has to be alligned to an 8 byte boundary
                                                   ^ spelling: 'aligned'
> +    data_element array_name_header;
> +  };
> +
> +  // helper struct to get the imaginary part of a view.
> +  template <typename ViewT,
> +            bool IsComplex =
> +	      vsip::impl::Is_complex<typename ViewT::value_type>::value>
> +  struct Subview_helper;
> +
> +  template <typename ViewT>
> +  struct Subview_helper<ViewT,true>
> +  {
> +    typedef typename ViewT::realview_type realview_type;
> +    typedef typename ViewT::imagview_type imagview_type;
> +
> +    static realview_type real(ViewT v) { return v.real(); }
> +    static imagview_type imag(ViewT v) { return v.imag(); }
> +  };
> +
> +  template <typename ViewT>
> +  struct Subview_helper<ViewT,false>
> +  {
> +    typedef ViewT realview_type;
> +    typedef ViewT imagview_type;
> +
> +    static realview_type real(ViewT v) { return v; }
> +    static imagview_type imag(ViewT v) { return v; }
> +  };
> +
> +
> +  // generic readers that allows us to read a generic type and cast to another
> +  
> +  // the read function for real
> +  template <typename T1,
> +            typename T2,
> +	    typename Block0,
> +	    template <typename,typename> class View>
> +  void read(std::istream& is,View<T2,Block0> v,vsip::impl::Bool_type<true>)
> +  {

Since you use the value 'View<T2, Block0>::dim' several t
imes below, it would be helpful to create a variable for it:

	dimension_type const dim = View<T2, Block0>::dim;

> +    vsip::Index<View<T2,Block0>::dim> my_index;
> +    vsip::impl::Length<View<T2,Block0>::dim> v_extent = extent(v);
> +    int num_points = 1;
> +    typedef typename vsip::impl::Scalar_of<T2>::type scalar_type;
> +    T1 data;
> +
> +    // get num_points
> +    for(int i=0;i<View<T2,Block0>::dim;i++) num_points *= v.size(i);

When iterating over dimensions, you should use 'dimension_type' instead
of int.  I.e.

	for (dimension_type i=0; i<View<...>::dim; ++i)

View<...>::dim is of type 'dimension_type'.  Using an int will generally
work, but will cause gcc to issue warnings about comparison between
signed and unsigned values when compiled with '-Wall -W'.

Also,

	length_type num_points = v.size();

should work.  View::size() with no arguments returns the total size of 
the view.  Also, it returns a value of type 'length_type', not 'int'.

> +
> +    // set index to 0
> +    for(int i=0;i<View<T2,Block0>::dim;i++) my_index[i] = 0;

This shouldn't be necessary, the constructor for Index should initialize 
all of its values to 0.


> +
> +    // read all the points
> +    for(int i=0;i<num_points;i++) {
> +      is.read(reinterpret_cast<char*>(&data),sizeof(data));
> +      put(v,my_index,scalar_type(data));
> +
> +      // increment index
> +      my_index = vsip::impl::next(v_extent,my_index);

We need to extend next to take dimension-ordering into account.  Right 
now it is row-major by default.  I'll post something for this.

> +    }
> +
> +  }
> +
> +  // the read function for complex
> +  template <typename T1,
> +            typename T2,
> +	    typename Block0,
> +	    template <typename,typename> class View>
> +  void read(std::istream& is,View<T2,Block0> v,vsip::impl::Bool_type<false>)
> +  {
> +    vsip::Index<View<T2,Block0>::dim> my_index;
> +    vsip::impl::Length<View<T2,Block0>::dim> v_extent = extent(v);
> +    int num_points = 1;
> +    typename Subview_helper<View<T2,Block0> >::imagview_type imag_view =
> +      Subview_helper<View<T2,Block0> >::imag(v);
> +    typedef typename vsip::impl::Scalar_of<T2>::type scalar_type;
> +    T1 data;
> +
> +    // get num_points
> +    for(int i=0;i<View<T2,Block0>::dim;i++) num_points *= v.size(i);
> +
> +    // read all the points
> +    for(int i=0;i<num_points;i++) {
> +      is.read(reinterpret_cast<char*>(&data),sizeof(data));
> +      put(imag_view,my_index,scalar_type(data));
> +
> +      // increment index
> +      my_index = vsip::impl::next(v_extent,my_index);
> +    }
> +  }

These two variants of read are pretty much the same, except the second 
one puts values into the imaginary subview.

If you move the call to Subview_helper into operator>> below, you can 
eliminate one of the definitions.


> +
> +  struct header
> +  {
> +    char description[116];
> +    char subsyt_data[8];
> +    char version[2];
> +    char endian[2];
> +  };
> +
> +  // constants for matlab binary format
> +
> +  // data types
> +  static int const miINT8           = 1;
> +  static int const miUINT8          = 2;
> +  static int const miINT16          = 3;
> +  static int const miUINT16         = 4;
> +  static int const miINT32          = 5;
> +  static int const miUINT32         = 6;
> +  static int const miSINGLE         = 7;
> +  static int const miDOUBLE         = 9;
> +  static int const miINT64          = 12;
> +  static int const miUINT64         = 13;
> +  static int const miMATRIX         = 14;
> +  static int const miCOMPRESSED     = 15;
> +  static int const miUTF8           = 16;
> +  static int const miUTF16          = 17;
> +  static int const miUTF32          = 18;
> +  
> +  // class types
> +  static int const mxCELL_CLASS     = 1;
> +  static int const mxSTRUCT_CLASS   = 2;
> +  static int const mxOBJECT_CLASS   = 3;
> +  static int const mxCHAR_CLASS     = 4;
> +  static int const mxSPARSE_CLASS   = 5;
> +  static int const mxDOUBLE_CLASS   = 6;
> +  static int const mxSINGLE_CLASS   = 7;
> +  static int const mxINT8_CLASS     = 8;
> +  static int const mxUINT8_CLASS    = 9;
> +  static int const mxINT16_CLASS    = 10;
> +  static int const mxUINT16_CLASS   = 11;
> +  static int const mxINT32_CLASS    = 12;
> +  static int const mxUINT32_CLASS   = 13;
> +
> +  // matlab header traits
> +  template <int size,bool is_signed,bool is_int>
> +  struct Matlab_header_traits

You should leave the base case undefined.  That way attempts to use a 
size we haven't defined (such as size=5 or something) will be caught as 
compile errors.

> +  { 
> +    static int const value_type = 0;
> +    static int const class_type = 0;
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<1, true, true> // char
> +  { 
> +    static int const value_type = miINT8;
> +    static int const class_type = mxINT8_CLASS; 
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<1, false, true> // unsigned char
> +  { 
> +    static int const value_type = miUINT8;
> +    static int const class_type = mxUINT8_CLASS; 
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<2, true, true> // short
> +  { 
> +    static int const value_type = miINT16;
> +    static int const class_type = mxINT16_CLASS; 
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<2, false, true> // unsigned short
> +  { 
> +    static int const value_type = miUINT16;
> +    static int const class_type = mxUINT16_CLASS; 
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<4, true, true> // int
> +  { 
> +    static int const value_type= miINT32;
> +    static int const class_type= mxINT32_CLASS;
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<4, false, true> // unsigned int
> +  { 
> +    static int const value_type= miUINT32;
> +    static int const class_type= mxUINT32_CLASS;
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<4, true, false> // float
> +  { 
> +    static int const value_type= miSINGLE;
> +    static int const class_type= mxSINGLE_CLASS;
> +  };
> +
> +  template <>
> +  struct Matlab_header_traits<8, true, false> // double
> +  { 
> +    static int const value_type= miDOUBLE;
> +    static int const class_type= mxDOUBLE_CLASS;
> +  };
> +
> +  // matlab desired layouts
> +  template <template <typename,typename> class View>
> +  struct Matlab_desired_LP;
> +
> +  template<> struct Matlab_desired_LP<vsip::const_Vector>
> +  { typedef vsip::impl::Layout<1,vsip::col1_type,
> +                     vsip::impl::Stride_unit_dense,vsip::impl::Cmplx_split_fmt>
> +      type; 
> +  };
> +
> +  template<> struct Matlab_desired_LP<vsip::const_Matrix>
> +  { typedef vsip::impl::Layout<2,vsip::col2_type,
> +                     vsip::impl::Stride_unit_dense,vsip::impl::Cmplx_split_fmt>
> +      type; 
> +  };
> +  
> +  template<> struct Matlab_desired_LP<vsip::const_Tensor>
> +  { typedef vsip::impl::Layout<3,vsip::col3_type,
> +                     vsip::impl::Stride_unit_dense,vsip::impl::Cmplx_split_fmt>
> +      type; 
> +  };
> +
> +  template<> struct Matlab_desired_LP<vsip::Vector>
> +  { typedef vsip::impl::Layout<1,vsip::col1_type,
> +                     vsip::impl::Stride_unit_dense,vsip::impl::Cmplx_split_fmt>
> +      type; 
> +  };
> +
> +  template<> struct Matlab_desired_LP<vsip::Matrix>
> +  { typedef vsip::impl::Layout<2,vsip::col2_type,
> +                     vsip::impl::Stride_unit_dense,vsip::impl::Cmplx_split_fmt>
> +      type; 
> +  };
> +  
> +  template<> struct Matlab_desired_LP<vsip::Tensor>
> +  { typedef vsip::impl::Layout<3,vsip::col3_type,
> +                     vsip::impl::Stride_unit_dense,vsip::impl::Cmplx_split_fmt>
> +      type; 
> +  };
> +
> +  // helper function to return the real and imaginary part of a pointer
> +  
> +  template<typename T>
> +  inline T* get_real_ptr(std::pair<T*,T*> ptr,vsip::impl::Bool_type<true>) 
> +    { return ptr.first; }
> +  template<typename T>
> +  inline T* get_real_ptr(T* ptr,vsip::impl::Bool_type<false>)
> +    { return ptr; }

You shouldn't need the Bool_type<...> variable to disambiguate these 
overloads.

> +
> +  template<typename T>
> +  inline T* get_imag_ptr(std::pair<T*,T*> ptr,vsip::impl::Bool_type<true>) 
> +    { return ptr.second; }
> +  template<typename T>
> +  inline T* get_imag_ptr(T* ptr,vsip::impl::Bool_type<false>)
> +    { return ptr; }
> +
> +
> +
> +} // namesapce matlab
> +
> +} // namespace vsip_csl
> +
> +#endif // VSIP_CSL_MATLAB_HPP
> Index: matlab_bin_formatter.hpp
> ===================================================================
> RCS file: matlab_bin_formatter.hpp
> diff -N matlab_bin_formatter.hpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ matlab_bin_formatter.hpp	26 May 2006 20:03:54 -0000
> @@ -0,0 +1,353 @@
> +/* Copyright (c) 2005, 2006 by CodeSourcery.  All rights reserved. */
> +
> +/** @file    vsip_csl/matlab_bin_formatter.hpp
> +    @author  Assem Salama
> +    @date    2006-05-22
> +    @brief   VSIPL++ CodeSourcery Library: Matlab binary formatter
> +*/
> +
> +#ifndef VSIP_CSL_MATLAB_BIN_FORMATTER_HPP
> +#define VSIP_CSL_MATLAB_BIN_FORMATTER_HPP
> +
> +#include <stdint.h>
> +#include <string>
> +#include <limits>
> +#include <vsip_csl/matlab.hpp>
> +#include <vsip/impl/fns_scalar.hpp>
> +#include <vsip/impl/fns_elementwise.hpp>
> +#include <vsip/impl/metaprogramming.hpp>
> +#include <vsip/impl/view_traits.hpp>
> +#include <vsip/impl/extdata.hpp>
> +
> +namespace vsip_csl
> +{
> +
> +  template <typename ViewT>
> +  struct Matlab_bin_formatter
> +  {
> +    Matlab_bin_formatter(ViewT v,std::string const& name) :
> +      v(v), view_name(name)  {}
> +
> +    ViewT v;
> +    std::string view_name;
> +
> +  };
> +
> +  struct Matlab_bin_hdr
> +  {
> +    Matlab_bin_hdr(std::string const& descr, std::string const& end) : 
> +      description(descr),version("MATLAB 5.0 : "),endian(end) {}
> +    Matlab_bin_hdr(std::string const& descr) : 
> +      description(descr),version("MATLAB 5.0 : "),endian("MI") {}
> +    Matlab_bin_hdr() : 
> +      description(" "),version("MATLAB 5.0 : "),endian("MI") {}
> +
> +    // description
> +    std::string version;
> +    std::string description;
> +    std::string endian;
> +
> +  };
> +} // namespace vsip_csl
> +
> +/****************************************************************************
> + * Definitions
> + ***************************************************************************/
> +
> +namespace vsip_csl
> +{
> +
> +// operator to write matlab header
> +inline
> +std::ostream&
> +operator<<(
> +  std::ostream&           o,
> +  Matlab_bin_hdr const&   h)
> +{
> +  matlab::header m_hdr;
> +
> +  // set hdr to spaces
> +  memset(&(m_hdr),' ',sizeof(m_hdr));
> +  strncpy(m_hdr.description, h.version.data(), h.version.length());
> +  strncpy(m_hdr.description+h.version.length(), h.description.data(),
> +    h.description.length());
> +  m_hdr.version[1] = 0x01; m_hdr.version[0] = 0x00;
> +  m_hdr.endian[0]=h.endian[0];
> +  m_hdr.endian[1]=h.endian[1];
> +
> +  // write header
> +  o.write(reinterpret_cast<char*>(&m_hdr),sizeof(m_hdr));
> +
> +  return o;
> +}
> +// operator to write a view to a matlab file
> +template <typename T,
> +          typename Block0,
> +	  template <typename,typename> class const_View>
> +inline
> +std::ostream&
> +operator<<(
> +  std::ostream&                                       o,
> +  Matlab_bin_formatter<const_View<T,Block0> > const&  mbf)
> +{
> +  typedef typename vsip::impl::Scalar_of<T>::type scalar_type;
> +  matlab::data_element temp_data_element;
> +  int    sz;
> +  matlab::view_header<vsip::impl::Dim_of_view<const_View>::dim > m_view;
> +  int    num_points = 1;
> +  int    v_dims = vsip::impl::Dim_of_view<const_View>::dim;
> +
> +  memset(&m_view,0,sizeof(m_view));
> +
> +  // matrix data type
> +  m_view.header.type = matlab::miMATRIX;
> +  m_view.header.size = 1; // TEMP
> +
> +  // array flags
> +  m_view.array_flags_header.type = matlab::miUINT32;
> +  m_view.array_flags_header.size = 8;
> +  if(vsip::impl::Is_complex<T>::value) 
> +    m_view.array_flags[1] |= 0x8; // Complex
> +
> +  // fill in class
> +  m_view.array_flags[0] = 
> +    matlab::Matlab_header_traits<sizeof(scalar_type),
> +                  std::numeric_limits<scalar_type>::is_signed,
> +                  std::numeric_limits<scalar_type>::is_integer>::class_type;
> +
> +  // make sure we found a matching trait
> +  assert(m_view.array_flags[0] != 0);
> +  
> +  // dimension sizes
> +  m_view.dim_header.type = matlab::miINT32;
> +  m_view.dim_header.size = v_dims*4; // 4 bytes per dimension
> +  // fill in dimension
> +  for(int i =0;i<v_dims;i++)
> +  {
> +    m_view.dim[i] = mbf.v.size(i);
> +    num_points *= mbf.v.size(i);
> +  }
> +
> +  // if this view is a vector, we need to make second dimension a one
> +  if(v_dims == 1)
> +  {
> +    m_view.dim_header.size += 4;
> +    m_view.dim[1] = 1;
> +  }
> +
> +  // array name
> +  m_view.array_name_header.type = matlab::miINT8;
> +  m_view.array_name_header.size = mbf.view_name.length();
> +
> +
> +  // calculate size
> +  sz = sizeof(m_view)-8;
> +  sz += mbf.view_name.length();
> +  sz += (8-mbf.view_name.length())&0x7;
> +  sz += 8; // 8 bytes of header for real data
> +  if(vsip::impl::Is_complex<T>::value) sz += 8; // 8 more for complex data
> +  sz += num_points*sizeof(T);
> +  m_view.header.size = sz;
> +
> +  o.write(reinterpret_cast<char*>(&m_view),sizeof(m_view));
> +
> +  // write array name
> +  o.write(mbf.view_name.c_str(),mbf.view_name.length());
> +  // pad
> +  { 
> +    char c=0;
> +    for(int i=0;i < ((8-mbf.view_name.length())&0x7);i++) o.write(&c,1);
> +  }
> +
> +  // write real part
> +  {
> +    
> +
> +    vsip::impl::Ext_data<Block0,
> +	                 typename matlab::Matlab_desired_LP<const_View>::type >

This Ext_data usage is good, but there are two problems:

First, if data isn't in the right format (dense, column-major, split if 
complex), this is going to require a memory allocation to reformat the 
data.  It is possible that transforming the data then performing IO with 
large blocks is more efficient than performing IO on small blocks, but 
the memory allocation/deallocation isn't good.  My guess is that we 
could get by with this (IO routines are mostly going to be done outside 
of the computation loop, and what embedded system is going to save data 
in matlab format anyways?).

However, to be safe, we should avoid this memory allocation.  We can do 
this by checking the cost of the Ext_data.

	// If cost == 0, no memory allocation will be done
	if (vsip::impl::Ext_data_cost<Block0,
                typename matlab::Matlab_desired_LP<const_View>::type >
	       ::value == 0)
	{
	   ... current code ...
	}
	else
	{
	   explicit loop, similar to read functionality.
	}


Second, assuming we were to leave this as is without the cost check, 
there is an identical Ext_data object below.  If data is not in the 
right format, it would do an allocation and copy as well, which would be 
overhead.

> +	     
> +	     m_ext(mbf.v.block());
> +
> +    temp_data_element.type = matlab::Matlab_header_traits<sizeof(scalar_type),
> +                  std::numeric_limits<scalar_type>::is_signed,
> +                  std::numeric_limits<scalar_type>::is_integer>::value_type;
> +
> +    temp_data_element.size = num_points*sizeof(scalar_type);
> +    o.write(reinterpret_cast<char*>(&temp_data_element),
> +              sizeof(temp_data_element));
> +    o.write(reinterpret_cast<char*>
> +         (matlab::get_real_ptr<scalar_type>(m_ext.data(),
> +            vsip::impl::Bool_type<vsip::impl::Is_complex<T>::value>())),
> +              num_points*sizeof(scalar_type));
> +  }
> +
> +  if(!vsip::impl::Is_complex<T>::value) return o; //we are done here
> +
> +  // write imaginary part
> +  {
> +    
> +
> +    vsip::impl::Ext_data<Block0,
> +	                 typename matlab::Matlab_desired_LP<const_View>::type >
> +	     
> +	     m_ext(mbf.v.block());
> +
> +    temp_data_element.type = matlab::Matlab_header_traits<sizeof(scalar_type),
> +                  std::numeric_limits<scalar_type>::is_signed,
> +                  std::numeric_limits<scalar_type>::is_integer>::value_type;
> +
> +    temp_data_element.size = num_points*sizeof(scalar_type);
> +    o.write(reinterpret_cast<char*>(&temp_data_element),
> +              sizeof(temp_data_element));
> +    o.write(reinterpret_cast<char*>
> +         (matlab::get_imag_ptr<scalar_type>(m_ext.data(),
> +            vsip::impl::Bool_type<vsip::impl::Is_complex<T>::value>())),
> +              num_points*sizeof(scalar_type));
> +  }
> +
> +
> +  return o;
> +}
> +
> +// operator to read matlab header
> +inline
> +std::istream&
> +operator>>(
> +  std::istream&           o,
> +  Matlab_bin_hdr          h)
> +{
> +  matlab::header m_hdr;
> +
> +  // read header
> +  o.read(reinterpret_cast<char*>(&m_hdr),sizeof(m_hdr));
> +
> +  h.version[1] = m_hdr.version[1];
> +  h.version[0] = m_hdr.version[0];
> +  h.endian[1] = m_hdr.endian[1];
> +  h.endian[0] = m_hdr.endian[0];
> +
> +  return o;
> +}
> +
> +// operator to read view from matlab file
> +template <typename T,
> +          typename Block0,
> +	  template <typename,typename> class View>
> +inline
> +std::istream&
> +operator>>(
> +  std::istream&                                       is,
> +  Matlab_bin_formatter<View<T,Block0> >               mbf)
> +{
> +  matlab::data_element temp_data_element;
> +  matlab::view_header<vsip::impl::Dim_of_view<View>::dim> m_view;
> +  typedef typename vsip::impl::Scalar_of<T>::type scalar_type;
> +  int v_dim = vsip::impl::Dim_of_view<View>::dim;
> +
> +
> +  // read header
> +  is.read(reinterpret_cast<char*>(&m_view),sizeof(m_view));
> +
> +  // is this complex?
> +  if(vsip::impl::Is_complex<T>::value)
> +    assert(m_view.array_flags[1]&0x8);

This should be an exception.  Something like "Attempting to read 
non-complex matlab view into complex VSIPL++ view".

> +
> +  // is this the same class?
> +  assert(m_view.array_flags[0] == 
> +            (matlab::Matlab_header_traits<sizeof(scalar_type),
> +                  std::numeric_limits<scalar_type>::is_signed,
> +                  std::numeric_limits<scalar_type>::is_integer>::class_type));
> +
> +  // do dimensions agree?
> +  if(v_dim == 1) m_view.dim_header.size -= 4; // special case for vectors
> +  assert(v_dim == (m_view.dim_header.size/4));

Likewise.  "Attempting to read N-dimensional matlab data into 
M-dimensional VSIPL++ view"

> +
> +  for(int i=0;i<v_dim;i++)
> +    assert(mbf.v.size(i) == m_view.dim[i]);

Likewise.

> +
> +  // read array name
> +  if(m_view.array_name_header.type & 0xffff0000)
> +  {
> +    // array name is short
> +
> +    int length = m_view.array_name_header.type >> 16;
> +    /*
> +    strncpy(mbf.view_name.data(),
> +            reinterpret_cast<char*>(&m_view.array_name_header.size),
> +	    length);
> +    mbf.view_name[length] = 0;
> +    */
> +  }
> +  else
> +  {
> +    int length = m_view.array_name_header.size;
> +    char c;
> +    char c_array[128];

How is 128 chosen?  Is that specified as the maximum size in the Matlab 
file spec?  If so, an assertion would be in order 'assert(length < 
128)'.  (Strict less-than since we put a \0 at the end).

If not, should we dynamically allocate c_array?

> +    // the name is longer than 4 bytes
> +    //o.read(mbf.view_name.data(),length);
> +    is.read(c_array,length);
> +    c_array[length] = 0;
> +    // read padding
> +    for(int i=0;i<((8-length)&0x7);i++) is.read(&c,1);
> +  }

Here's where we can reduce the number of read functions:

Something like:


	// read real data header
	is.read(reinterpret_cast<char*>(&temp_data_element),
                 sizeof(temp_data_element));
	matlab::read(is, temp_data_element.type,
                      Subview_helper<View<T,Block0> >::real(mbf.v));

	if (vsip::impl::Is_complex<T>::value)
	{
	  // read real data header
	  is.read(reinterpret_cast<char*>(&temp_data_element),
                 sizeof(temp_data_element));
	  matlab::read(is, temp_data_element.type,
                      Subview_helper<View<T,Block0> >::imag(mbf.v));
	}

To do this we would need to move the run-time->compile-time type 
dispatch for matlab into a separate function.

	template <typename ViewT>
	void
	read(
	  std::istream& is,
	  int const     m_type,
	  ViewT         v)
	{
	  if      (m_type == matlab::miINT8)  read<int8_t>(is, view);
	  else if (m_type == matlab::miUNIT8) read<uint8_t>(is, view);
	  ...
	}

	
> +
> +  // read data, we will go in this loop twice if we have complex data
> +  for (int i=0;i <= vsip::impl::Is_complex<T>::value;i++)
> +  {
> +
> +    // read data header
> +    is.read(reinterpret_cast<char*>(&temp_data_element),
> +            sizeof(temp_data_element));
> +
> +    // Because we don't know how the data was stored, we need to instantiate
> +    // generic_reader which can read a type and cast into a different one
> +    if(temp_data_element.type == matlab::miINT8) 
> +    {
> +      if(i==0)matlab::read<int8_t>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<int8_t>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else if(temp_data_element.type == matlab::miUINT8) 
> +    {
> +      if(i==0)matlab::read<uint8_t>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<uint8_t>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else if(temp_data_element.type == matlab::miINT16) 
> +    {
> +      if(i==0)matlab::read<int16_t>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<int16_t>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else if(temp_data_element.type == matlab::miUINT16) 
> +    {
> +      if(i==0)matlab::read<uint16_t>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<uint16_t>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else if(temp_data_element.type == matlab::miINT32) 
> +    {
> +      if(i==0)matlab::read<int32_t>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<int32_t>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else if(temp_data_element.type == matlab::miUINT32) 
> +    {
> +      if(i==0)matlab::read<uint32_t>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<uint32_t>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else if(temp_data_element.type == matlab::miSINGLE) 
> +    {
> +      if(i==0)matlab::read<float>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<float>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +    else
> +    {
> +      if(i==0)matlab::read<double>(is,mbf.v,vsip::impl::Bool_type<true>());
> +      else    matlab::read<double>(is,mbf.v,vsip::impl::Bool_type<false>());
> +    }
> +
> +  }
> +
> +}
> +
> +
> +
> +} // namespace vsip_csl
> +
> +#endif // VSIP_CSL_MATLAB_BIN_FORMATTER_HPP
> Index: matlab_text_formatter.hpp
> ===================================================================
> RCS file: matlab_text_formatter.hpp

Text formatter looks good.

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705


From mark at codesourcery.com  Tue May 30 20:25:14 2006
From: mark at codesourcery.com (Mark Mitchell)
Date: Tue, 30 May 2006 13:25:14 -0700
Subject: [vsipl++] Matlab IO Patch
In-Reply-To: <447CA890.5030605@codesourcery.com>
References: <44776082.9060406@codesourcery.com> <447CA890.5030605@codesourcery.com>
Message-ID: <447CAA2A.3040009@codesourcery.com>

Jules Bergmann wrote:

>> +    int length = m_view.array_name_header.size;
>> +    char c;
>> +    char c_array[128];
> 
> How is 128 chosen?  Is that specified as the maximum size in the Matlab
> file spec?  If so, an assertion would be in order 'assert(length <
> 128)'.  (Strict less-than since we put a \0 at the end).

I think this should be an exception.  One should never trust an external
data format to be correct; the file might be corrupt, etc.  One should
only use assertions for things that will always be true unless the
system (hardware, compiler, OS) or the program itself are broken.  Never
trust users or programs you didn't write.

-- 
Mark Mitchell
CodeSourcery
mark at codesourcery.com
(650) 331-3385 x713


From assem at codesourcery.com  Wed May 31 19:06:05 2006
From: assem at codesourcery.com (Assem Salama)
Date: Wed, 31 May 2006 15:06:05 -0400
Subject: ATLAS undefines
Message-ID: <447DE91D.6030903@codesourcery.com>

Everyone,
  As per Jule's request, this is the output of make when trying to 
compile convolution.cpp in the tests dir. The BLAS that I got with 
CLAPACK has functions similar to these but without the cblas prepended 
and without _sub.

Thanks,
Assem Salama
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: make.log
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060531/ce62985d/attachment.ksh>

From jules at codesourcery.com  Wed May 31 20:36:15 2006
From: jules at codesourcery.com (Jules Bergmann)
Date: Wed, 31 May 2006 16:36:15 -0400
Subject: [vsipl++] ATLAS undefines
In-Reply-To: <447DE91D.6030903@codesourcery.com>
References: <447DE91D.6030903@codesourcery.com>
Message-ID: <447DFE3F.4030508@codesourcery.com>

Assem,

Thanks for posting this.

It looks like we're trying to use the CBLAS bindings for 
CLAPACK/SRC/BLAS.  Unfortunately, looking at the source, it is a Fortran 
API, with a few variances (the complex dot-product Fortran functions 
have been converted to C "subroutines" that return the result by 
reference).  I suspect if you tried to build other tests you would see 
linker errors for functions like cblas_trsm, etc.

For this, we should take an approach similar to how we handled ACML:

  - Have configure define VSIP_IMPL_USE_CBLAS = 4 when using
    CLAPACK/SRC/BLAS

  - In lapack.hpp, when VSIP_IMPL_USE_CBLAS == 4,
     - wrap the dot-product functions to have a CBLAS interface and
       define VSIP_IMPL_USE_CBLAS_DOT = 1.

       This should be done in a separate header file, similar to
       acml_cblas.hpp.

     - Use Fotran API for other BLAS functions
       (VSIP_IMPL_USE_CBLAS_OTHER = 0).

Does that sound OK?

				-- Jules

Assem Salama wrote:
> Everyone,
>  As per Jule's request, this is the output of make when trying to 
> compile convolution.cpp in the tests dir. The BLAS that I got with 
> CLAPACK has functions similar to these but without the cblas prepended 
> and without _sub.
> 
> Thanks,
> Assem Salama
> 
> 
> ------------------------------------------------------------------------
> 
> g++ -g -O2 -I../src -I/drive2/assem/work/checkout/vpp/tests/../src -I/include/atlas -I/include/fftw3  -I/drive2/assem/work/checkout/vpp/vendor/atlas/include -I/drive2/assem/work/build/vpp_temp2/vendor/fftw/include  -o convolution.exe convolution.o -L/lib/atlas -L/lib/fftw3  -L/drive2/assem/work/build/vpp_temp2/vendor/atlas/lib -L/drive2/assem/work/build/vpp_temp2/vendor/fftw/lib -L/drive2/assem/work/build/vpp_temp2/vendor/clapack -L/drive2/assem/work/build/vpp_temp2/lib -L../src/vsip -lvsip -llapack -lF77 -lcblas  -lfftw3f -lfftw3 -lfftw3l   || rm -f convolution.exe
> convolution.o: In function `dot':
> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:180: undefined reference to `cblas_ddot'
> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:217: undefined reference to `cblas_zdotu_sub'
> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:179: undefined reference to `cblas_sdot'
> /drive2/assem/work/checkout/vpp/tests/../src/vsip/impl/lapack.hpp:216: undefined reference to `cblas_cdotu_sub'
> collect2: ld returned 1 exit status


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705