From don at codesourcery.com Mon May 1 05:48:57 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 30 Apr 2006 23:48:57 -0600 Subject: [patch] New benchmark - vector division Message-ID: <4455A149.20504@codesourcery.com> Here is a new benchmark for testing element-wise vector division. Also attached are two performance graphs comparing multiplication and division - one shows mega flops per second and the other latency, or the number of microseconds per operation. The graph showing flops per second is somewhat misleading for two reasons: both divide and multiply for real numbers are each counted as a "flop" even though they take a different number of clock cycles to perform. Second, complex-complex division takes more operations (11, two of which are real-real divisions) than complex-complex multiplication (6). This gives them a comparable flop count, even though the division takes roughly twice as long. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vd.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vd.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mul_div_real_cplx.png Type: image/png Size: 5783 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mul_div_real_cplx_lat.png Type: image/png Size: 5160 bytes Desc: not available URL: From jules at codesourcery.com Mon May 1 12:10:10 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 01 May 2006 08:10:10 -0400 Subject: [vsipl++] [patch] New benchmark - vector division In-Reply-To: <4455A149.20504@codesourcery.com> References: <4455A149.20504@codesourcery.com> Message-ID: <4455FAA2.3060106@codesourcery.com> Don McCoy wrote: > Here is a new benchmark for testing element-wise vector division. Also > attached are two performance graphs comparing multiplication and > division - one shows mega flops per second and the other latency, or the > number of microseconds per operation. Don, This looks good, please check in in. -- Jules > > The graph showing flops per second is somewhat misleading for two > reasons: both divide and multiply for real numbers are each counted as a > "flop" even though they take a different number of clock cycles to > perform. It does answer the question of whether a division FLOP is really the same as a multiply FLOP. Depending on problem size, it looks like 1 div FLOP ~ 8 mul FLOPS. > Second, complex-complex division takes more operations (11, > two of which are real-real divisions) than complex-complex > multiplication (6). This gives them a comparable flop count, even > though the division takes roughly twice as long. Comparing complex-multiply MFLOPS vs complex-division MFLOPS is somewhat of an apples to oranges comparison. The latency numbers, or alternatively measuring points per second, are a good way to look at it. What machine/configuration are the results from? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Mon May 1 15:59:49 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 01 May 2006 09:59:49 -0600 Subject: [vsipl++] [patch] New benchmark - vector division In-Reply-To: <4455FAA2.3060106@codesourcery.com> References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> Message-ID: <44563075.6020507@codesourcery.com> Jules Bergmann wrote: > Comparing complex-multiply MFLOPS vs complex-division MFLOPS is somewhat > of an apples to oranges comparison. The latency numbers, or > alternatively measuring points per second, are a good way to look at it. > That's what I was thinking as well. We have a benchmark that uses both multiply and divide (CFAR), it seemed odd to account for them as one operation each. > What machine/configuration are the results from? > Xeon 3.8 G w/ 2 M cache -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From don at codesourcery.com Mon May 1 16:04:44 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 01 May 2006 10:04:44 -0600 Subject: [vsipl++] [patch] New benchmark - vector division In-Reply-To: <44563075.6020507@codesourcery.com> References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> <44563075.6020507@codesourcery.com> Message-ID: <4456319C.7060906@codesourcery.com> Don McCoy wrote: > Jules Bergmann wrote: > >> What machine/configuration are the results from? >> > Xeon 3.8 G w/ 2 M cache > The configuration uses optimization flags from "SerialBuiltin". Otherwise --with-fft=builtin is set and --with-lapack not specified (defaulting to CLAPACK, no?). -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From jules at codesourcery.com Mon May 1 16:42:07 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 01 May 2006 12:42:07 -0400 Subject: [vsipl++] [patch] New benchmark - vector division In-Reply-To: <4456319C.7060906@codesourcery.com> References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> <44563075.6020507@codesourcery.com> <4456319C.7060906@codesourcery.com> Message-ID: <44563A5F.5000603@codesourcery.com> Don McCoy wrote: > Don McCoy wrote: >> Jules Bergmann wrote: >> >>> What machine/configuration are the results from? >>> >> Xeon 3.8 G w/ 2 M cache >> > The configuration uses optimization flags from "SerialBuiltin". > Otherwise --with-fft=builtin is set and --with-lapack not specified > (defaulting to CLAPACK, no?). > I think that is right: no '--with-lapack' option results in NO lapack being used at all plain '--with-lapack' results in configure searching for the presence of installed atlas, installed generic lapack, and then builtin atlas (using clapack). -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Mon May 1 16:43:32 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 01 May 2006 12:43:32 -0400 Subject: [vsipl++] [patch] New benchmark - vector division In-Reply-To: <44563075.6020507@codesourcery.com> References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> <44563075.6020507@codesourcery.com> Message-ID: <44563AB4.4040802@codesourcery.com> Don McCoy wrote: >> What machine/configuration are the results from? >> > Xeon 3.8 G w/ 2 M cache > Don, You should try building with IPP to see how that affects performance. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Tue May 2 00:43:27 2006 From: don at codesourcery.com (Don McCoy) Date: Mon, 01 May 2006 18:43:27 -0600 Subject: [patch] HPEC CFAR Detection benchmark Message-ID: <4456AB2F.801@codesourcery.com> The attached patch implements the CFAR benchmark. Briefly, this problem involves finding targets based on data within a three-dimensional cube of 'beam locations', 'range gates' and 'doppler bins'. It does this by comparing the signal in a given cell to that of nearby cells in order to avoid false-detection of targets. The range gate parameter is varied when considering 'nearby' cells. A certain number of guard cells are skipped, resulting in a computation that sums the values from two thick slices of this data cube (one on either side of the slice for a particular range gate). The HPEC PCA Kernel-Level benchmark paper has a diagram that shows one cell under consideration. Please refer to it if needed. The algorithm involves these basic steps: - compute the squares of all the values in the data cube - for each range gate: - sum the squares of desired values around the current range gate - compute the normalized power for each cell in the slice - search for values that exceed a certain threshold Some of the code relates to boundary conditions (near either end of the 'range gates' parameter), but otherwise it follows the above description. For now, the original implementation used get/put (actually operator()) instead of using subviews and the element-wise operators. Switching from one to the other resulted in about a 25% improvement in performance for the first set of data (see attached graph). The other sets experienced improvement as well, to varying degrees. I'd like to consider how we can improve the throughput further. Switching the processing order may help possibly. Thoughts are welcome. The benchmark only varies the number of range gates based upon the four sets of parameters defined in the HPEC paper. As the workload is equally dependent on each of the three dimensions, sweeping the other two would not add much value. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, .712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cf.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cf.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cfar_optimized_mflops.png Type: image/png Size: 4346 bytes Desc: not available URL: From jules at codesourcery.com Tue May 2 15:17:16 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 02 May 2006 11:17:16 -0400 Subject: [patch] Minor benchmark fixes Message-ID: <445777FC.7000100@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bm.diff URL: From jules at codesourcery.com Tue May 2 18:29:15 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 02 May 2006 14:29:15 -0400 Subject: [vsipl++] [patch] HPEC CFAR Detection benchmark In-Reply-To: <4456AB2F.801@codesourcery.com> References: <4456AB2F.801@codesourcery.com> Message-ID: <4457A4FB.9040505@codesourcery.com> Don McCoy wrote: > The attached patch implements the CFAR benchmark. Briefly, this problem > involves finding targets based on data within a three-dimensional cube > of 'beam locations', 'range gates' and 'doppler bins'. It does this by > comparing the signal in a given cell to that of nearby cells in order to > avoid false-detection of targets. The range gate parameter is varied > when considering 'nearby' cells. A certain number of guard cells are > skipped, resulting in a computation that sums the values from two thick > slices of this data cube (one on either side of the slice for a > particular range gate). The HPEC PCA Kernel-Level benchmark paper has a > diagram that shows one cell under consideration. Please refer to it if > needed. > > The algorithm involves these basic steps: > - compute the squares of all the values in the data cube > - for each range gate: > - sum the squares of desired values around the current range gate > - compute the normalized power for each cell in the slice > - search for values that exceed a certain threshold > > Some of the code relates to boundary conditions (near either end of the > 'range gates' parameter), but otherwise it follows the above description. Don, Excellent description of the benchmark! Can you put it into the file header as a comment? > > For now, the original implementation used get/put (actually operator()) > instead of using subviews and the element-wise operators. Switching > from one to the other resulted in about a 25% improvement in performance > for the first set of data (see attached graph). The other sets > experienced improvement as well, to varying degrees. I'd like to > consider how we can improve the throughput further. Switching the > processing order may help possibly. Thoughts are welcome. General comments: - Avoid memory allocation/deallocation inside the compute loop. For example, the 't1' temporary matrix in cfar_find_targets() is being allocated/deallocated multiple times during a single test_cfar() call. You could avoid this by moving cfar_find_targets() and cfar_detect() into t_cfar_base class, and then defining the temporary matrices/tensors as member variables. - When taking a slice of a matrix/tensor, use a subview instead of copying data. For example, the 'pow_slice' matrix currently looks like: Matrix pow_slice = cpow(dom0, 0, dom2); cfar_find_targets(... pow_slice ...); for (...) { ... pow_slice = cpow(dom0, j, dom2); cfar_find_targets(... pow_slice ...); } As written, pow_slice is a separate matrix holding a copy of the slice from cpow. Each iteration through the loop, new data is copied into pow_slice. The reason that pow_slice is a copy instead of a reference is because its block type (Dense<2, T>) is different from the block type returned by 'cpow(...)' (an impl-defined block type that I don't know off the top of my head, Subset_block maybe?). To have pow_slice reference the data instead of copying it, it needs to have the same block type returned from 'cpow()'. You can use Tensor::submtraix<2>::type to get the right type (where Tensor is the type of cpow): typename Tensor::submatrix<2>::type pow0_slice = cpow(dom0, 0, dom2); cfar_find_targets(... pow0_slice ...); for (...) { ... typename Tensor::submatrix<2>::type pow_slice = cpow(dom0, j, dom2); cfar_find_targets(... pow_slice ...); } Note that you can't change what pow_slice refers to after you create it (i.e. change from '0' to 'j'). That's why this has 'pow0_slice' and 'pow_slice'. Of course, you could also do away with the explicit variable 'pow_slice' altogether: cfar_find_targets(... cpow(dom0, 0, dom2), ...); for (...) { ... cfar_find_targets(... cpow(dom0, j, dom2), ...); } - When iterating through each element in a matrix or tensor, try to arrange the variables to coincide with the dimension-order. For example, if you have a 3 by 3 row-major matrix: Matrix mat(3, 3); The data will be laid out like this in memory: Address Matrix element 0 0,0 1 0,1 2 0,2 3 1,0 4 1,1 5 1,2 6 2,0 7 2,1 8 2,2 If you iterate over the elements like so: for (index_type j=0; j > The benchmark only varies the number of range gates based upon the four > sets of parameters defined in the HPEC paper. As the workload is > equally dependent on each of the three dimensions, sweeping the other > two would not add much value. > > Regards, > + /*********************************************************************** > + Support > + ***********************************************************************/ > + > + template + typename Block1, > + typename Block2> > + inline > + void > + cfar_find_targets( > + const_Matrix sum, // Sum of values in Cfar gates > + length_type gates, // Total number of Cfar gates used > + const_Matrix pow_slice, // A set of squared values of range gates > + const length_type mu, // Threshold for determining targets > + Matrix targets, // All of the targets detected so far. > + index_type& next, // the next empty slot in targets > + const length_type j) // Current range gate number. > + { > + if ( next >= targets.size(0) ) // full, nothing to do. > + return; > + > + // Compute the local noise estimate. The inverse is calculated in advance > + // for efficiency. > + T inv_gates = (1.0 / gates); > + Matrix tl = sum * inv_gates; > + > + // Make sure we don't divide by zero! We take advantage of a > + // reduction function here, knowing the values are positive. > + Index::dim> idx; > + if ( minval(tl, idx) == T() ) Checking that minval == T() is actually overhead. I.e. expanding it out: // compute minval for (i = ...) for (k = ...) if (t1(i, k) < minval) minval = ... // set 0 if (val == 0.0) for (i = ...) for (k = ...) if (t1(i, k) == 0.0) ... In effect it is looping through the matrix multiple times. Just going through the matrix and looking for zeros should be less expensive. > + { > + for ( index_type k = 0; k < tl.size(1); ++k ) > + for ( index_type i = 0; i < tl.size(0); ++i ) Since t1 is row-major, you should reverse the loop nest. > + if ( tl(i,k) == 0.0 ) { > + tl(i,k) = Precision_traits::eps; > + cout << "! " << i << " " << k << endl; > + } > + } > + > + // Compute the normalized power in the cell > + Matrix normalized_power = pow_slice / tl; Instead of using a separate matrix for normalize_power, you could update t1 in-place: t1 = pow_slice / t1; > + > + > + // If the normalized power is larger than mu record the coordinates. The > + // list of target are held in a [N x 3] matrix, with each row containing > + // the beam location, range gate and doppler bin location of each target. > + // > + for ( index_type k = 0; k < tl.size(1); ++k ) > + for ( index_type i = 0; i < tl.size(0); ++i ) > + { > + if ( normalized_power(i,k) > mu ) > + { > + targets(next,0) = i; > + targets(next,1) = j; > + targets(next,2) = k; > + if ( ++next == targets.size(0) ) // full, nothing else to do. > + return; > + } > + } Looking at this entire function (cfar_find_targets), it could benefit from loop fusion. It has 5 separate loops: - compute t1 - (find minimum -- we can remove this) - replace zero values with eps - compute normalized power - look for detections. Fusing these loops together would process each element start-to-finish, improving temporal locality. Ignoring any vectorization potential, it would be more efficient to have a single loop: for (i = ...) for (k = ...) { T t1 = sum(i, k) * inv_gates; if (t1 == T() t1 = eps; T norm_power = pow_slice(i, k) / t1; if (norm_power > mu) ... record detection ... } It would be nice if we could write a high-level VSIPL++ expression that did the same thing. Something like this might work: count = indexbool( pow_slice / max(sum * inv_gates, eps) > mu, targets(Domain<1>(next, 1, targets.size() - next)); next += count; It would be good to compare the performance of the explicit for loop with the VSIPL++ approach to see if VSIPL++ does a good job. > + } > + > + > + template + typename Block> > + void > + cfar_detect( > + Tensor cube, > + Matrix found, > + length_type cfar_gates, > + length_type guard_cells, > + length_type mu) > + { > + // Description: > + // Main computational routine for the Cfar Kernel Benchmark. Determines > + // targets by finding SNR signal data points that are greater than the > + // noise threshold mu > + // > + // Inputs: > + // cube: [beams x gates x bins] The radar datacube > + // > + // Note: this function assumes that second dimension of input cube C > + // has length (range gates) greater than 2(cfar gates + guard cells). > + // If this were not the case, then the parameters of the radar signal > + // processing would be flawed! Can you put this comment near the assertion that checks it? > + > + length_type beams = cube.size(0); > + length_type gates = cube.size(1); > + length_type dbins = cube.size(2); > + test_assert( 2*(cfar_gates+guard_cells) < gates ); > + > + Tensor cpow = pow(cube, 2); > + > + Domain<1> dom0(beams); > + Domain<1> dom2(dbins); > + Matrix sum(beams, dbins, T()); > + for ( length_type lnd = guard_cells; lnd < guard_cells+cfar_gates; ++lnd ) > + sum += cpow(dom0, 1+lnd, dom2); > + > + Matrix pow_slice = cpow(dom0, 0, dom2); > + > + index_type next_found = 0; > + cfar_find_targets(sum, cfar_gates, pow_slice, mu, found, next_found, 0); > + > + for ( index_type j = 1; j < gates; ++j ) > + { > + length_type gates_used = 0; > + length_type c = cfar_gates; > + length_type g = guard_cells; > + You could move this 'if-then-else' statement outside of the loop. This would result in multiple loops. Since the majority of time is spent in case 3, keeping cases 1 & 2 and 4 & 5 together would be OK. I.e.: - loop for cases 1 & 2 - loop for case 3 - loop for cases 4 & 5 > + // Case 1: No cell included on left side of CFAR; > + // very close to left boundary > + if ( j < (g + 1) ) > + { > + gates_used = c; > + sum += cpow(dom0, j+g+c, dom2) - cpow(dom0, j+g, dom2); > + } > + // Case 2: Some cells included on left side of CFAR; > + // close to left boundary > + else if ( (j >= (g + 1)) & (j < (g + c + 1)) ) > + { > + gates_used = c + j - (g + 1); > + sum += cpow(dom0, j+g+c, dom2) - cpow(dom0, j+g, dom2) > + + cpow(dom0, j-(g+1), dom2); > + } > + // Case 3: All cells included on left and right side of CFAR > + // somewhere in the middle of the range vector > + else if ( (j >= (g + c + 1)) & ((j + (g + c)) < gates) ) > + { > + gates_used = 2 * c; > + sum += cpow(dom0, j+g+c, dom2) - cpow(dom0, j+g, dom2) > + + cpow(dom0, j-(g+1), dom2) - cpow(dom0, j-(c+g+1), dom2); > + } > + // Case 4: Some cells included on right side of CFAR; > + // close to right boundary > + else if ( (j + (g + c) >= gates) & ((j + g) < gates) ) > + { > + gates_used = c + gates - (j + g); > + sum += - cpow(dom0, j+g, dom2) > + + cpow(dom0, j-(g+1), dom2) - cpow(dom0, j-(c+g+1), dom2); > + } > + // Case 5: No cell included on right side of CFAR; > + // very close to right boundary > + else if (j + g >= gates) > + { > + gates_used = c; > + sum += cpow(dom0, j-(g+1), dom2) - cpow(dom0, j-(c+g+1), dom2); > + } > + else > + { > + cerr << "Error: fell through if statements in Cfar detection - " << > + j << endl; > + test_assert(0); > + } > + > + pow_slice = cpow(dom0, j, dom2); > + cfar_find_targets(sum, gates_used, pow_slice, mu, found, next_found, j); > + } > + } > Index: benchmarks/loop.hpp > =================================================================== > RCS file: /home/cvs/Repository/vpp/benchmarks/loop.hpp,v > retrieving revision 1.17 > diff -c -p -r1.17 loop.hpp > *** benchmarks/loop.hpp 13 Apr 2006 19:21:07 -0000 1.17 > --- benchmarks/loop.hpp 2 May 2006 00:26:12 -0000 > *************** Loop1P::sweep(Functor fcn) > *** 286,292 **** > > float factor = goal_sec_ / time; > if (factor < 1.0) factor += 0.1 * (1.0 - factor); > ! loop = (int)(factor * loop); > > if (factor >= 0.75 && factor <= 1.25) > break; > --- 286,299 ---- > > float factor = goal_sec_ / time; > if (factor < 1.0) factor += 0.1 * (1.0 - factor); > ! if ( loop == (int)(factor * loop) ) > ! break; // Avoid getting stuck when factor ~= 1 and loop is small > ! else > ! loop = (int)(factor * loop); > ! if ( loop == 0 ) > ! loop = 1; > ! if ( loop == 1 ) // Quit if loop cannot get smaller > ! break; I was a little confused by this logic at first, but after considering it, it seems OK. I've thought about always starting the loop count at 1 for calibration and only letting it grow. If the new loop is ever smaller than the old one, that would end calibration (calibration would also end of 0.75 <= factor <= 1.25 as currently). Do you think that would work? > > if (factor >= 0.75 && factor <= 1.25) > break; -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Tue May 2 21:26:22 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 02 May 2006 17:26:22 -0400 Subject: [patch] Solver dispatch Message-ID: <4457CE7E.30903@codesourcery.com> This patch changes the dispatch for the LU and cholesky solvers to work when the lapack backend is not available. It updates the LU cholesky solver tests to only test types supported by a backend. It also adds support to use Lapack bindings provided by the AMD Core Math Library (ACML) when --with-lapack=acml. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: solver-dispatch.diff URL: From stefan at codesourcery.com Wed May 3 15:27:23 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 03 May 2006 11:27:23 -0400 Subject: patch: sal/fft.hpp fix Message-ID: <4458CBDB.9020508@codesourcery.com> The attached patch removes the incorrect use of SFINAE by a more explicit form to indicate that the SAL backend doesn't support long double types. Before this patch the SAL backend was always skipped. Ok to check in ? Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: fft.hpp.diff Type: text/x-patch Size: 2399 bytes Desc: not available URL: From jules at codesourcery.com Wed May 3 15:36:09 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 03 May 2006 11:36:09 -0400 Subject: [vsipl++] patch: sal/fft.hpp fix In-Reply-To: <4458CBDB.9020508@codesourcery.com> References: <4458CBDB.9020508@codesourcery.com> Message-ID: <4458CDE9.6070000@codesourcery.com> Stefan Seefeld wrote: > The attached patch removes the incorrect use of SFINAE by a more > explicit form to indicate that the SAL backend doesn't support > long double types. Before this patch the SAL backend was always > skipped. Ok to check in ? Stefan, How would the SAL FFT react if a user accidentally tried to do an FFT on integral data? Instead of listing the types that SAL doesn't support (long double), could you instead list the types that it does support? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Wed May 3 15:57:50 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 03 May 2006 11:57:50 -0400 Subject: [vsipl++] patch: sal/fft.hpp fix In-Reply-To: <4458CDE9.6070000@codesourcery.com> References: <4458CBDB.9020508@codesourcery.com> <4458CDE9.6070000@codesourcery.com> Message-ID: <4458D2FE.1060104@codesourcery.com> Jules Bergmann wrote: > Stefan Seefeld wrote: >> The attached patch removes the incorrect use of SFINAE by a more >> explicit form to indicate that the SAL backend doesn't support >> long double types. Before this patch the SAL backend was always >> skipped. Ok to check in ? > > Stefan, > > How would the SAL FFT react if a user accidentally tried to do an FFT on > integral data? > > Instead of listing the types that SAL doesn't support (long double), > could you instead list the types that it does support? That's what I tried with my sfinae approach. I agree that having an inclusive list is better than an exclusive (incomplete) list, and I'm still thinking about how to do that, without having to duplicate all the evaluator logic for all supported types. Meanwhile, I'd like to specifically disable long double for sal since fftw supports it, and thus it makes sense to have tests trying to run FFTs with long double types. Regards, Stefan From jules at codesourcery.com Wed May 3 16:57:48 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 03 May 2006 12:57:48 -0400 Subject: [vsipl++] patch: sal/fft.hpp fix In-Reply-To: <4458D2FE.1060104@codesourcery.com> References: <4458CBDB.9020508@codesourcery.com> <4458CDE9.6070000@codesourcery.com> <4458D2FE.1060104@codesourcery.com> Message-ID: <4458E10C.4070907@codesourcery.com> Stefan Seefeld wrote: > Meanwhile, I'd like to specifically disable long double for sal since fftw > supports it, and thus it makes sense to have tests trying to run FFTs > with long double types. > Ok, that patch looks fine then. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Sat May 6 20:07:15 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 06 May 2006 16:07:15 -0400 Subject: [patch] Run-time external data access. Message-ID: <445D01F3.7030500@codesourcery.com> Attach patch implements and tests external data access with run-time selection of the layout. In theory, should work something like this: // Assume this is an operator() function for a class similar // to Fft. It calls a backend (backend_) to do the work. // Since the backend is chosen at run-time (and is derived // from a virtual base class), we can't use the normal // Ext_data because it requires that layout be chosen // at compile-time. Instead we need to use run-time ext_data. operator()( const_Vector in, Vector out) { // First, determine layout of blocks: Rt_layout<1> rtl_in = block_layout(in.block()); Rt_layout<1> rtl_out = block_layout(out.block()); // Second, queury the backend about what layout // it can support. // Backend will modify rtl_in and rtl_out. // // For example, it might: // - set strides to unit-stride if it only supports // unit-stride, // - set complex formats to match, // - set dimension-ordering , // - etc. backend_->query_layout(rtl_in, rtl_out); // Thrid, create run-time Ext_data objects Rt_ext_data ext_in(in.block(), rtl_in); Rt_ext_data ext_out(out.block(), rtl_out); // Fourth, call functions in backend. // // Some knowledge may get encoded here. In particular, // because split- and interleaved- complex have // different types, we need to call the appropriate // backend function. The backends could do this dispatch // too. // backends don't have functions with mixed split/interleaved // arguments. assert(rtl_in.complex == rtl_out.complex); if (rtl_in.complex == cmplx_inter_fmt) { backend_->doit(rtl_in.data().as_inter(), rtl_in.stride(0), rtl_out.data().as_inter(), rtl_out.stride(0), out.size()); } else // (rtl_in.complex == cmplx_split_fmt) { backend_->doit(rtl_in.data().as_split(), rtl_in.stride(0), rtl_out.data().as_split(), rtl_out.stride(0), out.size()); } } -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rtex.diff URL: From don at codesourcery.com Sun May 7 00:47:21 2006 From: don at codesourcery.com (Don McCoy) Date: Sat, 06 May 2006 18:47:21 -0600 Subject: [patch] double support for SAL LU solver Message-ID: <445D4399.7000100@codesourcery.com> This was tested against C-SAL but without the portions of the tests excercising the transpose options (when using the "old" functions). Note: some lines were changed only in that tabs were replaced with spaces! Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lu.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lu.diff URL: From jules at codesourcery.com Sun May 7 01:35:55 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 06 May 2006 21:35:55 -0400 Subject: [vsipl++] [patch] double support for SAL LU solver In-Reply-To: <445D4399.7000100@codesourcery.com> References: <445D4399.7000100@codesourcery.com> Message-ID: <445D4EFB.2030506@codesourcery.com> Don McCoy wrote: > This was tested against C-SAL but without the portions of the tests > excercising the transpose options (when using the "old" functions). Don, This looks good. Can you: - Move the reciprocal call from sal_matfbs to sal_matlud. That way if multiple sal_matfbs calls are made (either because B/X have multiple columns, or because the LU object is used multiple times), vrecip will only get called once. - Create a typedef for the block_type of recip_. That way the Ext_data for recip_ is guaranteed to have the correct block type if recip_ ever changes. - a few more comments sprinkled below. If these comments make sense, once you address them this looks good to check in. How do we test this? By manually disabling the the mat_trans and mat_herm cases? thanks, -- Jules > + > + > + > + // "Legacy" SAL functions - The single-precision versions are listed > + // in the Appendix of the SAL Reference manual. Although the double- > + // precision ones are still part of the normal API, we refer to both > + // sets of functions as legacy functions just for ease of naming. > + > + // Legacy SAL LUD decomposition functions > + #define VSIP_IMPL_SAL_LUD_DEC( T, SAL_T, SALFCN ) \ > + inline void \ > + sal_matlud( \ > + T *c, \ > + int *d, int n) \ > + { \ > + SALFCN((SAL_T*) c, d, n); \ If you pass recip_ in, you can perform the reciprocal one time here. > + } > --- 285,308 ---- > > protected: > template ! typename Block0, > ! typename Block1> > bool impl_solve(const_Matrix, Matrix) > VSIP_NOTHROW; > > + length_type max_decompose_size(); > + > // Member data. > private: > typedef std::vector > vector_type; > > ! length_type length_; // Order of A. > ! vector_type ipiv_; // Pivot table for Q. This gets > // generated from the decompose and > ! // gets used in the solve > ! Vector recip_; // Vector of reciprocals used > ! // with legacy solvers Use a typedef for recip_'s block type. > ! Matrix data_; // Factorized matrix (A) > }; > > > *************** Lud_impl::Lud_impl( > *** 191,196 **** > --- 320,326 ---- > VSIP_THROW((std::bad_alloc)) > : length_ (length), > ipiv_ (length_), > + recip_ (length_), > data_ (length_, length_) > { > assert(length_ > 0); > *************** Lud_impl::Lud_impl(Lu > *** 203,213 **** > --- 333,347 ---- > VSIP_THROW((std::bad_alloc)) > : length_ (lu.length_), > ipiv_ (length_), > + recip_ (length_), > data_ (length_, length_) > { > data_ = lu.data_; > for (index_type i=0; i + { > ipiv_[i] = lu.ipiv_[i]; > + recip_.put(i, lu.recip_.get(i)); > + } Since recip_ is a vector, you could just say: recip_ = lu.recip_; > } > > > else > --- 464,498 ---- > } > Ext_data a_ext((tr == mat_trans)? > data_int.block():data_.block()); > + Ext_data > r_ext(recip_.block()); > > // sal_mat_lud_sol only takes vectors, so, we have to do this for each > // column in the matrix > ptr_type b_ptr = b_ext.data(); > ptr_type x_ptr = x_ext.data(); > ! for(index_type i=0;i ! { > ! #if VSIP_IMPL_SAL_USE_MAT_LUD > sal_mat_lud_sol(a_ext.data(), a_ext.stride(0), > &ipiv_[0], > ! storage_type::offset(b_ptr,i*length_), > ! storage_type::offset(x_ptr,i*length_), > ! length_,trans); > ! #else > ! if (x_ext.stride(0) != 1) > ! VSIP_IMPL_THROW(unimplemented( > ! "Lud_impl<>::impl_solve - data must be dense (have unit stride)")); This should either be an assertion, or removed. x_ext refers to x_int, which is declared by the LU object to be column major. Since we know the block is column major, the condition x_ext.stride(0) != 1 would indicate a bug in Ext_data (i.e. something impossible happened -> assert failure), as opposed to unsupported behavior (user tried to do something unsupported -> throw exception). > ! if (tr == mat_ntrans) > ! sal_matfbs(a_ext.data(), r_ext.data(), &ipiv_[0], > ! storage_type::offset(b_ptr, i*length_), > ! storage_type::offset(x_ptr, i*length_), > ! length_); > ! else > ! VSIP_IMPL_THROW(unimplemented( > ! "Lud_impl::impl_solve - unimplemented")); Good! Well, actually bad (SAL doesn't support mat_trans), but throwing an exception is the right thing to do. > ! #endif > } > > assign_local(x, x_int); > } > else -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Sun May 7 17:16:58 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 07 May 2006 13:16:58 -0400 Subject: [patch] Run-time external data access Message-ID: <445E2B8A.3000705@codesourcery.com> Add missing block_layout function. Fix several hard-coded dimensions in rt_extdata.hpp (thanks Stefan) Use block's dimension as default dimension for Rt_ext_data (thanks Stefan!) Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rtex2.diff URL: From jules at codesourcery.com Sun May 7 18:38:58 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 07 May 2006 14:38:58 -0400 Subject: [patch] Forcing a copy for run-time external data access. Message-ID: <445E3EC2.4030506@codesourcery.com> This patch adds support for a SYNC_IN_NOPRESERVE flag with Rt_ext_data. It requires the block to be synchronized with the external data when the Rt_ext_data is created, and it requires that changes made to the external data are not reflected in the original block. In short, it forces data to be copied, even if the block already has the requested layout. The intention is to support FFT backends like SAL that need to reorganize data in-place for packing before performing real-to-complex FFTs. The backend would communicate that it requires the input data to be copied so that it can pack as necessary. Applying this to the earlier example: operator()( const_Vector in, Vector out) { // First, determine layout of blocks: Rt_layout<1> rtl_in = block_layout(in.block()); Rt_layout<1> rtl_out = block_layout(out.block()); // Second, queury the backend about what layout // it can support. // Backend will modify rtl_in and rtl_out. // // For example, it might: // - set strides to unit-stride if it only supports // unit-stride, // - set complex formats to match, // - set dimension-ordering , // - etc. backend_->query_layout(rtl_in, rtl_out); // Determine if backend needs to modify the input data // (for example, if performing a real-to-complex FFT requires // a special packing format). // // If backend does need to modify it, we'll use SYNC_IN_NOPRESEVE // which effectively forces a copy. sync_action_type in_sync = backend_->requires_copy(rtl_in) ? SYNC_IN_NOPRESERVE : SYNC_IN; // Thrid, create run-time Ext_data objects Rt_ext_data ext_in (in.block(), rtl_in, in_sync); Rt_ext_data ext_out(out.block(), rtl_out, SYNC_OUT); // Fourth, call functions in backend. // // Some knowledge may get encoded here. In particular, // because split- and interleaved- complex have // different types, we need to call the appropriate // backend function. The backends could do this dispatch // too. // backends don't have functions with mixed split/interleaved // arguments. assert(rtl_in.complex == rtl_out.complex); if (rtl_in.complex == cmplx_inter_fmt) { backend_->doit(rtl_in.data().as_inter(), rtl_in.stride(0), rtl_out.data().as_inter(), rtl_out.stride(0), out.size()); } else // (rtl_in.complex == cmplx_split_fmt) { backend_->doit(rtl_in.data().as_split(), rtl_in.stride(0), rtl_out.data().as_split(), rtl_out.stride(0), out.size()); } } Stefan, this is a bit different than adding the 'force_copy' field to the Rt_layout that I was suggesting before. However, it seems cleaner in that the 'force_copy' is not really a property of the layout. Do you think this will work OK? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rtex3.diff URL: From don at codesourcery.com Sun May 7 19:42:19 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 07 May 2006 13:42:19 -0600 Subject: [vsipl++] [patch] double support for SAL LU solver In-Reply-To: <445D4EFB.2030506@codesourcery.com> References: <445D4399.7000100@codesourcery.com> <445D4EFB.2030506@codesourcery.com> Message-ID: <445E4D9B.7070100@codesourcery.com> Jules Bergmann wrote: > > This looks good. Can you: > > - Move the reciprocal call from sal_matfbs to sal_matlud. That way > if multiple sal_matfbs calls are made (either because B/X have > multiple columns, or because the LU object is used multiple times), > vrecip will only get called once. > > - Create a typedef for the block_type of recip_. That way the Ext_data > for recip_ is guaranteed to have the correct block type if recip_ > ever changes. > > - a few more comments sprinkled below. > > If these comments make sense, once you address them this looks good to > check in. Committed with suggested changes. Thanks for catching those things. > > How do we test this? By manually disabling the the mat_trans and > mat_herm cases? > Exactly. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lu2.diff URL: From jules at codesourcery.com Sun May 7 19:49:33 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 07 May 2006 15:49:33 -0400 Subject: [vsipl++] [patch] double support for SAL LU solver In-Reply-To: <445E4D9B.7070100@codesourcery.com> References: <445D4399.7000100@codesourcery.com> <445D4EFB.2030506@codesourcery.com> <445E4D9B.7070100@codesourcery.com> Message-ID: <445E4F4D.1000904@codesourcery.com> Don, Thanks for getting this checked in! -- Jules Don McCoy wrote: > VSIP_THROW((std::bad_alloc)) > : length_ (lu.length_), > ipiv_ (length_), > + recip_ (length_), > data_ (length_, length_) > { > data_ = lu.data_; > for (index_type i=0; i + { > ipiv_[i] = lu.ipiv_[i]; > + recip_ = lu.recip_; The recip_ assignment should go outside the loop, right? > + } > } -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Sun May 7 19:54:52 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 07 May 2006 15:54:52 -0400 Subject: [patch] Extend rt_extdata test coverage for vectors and tensors Message-ID: <445E508C.5080907@codesourcery.com> Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rtex4.diff URL: From don at codesourcery.com Sun May 7 20:07:38 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 07 May 2006 14:07:38 -0600 Subject: [vsipl++] [patch] double support for SAL LU solver In-Reply-To: <445E4F4D.1000904@codesourcery.com> References: <445D4399.7000100@codesourcery.com> <445D4EFB.2030506@codesourcery.com> <445E4D9B.7070100@codesourcery.com> <445E4F4D.1000904@codesourcery.com> Message-ID: <445E538A.5040507@codesourcery.com> Jules Bergmann wrote: > Don McCoy wrote: > >> VSIP_THROW((std::bad_alloc)) >> : length_ (lu.length_), >> ipiv_ (length_), >> + recip_ (length_), >> data_ (length_, length_) >> { >> data_ = lu.data_; >> for (index_type i=0; i> + { >> ipiv_[i] = lu.ipiv_[i]; >> + recip_ = lu.recip_; > > > The recip_ assignment should go outside the loop, right? > >> + } >> } > > > Yes. Corrected. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lu3.diff URL: From don at codesourcery.com Sun May 7 20:36:08 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 07 May 2006 14:36:08 -0600 Subject: [vsipl++] [patch] New benchmark - vector division In-Reply-To: <4455FAA2.3060106@codesourcery.com> References: <4455A149.20504@codesourcery.com> <4455FAA2.3060106@codesourcery.com> Message-ID: <445E5A38.3050901@codesourcery.com> Jules Bergmann wrote: > Don McCoy wrote: > >> Here is a new benchmark for testing element-wise vector division. >> Also attached are two performance graphs comparing multiplication and >> division - one shows mega flops per second and the other latency, or >> the number of microseconds per operation. > > > Don, This looks good, please check in in. -- Jules > Checked in. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From stefan at codesourcery.com Sun May 7 21:51:36 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Sun, 07 May 2006 17:51:36 -0400 Subject: [vsipl++] [patch] Forcing a copy for run-time external data access. In-Reply-To: <445E3EC2.4030506@codesourcery.com> References: <445E3EC2.4030506@codesourcery.com> Message-ID: <445E6BE8.50600@codesourcery.com> Jules Bergmann wrote: > Stefan, this is a bit different than adding the 'force_copy' field to > the Rt_layout that I was suggesting before. However, it seems cleaner > in that the 'force_copy' is not really a property of the layout. Do you > think this will work OK? Yes, it should work. I'll give it a try tonight. I already played with the split/interleaved (non-)conversion earlier today, which seems to work fine. Yay ! Regards, Stefan From don at codesourcery.com Mon May 8 03:49:58 2006 From: don at codesourcery.com (Don McCoy) Date: Sun, 07 May 2006 21:49:58 -0600 Subject: [vsipl++] [patch] HPEC benchmark makefiles In-Reply-To: <444E71AC.9070006@codesourcery.com> References: <443D0393.50800@codesourcery.com> <444E71AC.9070006@codesourcery.com> Message-ID: <445EBFE6.3030904@codesourcery.com> Jules Bergmann wrote: > Don McCoy wrote: > >> The attached patch moves the HPEC Kernel benchmarks to their own >> directory in benchmarks/hpec-kernel/ and includes new makefiles for >> both developers and users. ... > > Don, > > This looks good. Please check it in. thanks, -- Jules > Fixed a minor issue with src/vsip_csl/GNUmakefile.inc.in (it did not actually copy the correct files) and with benchmarks/hpec_kernel/make.standalone (to reference ../ for include files and main.o). Committed. > > Does gnumake allow variable names with "-" in them? If so, this is OK. > > If not, let's replace the "-" with a "_" (and update the GNUmakefile.in > 'norm_dir' function accordingly). > Changed to 'benchmarks/hpec_kernel' as discussed. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hpec2.diff URL: From jules at codesourcery.com Mon May 8 13:16:28 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 08 May 2006 09:16:28 -0400 Subject: [patch] QR Message-ID: <445F44AC.3090201@codesourcery.com> This patch builds on the QR portion of Assem's earlier QR/SVD patch. It has the following changes: - Explicitly disables using Qrd for full-QR (storage_type == qrd_saveq) when using SAL as the implementation. Trying to create a Qrd object for qrd_saveq will throw an unimplemented exception. - Uses the dispatcher from LU and Cholesky. - Fix the use of Ext_data objects to avoid modifying the block being accessed during the Ext_data object's lifetime. When using an Ext_data object to access a block's data directly, you should not modify the block's values during the Ext_data object's lifetime. When the block supports direct access, this happens to work OK, but when the block does not support direct access, Ext_data will copy data and changes made to the block will not be reflected in the copy. While we did choose the block types in Qrd with direct access in mind, we should keep the usage "correct" so that it doesn't get exported via cut-and-paste or subtly break in the future. (One test idea: we could build a special version of the library that forces all Ext_data objects to copy and see what breaks!) - Add support for split-complex - Added back assertions on input sizes to impl_prodq and impl_rsol. In general, assertions are a good thing. Here they help enforce that input matrices from the user have the right shape. Also in this patch: - Updated QR tests to only cover types and storage types supported by the implementation. In particular, avoids testing double precision and full-QR when using SAL. - Small configure.ac fix for FFTW3. Adds an AC_SUBST for VSIP_IMPL_FFTW3. Slight logic change when checking if FFTW3 is not enabled (was checking against empty string, now checks against "no"). A couple of questions Assem, Is there any reason we are using matmgs_dqr instead of matmgs_dqrx? Likewise for the other SAL functions (magmgs_srhr, etc). I don't see matmgs_dqr documented in the SAL reference manual (nor the other "non-x" functions documented). It looks like the difference is the missing ESAL flags. If the "non-x" functions are not documented, we should use the "x" functions instead. Stefan, Does the configure.ac bit for FFTW look OK? thanks, -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qr.diff URL: From stefan at codesourcery.com Mon May 8 20:44:57 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Mon, 08 May 2006 16:44:57 -0400 Subject: [vsipl++] [patch] Forcing a copy for run-time external data access. In-Reply-To: <445E3EC2.4030506@codesourcery.com> References: <445E3EC2.4030506@codesourcery.com> Message-ID: <445FADC9.9090105@codesourcery.com> The attached patch rewrites the 1D workspaces used to prepare data to be 'sent' to the FFT backends. It now uses Jules' new rt_extdata harness, to take advantage of the backend's handling of split/interleaved, as well as non-unit strides, if possible. Now the number of copies of the data blocks should be minimal. As I'm still in the process of debugging 2D and M cases, I send in this partial patch, in the hope that it is useful such as for benchmarking, to make sure the performance is at least on par with what it used to be before the redesign. Hopefully I'm able to send out more patches later tonight... Regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: fft1d.patch Type: text/x-patch Size: 46329 bytes Desc: not available URL: From jules at codesourcery.com Mon May 8 22:19:05 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 08 May 2006 18:19:05 -0400 Subject: [vsipl++] [patch] Forcing a copy for run-time external data access. In-Reply-To: <445FADC9.9090105@codesourcery.com> References: <445E3EC2.4030506@codesourcery.com> <445FADC9.9090105@codesourcery.com> Message-ID: <445FC3D9.5060207@codesourcery.com> Stefan Seefeld wrote: > The attached patch rewrites the 1D workspaces used to prepare > data to be 'sent' to the FFT backends. It now uses Jules' new > rt_extdata harness, to take advantage of the backend's handling > of split/interleaved, as well as non-unit strides, if possible. > Now the number of copies of the data blocks should be minimal. > > As I'm still in the process of debugging 2D and M cases, > I send in this partial patch, in the hope that it is useful > such as for benchmarking, to make sure the performance is > at least on par with what it used to be before the redesign. > > Hopefully I'm able to send out more patches later tonight... Stefan, This looks good. I need to check how Rt_extdata handles requests for 1D stride_unit_dense. In general it recongnizes that stride_unit_dense is a stricter requirement than stride_unit (i.e. anything that is stride_unit_dense is also stride_unit, but not visa-versa). It should make an exception for 1D since there are no higher dimensions. Alternatively, for 1D data you could just request stride_unit, since that is the minimal requirement. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Tue May 9 16:48:40 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 09 May 2006 12:48:40 -0400 Subject: [patch] Allow non-complex data to have split format Message-ID: <4460C7E8.8010900@codesourcery.com> Removes a few assumptions that non-complex data must have interleaved format. Adds additional test coverage for those cases. Patch applied. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: split.diff URL: From stefan at codesourcery.com Tue May 9 19:49:35 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 09 May 2006 15:49:35 -0400 Subject: patch: FFT 1D / 2D / M bug fixes and test enhancements. Message-ID: <4460F24F.7070102@codesourcery.com> The attached patch enhances the fft_be.cpp tests to cover all 1D, 2D, and M Fft variants, inclusively non-square and non-unit-stride matrices. Doing this revealed a number of (more or less subtle) bugs in the various backends, which are now fixed. (There is still one case that I didn't manage to fix: the c->r dft 2D case. If anybody wants to have a look, that would be appreciated. The relevant code is in fft/dft.hpp:386. The appropriate tests in fft_be.cpp are commented out for the moment.) The only remaining issue, then, is the finalization of the 3D FFTs, which is rather simple, as only fftw and dft support it. Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From stefan at codesourcery.com Tue May 9 20:19:29 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 09 May 2006 16:19:29 -0400 Subject: patch: Fix bug in configure.ac that caused VSIP_IMPL_FFTW3 not always to be defined as requested Message-ID: <4460F951.7070107@codesourcery.com> Just what the subject implies. The variables are defined if either 'fftw3' or 'builtin' backend is selected. Ok, to checkin ? Thanks, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.ac.diff Type: text/x-patch Size: 1031 bytes Desc: not available URL: From jules at codesourcery.com Wed May 10 01:24:26 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 09 May 2006 21:24:26 -0400 Subject: [vsipl++] patch: FFT 1D / 2D / M bug fixes and test enhancements. In-Reply-To: <4460F24F.7070102@codesourcery.com> References: <4460F24F.7070102@codesourcery.com> Message-ID: <446140CA.6030007@codesourcery.com> Stefan Seefeld wrote: > The attached patch enhances the fft_be.cpp tests to cover all > 1D, 2D, and M Fft variants, inclusively non-square and non-unit-stride > matrices. Doing this revealed a number of (more or less subtle) bugs > in the various backends, which are now fixed. > > (There is still one case that I didn't manage to fix: the c->r dft 2D > case. If anybody wants to have a look, that would be appreciated. > The relevant code is in fft/dft.hpp:386. The appropriate tests in > fft_be.cpp > are commented out for the moment.) > > The only remaining issue, then, is the finalization of the 3D FFTs, > which is rather simple, as only fftw and dft support it. Stefan, This is great work! Please check it in. Things we need to do before 1.1 (but after check in): - Workspaces need to allocate temporary storage that exists for the life of the Fft object. To help maange with split/interleaved, Rt_ext_data will convert a pointer to interleaved into a pointer for split (but not visa versa). - Add FFTW3 split support. Minor things to think about after 1.1 - Merging requires_copy into queury_layout to reduce number of virtual function calls. This is pretty far down the path of diminishing returns. - Naming convention for Fft/Fftm axii. With the 'A' and '1-A', it would be good to have a convention that indicated whether an axis was Fftm-convention of Fft-convention. Could be something as simple as Ax and Ay (i.e. Ax == 1 - Ay), but we should be able to do better. Would you like me to take a look at the FFTW3 split support? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Wed May 10 01:25:51 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 09 May 2006 21:25:51 -0400 Subject: [vsipl++] patch: Fix bug in configure.ac that caused VSIP_IMPL_FFTW3 not always to be defined as requested In-Reply-To: <4460F951.7070107@codesourcery.com> References: <4460F951.7070107@codesourcery.com> Message-ID: <4461411F.7040705@codesourcery.com> Stefan Seefeld wrote: > Just what the subject implies. The variables are defined if either > 'fftw3' or 'builtin' backend is selected. > > Ok, to checkin ? Looks good to me. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Wed May 10 03:11:48 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Tue, 09 May 2006 23:11:48 -0400 Subject: [vsipl++] patch: FFT 1D / 2D / M bug fixes and test enhancements. In-Reply-To: <446140CA.6030007@codesourcery.com> References: <4460F24F.7070102@codesourcery.com> <446140CA.6030007@codesourcery.com> Message-ID: <446159F4.3050903@codesourcery.com> Jules Bergmann wrote: > This is great work! Please check it in. Thanks. It's checked in now (with the ChangeLog ! :-) ). I will look into the task list tomorrow morning. Thanks, Stefan From jules at codesourcery.com Wed May 10 03:43:17 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Tue, 09 May 2006 23:43:17 -0400 Subject: [vsipl++] patch: FFT 1D / 2D / M bug fixes and test enhancements. In-Reply-To: <4460F24F.7070102@codesourcery.com> References: <4460F24F.7070102@codesourcery.com> Message-ID: <44616155.3080004@codesourcery.com> Stefan Seefeld wrote: > The attached patch enhances the fft_be.cpp tests to cover all > 1D, 2D, and M Fft variants, inclusively non-square and non-unit-stride > matrices. Doing this revealed a number of (more or less subtle) bugs > in the various backends, which are now fixed. > > (There is still one case that I didn't manage to fix: the c->r dft 2D > case. If anybody wants to have a look, that would be appreciated. > The relevant code is in fft/dft.hpp:386. The appropriate tests in > fft_be.cpp > are commented out for the moment.) > Stefan, It turns out the 1D complex->real DFT was broken. It was using the wrong index/exponent when calling sin_cos(). The symmetry means the complex values wrap around, but the exponents still progress as normal. I believe the 1D C->R tests weren't finding this because the ramp function only generates real inputs. For the 2D C->R case, the initial 1D C->C FFT generates values with non-zero imaginary parts which trips up the 1D C->R bug. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dft.diff URL: From stefan at codesourcery.com Wed May 10 13:14:28 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Wed, 10 May 2006 09:14:28 -0400 Subject: patch: Preallocate FFT buffers in workspace Message-ID: <4461E734.3030309@codesourcery.com> The attached patch lets workspaces pre-allocate memory potentially to be used in the call operators. It also contains a fix for the 1D c->r dft failure, reported earlier (thanks Jules !). Ok to checkin ? Thanks, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Wed May 10 13:40:00 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 10 May 2006 09:40:00 -0400 Subject: [vsipl++] patch: Preallocate FFT buffers in workspace In-Reply-To: <4461E734.3030309@codesourcery.com> References: <4461E734.3030309@codesourcery.com> Message-ID: <4461ED30.40101@codesourcery.com> Stefan Seefeld wrote: > The attached patch lets workspaces pre-allocate memory potentially to > be used in the call operators. > It also contains a fix for the 1D c->r dft failure, reported earlier > (thanks Jules !). > > Ok to checkin ? This looks good, please check it in. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From don at codesourcery.com Wed May 10 21:11:46 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 10 May 2006 15:11:46 -0600 Subject: [patch] Quickstart Guide update for FFT and LAPACK options Message-ID: <44625712.8010206@codesourcery.com> The attached patch needs verifying yet, but I was having some trouble building the docs. Thought it would be good to put this up and make sure it was technically correct in the meantime. Don -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qs.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qs.diff URL: From jules at codesourcery.com Thu May 11 01:46:18 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 10 May 2006 21:46:18 -0400 Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK options In-Reply-To: <44625712.8010206@codesourcery.com> References: <44625712.8010206@codesourcery.com> Message-ID: <4462976A.4020402@codesourcery.com> Don, Don McCoy wrote: > The attached patch needs verifying yet, but I was having some trouble > building the docs. Thought it would be good to put this up and make > sure it was technically correct in the meantime. > > Don > > > > ------------------------------------------------------------------------ > > 2006-05-10 Don McCoy > > * doc/quickstart/quickstart.xml: Updated options for --enable-fft= and > --with-lapack to reflect recent additions. > > > ------------------------------------------------------------------------ > > > Index: doc/quickstart/quickstart.xml > =================================================================== > RCS file: /home/cvs/Repository/vpp/doc/quickstart/quickstart.xml,v > retrieving revision 1.29 > diff -c -p -r1.29 quickstart.xml > *** doc/quickstart/quickstart.xml 28 Apr 2006 23:25:43 -0000 1.29 > --- doc/quickstart/quickstart.xml 10 May 2006 21:00:24 -0000 > *************** > *** 742,759 **** > > > > ! > > > Search for and use the FFT library indicated by > lib to perform FFTs. Valid > ! choices for lib include > ! , , and > ! , which select the FFTW3, IPP, and SAL > ! libraries respectively. If no FFT library is to be used > ! (disabling Sourcery VSIPL++'s FFT functionality), > ! should be chosen for > ! lib. > > > > --- 742,764 ---- > > > > ! > > > Search for and use the FFT library indicated by > lib to perform FFTs. Valid > ! choices for lib include > ! , , and > ! , which select FFTW3, IPP, and SAL > ! libraries respectively. A fourth option, , > ! selects the FFTW3 library that comes with Sourcery VSIPL++ (default). > ! This option should be used if an existing FFTW3 library is not available. > ! If no FFT library is to be used (disabling Sourcery VSIPL++'s FFT > ! functionality), should be chosen for > ! lib. Advanced uses may specify "Advanced users ..." might scare people off :) How about say something like: "Multiple libraries may be given as a comma separated list. When performing an FFT, VSIPL++ will use the first library in the list that can support the FFT parameters. For example, on Mercury systems would use SAL's FFT when possible, falling back to VSIPL++'s builtin FFTW3 otherwise. > ! more than one option separated by commans. This causes VSIPL++ ^ commas > ! to attempt to use one FFT library before falling back to > ! another if necessary. Example: --enable-fft=sal,builtin > > > > *************** > *** 794,807 **** > Search for and use the LAPACK library indicated by > lib to perform linear algebra > (matrix-vector products and solvers). Valid choices for > ! lib include , > ! , , and > ! . > > > > ! selects the Intel Math Kernel Library (MKL) > ! to perform linear algebra if found. > > > selects the ATLAS library > --- 799,817 ---- > Search for and use the LAPACK library indicated by > lib to perform linear algebra > (matrix-vector products and solvers). Valid choices for I think we should leave the text as is. The '--with-lapack=mkl' option still works. Internally configure tries to do the right thing by intrepeting it as '--with-lapack="mkl7 mkl5"', searching first for MKL v7 or v8, then for MKL v5. I will modify the configure.ac's builtin documentation to match the quickstart. However, we should add 'acml' to the list of lapack libraries. " selects the AMD Core Math Library (ACML) to perform linear algebra if found." Also, if we document --with-mkl-prefix, we should document --with-acml-prefix too. > ! lib include , > ! , , > ! , , and > ! . > > > > ! selects the Intel Math Kernel Library (MKL) > ! version 7.x or above to perform linear algebra if found. > ! > ! > ! selects the Intel Math Kernel Library (MKL) > ! version 5.x to perform linear algebra if found. > > > selects the ATLAS library > *************** > *** 812,822 **** > (-llapack) to perform linear algebra if found. > > > ! selects the builtin version of ATLAS > ! to perform linear algebra. This option requires building > ! ATLAS which can take considerable time and is not supported > ! on all platforms. It is only recommended if MKL, ATLAS, or > ! a generic LAPACK or not already installed on the platform. > > > > --- 822,842 ---- > (-llapack) to perform linear algebra if found. > > > ! selects the builtin version of > ! ATLAS/C-LAPACK to perform linear algebra. This option > ! requires building ATLAS which can take considerable time > ! and is not supported on all platforms. It is only recommended > ! if MKL, ATLAS, or a generic LAPACK or not already installed on > ! the platform. > ! > ! > ! selects the builtin version > ! of ATLAS/F77-LAPACK to perform linear algebra. Like the > ! , this option requires building ATLAS > ! as well. In this case, it uses the FORTRAN version instead of > ! the C version of LAPACK. Note this option requires the g2c Note this option requires *a fortran compiler* and the g2c library. > ! library. Use the option if > ! this library is not installed in a standard location. > > > -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Thu May 11 02:16:43 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 10 May 2006 22:16:43 -0400 Subject: [patch] Fft by-reference of const_View and View Message-ID: <44629E8B.9080605@codesourcery.com> Stefan, Here is a test that illustrates the failure I'm seeing with the fft.cpp test, along with a patch that fixes it. Ok to commit? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fft.diff URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: x-fft.cpp URL: From jules at codesourcery.com Thu May 11 03:11:27 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 10 May 2006 23:11:27 -0400 Subject: [patch] Build FFTW on GreenHills/PowerPC/MCOE Message-ID: <4462AB5F.70009@codesourcery.com> The following patch adds support for timers available on MCOE. It also fixes several issues with the object file extension. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cl.fftw URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftw.diff URL: From jules at codesourcery.com Thu May 11 03:14:30 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Wed, 10 May 2006 23:14:30 -0400 Subject: [patch] Disable use of Fortran in ATLAS Message-ID: <4462AC16.4070607@codesourcery.com> This patch adds a new '--disable-fortran' configure option to ATLAS. It disables configure's probing of the Fortran API, and it disables building of the libf77blas wrapper library. The top-level configure automatically configures ATLAS with --disable-fortran when using the builtin C Lapack (i.e. --with-lapack=buildin). -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: atlas.diff URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cl.atlas URL: From don at codesourcery.com Thu May 11 04:37:20 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 10 May 2006 22:37:20 -0600 Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK options In-Reply-To: <4462976A.4020402@codesourcery.com> References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> Message-ID: <4462BF80.7030701@codesourcery.com> Patch with suggested changes is attached. Regards, -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qsg.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qsg.diff URL: From don at codesourcery.com Thu May 11 04:54:15 2006 From: don at codesourcery.com (Don McCoy) Date: Wed, 10 May 2006 22:54:15 -0600 Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK options In-Reply-To: <4462BF80.7030701@codesourcery.com> References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> <4462BF80.7030701@codesourcery.com> Message-ID: <4462C377.8080007@codesourcery.com> Don McCoy wrote: > Patch with suggested changes is attached. > I caught a grammatical error and a leftover 'the', so I took the opportunity to reword that paragraph slightly. Please disregard the previous patch. :) -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qsg.changes URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: qsg.diff URL: From jules at codesourcery.com Thu May 11 10:45:43 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 11 May 2006 06:45:43 -0400 Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK options In-Reply-To: <4462C377.8080007@codesourcery.com> References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> <4462BF80.7030701@codesourcery.com> <4462C377.8080007@codesourcery.com> Message-ID: <446315D7.7090703@codesourcery.com> Don McCoy wrote: > Don McCoy wrote: >> Patch with suggested changes is attached. >> > > I caught a grammatical error and a leftover 'the', so I took the > opportunity to reword that paragraph slightly. Please disregard the > previous patch. :) Don, this looks good to me! -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From stefan at codesourcery.com Thu May 11 12:09:45 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Thu, 11 May 2006 08:09:45 -0400 Subject: [vsipl++] [patch] Fft by-reference of const_View and View In-Reply-To: <44629E8B.9080605@codesourcery.com> References: <44629E8B.9080605@codesourcery.com> Message-ID: <44632989.3010800@codesourcery.com> Jules, I'm sorry I didn't get back to you on this earlier. I was unable to reproduce the failure, which might be because I'm using gcc 4.1, while you are using 3.4.x, right ? I looked into the issue briefly when you told me initially about a potential problem with passing const views, but put the issue aside when I noticed that my tests indeed use const views, yet there was no error. I'm not sure whether gcc 4.1 is too permissive here, i.e. whether somehow it manages to create a temporary non-const view out of a const view. That's something to follow up on later... Jules Bergmann wrote: > Stefan, > > Here is a test that illustrates the failure I'm seeing with the fft.cpp > test, along with a patch that fixes it. Ok to commit? The patch looks good ! Thanks, Stefan From jules at codesourcery.com Thu May 11 14:56:44 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Thu, 11 May 2006 10:56:44 -0400 Subject: [patch] update release.sh Message-ID: <446350AC.3070505@codesourcery.com> These changes were made to release.sh for building the solaris snapshot. release.sh is the script we used to build the 1.0 and subsequent snapshots. It basically runs scripts/package.py after setting the environment. Stefan, release.sh has the approximate paths we should be using for the the source and binary packages. There are some hacks, in particular gc6.6 and dot disappeared from cugel at some point. I rebuilt dot in ~jules/local/graphviz-2.6 and copied gc6.6 into ~jules/build-cugel. Perhaps we should build those packages "once and for all" in /home/vsiplxx after 1.1 to lock them down. Also, for solaris, ~jules/local/sun4/bin has 'pkg-config'. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: rel.diff URL: From don at codesourcery.com Fri May 12 02:50:24 2006 From: don at codesourcery.com (Don McCoy) Date: Thu, 11 May 2006 20:50:24 -0600 Subject: [vsipl++] [patch] Quickstart Guide update for FFT and LAPACK options In-Reply-To: <446315D7.7090703@codesourcery.com> References: <44625712.8010206@codesourcery.com> <4462976A.4020402@codesourcery.com> <4462BF80.7030701@codesourcery.com> <4462C377.8080007@codesourcery.com> <446315D7.7090703@codesourcery.com> Message-ID: <4463F7F0.6010306@codesourcery.com> Jules Bergmann wrote: > > > Don, this looks good to me! -- Jules > Committed. -- Don McCoy don (at) CodeSourcery (888) 776-0262 / (650) 331-3385, x712 From jules at codesourcery.com Fri May 12 12:22:16 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 12 May 2006 08:22:16 -0400 Subject: [patch] Parallel FFTM Message-ID: <44647DF8.4080405@codesourcery.com> This patch adds support for Fftm to work in parallel. At the core, this required three minor changes: - Have fftm_facade call the workspace with 'view.local()' instead of view I.e. for by_reference, change call from workspace_.by_refernce(..., in, out); to: workspace_.by_reference(..., in.local(), out.local()); - Have the fftw3 backend use the rows/cols passed from the workspace to determine the number of FFTs to perform (instead of the saved value mult_). (Similar changes may need to be made for the other backends. I will look at SAL next). - For by_value Fftm, use the input's map as the map for the output. (Also, since Fast_block apparently can't be distributed at the moment, use Dense block for distributed results). Also included in this patch: - Add input checking to fftm_facade. In particular - check that input and output view sizes are correct. - check that maps for distributed data are supported. This is primarily a "usability" enhancement to detect incorrect usage at the root. - Fix Wall warnings (unsigned vs signed comparison) in fftw3/fft_impl.hpp - (Non FFT related): have configure default to --with-lapack=probe if no --with-lapack option given. (This is consistent with our MPI behavior). - (Non FFT related): add --with-test-level option to set VSIP_IMPL_TEST_LEVEL. Stefan, OK to apply? -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fftm.diff URL: From jules at codesourcery.com Fri May 12 19:29:01 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Fri, 12 May 2006 15:29:01 -0400 Subject: [patch] Improvements for distributed split-complex and solver tests on mercury. Message-ID: <4464E1FD.4000503@codesourcery.com> Fixes some problems encountered when building a parallel library with split-complex. Fixes for solver tests to run on Mercury. All solver tests pass on mercury with exception of solver-lu. -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc.diff URL: From stefan at codesourcery.com Sun May 14 00:27:13 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Sat, 13 May 2006 20:27:13 -0400 Subject: patch: remove declarators for unused parameters. Message-ID: <44667961.8010605@codesourcery.com> The attached patch removes declarators for unused function parameters, and fixes a wrong signature of a function forward-declaration. The patch is checked in. Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Sun May 14 02:20:07 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sat, 13 May 2006 22:20:07 -0400 Subject: [patch] Last minute patches Message-ID: <446693D7.1010606@codesourcery.com> A collection of miscellaneous patches. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc2.diff URL: From stefan at codesourcery.com Sun May 14 05:52:45 2006 From: stefan at codesourcery.com (Stefan Seefeld) Date: Sun, 14 May 2006 01:52:45 -0400 Subject: patch: Add full 3D FFT support. Message-ID: <4466C5AD.5030102@codesourcery.com> The attached patch adds full 3D FFT support (to the fftw3 and dft backends), and cleans up some places I missed in the previous patch. Regards, Stefan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From jules at codesourcery.com Sun May 14 06:55:38 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 14 May 2006 02:55:38 -0400 Subject: [patch] Cleanup for release. Message-ID: <4466D46A.6010401@codesourcery.com> This patch - fixes a bug with Rt_extdata incorrectly attempting to deallocate storage. - fixes a bug in the handling of column-major data for 2D and 3D FFTs with the FFTW3 backend. - adds checks on data layout for the 2D and 3D FFTW3 backends. - fixes the SAL FFTM backend to use the rows/cols of the data being processed to determine the number of FFTs to perform (as opposed to the size of the FFTM object when created). Necessary for distributed FFTMs. - Disables SAL's FFTM evaluator from trying to do long-double FFTMs - Cleans up fft.cpp test to make the ifdefs a little more manageable. Patch applied. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 From jules at codesourcery.com Sun May 14 07:13:27 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Sun, 14 May 2006 03:13:27 -0400 Subject: [vsipl++] [patch] Cleanup for release. In-Reply-To: <4466D46A.6010401@codesourcery.com> References: <4466D46A.6010401@codesourcery.com> Message-ID: <4466D897.3090901@codesourcery.com> Doh! Forgot the patch -- Jules Jules Bergmann wrote: > This patch > - fixes a bug with Rt_extdata incorrectly attempting to deallocate > storage. > - fixes a bug in the handling of column-major data for 2D and 3D > FFTs with the FFTW3 backend. > - adds checks on data layout for the 2D and 3D FFTW3 backends. > - fixes the SAL FFTM backend to use the rows/cols of the data > being processed to determine the number of FFTs to perform > (as opposed to the size of the FFTM object when created). > Necessary for distributed FFTMs. > - Disables SAL's FFTM evaluator from trying to do long-double > FFTMs > - Cleans up fft.cpp test to make the ifdefs a little more manageable. > > Patch applied. > > -- Jules > -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc3.diff URL: From jules at codesourcery.com Mon May 15 18:56:56 2006 From: jules at codesourcery.com (Jules Bergmann) Date: Mon, 15 May 2006 14:56:56 -0400 Subject: [patch] Finaly 1.1 items Message-ID: <4468CEF8.7000400@codesourcery.com> FYI, I applied this patch yesterday in preparation for 1.1 release. -- Jules -- Jules Bergmann CodeSourcery jules at codesourcery.com (650) 331-3385 x705 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: misc5.diff URL: From assem at codesourcery.com Fri May 19 16:01:45 2006 From: assem at codesourcery.com (Assem Salama) Date: Fri, 19 May 2006 12:01:45 -0400 Subject: Matlab IO Message-ID: <446DEBE9.9020201@codesourcery.com> Everyone, This patch adds support for Matlab M file output. I only did diff in src/vsip_csl because I'm chainging other stuff. Thanks, Assem Salama -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cvs.diff.05192006.1.log URL: From mark at codesourcery.com Fri May 19 16:18:17 2006 From: mark at codesourcery.com (Mark Mitchell) Date: Fri, 19 May 2006 09:18:17 -0700 Subject: [vsipl++] Matlab IO In-Reply-To: <446DEBE9.9020201@codesourcery.com> References: <446DEBE9.9020201@codesourcery.com> Message-ID: <446DEFC9.5090808@codesourcery.com> Assem Salama wrote: > Everyone, > This patch adds support for Matlab M file output. I only did diff in > src/vsip_csl because I'm chainging other stuff. > +all:: lib/libvsip_csl.a Is this a new library? If so, why? We can put this in the ordinary VISPL++ library. > Index: matlabformatter.hpp > =================================================================== > RCS file: matlabformatter.hpp > diff -N matlabformatter.hpp > --- /dev/null 1 Jan 1970 00:00:00 -0000 > +++ matlabformatter.hpp 19 May 2006 15:59:06 -0000 > @@ -0,0 +1,103 @@ You're missing the usual header-file comments, copyright notice, etc. > + //template