[vsipl++] Fast convolution (slow!)

Jules Bergmann jules at codesourcery.com
Wed Aug 6 18:52:45 UTC 2008


Gaetano,

> In my application I have to perform convolution of a sequence with N
> points with a sequence of M points where N ~ 1E6 points, M ~ 1K
> points.  Having a startup time of 1 hour for my application is too
> much, is there a way to serialize on disk an instance of
> "vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd,
> vsip::by_reference>"?

I agree, a startup time of 1 hour is excessive!  It turns out VSIPL++
uses FFTW3's PATIENT planning mode when number of times == 0 (since
'0' indicates the FFT object will be used an infinite number of
times).  As you're seeing, PATIENT can be very expensive.

If you set number of times to 15, VSIPL++ will use FFTW3's MEASURE
planning mode, which should be much faster than the PATIENT planning
mode, while still delivering reasonable performance.

For example, for a 1K and 1M FFTs (measuring MFLOP/s on a ~3.6 GHz
Xeon):

	ESTIMATE	MEASURE 	PATIENT
	(times == 1)	(times == 15)	(times == 0)
1K	4959		6355		7482
32K	3985		5370		6488
1M	 687		1735

While FFTW3 allows plans to be pre-computed and stored to disk, there
is no way to access the functionalty from VSIPL++ at present.

> 
> Also take a look at the following two tests:
> 
> Test1.
> 
> typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference> fwd_fft_type;
> fwd_fft_type fwd_fft(1048576, 1.0);   <==  25 minutes
> 
> 
> Test2.
> 
> vsip::Vector<T> inputs(1048576);
> typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference, 1> fwd_fft_type;
> fwd_fft_type fwd_fft(1048576, 1.0);  <== 10 ms (as expected)
> fwd_fft(inputs);                     <== 91 ms (I expected here around 25 mins)
> 
> I think something wrong is going on.

No, that's reasonable.  It is the power of heuristics doing well
against a brute force search. PATIENT (number of times == 0) does a
thorough search of the space of possible FFTs, while ESTIMATE (number
of times == 1) uses simple heuristics.

Presumably, if you measured the FFT time for Test1, it would be faster
than 91 ms.  If it wasn't, it would indicate that the ESTIMATE
heuristics happened to pick the best case heuristically, leaving
PATIENT with nothing better to find.

				-- Jules


> 
> 
> Regards
> Gaetano Mendola

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705



More information about the vsipl++ mailing list