[vsipl++] Fast convolution (slow!)
Jules Bergmann
jules at codesourcery.com
Wed Aug 6 18:52:45 UTC 2008
Gaetano,
> In my application I have to perform convolution of a sequence with N
> points with a sequence of M points where N ~ 1E6 points, M ~ 1K
> points. Having a startup time of 1 hour for my application is too
> much, is there a way to serialize on disk an instance of
> "vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd,
> vsip::by_reference>"?
I agree, a startup time of 1 hour is excessive! It turns out VSIPL++
uses FFTW3's PATIENT planning mode when number of times == 0 (since
'0' indicates the FFT object will be used an infinite number of
times). As you're seeing, PATIENT can be very expensive.
If you set number of times to 15, VSIPL++ will use FFTW3's MEASURE
planning mode, which should be much faster than the PATIENT planning
mode, while still delivering reasonable performance.
For example, for a 1K and 1M FFTs (measuring MFLOP/s on a ~3.6 GHz
Xeon):
ESTIMATE MEASURE PATIENT
(times == 1) (times == 15) (times == 0)
1K 4959 6355 7482
32K 3985 5370 6488
1M 687 1735
While FFTW3 allows plans to be pre-computed and stored to disk, there
is no way to access the functionalty from VSIPL++ at present.
>
> Also take a look at the following two tests:
>
> Test1.
>
> typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference> fwd_fft_type;
> fwd_fft_type fwd_fft(1048576, 1.0); <== 25 minutes
>
>
> Test2.
>
> vsip::Vector<T> inputs(1048576);
> typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference, 1> fwd_fft_type;
> fwd_fft_type fwd_fft(1048576, 1.0); <== 10 ms (as expected)
> fwd_fft(inputs); <== 91 ms (I expected here around 25 mins)
>
> I think something wrong is going on.
No, that's reasonable. It is the power of heuristics doing well
against a brute force search. PATIENT (number of times == 0) does a
thorough search of the space of possible FFTs, while ESTIMATE (number
of times == 1) uses simple heuristics.
Presumably, if you measured the FFT time for Test1, it would be faster
than 91 ms. If it wasn't, it would indicate that the ESTIMATE
heuristics happened to pick the best case heuristically, leaving
PATIENT with nothing better to find.
-- Jules
>
>
> Regards
> Gaetano Mendola
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list