[vsipl++] Fast convolution (slow!)

Wed Aug 6 13:33:38 UTC 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jules Bergmann wrote:
> Gaetano Mendola wrote:
>> Hi all,
>> I'm evaluating VSIPL++ for an application heavily based on convolutions.
> 
> Gaetano,
> 
> Thank you for evaluating VSIPL++!  We'll try to answer your questions
> as best we can to make sure your evaluation is a success.
> 
> VSIPL++ separates the creation of an FFT object (the 'fwd_fft_type
> fwd_fft') from the invocation (the 'fwd_fft(inputs, outputs') to allow
> potentially expensive setup computations to occur "outside" the
> computation inner loop.  For example, a simple FFT backend might
> pre-compute twiddle factors, while a more elaborate backend might
> determine which FFT choices (radix, DIF vs DIT, etc) is best suited
> for the architecture.  The intent is that setup computations will be
> amortized over a large number of FFT invocations.
> 
> That is occurring here.  The FFT backend use by VSIPL++ (persumably
> FFTW3) is spending considerable time planning the FFTs (20 sec per
> FFT) to give an efficient execution.
> 
> If you really are performing the convolution only once
> (i.e. preventing the setup time from being amortized), you might
> consider setting the "number of times" template parameter to 1.  This
> will tell the FFT BE to spend less effort planning.  This will slow
> the FFT itself, but the overall time should be smaller.  The default
> number of times is 0 corresponds to "infinite", which results in a
> large planning effort.
> 
>       typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd,
>                         vsip::by_reference,
>                         1 // Number of times
>                         > fwd_fft_type;
> 
> You might also consider using another FFT backend library besides
> FFTW3.  IPP may spend less time planning, without sacrificing
> performance.  However, IPP may have poorer performance on
> non-power-of-2 FFTs.
> 
> Also, to be clear, what FFT library are you using?  Is it the default
> provided by VSIPL++ (FFTW3), or another one?

I do believe I'm using the FFTW3, I'm linking with:

- -lfftw3 -lfftw3f -lfftw3l

My application will do lot of convolution, so the best would be let the
vsip::Fft::CTOR make a full plan, but this is not feasible with the large N I
have to use. I have done some test with power of 2 and this is what I get:

N: 2 -> 001 ms
N: 4 -> 000 ms
N: 8 -> 000 ms
N: 16 -> 000 ms
N: 32 -> 013 ms
N: 64 -> 035 ms
N: 128 -> 083 ms
N: 256 -> 140 ms
N: 512 -> 324 ms
N: 1024 -> 678 ms
N: 2048 -> 1.477 sec [1477 ms]
N: 4096 -> 3.529 sec [3529 ms]
N: 8192 -> 8.569 sec [8569 ms]
N: 16384 -> 19.157 sec [19157 ms]
N: 32768 -> 23.527 sec [23527 ms]
N: 65536 -> 53.429 sec [53429 ms]
N: 131072 -> 1:34.803 [94803 ms]
N: 262144 -> 3:34.710 [214710 ms]
N: 524288 -> 9:30.746 [570746 ms]
N: 1048576 -> 25:24.579 [1524579 ms]
N: 2097152 -> 56:42.435 [3402435 ms]
N: 4194304 -> 2:10:8.423 [7808423 ms]

In my application I have to perform convolution of a sequence with N points with
a sequence of M points where N ~ 1E6 points, M ~ 1K points.

Having a startup time of 1 hour for my application is too much, is there a way to
serialize on disk an instance of "vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference>"?

Also take a look at the following two tests:

Test1.

typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference> fwd_fft_type;
fwd_fft_type fwd_fft(1048576, 1.0);   <==  25 minutes

Test2.

vsip::Vector<T> inputs(1048576);
typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference, 1> fwd_fft_type;
fwd_fft_type fwd_fft(1048576, 1.0);  <== 10 ms (as expected)
fwd_fft(inputs);                     <== 91 ms (I expected here around 25 mins)

I think something wrong is going on.

Regards
Gaetano Mendola

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFImagy7UpzwH2SGd4RAiATAJ4qHhFC0A5Aw7IFpzJUwCvcJoP3igCguuCL
y1Nj6TOitedpsXMG3eFsijY=
=Vbwi
-----END PGP SIGNATURE-----