[vsipl++] Fast convolution (slow!)
Gaetano Mendola
gmendola at mbigroup.it
Wed Aug 6 13:33:38 UTC 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jules Bergmann wrote:
> Gaetano Mendola wrote:
>> Hi all,
>> I'm evaluating VSIPL++ for an application heavily based on convolutions.
>
> Gaetano,
>
> Thank you for evaluating VSIPL++! We'll try to answer your questions
> as best we can to make sure your evaluation is a success.
>
> VSIPL++ separates the creation of an FFT object (the 'fwd_fft_type
> fwd_fft') from the invocation (the 'fwd_fft(inputs, outputs') to allow
> potentially expensive setup computations to occur "outside" the
> computation inner loop. For example, a simple FFT backend might
> pre-compute twiddle factors, while a more elaborate backend might
> determine which FFT choices (radix, DIF vs DIT, etc) is best suited
> for the architecture. The intent is that setup computations will be
> amortized over a large number of FFT invocations.
>
> That is occurring here. The FFT backend use by VSIPL++ (persumably
> FFTW3) is spending considerable time planning the FFTs (20 sec per
> FFT) to give an efficient execution.
>
> If you really are performing the convolution only once
> (i.e. preventing the setup time from being amortized), you might
> consider setting the "number of times" template parameter to 1. This
> will tell the FFT BE to spend less effort planning. This will slow
> the FFT itself, but the overall time should be smaller. The default
> number of times is 0 corresponds to "infinite", which results in a
> large planning effort.
>
> typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd,
> vsip::by_reference,
> 1 // Number of times
> > fwd_fft_type;
>
> You might also consider using another FFT backend library besides
> FFTW3. IPP may spend less time planning, without sacrificing
> performance. However, IPP may have poorer performance on
> non-power-of-2 FFTs.
>
> Also, to be clear, what FFT library are you using? Is it the default
> provided by VSIPL++ (FFTW3), or another one?
I do believe I'm using the FFTW3, I'm linking with:
- -lfftw3 -lfftw3f -lfftw3l
My application will do lot of convolution, so the best would be let the
vsip::Fft::CTOR make a full plan, but this is not feasible with the large N I
have to use. I have done some test with power of 2 and this is what I get:
N: 2 -> 001 ms
N: 4 -> 000 ms
N: 8 -> 000 ms
N: 16 -> 000 ms
N: 32 -> 013 ms
N: 64 -> 035 ms
N: 128 -> 083 ms
N: 256 -> 140 ms
N: 512 -> 324 ms
N: 1024 -> 678 ms
N: 2048 -> 1.477 sec [1477 ms]
N: 4096 -> 3.529 sec [3529 ms]
N: 8192 -> 8.569 sec [8569 ms]
N: 16384 -> 19.157 sec [19157 ms]
N: 32768 -> 23.527 sec [23527 ms]
N: 65536 -> 53.429 sec [53429 ms]
N: 131072 -> 1:34.803 [94803 ms]
N: 262144 -> 3:34.710 [214710 ms]
N: 524288 -> 9:30.746 [570746 ms]
N: 1048576 -> 25:24.579 [1524579 ms]
N: 2097152 -> 56:42.435 [3402435 ms]
N: 4194304 -> 2:10:8.423 [7808423 ms]
In my application I have to perform convolution of a sequence with N points with
a sequence of M points where N ~ 1E6 points, M ~ 1K points.
Having a startup time of 1 hour for my application is too much, is there a way to
serialize on disk an instance of "vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference>"?
Also take a look at the following two tests:
Test1.
typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference> fwd_fft_type;
fwd_fft_type fwd_fft(1048576, 1.0); <== 25 minutes
Test2.
vsip::Vector<T> inputs(1048576);
typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference, 1> fwd_fft_type;
fwd_fft_type fwd_fft(1048576, 1.0); <== 10 ms (as expected)
fwd_fft(inputs); <== 91 ms (I expected here around 25 mins)
I think something wrong is going on.
Regards
Gaetano Mendola
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFImagy7UpzwH2SGd4RAiATAJ4qHhFC0A5Aw7IFpzJUwCvcJoP3igCguuCL
y1Nj6TOitedpsXMG3eFsijY=
=Vbwi
-----END PGP SIGNATURE-----
More information about the vsipl++
mailing list