[vsipl++] SIMD threshold with loop fusion
Jules Bergmann
jules at codesourcery.com
Wed May 16 03:43:25 UTC 2007
>> [1] This looks good. However, do you think faux-SIMD should have the
>> same "API" as the real SIMD functions below?
>>
>> For example, AltiVec vgt returns 0xFFFFFFFF or 0x00000000 for each
>> position. That can be used as a mask. (What does SSE do?)
> SSE is the same thing because there is a website that has a
> cross-reference for altivec and sse instructions.
>>
>> Since faux SIMD returns 1 or 0, it can't be used as mask. A generic
>> routine that uses vgt may not work with faux-simd if it expects
>> vgt/vlt to return a value valid for a mask.
> Why not? I use normal bit operations on the return values. If I and '1'
> with another value, I get the value, right?
It depends on whether the 'and' is binary or logical.
I.e. if you do something like
mask = simd::vgt(a, b);
result = simd::band(mask, a);
For AltiVec and SSE, this does the right thing because mask[i] is
0xffffffff when a[i] > b[i].
For faux-simd, mask[0] is 0x00000001 when a[0] > b[0]. That will pull
just the lowest order bit out of a[0], not the entire value.
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list