Vector product in CUDA emulate ippMul_32fc

Hi all,

I’m doing some work with CUDA and I found myself to rewrite a routine implemented

in terms of an IPP function: “ippMul_32fc”, for that purpose I wrote this kernel:


global void

cudaMul_32fc(const float2* aIn1,

                    const float2* aIn2,

                    float2* aOut,

                    const unsigned int aSize) {

const unsigned int myPos = blockIdx.x * blockDim.x + threadIdx.x;

if (myPos < aSize) {

aOut[myPos].x = aIn1[myPos].x * aIn2[myPos].x - aIn1[myPos].y * aIn2[myPos].y;

aOut[myPos].y = aIn1[myPos].x * aIn2[myPos].y + aIn1[myPos].y * aIn2[myPos].x;




is that the best implementation of it ?


May be someone missed this due to summer holidays :D