Hi all,
I’m doing some work with CUDA and I found myself to rewrite a routine implemented
in terms of an IPP function: “ippMul_32fc”, for that purpose I wrote this kernel:
[codebox]
global void
cudaMul_32fc(const float2* aIn1,
const float2* aIn2,
float2* aOut,
const unsigned int aSize) {
const unsigned int myPos = blockIdx.x * blockDim.x + threadIdx.x;
if (myPos < aSize) {
aOut[myPos].x = aIn1[myPos].x * aIn2[myPos].x - aIn1[myPos].y * aIn2[myPos].y;
aOut[myPos].y = aIn1[myPos].x * aIn2[myPos].y + aIn1[myPos].y * aIn2[myPos].x;
}
}
[/codebox]
is that the best implementation of it ?
K.