OpenCL code from an existing CUDA kernel Pointwise multiplication

Hi all,

I consider myself a newbie of GPU programming so…

I developed a CUDA code for audio convolution.

It’s quite straightforward because basically you need to do an fft of the inputs, then perform pointwise multiplication and then ifft.

Now what I need is a “porting” to OpenCL.

I’m mostly done except for the pointwise multiplication. I have to write the kernel by myself and i’m trying to “emulate” the CUDA implementation but…

It’s kinda complicate to because writing kernels in OpenCL is far more complex than with CUDA.

Can someone gives me some advices?

Here is the original code.

////////////////////////////////////////////////////////////////////////////////

// Complex operations

////////////////////////////////////////////////////////////////////////////////

// Complex multiplication

static __device__ __host__ inline cufftComplex ComplexMul(cufftComplex a, cufftComplex b)

{

	cufftComplex c; 

	c.x = a.x * b.x - a.y * b.y;

	c.y = a.x * b.y + a.y * b.x;

	return c;

}

// Complex pointwise multiplication

// Based on ComplexPointwiseMulAndScale but without scaling... It creates more problems than it solves...

static __global__ void ComplexPointwiseMul(cufftComplex* a, const cufftComplex* b, int size)

{

	const int numThreads = blockDim.x * gridDim.x;

	const int threadID = blockIdx.x * blockDim.x + threadIdx.x;

	for (int i = threadID; i < size; i += numThreads)

		a[i] =ComplexMul(a[i], b[i]); 

}

Bests.

That quite a subjective view ;-) It all depends on what you’re used to.

This should basically do it:

typedef float2 Complex;

Complex ComplexMul(Complex a, Complex b)

{

        Complex c; 

        c.x = a.x * b.x - a.y * b.y;

        c.y = a.x * b.y + a.y * b.x;

        return c;

}

__kernel void ComplexPointwiseMul(__global Complex* a, __global const Complex* b, int size)

{

        const int numThreads = get_local_size(0) * get_num_groups(0);

        for (int i = get_global_id(0); i < size; i += numThreads)

                a[i] = ComplexMul(a[i], b[i]); 

}

There are a lot of nice tutorials / tools on how to convert CUDA to OpenCL. Take e.g. a look at this presentation or this program.

Thank you for your answer.
I get the point on “habits” but i think that as long as won’t exists something like NVCC it will be at least a little bit more difficult for us :)
Anyway now i’m facing new errors such as:
Undefined symbols for architecture i386:
“_oclLoadProgSource”, referenced from:
_OCLconv in OCLconv.cpp.o
“_oclGetFirstDev”, referenced from:
_OCLconv in OCLconv.cpp.o
“_oclLogBuildInfo”, referenced from:
_OCLconv in OCLconv.cpp.o
“_oclLogPtx”, referenced from:
_OCLconv in OCLconv.cpp.o
ld: symbol(s) not found for architecture i386

i linked my code with -framework OpenCL since i’m a mac user. am i doing something wrong? i’m including oclUtils.h.

EDIT: Solved, I swapped two cpp files, forgive me…

In fact, the external compiler in the form of NVCC, and the custom build rules you have to set up for CUDA are exactly the points that I dislike the most in CUDA. In OpenCL, the compiler is part of the driver, and you don’t need any custom build rules.

Anyway, speaking of “something like NVCC” for OpenCL, check out this topic about clcc.