C++ support 2.2

So one cant have classes inside kernel code???

However, the folllowing code works well on CUDA 2.2. So, if I write kernels like this – Does it mean, I am in un-supported land?

#include <stdio.h>

class base

{

protected:

	int data[100]; /* __device__ qualification FAILS for data members of a class */

public:

	 __device__ __host__ int store(int i, int d)

	 {

		if (i<100)

		{

		 data[i] = d;

		}

		return 0;

	 }

};

/*

 * __device__ and __host__ qualifiers work only for FUNCTIONS.

 *

 * Using it in front of "class" specifier does NOT work.

 *

 * To use a C++ object in kernel, one needs to use the __device__ in front of all functions 

 * in that class.

 * The data-members dont need any qualification. Compiler errors otherwise.

 * The data-members reside in the same space where the object is declared.

 */

class sample : public base

{

	public:

	__device__ sample()

	{

		for(int i=0; i<100; i++)

			data[i] = i;

	}

	__device__ __host__ int fetch(int i)

	{

		return data[i];

	}

};

__global__ void mykernel(float *result)

{

	__shared__ sample d; /* data[100] private member occupies SHARED MEMORY space */

	int sum = 0;

	for(int i=0; i<100; i++)

		sum += d.fetch(i);

	*result = sum;

		d.store(0, sum);

}

int main(void)

{

	void *result;

	float data;

	cudaMalloc(&result, sizeof(float));

	mykernel <<< 1, 1>>> ((float*)result);

	cudaThreadSynchronize();

	cudaMemcpy(&data, result, sizeof(float), cudaMemcpyDeviceToHost);

	printf("%f\n", data);

	return 0;

}

yep, totally unsupported

the programming guide isn’t entirely correct, templates are officially supported for device code

Thanks for the quick clarification…

We were just wondering if we could use that strategy for porting a software.

Now, Your answer sets it all clear. We will go the C way. Thanks!

In practice, simple classes with public/private data and methods work fine.

It’s pointers that are the sticking point, that rules out pretty much any C++ library, virtual inheritance, etc.

But I’ve found that simple classes are enough for a lot of nice techniques… especially when used as functors with templates. I even use template metaprogramming a lot in CUDA for a PRNG.

There’s a small gotcha where device emulation mode doesn’t like private data or methods. That’s likely a bug, I haven’t tested it since CUDA 2.0. Workaround is to just do simple things like:

#ifndef __DEVICE_EMULATION__

 private:

#endif

Study all of Mark Harris’s code like the scan examples and CUDPP. He’s really good at using templates to generalize and parameterize kernels. I learned a lot from him (and it’s not even GPU-specific)

And Tim, you’re up way too late again.

SPWorley,

Thanks! Yeah, it looks like small things work… But you never know when they will break, since the official stand is “un-supported”. :-(

this is fixed with the 2.3 release

@sarnath, that class fails with the int data[100] because you’re then effectively asking for dynamic device memory allocation, and that just aint gonna happen. This is true even if it were going to be local memory.

This is where memory becomes a gotcha… when the fundamentals become invalid, it just affects everything.

Maybe a workaround is to statically allocate the local memory yourself then pass it to your class in the constructor. That’s ugly too but is likely to work.

@SPW,

The code works perfectly on CUDA 2.2. I dont understand what dynamic memory u r talking about.

While compiling, the compiler correctly reports 400+ bytes as shared memory per block…

So, things work… but since the official stand is un-supported, I won’t code anything based on that.