Best Method for Integrating CUDA Code with C++ Code?

I’ve only recently started using CUDA, and I’m trying to figure out the best way to organise my code with a view to this becoming a large project with much C++ and CUDA interaction.

My first attempt involves the following files:

//file.cu

__global__ void calculate(int* args)

{

}

extern "C"

void calculateKernel(int* args,  int blocksPerGrid, int threadsPerBlock)

{

	calculate<<<blocksPerGrid, threadsPerBlock>>(args);

}
//file.cuh

extern "C"

void calculateKernel(int* args,  int blocksPerGrid, int threadsPerBlock);
//file.h

#include "file.cuh"

void function()

{

	int *args;

	calculateKernel(args, 256, 1000);

}

This all works perfectly, but it seems like a lot of files and a lot of redundant information. Ideally I would like to be able to define the blocksPerGrid and threadsPerBlock in my C++ as it is dependent on the data I will be passing in, but having this extra function that takes the exact same parameters with two extra ones for every kernel I have seems like quite a bit of overhead. To add an extra parameter to my kernel requires quite a lot of changes to the rest of my code.

So my question is are there any techniques for getting around all this extra code that may simplify the situation given that I will be creating a lot of kernels that will need to be called from C++?

What you describe is exactly what I do, and I haven’t found anything better yet. One way to avoid having to change the code in too many places would be to wrap up the arguments to calculateKernel in a struct. Then, when you add a parameter, you at least don’t have to change the function declaration and definition lines. With default arguments in the struct’s constructor you could even avoid having to change all existing calls to that kernel elsewhere in the code (assuming the default value for the new argument is appropriate).

With the recent improvements in C++ support in nvcc, something even nicer might be possible. I wonder if it is possible to make a kernel invocation from a class member function? I haven’t tried it, but supposing it is possible, there are two ideas that spring to my head.

// file.h

class SomeClass

	{

	void method1();

	void runKernel();

	int blocksPerGrid;

	int threadsPerBlock;

	int *args;

	};
// file.cc

#include "SomeClass.h"

SomeClass::method1()

...
// file.cu

#include "SomeClass.h"

__global__ void calculate(int* args)

{

}

void SomeClass::calculateKernel()

{

	calculate<<<blocksPerGrid, threadsPerBlock>>(args);

}

Again, I have no idea if that would compile with nvcc but it would be cool if it did work :) Of course in large apps, SomeClass is likely to contain strings, maps, boost shared pointers, etc… and nvcc has had historically notoriously bad support for even being able to compile the header files of those libraries. So as nice as it is, I probably wouldn’t consider using this method in a production code until several versions of nvcc in a row are proven to reliably compile all of boost and the stl.

The 2nd, related, idea I had was to wrap up the kernel call into a bare-bones functor like class declared in file.cuh and defined in file.cu. Thinking more about it, I don’t see how it is any different than the simple C-style function call concept, it just has a lot more syntax and even more code to manage when you add arguments. Throw that idea away.