user-customized kernel code with CUDA 4.0

adamjmac · March 23, 2011, 11:55pm

I’m writing a particle physics engine as a DLL, where a user would write a driver program that runs their simulation. I would like the user to easily be able to write a custom interaction to calculate the force between two particles, for example:

__device__ float spring_func(float dr) { return -25 * dr; }

and then that would be passed to the DLL like:

physics.addInteraction(particleset1, particleset2, spring_func);

physics.simulate();

Internally, my kernel would accept spring_func as a parameter and execute it on certain pairs of particles.

My question is, what would be the best way of going about this?

Currently, the user’s test program does not have any device functions, as it loads the DLL at run time, so it does not need NVCC to be compiled. With custom device code though, I suppose there’s no way around forcing the user to use NVCC (and all the CUDA headers) to compile their simulation, correct?

I’ve read about how device function pointers do not work on the host, but you can have a table of functions on the device to be indexed later. I guess the user would have to add their custom function to a device array and pass an index to the DLL. Would this be the best way for CUDA 3.2?

With virtual function pointers in CUDA 4.0, I would expect I can write a base functor class on the DLL with a virtual operator() function, and then let the user write a custom class that inherits the base class, which implements the operator() function, to be called by a kernel inside the DLL. I don’t have 4.0 yet, but can I expect this method to work?

Thanks in advance

adamjmac · March 26, 2011, 2:45pm

For those facing the issue of the inability to customize kernel code at run-time, I solved this by implementing a general assembly-like parser. The user inputs his equation as a set of instructions, and each one is evaluated within a kernel. The instructions are stored in constant memory, and, in my case, it’s surprisingly the exact same speed as hard coding the equation.

This can and will be extended to support if statements, and other math like sqrt, etc.

float reg[8];

	float cur, obj;

	for (int i = 0; i < MAX_FUNCSTEPS; ) {

		if (f[i].op == OP_RET) {

			return cur;

		} else if (f[i].op == OP_STO) {

			reg[ f[i].obj - OBJ_REG0 ] = cur;

		} else {

			switch (f[i].obj) {

				case(OBJ_CUR):

					obj = cur;

					break;

				case(OBJ_VALUE):

					obj = f[i].val;

					break;

				case(OBJ_DR):

					obj = dr;

					break;

				case(OBJ_DV):

					obj = dv;

					break;

				case(OBJ_RND):

					obj = rnd;

					break;

				default:

					obj = reg[ f[i].obj - OBJ_REG0 ];

					break;

			}

			switch (f[i].op) {

				case(OP_SET):

					cur = obj;

					break;

				case(OP_ADD):

					cur += obj;

					break;

				case(OP_SUB):

					cur -= obj;

					break;

				case(OP_MUL):

					cur *= obj;

					break;

				case(OP_DIV):

					cur /= obj;

					break;

			}

		}

		i++;

	}