Working with cuda and class methods

I am trying to figure out how to write neat cuda C++ code. However i am finding it hard. The most problematic thing is that i don’t know how to integrate cuda into a class. Often what i do is writing a global global kernel and then launch it inside a class method.

This however is cumbersome because i have to pass on all the class parameters into the global kernel as arguments.

When i found out it was possible to pass on a class object as a argument into a kernel i was delighted. I was also delighted to find out i could have device methods as part of the class.

However i still require massive amounts of global kernel calls to trigger these device class methods. When i found out about dynamic parallelism i was elated but quickly found out that dynamic parallelism is only possible when using global kernels which cannot be part of a class as a class method. Or at least i haven’t found a way of doing this as of yet.

Is there something i am missing? How does one neatly integrate cuda into a c++ class methods?

Kind regards

This best way i found so far is:

main.cpp:

#include <stdio.h>
#include "Hello.cuh"


int main() {
	Hello hello();
	hello.pr();
	return 0;
}

Hello.cuh:

#pragma once
#include "device_launch_parameters.h"
#include <string>

using namespace std;

struct Hello_p
{
	int i;
	__device__ void d_pr()
	{
		printf("Yolo %d",i);
	}
};

class Hello
{
public:
	Hello_p parameters;
	Hello()
	{
		parameters.i = 10;
	}
	void pr();
};

Hello.cu:

#include "Hello.cuh"

__global__ void g_pr(Hello_p h) {
   h.d_pr();
}

void Hello::pr()
{
	g_pr<<<1,2>>>(parameters);
	cudaDeviceSynchronize();
}

Can this be done better, such that i don’t need a separate struct that contains my class parameters and for each method a separate global? Is this the best approaching combining cuda and c++?

If you want class methods that are callable from host code and do something meaningful on the device, they will need a global function in them, somewhere. (The exception I can think of would be persistent thread kernels, and let’s not go there.) And parameters that are set up in host code will need to be passed to the device, in one fashion or another.

I don’t think you need a separate struct that contains your class parameters, but I may simply be misinterpreting your question.

At the moment, I’m having trouble getting your code to compile. Not sure if you intended it to be compilable. Maybe I am doing something stupid. I suspect you are on windows. I am on linux. If I change this:

Hello hello();
hello.pr();

to this:

Hello hello;
hello.pr();

it compiles and runs fine.