How to pass pointer of class pointer into CUDA kernel

Hi All,

I am now exploring task parallelism using CUDA. Basically, I want to run 32 independent tasks in parallel on GPU, and each task is mapped to a thread block. Each task may run 20 iterations and has its own memory space.

In my program, a memory (work) space is a class having multiple int array pointers. Therefore, I need to be able to see 32 class pointers in the CUDA kernel, each pointer corresponds to a memory space for a task.

I understand that the host needs to pass 32 class pointers into the CUDA kernel. But when I try to pass an array of 32 class pointers (pointer of pointer) into the CUDA kernel, many issues come up.

I am not sure if I am doing in a reasonable way to explore task parallelism in CUDA. Does anyone have related experience or comments? Thanks.

Up, Up, Up…

Alright, take everything I am going to say with a grain of salt. Since I never did what you are trying to do.

What I would do is just copy the memory of the classes over to cuda, and move from there. The main problem is if your classes have pointers, meaning copy the class is not copying the data, same problem that arises if you do a memcpy of a std::vector, you are not copying the actual vector data but the vector instance.
If the data is included in your class and you don’t have indirection copying that data normally on the card should be fine, then you can reinterpret cast your pointers and be good to go. Be aware that cuda might support structs only being a c API.

Can you give more insight into your actual problem? In the end only passing pointers won’t be a solution because unless you have managed memory the graphic card cannot access the data on the card directly (and even if managed you have copies going on under the hood). So your classes need to be copied on the card’s memory

Thank you, I have tried copying all the classes into GPU, and it works.