Pass pointer to class as a kernel argument and access class methods

davide.m.bergamini · July 5, 2018, 2:34pm

Hi,
I’ve been trying to allocate dynamically a class array on CPU, allocate the correspondig memory on GPU and copy the array.
I then pass a pointer to the array located on GPU as the kernel’s argument and try to call the class’s method but every call to the methods fails (no errors are given during compilation).
Here is an example of the code:

#include <iostream>

using namespace std;

#define THREADS_NUMBER 2
#define BLOCKS_NUMBER 1


class MyClass{
private:
    double member;
public:
    __device__ __host__ MyClass() {member=0;};
    __device__ __host__ void add( double value ){ member+=value;};
    __device__ __host__ double get_member(){return member;};
};

__global__ void kernel(MyClass** class_array){
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    (*class_array)[tid].add(5.);
    printf("%lf \n",(*class_array)[tid].get_member());
}


int main (){

    MyClass* classarray = new MyClass[BLOCKS_NUMBER*THREADS_NUMBER]();
    MyClass* dev_classarray;

    size_t size= BLOCKS_NUMBER*THREADS_NUMBER*sizeof(MyClass);

    cudaMalloc(&dev_classarray, size);
    cudaMemcpy(dev_classarray, classarray, size, cudaMemcpyHostToDevice);

kernel<<<BLOCKS_NUMBER,THREADS_NUMBER>>>(&dev_classarray);

    cudaMemcpy(classarray, dev_classarray, size, cudaMemcpyDeviceToHost);

    for (int s=0; s<BLOCKS_NUMBER*THREADS_NUMBER; s++){
        cout<<classarray[s].get_member()<<endl;
    }

return 0;
}

The same code with an host function identical to the kernel (with a for cycle in it) works just fine.
Any ideas as to where is the problem?
Thank you

Robert_Crovella · July 5, 2018, 3:44pm

Any time you are having trouble with a CUDA code, it’s good practice to employ proper CUDA error checking, and also run your code with cuda-memcheck. Even if you don’t understand the error output generated, it will be useful to others that you may ask for help.

I suggest doing this before asking others for help. Not sure what proper CUDA error checking is? Google “proper CUDA error checking” and take the first hit, and read it and apply it to your code.

There is certainly a problem here:

kernel<<<BLOCKS_NUMBER,THREADS_NUMBER>>>(&dev_classarray);
                                         ^

The address of dev_classarray is a location in host memory. That sort of construct is almost never usable in ordinary CUDA device code. Attempting to dereference that pointer in device code will result in a device code execution fault - the device is not allowed to dereference a host pointer (i.e. a pointer that points to a location in host memory space). The device code is not allowed to access host memory space, when that space has been allocated using an ordinary host allocation method (excepting Power9 Coherency/ATS, which is presumably not at issue here).

There is an additional usage error indicating a lack of comprehension around the use of double pointers and cudaMalloc, but we can leave that aside for now.

It’s not clear why you wish to pass a double-pointer to your kernel. For the functionality you have shown here, single pointer usage should suffice:

$ cat t129.cu
#include <iostream>
#include <stdio.h>
using namespace std;

#define THREADS_NUMBER 2
#define BLOCKS_NUMBER 1


class MyClass{
private:
    double member;
public:
    __device__ __host__ MyClass() {member=0;};
    __device__ __host__ void add( double value ){ member+=value;};
    __device__ __host__ double get_member(){return member;};
};

__global__ void kernel(MyClass* class_array){
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    class_array[tid].add(5.);
    printf("%f \n", class_array[tid].get_member());
}


int main (){

    MyClass* classarray = new MyClass[BLOCKS_NUMBER*THREADS_NUMBER]();
    MyClass* dev_classarray;

    size_t size= BLOCKS_NUMBER*THREADS_NUMBER*sizeof(MyClass);

    cudaMalloc(&dev_classarray, size);
    cudaMemcpy(dev_classarray, classarray, size, cudaMemcpyHostToDevice);

kernel<<<BLOCKS_NUMBER,THREADS_NUMBER>>>(dev_classarray);

    cudaMemcpy(classarray, dev_classarray, size, cudaMemcpyDeviceToHost);

    for (int s=0; s<BLOCKS_NUMBER*THREADS_NUMBER; s++){
        cout<<classarray[s].get_member()<<endl;
    }

return 0;
}
$ nvcc -o t129 t129.cu
$ cuda-memcheck ./t129
========= CUDA-MEMCHECK
5.000000
5.000000
5
5
========= ERROR SUMMARY: 0 errors
$

Topic		Replies	Views
Cuda passing a class to a kernel CUDA Programming and Performance	2	1248	July 5, 2022
How to implement a generic array in unified memory? CUDA Programming and Performance	7	2027	July 2, 2017
How to pass pointer of class pointer into CUDA kernel CUDA Programming and Performance	3	1423	May 1, 2017
use pointer in c++ class CUDA Programming and Performance	5	3353	August 5, 2014
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	19004	December 18, 2009
How do I pass a double pointers array to the device? I'm getting cudaErrorIllegalAddress CUDA Programming and Performance	12	3495	January 17, 2024
Passing a structure with a pointer How do you pass a structure with a pointer in it to a kernel CUDA Programming and Performance	8	1236	March 22, 2011
Complex structs in CUDA CUDA Programming and Performance	5	855	May 7, 2024
CUDA and 64bit Linux problem CUDA Programming and Performance	5	4078	November 2, 2007
How to handle Advisory . CUDA Programming and Performance	9	3473	March 26, 2009

Pass pointer to class as a kernel argument and access class methods

Related topics