How to make device member function directly access class member?

szhangcj · June 16, 2023, 9:48am

I first define a C++ class that allocates data I will use in the following computation. After that, I want to let the device function directly access the private variable. I get the error of illegal memory access. My target is to let the device function call another class member function which is also a device function. I hope that I am clear. Thank you !

#include <cstddef>
#include <memory>
#include <utility>
#include <iostream>

#include <cuda_runtime.h>
#include <cuda/std/atomic>

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}


class static_map {
public:
	void initialize() {
		capacity = 32;
		gpuErrchk(cudaMalloc((void**)&vals, capacity * sizeof(int32_t)));
		gpuErrchk(cudaMalloc((void**)&keys, capacity * sizeof(cuda::atomic<int32_t>)));
	}

	__device__ void init() {
		int tid = threadIdx.x + blockDim.x * blockIdx.x;

		keys[tid].store(1);
		vals[tid] = 1;
	}

	__device__ void access() {
		int tid = threadIdx.x + blockDim.x * blockIdx.x;
		
		printf("Thread id %d, local key %d\n", tid, keys[tid].load());
	}


private:
	cuda::atomic<int32_t>* keys;
	int32_t* vals;
	int capacity;
};

__global__ void call_device_kernel(static_map* map) {
	int tid = threadIdx.x + blockDim.x * blockIdx.x; 
	map->init();
}

int main() {
	static_map* map = new static_map();
	map->initialize();

	int block_size = 32;
	int grid_size = 1;
	call_device_kernel<<<grid_size, block_size>>>(map);
	gpuErrchk(cudaDeviceSynchronize());
}

striker159 · June 16, 2023, 11:49am

You need to transfer map to the device and pass the device pointer to you kernel.

szhangcj · June 17, 2023, 8:10am

I also check the cuCollection implementation and believe they can directly use the device function to access class members without passing the device pointer. Please check this link. cuCollections/include/cuco/static_map.cuh at dev · NVIDIA/cuCollections · GitHub

The slot is also a class member variable that is allocated on the device. I am not sure the difference between the two.

template <typename CG>
    __device__ iterator next_slot(CG const& g, iterator s) noexcept
    {
      uint32_t index = s - slots_;
      return &slots_[(index + g.size()) % capacity_];
    }

striker159 · June 17, 2023, 9:59am

Your problem has nothing to do with device functions or member accesses. new static_map() allocates ordinary host memory which cannot be accessed from a kernel.

szhangcj · June 17, 2023, 10:02am

What I am confused is that I allocate device memory in the initialize function. Does it mean that the vals and keys are all pointers that point to device memory?

striker159 · June 17, 2023, 10:52am

yes, vals and keys are device pointers.

Topic		Replies	Views
Problem accessing global memory General protection fault CUDA Programming and Performance	4	7767	October 31, 2007
CUDA function qualifiers in classes CUDA Programming and Performance	8	26938	April 12, 2012
How to use class in CUDA C++? CUDA Programming and Performance	1	19403	May 29, 2018
OOP Class Design with Device Variables CUDA Programming and Performance cuda	1	1376	August 5, 2021
The most basic problem,ask for help CUDA Programming and Performance	5	2087	February 2, 2009
CUDA with Pthread CUDA Programming and Performance	5	8450	July 24, 2008
Invalid Device Pointer CUDA Programming and Performance	9	24498	January 15, 2009
Passing device memory pointer between classes CUDA Programming and Performance	0	1714	March 21, 2014
cuda function pointers to class member functions (change in Pascal?) CUDA Programming and Performance	5	1173	June 2, 2019
Pass pointer to class as a kernel argument and access class methods CUDA Programming and Performance	1	3460	July 5, 2018

How to make device member function directly access class member?

Related topics