How to make device member function directly access class member?

I first define a C++ class that allocates data I will use in the following computation. After that, I want to let the device function directly access the private variable. I get the error of illegal memory access. My target is to let the device function call another class member function which is also a device function. I hope that I am clear. Thank you !

#include <cstddef>
#include <memory>
#include <utility>
#include <iostream>

#include <cuda_runtime.h>
#include <cuda/std/atomic>

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
   if (code != cudaSuccess) 
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);

class static_map {
	void initialize() {
		capacity = 32;
		gpuErrchk(cudaMalloc((void**)&vals, capacity * sizeof(int32_t)));
		gpuErrchk(cudaMalloc((void**)&keys, capacity * sizeof(cuda::atomic<int32_t>)));

	__device__ void init() {
		int tid = threadIdx.x + blockDim.x * blockIdx.x;

		vals[tid] = 1;

	__device__ void access() {
		int tid = threadIdx.x + blockDim.x * blockIdx.x;
		printf("Thread id %d, local key %d\n", tid, keys[tid].load());

	cuda::atomic<int32_t>* keys;
	int32_t* vals;
	int capacity;

__global__ void call_device_kernel(static_map* map) {
	int tid = threadIdx.x + blockDim.x * blockIdx.x; 

int main() {
	static_map* map = new static_map();

	int block_size = 32;
	int grid_size = 1;
	call_device_kernel<<<grid_size, block_size>>>(map);

You need to transfer map to the device and pass the device pointer to you kernel.

I also check the cuCollection implementation and believe they can directly use the device function to access class members without passing the device pointer. Please check this link. cuCollections/include/cuco/static_map.cuh at dev · NVIDIA/cuCollections · GitHub

The slot is also a class member variable that is allocated on the device. I am not sure the difference between the two.

template <typename CG>
    __device__ iterator next_slot(CG const& g, iterator s) noexcept
      uint32_t index = s - slots_;
      return &slots_[(index + g.size()) % capacity_];

Your problem has nothing to do with device functions or member accesses. new static_map() allocates ordinary host memory which cannot be accessed from a kernel.

What I am confused is that I allocate device memory in the initialize function. Does it mean that the vals and keys are all pointers that point to device memory?

yes, vals and keys are device pointers.