How to use class in CUDA C++?

cudaMancpy · May 29, 2018, 12:07pm

Hello,

I usually use CUDA/C.
but now, I am trying to use CUDA/C++.
Using CUDA/C++ is so hard to me…
because, there is no enough direction…

How to use class in CUDA/C++?
It means how to generate object using class on device memory?

Please, let me know.

this is my code. but there are a few errors.

GPU_attribute_handler.cuh

#ifndef __GPU__attribute_handler_HPP__
#define __GPU__attribute_handler_HPP__

#include <cuda_runtime.h>

class gpu_attribute_handler {
private:
	size_t firstnode_id;			
	size_t lastnode_id;				

	char * attribute_ptr;			

	bool boundtype;					
	bool writefile;				

public:
	__device__ gpu_attribute_handler() {
		firstnode_id = 0;
	}

	__device__ ~gpu_attribute_handler() {
	}

	__device__ void set(int *temp, size_t SIZE) {
		cudaMalloc((void**)&attribute_ptr, SIZE);
		cudaMemcpy(attribute_ptr, temp, SIZE, cudaMemcpyHostToDevice);
	}
	
	__device__ void get_result(int *temp, size_t SIZE) {
		cudaMemcpy(temp, attribute_ptr, SIZE, cudaMemcpyDeviceToHost);
	}

	template<typename valuetype>
	__device__ valuetype get_value(size_t nodenumber) {
		valuetype * value_ptr = (valuetype *)attribute_ptr;
		return value_ptr[nodenumber - firstnode_id];
	}

	template<typename valuetype>
	__device__ void set_value(size_t nodenumber, valuetype value) {
		valuetype * value_ptr = (valuetype *)attribute_ptr;
		value_ptr[nodenumber - firstnode_id] = value;
	}

	template<typename valuetype>
	__device__ void add_value(size_t nodenumber, valuetype value) {
		valuetype * value_ptr = (valuetype *)attribute_ptr;
		value_ptr[nodenumber - firstnode_id] += value;
	}

	__device__ void set_boundtype(bool _boundtype) {
		boundtype = _boundtype;
	}

	__device__ void set_writefile(bool _writefile) {
		writefile = _writefile;
	}
};

#endif

kernel.cu

#include "cuda_runtime.h"
#include "GPU_attribute_handler.cuh"
#include <stdio.h>
#include <stdlib.h>
#pragma warning(disable : 4996)

#define SIZE 5

__global__ void addKernel(gpu_attribute_handler a, gpu_attribute_handler b, gpu_attribute_handler c) {
	int tid = blockDim.x * blockIdx.x + threadIdx.x;

	if (tid < SIZE) {
		int sum = a.get_value<float>(tid) + b.get_value<float>(tid);
		c.set_value<float>(tid, sum);
		
	}
}

class test {
private:	
	gpu_attribute_handler dev_a, dev_b, dev_c;
	cudaError_t err = cudaSuccess;
	int *a, *b, *c;

public:
	test() {
	}
	~test() {
	}
	
	void allocate() {
		a = (int*)malloc(SIZE * sizeof(int));
		b = (int*)malloc(SIZE * sizeof(int));
		c = (int*)malloc(SIZE * sizeof(int));
	}

	void initialization() {
		for (int i = 0; i < SIZE; i++) {
			a[i] = i + 1;
			b[i] = i + 1;
		}
	}

	void copyData() {
		dev_a.set(a, SIZE);
		dev_b.set(b, SIZE);
		dev_c.set(c, SIZE);
	}

	void operation() {
		addKernel << <256, 256 >> > (dev_a, dev_b, dev_c);
	}

	void syncToHost() {
		dev_c.get_result(c, SIZE);
	}

	void printValues() {
		for (int i = 0; i < SIZE; i++) {
			printf("%d = %d\n", i+1, c[i]);
		}
	}
};

int main() {
	test go;

	go.allocate();
	go.initialization();
	go.copyData();
	go.operation();
	go.syncToHost();
	go.printValues();
}

These are error messages.
calling a host function(“cudaMalloc”) from a device function(“gpu_attribute_handler::set”) is not allowed /* line 25 /
identifier “cudaMalloc” is undefined in device code / line 25 */

Same as line 26 and line 30 (cudaMalloc and cudaMemcpy)

cbuchner1 · May 29, 2018, 1:11pm

Regarding your compilation problems: Just read carefully what the compiler tells you.

You can’t call a host function (such as cudaMalloc, cudaMemcpy) from a device function.

A possible fix is to remove the device attribute from member functions set() and get_result(), possibly declaring it a host function instead.

You have to initiate all your memory allocations and transfers of device memory on the host, e.g. by calling set() from host code on a gpu_attribute_handler object.

Note that you can’t have the same object instance on both the CPU and the GPU. An object you create on the host resides in host memory. An object you create in a GPU kernel resides in device memory.

Your dev_a, dev_b, dev_c instances are thusly labelled incorrectly. They are host side object instances, living on the CPU’s stack frame.

You could copy an object’s contents into an object instance on device memory (provided that the object’s memory layout on both host and device is full compatible).

CUDA unified memory can assist in making the object accessible on CPU and GPU in the same memory space (maybe incurring some extra overhead). cudaMallocManaged() is the API functions to use in this case. But then, this may involve using custom allocators (e.g. with new/delete operator overloading) for your objects, which is quite an advanced C++ language feature.

Topic		Replies	Views
How to allocate class? CUDA Programming and Performance	4	1727	February 8, 2019
OOP Class Design with Device Variables CUDA Programming and Performance cuda	1	1363	August 5, 2021
Passing an object to the device by using zero-copy memory CUDA Programming and Performance	5	2683	July 29, 2013
question about calling CUDA kernels using a class CUDA Programming and Performance	5	14564	July 12, 2010
Invalid Device Pointer CUDA Programming and Performance	9	24470	January 15, 2009
Using cudamalloc in device function CUDA Programming and Performance	4	7706	September 27, 2013
c++ Project file management using CUDA Where to put the CUDA code? CUDA Programming and Performance	3	3063	August 30, 2011
c++ class and GPU how to include class in GPU ? CUDA Programming and Performance	3	1530	January 18, 2011
C++ support 2.2 CUDA Programming and Performance	7	1632	July 2, 2009
Cuda passing a class to a kernel CUDA Programming and Performance	2	1248	July 5, 2022

How to use class in CUDA C++?

Related topics