How to run member function of an object in kernel?

I want to create around 1000 objects in host, then copy them to device, and create 1000 threads to run the 1000 object functions in parallel. How can I do this? I am trying the following code. It can be compiled, but when I copy the results from device to host, there is an error: cudaErrorIllegalAddress

#include <cuda_runtime.h>
#include <iostream>

using namespace std;

class AA{
double* a_d;
double* a_h; 
unsigned int mem_size;
int n;
	AA(double* a_h_,int n_){
		mem_size = sizeof(double) * n;

	void InitializeOnGPU(){
		cudaError_t error;
		error = cudaMalloc((void **) &a_d, mem_size);
		error = cudaMemcpy(a_d, a_h, mem_size, cudaMemcpyHostToDevice);
		//cudaError_t error2;
		//error2 = cudaMemcpy(a_h, a_d, mem_size, cudaMemcpyDeviceToHost);

	__device__ void RunOnGPU(){
		for (int ic=0;ic<n;ic++){

	void CopySolutionFromGPU2CPU(){
		cudaError_t error;
		error = cudaMemcpy(a_h, a_d, mem_size, cudaMemcpyDeviceToHost);

	void PrintSolutions(){
		for(int ic=0;ic<n;ic++){


__global__ void RunObjFun(AA** AAs_d){
	int bx = blockIdx.x;
	int tx = threadIdx.x;
	int Id_t=blockDim.x*bx+tx;

int main(){

int n=3;
double* a_h=new double[n];
int ncase=10;
AA** AAs_h=new AA*[ncase];
for (int ic=0;ic<ncase;ic++){
	AAs_h[ic]=new AA(a_h,n);

AA** AAs_d;
unsigned int mem_size_As = sizeof(AA)*ncase;
cudaError_t error;
error = cudaMalloc((void **) &AAs_d, mem_size_As);
error = cudaMemcpy(AAs_d, AAs_h, mem_size_As, cudaMemcpyHostToDevice);


for (int ic=0;ic<ncase;ic++){
return 0;

In the future, please format your code with the code formatting button in the toolbar, it looks like </>
Select your code, then click that button.

You’ve got single pointer and double pointer usage on AAs_d and AAs_h messed up. It’s not clear why you need to use double pointers, and using double pointers this way will significantly increase your code complexity, necessitating the need for deep copies, which can be challenging for beginners.

Just to give you one example of your logical errors:

AA** AAs_d;                                // double pointer, which therefore points to an array of pointers
unsigned int mem_size_As = sizeof(AA)*ncase;     // this size is wrong.  you should be allocating sizeof(AA*)*ncase, then you would need an additional allocation for each pointer in the array

If you are willing to solve this problem with an array of your objects, i.e.

AA* AAs_h=new AA[ncase];
AA* AAs_d;

Your coding effort will be much easier. I’m not saying those changes only will fix your code. I’m saying creating a proper code to do what you want, with that kind of array-of-objects starting point, will be much easier.

Thank you. I didn’t want to implement a default constructor, so I used double pointer to dynamically allocate memory for object. But your way can make things easier.