Hello,
I have searched the forums and having tried most solutions , I am forced to ask this rather trivial question… But how would you pass a structure as an argument to a kernel when the structure holds a pointer ?
Suppose I have a simple structure as
struct test_pointer{
double *sample;
};
I have tried
test_pointer *test_object;
test_pointer *device_test_object;
cudaMalloc( (void **)&(device_test_object->sample) , sizeof(double)*5 );
test_object = (test_pointer*) calloc( 1 , sizeof(double) );
test_object->sample[0] = 1.0;
test_object->sample[1] = 2.0;
test_object->sample[2] = 3.0;
test_object->sample[3] = 4.0;
test_object->sample[4] = 5.0;
cudaMemcpy( device_test_object , test_object , sizeof(double)*5, cudaMemcpyHostToDevice );
OR
cudaMemcpy( device_test_object->sample , test_object->sample , sizeof(double)*5, cudaMemcpyHostToDevice );
I am probably missing a few things here…any comments would be appreciated.
Thanks,
Sj
raghu
March 21, 2011, 6:32am
2
Hello,
I have searched the forums and having tried most solutions , I am forced to ask this rather trivial question… But how would you pass a structure as an argument to a kernel when the structure holds a pointer ?
Suppose I have a simple structure as
struct test_pointer{
double *sample;
};
I have tried
test_pointer *test_object;
test_pointer *device_test_object;
cudaMalloc( (void **)&(device_test_object->sample) , sizeof(double)*5 );
test_object = (test_pointer*) calloc( 1 , sizeof(double) );
test_object->sample[0] = 1.0;
test_object->sample[1] = 2.0;
test_object->sample[2] = 3.0;
test_object->sample[3] = 4.0;
test_object->sample[4] = 5.0;
cudaMemcpy( device_test_object , test_object , sizeof(double)*5, cudaMemcpyHostToDevice );
OR
cudaMemcpy( device_test_object->sample , test_object->sample , sizeof(double)*5, cudaMemcpyHostToDevice );
I am probably missing a few things here…any comments would be appreciated.
Thanks,
Sj
You have not done cudaMalloc for device_test_object structure. You can allocate memory for structure containing pointer as below:
test_pointer *test_object;
test_pointer *device_test_object;
STEP 1:
malloc for test_object
malloc for test_object->sample
fill up values for test_object->sample
STEP 2:
cudaMalloc for device_test_object
cudaMalloc for device_test_object->sample
cudaMemcpy from test_object->sample to device_test_object->sample
cudaMemcpy from test_object to device_test_object
Now you can use device_test_object in your kerner as you access in normal functions
So you actually want to copy this structure to device memory, not pass it as a function argument?
Your underlying problems have nothing to do with CUDA or passing structures and everything to do with improper usage of pointers. I would strongly recommend spending some time with some learning material on pointers in C. While you are doing that, contemplate this:
double * sample;
sample = (double *) calloc( 5, sizeof(double) );
sample[0] = 1.0;
sample[1] = 2.0;
sample[2] = 3.0;
sample[3] = 4.0;
sample[4] = 5.0;
double * device_sample;
cudaMalloc( (void **)&(device_sample) , sizeof(double) * size_t(5) );
cudaMemcpy( device_sample , sample , sizeof(double) * size_t(5), cudaMemcpyHostToDevice );
test_pointer * test_object;
test_object = (test_pointer *) calloc( 1 , sizeof(test_pointer) );
test_object->sample = device_sample;
test_pointer * device_test_object;
cudaMalloc( (void **)&(device_test_object) , sizeof(test_pointer) );
cudaMemcpy( device_test_object , test_object , sizeof(test_pointer), cudaMemcpyHostToDevice );
which is probably much closer to what it seems you are trying to achieve.
You can allocate memory for structure containing pointer as below:
test_pointer *test_object;
test_pointer *device_test_object;
STEP 1:
malloc for test_object
malloc for test_object->sample
fill up values for test_object->sample
STEP 2:
cudaMalloc for device_test_object
cudaMalloc for device_test_object->sample
cudaMemcpy from test_object->sample to device_test_object->sample
cudaMemcpy from test_object to device_test_object
Now you can use device_test_object in your kerner as you access in normal functions
The last cudaMemcpy call will copy host pointers into device memory. That will not work. In fact several stages in STEP 2 won’t work.
You have not done cudaMalloc for device_test_object structure. You can allocate memory for structure containing pointer as below:
test_pointer *test_object;
test_pointer *device_test_object;
STEP 1:
malloc for test_object
malloc for test_object->sample
fill up values for test_object->sample
STEP 2:
cudaMalloc for device_test_object
cudaMalloc for device_test_object->sample
cudaMemcpy from test_object->sample to device_test_object->sample
cudaMemcpy from test_object to device_test_object
Now you can use device_test_object in your kerner as you access in normal functions
Suppose I cudaMalloc it as
cudaMalloc( (void **)&(device_test_object) , MEMSIZE );
what would MEMSIZE be for this structure object ? Almost everything I tried leads to segmentation faults…
Thanks.
raghu
March 21, 2011, 9:07am
7
Here is the complete code to allocate memory for a structure containing pointer:
#include<stdio.h>
#define MAXLIMIT 64
struct test_pointer{
double *sample;
};
__global__ void myKernel(test_pointer *device_test_pointer)
{
device_test_pointer->sample[threadIdx.x] = 10.0f + threadIdx.x;
return;
}
int main()
{
test_pointer *host_test_object;
test_pointer *device_test_object;
double *host_sample, *device_sample;
host_sample = (double*) malloc(MAXLIMIT * sizeof(double));
for(int i = 0; i < MAXLIMIT; i++)
{
host_sample[i] = (double) i;
}
cudaMalloc( (void **)&(device_sample) , sizeof(double)*MAXLIMIT );
cudaMemcpy( device_sample, host_sample, sizeof(double)*MAXLIMIT, cudaMemcpyHostToDevice );
host_test_object = (test_pointer*) malloc( sizeof(test_pointer));
host_test_object->sample = device_sample;
cudaMalloc( (void **)&(device_test_object) , sizeof(test_pointer) );
cudaMemcpy( device_test_object , host_test_object, sizeof(test_pointer), cudaMemcpyHostToDevice );
myKernel<<<1,MAXLIMIT>>>(device_test_object);
cudaThreadSynchronize();
cudaMemcpy(host_sample, device_sample, sizeof(double) * MAXLIMIT, cudaMemcpyDeviceToHost);
for(int i = 0; i < MAXLIMIT; i++)
{
printf("%lf\n", host_sample[i]);
}
return 0;
}
Since you are using double, compile the code with nvcc option -arch=sm_13.
That works ! Thanks for all the replies and to Raghu for that code snippet , that is precisely what I needed…
If it is not too much trouble , may I ask what a good reference for these topics would be ?
Thanks