Passing a structure with a pointer How do you pass a structure with a pointer in it to a kernel

Hello,

I have searched the forums and having tried most solutions , I am forced to ask this rather trivial question… But how would you pass a structure as an argument to a kernel when the structure holds a pointer ?

Suppose I have a simple structure as

struct test_pointer{

  double *sample;

};

I have tried

test_pointer *test_object;

  test_pointer *device_test_object;

cudaMalloc( (void **)&(device_test_object->sample) , sizeof(double)*5 );

  test_object = (test_pointer*) calloc( 1 , sizeof(double) );

test_object->sample[0] = 1.0;  

  test_object->sample[1] = 2.0;  

  test_object->sample[2] = 3.0;

  test_object->sample[3] = 4.0;  

  test_object->sample[4] = 5.0;       

cudaMemcpy( device_test_object , test_object , sizeof(double)*5, cudaMemcpyHostToDevice ); 

                                                           OR                        

  cudaMemcpy( device_test_object->sample , test_object->sample , sizeof(double)*5, cudaMemcpyHostToDevice );

I am probably missing a few things here…any comments would be appreciated.

Thanks,

Sj

You have not done cudaMalloc for device_test_object structure. You can allocate memory for structure containing pointer as below:

test_pointer *test_object;

test_pointer *device_test_object;

STEP 1:

malloc for test_object

malloc for test_object->sample

fill up values for test_object->sample

STEP 2:

cudaMalloc for device_test_object

cudaMalloc for device_test_object->sample

cudaMemcpy from test_object->sample to device_test_object->sample

cudaMemcpy from test_object to device_test_object

Now you can use device_test_object in your kerner as you access in normal functions

So you actually want to copy this structure to device memory, not pass it as a function argument?

Your underlying problems have nothing to do with CUDA or passing structures and everything to do with improper usage of pointers. I would strongly recommend spending some time with some learning material on pointers in C. While you are doing that, contemplate this:

double * sample;

sample = (double *) calloc( 5, sizeof(double) );

sample[0] = 1.0;  

sample[1] = 2.0;  

sample[2] = 3.0;

sample[3] = 4.0;  

sample[4] = 5.0;      

double * device_sample;

cudaMalloc( (void **)&(device_sample) , sizeof(double) * size_t(5) );

cudaMemcpy( device_sample , sample , sizeof(double) * size_t(5), cudaMemcpyHostToDevice );

test_pointer * test_object;

test_object = (test_pointer *) calloc( 1 , sizeof(test_pointer) );

test_object->sample = device_sample;

test_pointer * device_test_object;

cudaMalloc( (void **)&(device_test_object) , sizeof(test_pointer) );

cudaMemcpy( device_test_object , test_object , sizeof(test_pointer), cudaMemcpyHostToDevice );

which is probably much closer to what it seems you are trying to achieve.

The last cudaMemcpy call will copy host pointers into device memory. That will not work. In fact several stages in STEP 2 won’t work.

Suppose I cudaMalloc it as

cudaMalloc( (void **)&(device_test_object) , MEMSIZE );

what would MEMSIZE be for this structure object ? Almost everything I tried leads to segmentation faults…

Thanks.

sizeof(test_pointer)

Here is the complete code to allocate memory for a structure containing pointer:

#include<stdio.h>

#define MAXLIMIT 64

struct test_pointer{

    double *sample;

};

__global__ void myKernel(test_pointer *device_test_pointer)

{

    device_test_pointer->sample[threadIdx.x] = 10.0f + threadIdx.x;

    return;

}

int main()

{

    test_pointer *host_test_object;

    test_pointer *device_test_object;

double *host_sample, *device_sample;

host_sample = (double*) malloc(MAXLIMIT * sizeof(double));

    for(int i = 0; i < MAXLIMIT; i++)

    {

        host_sample[i] = (double) i;

    }

cudaMalloc( (void **)&(device_sample) , sizeof(double)*MAXLIMIT );

    cudaMemcpy( device_sample, host_sample, sizeof(double)*MAXLIMIT, cudaMemcpyHostToDevice );

host_test_object = (test_pointer*) malloc( sizeof(test_pointer));

    host_test_object->sample = device_sample;

cudaMalloc( (void **)&(device_test_object) , sizeof(test_pointer) );

    cudaMemcpy( device_test_object , host_test_object, sizeof(test_pointer), cudaMemcpyHostToDevice );

myKernel<<<1,MAXLIMIT>>>(device_test_object);

    cudaThreadSynchronize();

cudaMemcpy(host_sample, device_sample, sizeof(double) * MAXLIMIT, cudaMemcpyDeviceToHost);

for(int i = 0; i < MAXLIMIT; i++)

    {

        printf("%lf\n", host_sample[i]);

    }

return 0;

}

Since you are using double, compile the code with nvcc option -arch=sm_13.

That works ! Thanks for all the replies and to Raghu for that code snippet , that is precisely what I needed…

If it is not too much trouble , may I ask what a good reference for these topics would be ?

Thanks