Struct in CUDA can i use this struct in CUDA

Hi everybody.

I have a problem, please help me.

I have this “struct”, and it works perfect in C++, but i don’t know that this struct can work in CUDA

struct Mystruct

{

  int n;

  double f;

public:

  Mystruct(const double& d) 

  {

	double dn;

	f = modf(d, &dn);

	n = static_cast<int>(dn);

  }

Mystruct(const int& n, const double& f)

	: n(n), f(f) {}

};

Thank you very much.

I was trying to use this struct but I can not.
so I think that CUDA not support for this kind.

We don’t officially support C++ in CUDA kernel code. Some stuff works (e.g. templates), but we don’t recommend using these features in production code.

Thank Simon Green :yes:

CUDA kernel code only support C language.

That struct should work fine. What do you mean, more specifically, by ‘it doesn’t work’?

Hi,

I face a similar problem but with C.

I have written codes in C + CUDA with passing arrays between host and device.

Now I want to use “struct” as the data structure but it is giving me a lot of errors.

Eg. if M is a struct and if I use commands like (just for instance):

CUDA_SAFE_CALL(cudaMalloc((void**)&M,structsize));

CUDA_SAFE_CALL(cudaMemset(M,1,structsize));

CUDA_SAFE_CALL(cudaMemcpy(Md,Mh, structsize, cudaMemcpyHostToDevice));

CUDA_SAFE_CALL(cudaFree(M));

etc.

it gives me errors in places where I try to access the elements of the struct on the device and where I declare M to be struct.

If M is a pointer to a struct, it gives me many more compilation errors…many of them hint towards as if CUDA is expecting some “c++ class” instead of a struct. Whatever I do, the problem is not solved and I am not very familiar with C++ too.

No logical “struct” handling in C works to resolve this. Can someone please post an example of struct being passed between the device and the host, allocation on host and device, etc.?

Thanks a lot.

Aditi

Does this help?

[codebox]#include <stdio.h>

struct SWhatever

{

float a;

unsigned b;

device float Get() {return a*b;}

};

global void test(SWhatever *data, float * result)

{

result[threadIdx.x]=data[threadIdx.x].Get();

}

int main()

{

const unsigned N=128;

SWhatever *host_data=new SWhatever[N];

for(unsigned i=0;i<N;++i)

{

host_data[i].a=i;

host_data[i].b=i*i;

}

SWhatever *device_data;

cudaMalloc((void**)&device_data,N*sizeof(SWhatever));

cudaMemcpy(device_data,host_data,N*sizeof(SWhatever),cudaMem

cpyHostToDevice);

float *device_result;

cudaMalloc((void**)&device_result,N*sizeof(float));

test<<<1,N>>>(device_data,device_result);

float* host_result=new float[N];

cudaMemcpy(host_result,device_result,N*sizeof(float),cudaMem

cpyDeviceToHost);

for(unsigned i=0;i<N;++i)

{

if(i*i*i!=host_result[i])

{

  printf("Error at index %d. Should be %f but is %f.\n",i,1.f*i*i*i,host_result[i]);

}

}

cudaFree(device_result);

cudaFree(device_data);

delete host_data;

delete host_result;

}[/codebox]

Hi,

Thanks for your reply. My code scheme is not very different from yours, still it gives me the error. The only difference is that I have a dynamic array of integers within the struct.

I am pasting my code here. It is a simple 4x4 matrix multiplication code (embarrassingly parallel) that I am using to experiment with using structs with CUDA. After the code, I have pasted the error msgs that I get when I compile the code.

Please note two things: in transferring the data between device and host (cudaMemcpy), if I use “matrixsize (i.e. the size of the array within the struct)”, there are no errors for lines 261, 266, 271 and 292. But if I use “sizeof(Matrix)” like in the version below, these lines give the following error: “error: incomplete type is not allowed”.

I am totally perplexed. Will be thankful if someone can look into it and point out the possible reasons of errors.


Hi again,

I just ran through a few of your compiler issues and none of them are CUDA related. Without sounding too much like a jerk, I would recommend that you work on your C/C++ skills a bit more before going on, especially w.r.t. pointer usage. If concepts like that are giving you trouble, then working with CUDA is not going to be a pleasant experience. Grab a book, make sure you understand the language, and you’re going to be fine.

This code snippet is great! Thank you so much for sharing.

I just have one question, are you allowed to use pointers in the struct?

Would something like this be legal:

struct Vertices {

int *a;  //set to point to arrays in main

int *b;  

__device__ int Geta(int i) {return a[i];}

__device__ int Getb(int i) {return b[i];}

};

__global__ void test(Vertices *data, int * result)

{

int idx = threadIdx.x;

result[threadIdx.x]= data->Geta(idx) + data->Getb(idx);

}

I can call Geta/b with no problem from the kernel, except when a and b are pointers, Geta and Getb just return 0;

Is the kernel not able to work with pointers?

Are you sure you’re initializing your Vertex objects using DEVICE pointers?
It’s a really common mistake to accidentally use host pointers when doing manual structure initialization on the host.

You are right, and I was making that mistake.

How I got around it was to copy the arrays a and b to device pointers d_a and d_b, with the usual CudaMemcpy, and then calling a kernel function with Vertices *data and d_a d_b, and setting data->a = d_a and data->b = d_b.

This works fine now, but seems a bit ad-hoc. Is this the correct way of doing it, or there are better ways?

Thank you!

Generally, the approach is to avoid more than one level of pointer indirection if at all possible. This usually has the side-effect of forcing you to structure your data in a flatter way that allows for more memory coalescing.

I have also used struct in CUDA but it worked fine.

what i have done was i create a structure in host and also alocating memory in device and copying structure from host to device and using it as parameter at the time of kernel launch.

One more thing you have used double as data member in structure which is not supported by all devices so use float at the place of double.

This might sound obvious, but have you checked that you don’t have circular references and are trying to calculate sizeof(Matrix) before the proper header is actually #included ?

Is that because templates are not optimized as well?