I am trying to get the following code segment to create an array of cuda arrays on the host. In device emulation mode the code works as expected. When I tried this aproach on the card, first it gave me a warning message that it could not determine object type assuming global variable. When I run the file all of the resulting matrix calculations are 0. Does anyone know what could cause this problem and how to fix it?
…
float ** d_input = malloc(MatrixLengtrhsizeof(float));
int i;
Are you passing d_input to the kernel as an array of pointers? Since d_input is a pointer itself, it is in host memory. You need a device memory d_d_input (probably pick a better name) and copy the pointers in d_input to it.
About the warning messages, there doesn’t seem to be a way to suppress them or to tell the compiler that the pointer does point to global memory.
I have tried to create then copy the pointer array to a device value and have not been able to get it to work. Would you be willing to tell me the sintax to make the pointer array on the device?
No error checking, and I probably made typos, but here you go.
float **h_array_list, **d_array_list;
// allocate array lists
h_array_list = (float*)malloc(num_arrays * sizeof(float *));
cudaMalloc((void**)&d_array_list, num_arrays * sizeof(float *));
// allocate arrays on the device
for (int i = 0; i < num_arrays; i++)
cudaMalloc((void**)&h_array_list[i], data_size);
// copy array list to the device
cudaMemcpy(d_array_list, h_array_list, num_arrays * sizeof(float*), cudaMemcpyHostToDevice);
// allocate array list on the host
float **array_list;
array_list = (float**)malloc(num_arrays * sizeof(float*));
// allocate arrays on the host
for (int i = 0; i < num_arrays; i++)
array_list[i] = malloc(data_size);
// ****fill out data here
// populate data arrays on the device
for (int i = 0; i < num_arrays; i++)
cudaMemcpy(h_array_list[i], array_list[i], data_size, cudaMemcpyDeviceToHost);
As you can see, it is quite a bit of a pain to do this. If at all possible, use 2D structures in memory allocated by cudaMallocPitch. I do use the nasty ** construction at one point in my code where a kernel sums up a variable number of arrays, but I only do so because the memory for each array is allocated separately by different classes. Thinking about it now, I should have written a little memory manager class that would dole out rows of a 2D memory area so that I wouldn’t have to do the ugly ** mess.
Thanks it is finally working. I was also wondering if you might be able to help me with the 2d mem to eliminate all of the pointers. When I tried to switch the code to 2d arrays, the resulting output was incorrect. Is there a good example of using 2d arrays that I could look at to see where I am going wrong?
Just allocate the multiple arrays using cudaMallocPitch. Then you can access element i of array j with “array_data[j*width + i]”, where width is the pitch returned from cudaMallocPitch divided by the size of the element of array_data (because pitch is returned in bytes).