Problems with creating an array of Cuda pointers

captainp · February 28, 2008, 5:45pm

I am trying to get the following code segment to create an array of cuda arrays on the host. In device emulation mode the code works as expected. When I tried this aproach on the card, first it gave me a warning message that it could not determine object type assuming global variable. When I run the file all of the resulting matrix calculations are 0. Does anyone know what could cause this problem and how to fix it?

…
float ** d_input = malloc(MatrixLengtrhsizeof(float));
int i;

for(i=0;i<NumberM;i++) {
CUDA_SAFE_CALL(cudaMalloc((void**) &d_input, MatrixLengthMatrixWidthsizeof(float));
}

for(i=0;i<NumberM;i++) {
CUDA_SAFE_CALL(cudaMemcpy(d_input[i], h_input[i], MatrixLengthMatrixWidthsizeof(float));,
cudaMemcpyHostToDevice) );
}

…

MisterAnderson42 · February 28, 2008, 6:31pm

Are you passing d_input to the kernel as an array of pointers? Since d_input is a pointer itself, it is in host memory. You need a device memory d_d_input (probably pick a better name) and copy the pointers in d_input to it.

About the warning messages, there doesn’t seem to be a way to suppress them or to tell the compiler that the pointer does point to global memory.

captainp · March 3, 2008, 8:01pm

I have tried to create then copy the pointer array to a device value and have not been able to get it to work. Would you be willing to tell me the sintax to make the pointer array on the device?

MisterAnderson42 · March 3, 2008, 9:17pm

No error checking, and I probably made typos, but here you go.

float **h_array_list, **d_array_list;

// allocate array lists

h_array_list = (float*)malloc(num_arrays * sizeof(float *));

cudaMalloc((void**)&d_array_list, num_arrays * sizeof(float *));

// allocate arrays on the device

for (int i = 0; i < num_arrays; i++)

   cudaMalloc((void**)&h_array_list[i], data_size);

// copy array list to the device

cudaMemcpy(d_array_list, h_array_list, num_arrays * sizeof(float*), cudaMemcpyHostToDevice);

// allocate array list on the host

float **array_list;

array_list = (float**)malloc(num_arrays * sizeof(float*));

// allocate arrays on the host

for (int i = 0; i < num_arrays; i++)

   array_list[i] = malloc(data_size);

// ****fill out data here

// populate data arrays on the device

for (int i = 0; i < num_arrays; i++)

   cudaMemcpy(h_array_list[i], array_list[i], data_size, cudaMemcpyDeviceToHost);

As you can see, it is quite a bit of a pain to do this. If at all possible, use 2D structures in memory allocated by cudaMallocPitch. I do use the nasty ** construction at one point in my code where a kernel sums up a variable number of arrays, but I only do so because the memory for each array is allocated separately by different classes. Thinking about it now, I should have written a little memory manager class that would dole out rows of a 2D memory area so that I wouldn’t have to do the ugly ** mess.

captainp · March 6, 2008, 6:42pm

Thanks it is finally working. I was also wondering if you might be able to help me with the 2d mem to eliminate all of the pointers. When I tried to switch the code to 2d arrays, the resulting output was incorrect. Is there a good example of using 2d arrays that I could look at to see where I am going wrong?

MisterAnderson42 · March 6, 2008, 7:46pm

Just allocate the multiple arrays using cudaMallocPitch. Then you can access element i of array j with “array_data[j*width + i]”, where width is the pitch returned from cudaMallocPitch divided by the size of the element of array_data (because pitch is returned in bytes).

edisongustavo · May 21, 2008, 7:00pm

Hi, I have the same problem. I tried modifying your code MisterAnderson to work with a kernel, but I can’t make it work.

When I read back the result of the kernel I’m getting random values, which means:

1. I'm not writing the values correctly inside the kernel

2. I'm not getting the results from the kernel correctly

The modified code is compiling, and I have 10 arrays with 2 elements each (each element is a float).

The code is this:

#include <stdio.h>

__global__ void kernel(float ** input, int num_arrays, int num_sub_arrays) {

	const int index = blockIdx.x * blockDim.x + threadIdx.x;

	

	if (index < num_arrays) {

  for (int i = 0; i < num_sub_arrays; i++) {

  	input[index][i] = 10 + i;

  }

	}

}

int main() {

	int num_arrays = 10;

	int num_sub_arrays = 2;

	int data_size = num_sub_arrays * sizeof(float);

	float **h_array_list, **d_array_list;

	// allocate array lists

	h_array_list = (float**)malloc(num_arrays * sizeof(float *));

	cudaMalloc((void**)&d_array_list, num_arrays * sizeof(float *));

	// allocate arrays on the device

	for (int i = 0; i < num_arrays; i++)

   cudaMalloc((void**)&h_array_list[i], data_size);

	// copy array list to the device

	cudaMemcpy(d_array_list, h_array_list, num_arrays * sizeof(float*), cudaMemcpyHostToDevice);

	// allocate array list on the host

	float **array_list;

	array_list = (float**)malloc(num_arrays * sizeof(float*));

	// allocate arrays on the host

	for (int i = 0; i < num_arrays; i++)

   array_list[i] = (float*)malloc(data_size);

	// ****fill out data here

	dim3 grid(num_arrays,num_sub_arrays);

	dim3 block(8,8);

	kernel<<<grid, block>>>(d_array_list, num_arrays, num_sub_arrays);

	// populate data arrays on the device

	for (int i = 0; i < num_arrays; i++)

   cudaMemcpy(h_array_list[i], array_list[i], data_size, cudaMemcpyDeviceToHost);

	for (int i = 0; i < num_arrays; i++) {

  float * listSubArrays = array_list[i];

  printf("i = %i\n", i);

  for (int j = 0; j < num_sub_arrays; j++) {

  	float theFloat = listSubArrays[j];

  	printf("\t myFloat = %f\n", theFloat);

  }

  

	}

}

schildirulez · April 20, 2009, 2:27pm

Hi, I have the same problem. I tried modifying your code MisterAnderson to work with a kernel, but I can’t make it work.

When I read back the result of the kernel I’m getting random values, which means:
1. I'm not writing the values correctly inside the kernel

2. I'm not getting the results from the kernel correctly
The modified code is compiling, and I have 10 arrays with 2 elements each (each element is a float).

I have the same problem. Does someone know a solution for this?

Or is there a better way to do it? I need up to 20 pointers (can be hard coded) that points to a variable length of data fields.

Thanks.

Topic		Replies	Views
Help with cuda 2d array CUDA Programming and Performance	6	7446	September 29, 2014
How do I pass a double pointers array to the device? I'm getting cudaErrorIllegalAddress CUDA Programming and Performance	12	3467	January 17, 2024
Pointers array CUDA Programming and Performance	7	5562	July 28, 2009
Using 2D array in CUDA CUDA Programming and Performance	7	7249	July 21, 2015
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26435	March 17, 2011
help cudaMemcpy2d Trying to modify a 2d array on cuda device CUDA Programming and Performance	8	4975	September 11, 2010
How to cudaMalloc two-dimensional array ? CUDA Programming and Performance	46	65903	September 7, 2023
Global arrays? CUDA Programming and Performance	24	10610	August 18, 2010
CudaMallocPitch and CudaMemcpy2D CUDA Programming and Performance	7	5472	August 3, 2015
Static 2D array problem CUDA Programming and Performance	5	1734	October 8, 2009

Problems with creating an array of Cuda pointers

Related topics