Array addressing problem Error occur when trying to access an array of 2d arrays elements

rudolph.gatt · March 22, 2010, 9:08pm

I need to hold reference of multiple 2d arrays. I am storing a single 2d array as follows: float* Filtered, and access it in this manner: Filtered[(y*WIDTH)+x]. Now I want to pass multiple 2D arrays to a cuda kernel. I was trying the following:

Allocation on device:
float d_allFiltered[2][15];
for (int i = 0; i < 2; i++)
for (int j = 0; j < 15; j++)
cudaMalloc((void*)&d_allFiltered[i][j], energySizex * energySizey * sizeof(float));

Calling kernel (passing 15 elements):
ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_allFiltered[image], d_SUM1mapAll, sizex, sizey);

Kernel:
global void ComputeSum(float* Filtered, float* SUM1mapAll, unsigned int sizex, unsigned int sizey)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

if ((x < sizex) && (y < sizey))
{
	SUM1mapAll[(y * sizex) + x] = 0;
	for (int i = 0; i < 15; i++)
		SUM1mapAll[(y * sizex) + x] += Filtered[i][(y * sizex) + x];
}

}

When a thread tries to access “Filtered[i][(y * sizex) + x]”, an error is triggered. Can someone either tell me what am I doing wrong, or propose me another better method to keep reference of multiple 2D arrays.
Thanks

LSChien · March 23, 2010, 12:54am

try

float *d_allFiltered[2][15];

for (int i = 0; i < 2; i++)

	for (int j = 0; j < 15; j++)

		cudaMalloc((void**)&d_allFiltered[i][j], energySizex * energySizey * sizeof(float));

float **d_pointerArray;

cudaMalloc((void**)&d_pointerArray, 2*15*sizeof(float*) );

cudaMemcpy( d_pointerArray, d_allFiltered, 2*15*sizeof(float*), cudaMemcpyHostToDevice);

ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_pointerArray[0] + image*15, d_SUM1mapAll, sizex, sizey);

rudolph.gatt · March 23, 2010, 9:33am

[quote name=‘LSChien’ post=‘1025104’ date=‘Mar 23 2010, 01:54 AM’]

try

[codebox] ComputeSum<>>(d_allFiltered[i][j], d_SUM1mapAll, sizex, sizey);[/codebox]

But I get an error when I try to pass a number of arrays:

[codebox] ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_allFiltered[i], d_SUM1mapAll, sizex, sizey);[/codebox]

LSChien · March 23, 2010, 4:47pm

d_allFiltered[i][j] in

ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_allFiltered[i][j], d_SUM1mapAll, sizex, sizey);

contains an address of device memory, say a pointer,

however d_allFiltered[i][j] is a variable in host memory, its value is copied into shared memory.

So it works.

However d_allFiltered[i] in

ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_allFiltered[i], d_SUM1mapAll, sizex, sizey);

is a pointer array in host memory, say

d_allFiltered[i][0], d_allFiltered[i][1], … d_allFiltered[i][15]

is a contiguous array, each element is an address of device memory,

But when value of d_allFiltered[i] is copied into shared memory, then

you have only value of d_allFiltered[i][0], the value of d_allFiltered[i][1] in kernel is not a pointer,

but value of (d_allFiltered[i][0] + 1)

rudolph.gatt · March 24, 2010, 3:13pm

d_allFiltered[i][j] in
ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_allFiltered[i][j], d_SUM1mapAll, sizex, sizey);
contains an address of device memory, say a pointer,

however d_allFiltered[i][j] is a variable in host memory, its value is copied into shared memory.

So it works.

However d_allFiltered[i] in
ComputeSum<<< dimGrid, dimBlock, 0 >>>(d_allFiltered[i], d_SUM1mapAll, sizex, sizey);
is a pointer array in host memory, say

d_allFiltered[i][0], d_allFiltered[i][1], … d_allFiltered[i][15]

is a contiguous array, each element is an address of device memory,

But when value of d_allFiltered[i] is copied into shared memory, then

you have only value of d_allFiltered[i][0], the value of d_allFiltered[i][1] in kernel is not a pointer,

but value of (d_allFiltered[i][0] + 1)

Thanks, managed to allocate a single big array instead of using pointer-to-pointers.