how to deal with 2d array in kernel error ==> "invalid configuration arguments"

i’ve been struggling lately on how to deal with assigning and retrieving 2d array CPU <-> GPU.

with lots of help from ‘cudesnick’ <== Thank You.

i now know manipulating multi-dimensional arrays using GPU…

with that, my next test is the following.

i’d like to ( sort of ) initialize 2d array using kernel.

assume that i know how to retrieve 2d array from device (please read through my other conversations, specially, with ‘cudesnick’)

i did the following and got “Cuda error: kernel invocation: invalid configuration arguments.”

can i not directly transfer **c to kernel?

many thanks in advance again.

=============================

global void MatAdd(int **c)
{
int i = threadIdx.x;
int j = threadIdx.y;
c[i][j] = 1000;
}

#define X 400
#define Y 300

main(){


int **c_d;
CUDA_SAFE_CALL( cudaMalloc((void **)&c_d, X*sizeof(int *)) );

    dim3 dimBlock(X,Y);
    MatAdd<<<1,dimBlock>>>(c_d);
    
    // block until the device has completed
    cudaThreadSynchronize();

    // check if kernel execution generated an error
    // Check for any CUDA errors
    checkCUDAError("kernel invocation");

}

i’ve been struggling lately on how to deal with assigning and retrieving 2d array CPU <-> GPU.

with lots of help from ‘cudesnick’ <== Thank You.

i now know manipulating multi-dimensional arrays using GPU…

with that, my next test is the following.

i’d like to ( sort of ) initialize 2d array using kernel.

assume that i know how to retrieve 2d array from device (please read through my other conversations, specially, with ‘cudesnick’)

i did the following and got “Cuda error: kernel invocation: invalid configuration arguments.”

can i not directly transfer **c to kernel?

many thanks in advance again.

=============================

global void MatAdd(int **c)
{
int i = threadIdx.x;
int j = threadIdx.y;
c[i][j] = 1000;
}

#define X 400
#define Y 300

main(){


int **c_d;
CUDA_SAFE_CALL( cudaMalloc((void **)&c_d, X*sizeof(int *)) );

    dim3 dimBlock(X,Y);
    MatAdd<<<1,dimBlock>>>(c_d);
    
    // block until the device has completed
    cudaThreadSynchronize();

    // check if kernel execution generated an error
    // Check for any CUDA errors
    checkCUDAError("kernel invocation");

}

There is a limit of 512 threads per block on most hardware (1024 on compute 2.x). You are trying to launch 12000, which is illegal.

There is a limit of 512 threads per block on most hardware (1024 on compute 2.x). You are trying to launch 12000, which is illegal.

hmmm. what was i thinking? thanks for your reply…

when i try small X and Y, i don’t see any problem.

then problems for the big numbers made me thinking that transferring **c in kernel was not admissible.

now i fixed it in the following way so that i can deal with large numbers…

define X 1500

define Y 2500

define Z 50

int threadsPerBlockx = 30; int threadsPerBlocky = 30;

int blocksPerGridx = (X + threadsPerBlockx - 1) / threadsPerBlockx;

int blocksPerGridy = (Y + threadsPerBlocky - 1) / threadsPerBlocky;

int total_threads = threadsPerBlockx * threadsPerBlocky;

int total_blocks = blocksPerGridx * blocksPerGridy;

dim3 blocks(blocksPerGridx,blocksPerGridy);

dim3 threads(threadsPerBlockx,threadsPerBlocky);

MatAdd<<<blocks, threads>>>(c_d);

so i’d like to make sure that the way i was dealing with multi-dimensional double pointers array is no problem… right?

hmmm. what was i thinking? thanks for your reply…

when i try small X and Y, i don’t see any problem.

then problems for the big numbers made me thinking that transferring **c in kernel was not admissible.

now i fixed it in the following way so that i can deal with large numbers…

define X 1500

define Y 2500

define Z 50

int threadsPerBlockx = 30; int threadsPerBlocky = 30;

int blocksPerGridx = (X + threadsPerBlockx - 1) / threadsPerBlockx;

int blocksPerGridy = (Y + threadsPerBlocky - 1) / threadsPerBlocky;

int total_threads = threadsPerBlockx * threadsPerBlocky;

int total_blocks = blocksPerGridx * blocksPerGridy;

dim3 blocks(blocksPerGridx,blocksPerGridy);

dim3 threads(threadsPerBlockx,threadsPerBlocky);

MatAdd<<<blocks, threads>>>(c_d);

so i’d like to make sure that the way i was dealing with multi-dimensional double pointers array is no problem… right?