syoon
November 16, 2010, 11:28pm
1
i’ve been struggling lately on how to deal with assigning and retrieving 2d array CPU <-> GPU.
with lots of help from ‘cudesnick’ <== Thank You.
i now know manipulating multi-dimensional arrays using GPU…
with that, my next test is the following.
i’d like to ( sort of ) initialize 2d array using kernel.
assume that i know how to retrieve 2d array from device (please read through my other conversations, specially, with ‘cudesnick’)
i did the following and got “Cuda error: kernel invocation: invalid configuration arguments.”
can i not directly transfer **c to kernel?
many thanks in advance again.
=============================
global void MatAdd(int **c)
{
int i = threadIdx.x;
int j = threadIdx.y;
c[i][j] = 1000;
}
#define X 400
#define Y 300
main(){
…
int **c_d;
CUDA_SAFE_CALL( cudaMalloc((void **)&c_d, X*sizeof(int *)) );
dim3 dimBlock(X,Y);
MatAdd<<<1,dimBlock>>>(c_d);
// block until the device has completed
cudaThreadSynchronize();
// check if kernel execution generated an error
// Check for any CUDA errors
checkCUDAError("kernel invocation");
…
}
syoon
November 16, 2010, 11:28pm
2
i’ve been struggling lately on how to deal with assigning and retrieving 2d array CPU <-> GPU.
with lots of help from ‘cudesnick’ <== Thank You.
i now know manipulating multi-dimensional arrays using GPU…
with that, my next test is the following.
i’d like to ( sort of ) initialize 2d array using kernel.
assume that i know how to retrieve 2d array from device (please read through my other conversations, specially, with ‘cudesnick’)
i did the following and got “Cuda error: kernel invocation: invalid configuration arguments.”
can i not directly transfer **c to kernel?
many thanks in advance again.
=============================
global void MatAdd(int **c)
{
int i = threadIdx.x;
int j = threadIdx.y;
c[i][j] = 1000;
}
#define X 400
#define Y 300
main(){
…
int **c_d;
CUDA_SAFE_CALL( cudaMalloc((void **)&c_d, X*sizeof(int *)) );
dim3 dimBlock(X,Y);
MatAdd<<<1,dimBlock>>>(c_d);
// block until the device has completed
cudaThreadSynchronize();
// check if kernel execution generated an error
// Check for any CUDA errors
checkCUDAError("kernel invocation");
…
}
avidday
November 17, 2010, 4:32am
3
There is a limit of 512 threads per block on most hardware (1024 on compute 2.x). You are trying to launch 12000, which is illegal.
avidday
November 17, 2010, 4:32am
4
There is a limit of 512 threads per block on most hardware (1024 on compute 2.x). You are trying to launch 12000, which is illegal.
syoon
November 17, 2010, 6:57pm
5
hmmm. what was i thinking? thanks for your reply…
when i try small X and Y, i don’t see any problem.
then problems for the big numbers made me thinking that transferring **c in kernel was not admissible.
now i fixed it in the following way so that i can deal with large numbers…
define X 1500
define Y 2500
define Z 50
…
int threadsPerBlockx = 30; int threadsPerBlocky = 30;
int blocksPerGridx = (X + threadsPerBlockx - 1) / threadsPerBlockx;
int blocksPerGridy = (Y + threadsPerBlocky - 1) / threadsPerBlocky;
int total_threads = threadsPerBlockx * threadsPerBlocky;
int total_blocks = blocksPerGridx * blocksPerGridy;
dim3 blocks(blocksPerGridx,blocksPerGridy);
dim3 threads(threadsPerBlockx,threadsPerBlocky);
MatAdd<<<blocks, threads>>>(c_d);
…
so i’d like to make sure that the way i was dealing with multi-dimensional double pointers array is no problem… right?
syoon
November 17, 2010, 6:57pm
6
hmmm. what was i thinking? thanks for your reply…
when i try small X and Y, i don’t see any problem.
then problems for the big numbers made me thinking that transferring **c in kernel was not admissible.
now i fixed it in the following way so that i can deal with large numbers…
define X 1500
define Y 2500
define Z 50
…
int threadsPerBlockx = 30; int threadsPerBlocky = 30;
int blocksPerGridx = (X + threadsPerBlockx - 1) / threadsPerBlockx;
int blocksPerGridy = (Y + threadsPerBlocky - 1) / threadsPerBlocky;
int total_threads = threadsPerBlockx * threadsPerBlocky;
int total_blocks = blocksPerGridx * blocksPerGridy;
dim3 blocks(blocksPerGridx,blocksPerGridy);
dim3 threads(threadsPerBlockx,threadsPerBlocky);
MatAdd<<<blocks, threads>>>(c_d);
…
so i’d like to make sure that the way i was dealing with multi-dimensional double pointers array is no problem… right?