I am reading sample code of CUDA kernels that deal with arrays, and they usually are like this
#define N 50
__global__ void doThis(int *array){
int tid= blockIdx.x;
if (tid<N)
//do something with a[tid]
}
which is ok when you know that your array has N
elements.
But how can I do when the number of elements is not known a priori, but just when you run the code. How the kernel know the number of elements.
My solution would be to pass it as an argument but that implies:
- calculate the number of elements (in a host variable)
- copy this value to a device variable
- pass this variable as an argument to the kernel
Is this the only way?