My problem is, that I have have a kernel with 64 blocks, and each block of threads needs to know some index value (each thread in n-th block has to know n-th index). Can I pass the array of indexes by a kernel function parameter? These indexes will be in shared or local memory?
You can only pass pointers to global memory as kernel arguments. If these indices are constant over the life of one kernel, then the best approach is probably to store them in constant memory. For 64 integers, constant memory (which has cache and a broadcast mechanism) should be the fastest way for every thread in a given block to get the value they need.
Yes, constant memory is probably best suited for your problem. For example you can try the following:
At file scope:
__constant__ int lookupArray;
In your host function:
int hostArray; setTheHostArray(...); cudaMemcpyToSymbol(lookupArray,(void*)hostArray,64*sizeof(int),0); callKernel<<<X,Y>>>();
and in the kernel name “lookupArray” will be visible to you as if it was a global read-only constant.
Search for cudaMemcpyToSymbol in the Reference Manual if you want details on what cudaMemcpyToSymbol does.
I tried passing an array by value to a CUDA kernel and it crashed. Apparently the compiler tried to be smart and wanted to pass this as a pointer reference - which of course did not work with a host memory pointer. Duh ;)
If you really need to do that, pack your array into a struct. In C, C++ you cannot pass an array explicitly (you always pass a pointer) and in case of CUDA you would pass a host pointer to a device kernel - hence the crash.