I am working on NVIDIA Tesla. I have a 1D array and I would like to assign every element to a thread, thus have number of threads = array size. Whatever the thread/block/grid structure I use, I can only access the first 8 elements, never the rest.
I wrote several CUDA programs with similar/different data structures on other platforms, never had something similar. What is the point I am missing?
Thanks in advance,