I can access to only the first 8 elements of the array cannot acces to every element of the array


I am working on NVIDIA Tesla. I have a 1D array and I would like to assign every element to a thread, thus have number of threads = array size. Whatever the thread/block/grid structure I use, I can only access the first 8 elements, never the rest.

I wrote several CUDA programs with similar/different data structures on other platforms, never had something similar. What is the point I am missing?

Some code might be useful. “Hello my program doesn’t work, how do I fix it?” is not an easy question to answer without at least a modicum of detail…

This is really basic, I just try accessing the data. I even tried to put some stupid thread structure, the result is always the same.

int hostX = (int)malloc(sizeof(int) * N);
for (i = 0; i < N; i++){
hostX[i] = i;

int deviceX = NULL;
*) &deviceX, N));

CUDA_SAFE_CALL(cudaMemcpy(deviceX, hostX, N, cudaMemcpyHostToDevice));

//dim3 block(N/8, 1, 1); // whatever I put in
//dim3 threads(N, 1, 1); // whatever I put in
dim3 threads(N, 1, 1); // whatever I put in, let’s leave it this time

access<<<1, threads>>> (deviceX);

global void access(int *x){
printf(“I am reading x[%d] = %d\n”, threadIdx.x, x[threadIdx.x]); // I change array/thread index according to the block structure

how about N * sizeof(int)

you got that right in the allocation - so why not in the memcpy too?

Same with the cudaMalloc call, you only allocate N bytes instead of sizeof(int)*N bytes.

Sooo simple! Thanks :)