Hi all!
I have some troubles with 2-dimensional thread blocks. In fact it seems that I can’t use the threadIdx.y coordinate. The following example shows my issue.
bi_thread_block.cu:
#define SIZE 10
#include <stdio.h>
// Kernel definition
global void add(int* device)
{
int i =5*threadIdx.y + threadIdx.x;
device[i] = i;
}
int main()
{
int A={0};
int *devPtrA;
int memsize= SIZE * sizeof(int);
cudaMalloc((void**)&devPtrA, memsize);
cudaMemcpy(devPtrA, A, memsize, cudaMemcpyHostToDevice);
for (int i=0; i<SIZE; i++)
printf("A[%d]=%d\n",i,A[i]);
printf("\n");
add<<<2, 5>>>(devPtrA);
cudaMemcpy(A, devPtrA, memsize, cudaMemcpyDeviceToHost);
for (int i=0; i<SIZE; i++)
printf("A[%d]=%d\n",i,A[i]);
cudaFree(devPtrA);
return 0;
}
Command I use to compile:
nvcc -o bi_thread_block bi_thread_block.cu
./bi_thread_block output:
A[0]=0
A[1]=0
A[2]=0
A[3]=0
A[4]=0
A[5]=0
A[6]=0
A[7]=0
A[8]=0
A[9]=0
A[0]=0
A[1]=1
A[2]=2
A[3]=3
A[4]=4
A[5]=0
A[6]=0
A[7]=0
A[8]=0
A[9]=0
The first 5 elements are modified by “add”, while the other 5 are not. I’ve also tried to use only the y coordinate calling add<<<10, 1>>>(devPtrA) and changing “add” to
global void add(int* device)
{
int i =threadIdx.y;
device[i] = i;
}
but it doesn’t work either. Does anyone have any idea?
Thanks a lot!
Giacomo