I tried a small program that does negative indexing of the array
__global__ void exec (int *arr_ptr, int size, int *result) {
int tx = threadIdx.x;
int ty = threadIdx.y;
*result = arr_ptr[-2];
}
void run(int *arr_dev, int size, int *result) {
cudaStream_t stream = 0;
int *arr_ptr = arr_dev + 5;
dim3 threads(1,1,1);
dim3 grid (1,1);
exec<<<grid, threads, 0, stream>>>(arr_ptr, size, result);
}
When I tested this code on older Tesla architecture (Tesla T10 processor), the code ran without any errors. But when I tried it on Fermi architecture (Tesla M2090), the code generated a segmentation fault.
On debugging it (cuda-gdb), the code generated
at
On memchecking it (cuda-memcheck), it generated
Can anyone know the reason and solution for this problem???