Can't I do negative indexing of array on Fermi (Tesla M2090)

I tried a small program that does negative indexing of the array

__global__ void exec (int *arr_ptr, int size, int *result) {

int tx = threadIdx.x;

    int ty = threadIdx.y;

*result = arr_ptr[-2];

}

void run(int *arr_dev, int size, int *result) {

cudaStream_t stream = 0;

    int *arr_ptr = arr_dev + 5;

dim3 threads(1,1,1);

    dim3 grid (1,1);

exec<<<grid, threads, 0, stream>>>(arr_ptr, size, result);

}

When I tested this code on older Tesla architecture (Tesla T10 processor), the code ran without any errors. But when I tried it on Fermi architecture (Tesla M2090), the code generated a segmentation fault.

On debugging it (cuda-gdb), the code generated

at

On memchecking it (cuda-memcheck), it generated

Can anyone know the reason and solution for this problem???

It’s awesome that you could dig deeper with cuda-gdb and cuda-memcheck. I could not reproduce this on a Quadro Fermi card.
Can you post the full program, i.e., also the host part of your code? It matters because how much device memory you’re allocating could be making a difference here. Thanks.