kernel unspecified launch failure , out of range load


I have a problem that I can not figure out. When I’m running the following kernel with nsight visual studio I get an out of range load exception when trying to access im_mem[index].


__global__ void kernelMemoryKrasch( const unsigned int *grid,
				    unsigned char *im_mem,
				    uint4 *response,
				    int max_size)
  int tid = blockDim.x*blockIdx.x + threadIdx.x;
  unsigned int output;
  if(tid < max_size)
  unsigned int index = grid[tid];
  output = 0;
  for(int n = 0; n < 16; n++)
    output = im_mem[index]; 
    output += im_mem[index];
    output += im_mem[index]; 
    output += im_mem[index]; 

  response[tid].x = output;
  response[tid].y = output;
  response[tid].z = output;
  response[tid].w = output;

In the test kernel all the global data are set to be zero so I’m actually trying to read from im_mem[0] at all time.

I’m having a Quadro FX 1800M compute cap 1.3 ( I have been running the same test program on GFX680 with the same result)

gridDim = 8025
blockDim = 256

ptxas : info : 144 bytes gmem, 2176 bytes cmem[0], 16 bytes cmem[14]
ptxas : info : Compiling entry function '_Z18kernelMemoryKraschPKjPhP5uint4i' for 'sm_13'
ptxas : info : Used 5 registers, 32 bytes smem, 8 bytes cmem[1]

If I lower the total number of threads to around 40 % of the original it seems find which make think I’m running out of stack space however I have compiled it with sm_30 and it shows 0 stack frame…

Running CUDA 5.5 Driver version 331.82

I hope any one have some explanation for me =)