Obvious error cannot be detected through cudaGetLastError()

Hi,

Usually it’s “Please find the problem”. In my case I know the problem but I don’t know how to build an error-check which shows me, that there is a problem. Consider the following code:

#include <cstdlib>
#include <cstdio>

#include <cuda_runtime.h>
#include <device_launch_parameters.h>

inline void checkCuda(cudaError_t result)
{
  printf("%s: %s\n", cudaGetErrorName(result), cudaGetErrorString(result));
}

__global__
void kernel(int *y)
{
// access to element 1024
  y[1023] = 1;
}

int main(void)
{
  int *d;
  // only 1 element alloced
  checkCuda( cudaMalloc((void**)&d, 1*sizeof(int)) );
  kernel <<<1,1>>> (d);
  checkCuda( cudaDeviceSynchronize() );
  checkCuda( cudaGetLastError() );
  checkCuda( cudaFree(d) );
  return EXIT_SUCCESS;
}

It is obvious, that the kernel tries to access an element, which is not malloced before. So the expected result is something like “bad memory access” or something like this. But on my system (CUDA 11.4, RTX2070 with 471.11, VS2019) I get cudaSuccess four times.
Did I miss something obvious?

THank you in advance.

Runtime detection of out-of-bounds accesses is not accurate to any specific boundary. If you access far enough out-of-bounds, you will trigger a runtime error (for example, change your 1023 to a very large number), but there are no guarantees that any out-of-bounds access will always trigger a runtime error. As far as I know the same principle is true for host code.

Just as a simple example, when I change the 1023 to 10230000, I get a runtime out-of-bounds error detected.

To be assured to catch such errors, run your code with compute-sanitizer.

1 Like

there are no guarantees that any out-of-bounds access will always trigger a runtime error. As far as I know the same principle is true for host code.

Not every out-of-bounds access represents an access violation with regard to page-based memory protection schemes. Tools like valgrind can detect pretty much every out-of-bounds access (at considerable cost in runtime).

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.