Hi,
Usually it’s “Please find the problem”. In my case I know the problem but I don’t know how to build an error-check which shows me, that there is a problem. Consider the following code:
#include <cstdlib>
#include <cstdio>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
inline void checkCuda(cudaError_t result)
{
printf("%s: %s\n", cudaGetErrorName(result), cudaGetErrorString(result));
}
__global__
void kernel(int *y)
{
// access to element 1024
y[1023] = 1;
}
int main(void)
{
int *d;
// only 1 element alloced
checkCuda( cudaMalloc((void**)&d, 1*sizeof(int)) );
kernel <<<1,1>>> (d);
checkCuda( cudaDeviceSynchronize() );
checkCuda( cudaGetLastError() );
checkCuda( cudaFree(d) );
return EXIT_SUCCESS;
}
It is obvious, that the kernel tries to access an element, which is not malloced before. So the expected result is something like “bad memory access” or something like this. But on my system (CUDA 11.4, RTX2070 with 471.11, VS2019) I get cudaSuccess four times.
Did I miss something obvious?
THank you in advance.