Hello, it looks like shared memory bounds check is off by 1024.
Platform: Windows 10
Toolkit: 12.2 Update 2
CC: 8.6
Build: nvcc -arch sm_86 -G -g -o task1.exe task1.cu
Test: compute-sanitizer task1.exe
Code:
__global__ void k(int *s)
{
__shared__ int a[1];
s[0] = a[1]; // does not detect this
s[0] = a[256]; // does not detect this
// s[0] = a[257]; // detects only this
}
int main() {
int *devMem = nullptr;
cudaMalloc((void**)&devMem, 32);
k<<<1,1>>>(devMem);
return 0;
}
It looks suspiciously like the 1KB of reserved shared memory on CC 8.0+.