Hi,
I’ve encountered a puzzling issue with unified memory on certain GPUs and was hoping to get some help. There’s a reproducible attached, but essentially the test performs various allocations using cudaMallocManaged
and writes to some of these buffers on the CPU and GPU.
The test app will eventually encounter the bellow error when calling cudaDeviceSynchronize()
immediately after calling test_kernel
CUDA Runtime Error: an illegal memory access was encountered at test.cpp:69
I’ve run the test app with cuda-gdb
and get the below:
CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x8fd6a8 (test.cpp:58)
Thread 1 "test" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 527, block (26703,0,0), thread (0,0,0), device 0, sm 0, warp 8, lane 0]
The pattern and size of the allocations seems to be important in order to reproduce the issue.
I’ve only seen this on Quadro M6000 24gb GPUs. I can reproduce the issue on several machines, so I don’t believe this is due to a faulty card.
I’ve also tried upgrading to the latest cuda drivers: 530.30.02
Are there any known issues with UVM on Maxwell cards that would explain this?
Environment:
- Driver Versions:
- 510.47.04
- 530.30.02
- CentOS Linux release 7.9.2009
- GPUs (on different machines)
- Quadro M6000 24GB
- CUDA Toolkits:
- 11.2.2
- 11.6.2
- 11.8
- GCC version: 9.3.1 20200408 (Red Hat 9.3.1-2)
Thanks,
cuda_uvm_issue.zip (2.8 KB)