i reserved 1GB heap memory for the device, launched ~10K threads. Each thread malloced and freed 4 bytes only. While the program didn’t crash when run standalone, as long as I used initcheck the following error message popped up:
========= Error: process didn’t terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found
Any thoughts on this? Thanks!
PS: version info