initcheck and malloc/free don't get along

dear experts,
i reserved 1GB heap memory for the device, launched ~10K threads. Each thread malloced and freed 4 bytes only. While the program didn’t crash when run standalone, as long as I used initcheck the following error message popped up:

========= Error: process didn’t terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found

Any thoughts on this? Thanks!

PS: version info
nvcc 7.5
cuda-memcheck 7.0
driver 8.0
runtime 7.5

Your cuda-memcheck version should match the runtime version.