I’m facing a strange error: From many months our cuda code is working correctly, recently we decided to improve it’s speed by optimizing memory transfers. I changed memory allocation from C++'s
cudaMallocHost and now
cudaMemcpy is giving error “invalid argument”.
cuda-memcheck gives stack trace along with:
========= Program hit cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaMemcpy.
Nothing else is changed, if I changed memory allocation back to
new, everything works correctly.
I’ve tested the sample code from this blog: https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/ and it runs successfully, so I’m sure my hardware setup is ok.
Can someone please suggest what can be wrong or way to detect what’s wrong.