Memcpy gives cudaErrorInvalidValue on pinning memory

Hello,
I’m facing a strange error: From many months our cuda code is working correctly, recently we decided to improve it’s speed by optimizing memory transfers. I changed memory allocation from C++'s new to cudaMallocHost and now cudaMemcpy is giving error “invalid argument”.

cuda-memcheck gives stack trace along with:
========= Program hit cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaMemcpy.

Nothing else is changed, if I changed memory allocation back to new, everything works correctly.

I’ve tested the sample code from this blog: https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/ and it runs successfully, so I’m sure my hardware setup is ok.

Can someone please suggest what can be wrong or way to detect what’s wrong.