What is the correct way to use cudaMallocHost to create a local array representing the GPU data?

If I am creating memory equally on the host and GPU like this:

T* cpuSide = NULL;
T* gpuSide = NULL;

size = 1024;

cpuSide = new T[size]; //bad method as per  below link

cudaStatus = cudaMalloc((void**)&gpuSide, size * sizeof(T));

This seems to work fine. I can run memcopy on these and they work fine.

However, as per this link, this is the wrong way to create host memory: cuda - cudaMemcpy() calls to streams - Stack Overflow

They suggest we should instead do:

T* cpuSide = NULL;
T* gpuSide = NULL;

size = 1024;

cudaStatus = cudaMallocHost((void**)&cpuSide, size * sizeof(T));

cudaStatus = cudaMalloc((void**)&gpuSide, size * sizeof(T));

However, when I do this, I end up with memory errors on usage:
0xC0000005: Access violation reading location 0x00000002042003EC.

I presume this is not creating my array correctly on the local host. So what do I need to fix? Thanks for any help.

That’s a bit extreme. It is perfectly valid. In some situations, using cudaMallocHost as an alternative may be necessary (e.g. to achieve copy/compute overlap) or preferred for other reasons.

There is not enough information here to diagnose. The two variants (with new and with cudaMallocHost) should be roughly equivalent from a “legal access” perspective, so something else is going on. If you are on windows and requesting a large amount of space (which you aren’t) via cudaMallocHost, that can be an issue. Also, be sure you are actually checking those cudaStatus results. If you still need help, provide a complete example, along with the CUDA version, the GPU, and the OS you are running on.

Thanks. I just realized it was because I had a delete[] cpuAllocation I had left in there and hadn’t replaced. My mistake. I will just stick with the cudaMallocHostas it works and sounds safer going forward. Thanks.