What is the correct way to use cudaMallocHost to create a local array representing the GPU data?

If I am creating memory equally on the host and GPU like this:

T* cpuSide = NULL;
T* gpuSide = NULL;

size = 1024;

cpuSide = new T[size]; //bad method as per  below link

cudaStatus = cudaMalloc((void**)&gpuSide, size * sizeof(T));

This seems to work fine. I can run memcopy on these and they work fine.

However, as per this link, this is the wrong way to create host memory: cuda - cudaMemcpy() calls to streams - Stack Overflow

They suggest we should instead do:

T* cpuSide = NULL;
T* gpuSide = NULL;

size = 1024;

cudaStatus = cudaMallocHost((void**)&cpuSide, size * sizeof(T));

cudaStatus = cudaMalloc((void**)&gpuSide, size * sizeof(T));

However, when I do this, I end up with memory errors on usage:
0xC0000005: Access violation reading location 0x00000002042003EC.

I presume this is not creating my array correctly on the local host. So what do I need to fix? Thanks for any help.

That’s a bit extreme. It is perfectly valid. In some situations, using cudaMallocHost as an alternative may be necessary (e.g. to achieve copy/compute overlap) or preferred for other reasons.

There is not enough information here to diagnose. The two variants (with new and with cudaMallocHost) should be roughly equivalent from a “legal access” perspective, so something else is going on. If you are on windows and requesting a large amount of space (which you aren’t) via cudaMallocHost, that can be an issue. Also, be sure you are actually checking those cudaStatus results. If you still need help, provide a complete example, along with the CUDA version, the GPU, and the OS you are running on.

1 Like

Thanks. I just realized it was because I had a delete[] cpuAllocation I had left in there and hadn’t replaced. My mistake. I will just stick with the cudaMallocHostas it works and sounds safer going forward. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.