Grace Hopper Superchip programming without unified memory


I’m trying to know if an existing CUDA code can be ported naturally to Grace Hopper Superchip that basically utilizes unified memory.
How will cudaMalloc and cudaMemcpy behave on Grace Hopper Superchip?
Namely, can we program Grace Hopper Superchip without using unified memory just like ordinary discrete GPUs?

Yes, you can program the GH superchip in a fashion similar to ordinary CUDA programming. cudaMalloc and cudaMemcpy will work the same way as they do on a system with an x86 family CPU.

Thank you.
So, cudaMalloc will allocate memory on the HBM memory of Hopper GPU, and cudaMemcpy will copy memory between the DDR memory of Grace CPU and the HBM memory of Hopper GPU. Is this correct?

Yes, correct. Just exactly the same way that it works today.

You may not be aware of this, but CUDA already supports ARM CPUs. Grace is an ARM CPU.

1 Like