Memory Allocation and Data Copy in CC Mode

Yifan-Tan · July 26, 2023, 3:38am

According to Confidential Compute on NVIDIA Hopper H100 → Running a Confidential Compute Application on the GPU → Developer Considerations, memory allocations by cudaMallocHost would be handled by UVM, like cudaManagedAlloc.

So, what’s the behavior of cudaMemcpyAsync?

Thanks!

rnertney · July 31, 2023, 4:09pm

When going from a CPU buffer to a pinned GPU buffer, CUDA UMD will encrypt and stage into a bounce buffer and have the GPU decrypt and pull into the TCB.

UVM will only be triggered if the pointer was via cudaHostAlloc/cudaMallocHost/cudaMallocManaged, in which case, the UVM driver does the encrypt+bounce+DMA+decrypt sequence.

Yifan-Tan · July 31, 2023, 4:47pm

Thanks!

By the way, may I ask @rnertney , is there a large performance difference between using CUDA UMD and using UVM? It might suggest whether app developers should use malloc or cudaHostAlloc for CPU-side memory allocation for DMA under CC.

I do not have access to H100 hardware, and therefore cannot evaluate it by myself.

rnertney · August 1, 2023, 3:46pm

UVM handles memory automatically and does migrations based on page-faults. A carefully coded application + manual data-movement (i.e., without UVM) can perform better than UVM in the more complicated data-movement scenarios. However, UVM is very powerful and has the benefit of a much easier code writing. Take a look at the intro here:
https://developer.nvidia.com/blog/unified-memory-cuda-beginners/

Comparisons of speed between standard movement would vary based on the complexity of your movement flows; UVM is pretty close to ideal in many use-cases.

malloc + cudaHostRegister() is a forbidden API because the GPU DMAs cannot directly access VM memory when the CPUs have isolated it. In HCC mode, all memory being moved into the GPU will need to have a cu_ prefix such that our driver may intercept and encrypt it.

system · August 15, 2023, 3:47pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does cudaMallocHost still works on CC mode Confidential Computing	6	60	November 18, 2024
Low performance for CPU accessing page-locked memory? CUDA Programming and Performance	3	597	March 7, 2019
UVM in confidential computing mode Confidential Computing	4	526	January 17, 2024
cudahostalloc vs memcpy tradeoff CUDA Programming and Performance	1	1369	November 24, 2014
a question about cudaMallocManaged（） CUDA Programming and Performance	4	523	November 17, 2018
uncached memory created by cudaHostAlloc and cudaMemcpyAsync issues on TX1 Jetson TX1	3	1721	July 15, 2016
Is cudaMallocHost allocated physical memory? CUDA Programming and Performance	6	1081	July 15, 2020
Difference between cudaMallocManaged and cudaMallocHost CUDA Programming and Performance cuda	3	10528	March 30, 2022
Why is cudaMallocHost() so slow? CUDA Programming and Performance	7	8769	November 17, 2021
CPU operation is very slow on memory allocated by cudaMallocHost TensorRT	1	827	October 8, 2018

Memory Allocation and Data Copy in CC Mode

Related topics