Questions about efficient memory management for TensorRT on TX2

Beerend · November 8, 2018, 8:13am

Hi Robert,

Thanks for your reply. This clarifies things a little and I believe it solves my problem.

I checked cudaHostAlloc with the cudaHostAllocMapped flag. Next I obtain a device pointer using cudaHostGetDevicePointer and both pointer appear to be the same so this is pretty cool.

NV_CUDA_CHECK(cudaHostAlloc(&trt_output_cpu_ptr, output_size, cudaHostAllocMapped));
NV_CUDA_CHECK(cudaHostGetDevicePointer(&trt_output_gpu, trt_output_cpu_ptr, 0));
std::cout << "[" << std::this_thread::get_id() << "] Allocated output cpu_ptr " << trt_output_cpu_ptr << " and obtained gpu_ptr" << trt_output_gpu << std::endl;

My application ensures that the GPU input memory will only be read by the GPU after the CPU has completed writing. Conversely I use cudaStreamSynchronize to ensure that the GPU has completed its work before I start to read the results on the CPU. Do I still need other calls to e.g. cudaDeviceSynchronize to fix some caching issues?

Topic		Replies	Views
Performance issues after refactoring CUDA code to avoid managed memory CUDA Programming and Performance jetson	5	76	November 19, 2024
Asynchronous memory transfer on Jetson TX1 Jetson TX1	10	1618	October 18, 2021
Optimising GPU and CPU memory transfer time (CUDA/Hardware)? CUDA Programming and Performance hw , cuda	8	4216	January 7, 2022
Jetson TX2 cudaMalloc() failed with error all CUDA-capable devices are busy or unavailable Jetson TX2 cuda	9	1044	September 13, 2023
RE: Performance issues after refactoring CUDA code to avoid managed memory Jetson AGX Xavier cuda	4	41	November 25, 2024
TensorRT multi stream TensorRT	3	2671	February 29, 2024
nvidia-smi and exclusive compute mode Jetson TX2	10	5593	October 18, 2021
Help \| Mapping NVMM buffers with CUDA using EGLImageKHR and CUeglFrame, Jetson Nano running 4.5.1 Jetson Nano cuda , gstreamer	4	780	May 13, 2022
TF-TRT optimization TensorRT tensorrt , tensorflow , jetson-inference	4	4948	June 2, 2021
getPluginCreator could not find plugin BatchedNMS_TRT version 1 TensorRT	5	4015	December 23, 2020

Questions about efficient memory management for TensorRT on TX2

Related topics