Transfer tensor device without copying it in Jetson device

michele.tessari · August 25, 2023, 12:28pm

I have a Jetson Nano Orin which I’m running a model with TensorRT.
I’ve seen that the most time consuming operation is when I transfer a torch Tensor into the cuda device using the function to(“cuda:0”).

By profiling with torch.profiler, I’ve seen that the 97% of the overhead came from the copy of the tensor. However the Jetson Orin has shared memory between the CPU and the GPU, so technically I could avoid to copy the tensor because the GPU has the access to the same memory.

It is possible to transfer the tensor’s device without copying it? If yes, how?

AastaLLL · August 30, 2023, 3:07am

Hi,

It’s possible but you will need to read the image to some sharable buffer (e.g. unified memory or pinned memory).

Based on the below topic, unified memory seems not supported in PyTorch.
You can check with the PyTorch team for the latest status.

Thanks.

michele.tessari · August 30, 2023, 8:23am

Thank you for the response,

Now I better understood the memory management in cuda.

It seems CuPy support the unified memory allocation and also the conversion to pytorch tensors.
If there is no copy between the cupy-pytorch conversion, I think this solution could work but I didn’t test it yet.

I’ll update here when I will have time to test.

AastaLLL · August 31, 2023, 8:02am

Hi,

Thanks for sharing this info.
Look forward to knowing your testing results.

system · October 9, 2023, 7:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson AGX Orin CUDA IPC Support Jetson AGX Orin cuda , docker , pytorch	7	2195	July 29, 2022
Jetson Nano: PyTorch to TensorRT conversion memory low, cross-compile on the host PC Jetson Nano tensorrt , pytorch	3	918	October 15, 2021
Use Jetson Nano Image on Jetson Orin Jetson AGX Orin pytorch	6	460	September 1, 2022
Multiprocessing PyTorch inference with TensorRT on Jetson Orin NX devices Jetson Orin NX tensorrt , cuda , pytorch , cudnn	2	542	May 7, 2024
About zeor-cpoy between host and GPU Jetson AGX Orin cuda	3	201	May 27, 2024
Question about cudaManagedMemory and zero-copy memory for Jetson AGX Jetson AGX Orin cuda	2	54	November 18, 2024
How to run pytorch custom inference on Jetson Nano's GPU? Jetson Nano pytorch	4	1179	June 21, 2022
data transfer cost a lot of time Jetson TX2	2	743	October 18, 2021
Jetson Orin Nano CUDA 12.6 in JetPack 6.1 in Tensorflow and Pytorch Jetson Orin Nano cuda , pytorch	11	1406	November 21, 2024
JetPack 4.6.1 (L4T R32.7.1): PyTorch allocates all the memory + swap! Jetson Nano cuda , pytorch	5	471	December 19, 2023

Transfer tensor device without copying it in Jetson device

Related topics