Transfer tensor device without copying it in Jetson device

I have a Jetson Nano Orin which I’m running a model with TensorRT.
I’ve seen that the most time consuming operation is when I transfer a torch Tensor into the cuda device using the function to(“cuda:0”).

By profiling with torch.profiler, I’ve seen that the 97% of the overhead came from the copy of the tensor. However the Jetson Orin has shared memory between the CPU and the GPU, so technically I could avoid to copy the tensor because the GPU has the access to the same memory.

It is possible to transfer the tensor’s device without copying it? If yes, how?

Hi,

It’s possible but you will need to read the image to some sharable buffer (e.g. unified memory or pinned memory).

Based on the below topic, unified memory seems not supported in PyTorch.
You can check with the PyTorch team for the latest status.

Thanks.

Thank you for the response,

Now I better understood the memory management in cuda.

It seems CuPy support the unified memory allocation and also the conversion to pytorch tensors.
If there is no copy between the cupy-pytorch conversion, I think this solution could work but I didn’t test it yet.

I’ll update here when I will have time to test.

Hi,

Thanks for sharing this info.
Look forward to knowing your testing results.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.