Transfer rate from GPU to CPU with pytorch on Xavier NX

I’m trying to run a segmentation network, the result of which is a 8x3x224x224 tensor.
This takes ~500ms, which seems excessive (this is the time it takes the .cpu() function to run, as measured by cProfile).

Is there a way to reduce this?


Have you maximized the device performance?
It should improve the data transfer performance.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.