Not utilizing GPU while using pytorch and torch2trt

Hi, I insatlled trt_pose and run sample python code
(NVIDIA Jetson: JetsonNano - NVIDIA AI IOT - Human Pose estimation using TensorRT)
Howerver it took too long running time.
I checked GPU resource by JTOP while running the code,
and I found that jetson nano doesn’t use GPU while running.

I’m using
python 3.6.9
pytorch 1.8
torchvision 0.9.0
jetpack 4.6
CUDA 10.2.300
opencv 4.5.4 compiled CUDA : YES
TensorRT 8.0.1.6
and swap memory 8GB

when I checked running time by using below python code,

model_trt = TRTModule()
model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))

t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()

print(50.0 / (t1 - t0))

it reported value : 5~ 6

I think it should be more than 10,
but it’s low

I think that’s because it’s not using GPU resource.

How can I solve this problem?

Hi,

First, have you maximized the device performance?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Next, do you need the swap memory to make the inference working?
Since swap memory is implemented by disk, it may induce some overhead to the pipeline.

Thanks.

Hi, thank you for replying
After I maximized the performance and disabled the extra swap memory,
It’s been a little bit better but still to slow,

the estimated bench score by using below code

t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()
print(50.0 / (t1 - t0))

is now 7~8, but I think it’s still low
and as I check with JTOP,
It’s still not using any of GPU resource.

Hi,

Thanks for your reporting.

We are going to reproduce this internally.
Will share more information later.

Hi,

We test the trt_pose example on JetPack4.6 with Nano.
It seems that the torch2trt library requires TensorRT 7.1 for the plugin compatibility.

Did you run the sample on JetPack 4.6?
If yes, have you modified the plugin implementation (GroupNormPlugin and InterpolatePlugin)?

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.