Hello,
I’m using Jetson AGX Xavier with TensorRT to run object detection with RetinaNet for my master’s thesis. Sadly, I’m facing a couple of issues. My pipeline is mmdetection → ONNX → TensorRT 6 and all tests are run in FP16 mode with batch size 1.
I have problems getting over 50 fps by adjusting image size, detection head or backbone. After some research I found that the GPU is getting clocked down during image loading and the clock isn’t getting ramped up fast enough for the inference. The two solutions I tested so far are either running inference on the same image multiple times and only timing the last pass or to set a minimum frequency for the GPU with /sys/devices/17000000.gv11b/devfreq/17000000.gv11b/min_freq. The first solution makes the evaluation really slow. With the second solution I am not sure whether the 30 W limit is enforced. Is it still enforced or can the power consumption exceed 30 W in this case? Is there any other way to ramp up the GPU clock faster when the inference starts?
Is there some way to reliably measure the power consumption? I have seen the /sys/bus/i2c/drivers/ina3221x/1-0040/iio:device0/in_power*_input files in the Jetson documentation that should report the power consumption of GPU, CPU and SOC in mW (Welcome — Jetson Linux<br/>Developer Guide 34.1 documentation). However, during inference in 30 W mode with default clock settings the three files show about 5000 in total. 5 W seems to be way too little. So, is this measured in 10 mW instead of 1 mW? But 50 W would be too much for the 30 W mode.
Is there some more detailed documentation what is meant by optimized power budget in the documentation (Welcome — Jetson Linux<br/>Developer Guide 34.1 documentation)? I especially would like to know how the budget is shared between CPU and GPU.
Thank you in advance for any support.
Environment
TensorRT Version: 6.0.1
GPU Type: Volta (Jetson AGX Xavier)
Nvidia Driver Version: JetPack 4.3
CUDA Version: 10.0
CUDNN Version: 7.6.3
Operating System + Version: JetPack 4.3
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.3.0
Baremetal or Container (if container which image + tag): Baremetal