Extremely slow inference with MMDetection on Jetson Xavier NX

yanisdu94800 · June 15, 2022, 3:33pm

Hi everyone,

I’m currently trying to run a very basic code on my Jetson Xavier NX in order to do object detection on a video, with MMDetection. But it seems that whatever the model I test, it takes an average of 1 second to infer a single frame (0.7s for the best one I checked), which is extremely slow and under the expected inference time advertised on the mmdet website (~50 fps).

I also tested the mmdetection demo scipts (video_demo.py and video_gpuacc.py), tried to convert my mmdet model to a TensorRT model (fp16 and int8 tested), but I still have approximatively the same results.

I really don’t know what I’m missing …

Please note that I previously worked on YoloV3 with Darknet and I had no problem like this.
My code can be seen below.

Environnement

Python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
CUDA available: True
GPU 0: Xavier
CUDA_HOME: /usr/local/cuda-11.4
NVCC: Cuda compilation tools, release 11.4, V11.4.166
GCC: aarch64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 9.4
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 11.4
  - NVCC architecture flags: -gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_87,code=sm_87
  - CuDNN 8.3.2
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.11.1
OpenCV: 4.5.5
MMCV: 1.5.2
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.4
MMDetection: 2.25.0+ca11860

My code

from mmdet.apis import init_detector, inference_detector
import mmcv
import cv2

config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
# wrap_fp16_model(model)

def main():
	video_reader = mmcv.VideoReader("/home/thalesgroup/Thales/medias/video_sample.mp4")
	
	for frame in mmcv.track_iter_progress(video_reader):
		result = inference_detector(model,frame)
		frame = model.show_result(frame, result)
		cv2.namedWindow('Processed video', 0)
		mmcv.imshow(frame, 'Processed video', 1)

if __name__ == '__main__':
	main()

Any help or idea is welcomed, thanks !

AastaLLL · June 16, 2022, 2:49am

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

yanisdu94800 · June 16, 2022, 7:44am

Hello ! Thanks for your answer, I just tried it but it doesn’t change anything …

AastaLLL · June 17, 2022, 2:33am

Hi,

Could you share how you convert the model into TensorRT?
Do you use the flow: .pth - .onnx - .trt

Thanks.

yanisdu94800 · June 17, 2022, 7:50am

Hi, I’m using a project that simplifies the conversion between MMDet model and TensorRT model :

I’m not actually an expert on the subject, I’ve just began so idk if this really works …
Is there another simple way to convert a model into TensorRT (int8 preferably) ?
Thanks a lot for your help.

yanisdu94800 · June 21, 2022, 8:48am

Edit:
I tried Detectron2 models and I had similar results (~1 FPS) far below those expected … I checked GPU usage with Tegrastats and it seems the GPU is well used (peak at 100% every second).
Thanks to anyone who will take the time to help me.

AastaLLL · June 27, 2022, 4:37am

Hi,

We also have a sample to inference Detectron2 with TensorRT.
Would you mind giving it a try to see if the performance improves?

Thanks.

system · July 20, 2022, 2:05am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3529	October 23, 2020
Xavier NX inference speed Jetson Xavier NX jetson-inference	5	1828	March 9, 2022
Extremely slow inference in TensorRT for live semantic segmentation model Jetson AGX Xavier tensorrt , tensorflow , jetson-inference	11	4329	April 12, 2022
Nvidia Jetson NX extremely slow even with TensorRT inference for yolov3 TensorRT	3	1194	August 23, 2021
Nvidia Jetson NX extremely slow even with TensorRT inference for yolov3 Jetson Xavier NX tensorrt	21	2502	October 18, 2021
2 fps inference speed Jetson AGX Xavier	4	1838	October 18, 2021
Object Detection working very slow on Jetson TX2 Jetson TX2	9	1694	October 18, 2021
Speed up float16 conversion using python Jetson AGX Xavier tensorrt , python , cudnn	6	402	May 7, 2024
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1195	August 27, 2020
Inference slow using nvInfer and TensorRT directly into PX2 General	6	754	April 17, 2019

Extremely slow inference with MMDetection on Jetson Xavier NX

Environnement

My code

Related topics