Hi everyone,
I’m currently trying to run a very basic code on my Jetson Xavier NX in order to do object detection on a video, with MMDetection. But it seems that whatever the model I test, it takes an average of 1 second to infer a single frame (0.7s for the best one I checked), which is extremely slow and under the expected inference time advertised on the mmdet website (~50 fps).
I also tested the mmdetection demo scipts (video_demo.py and video_gpuacc.py), tried to convert my mmdet model to a TensorRT model (fp16 and int8 tested), but I still have approximatively the same results.
I really don’t know what I’m missing …
Please note that I previously worked on YoloV3 with Darknet and I had no problem like this.
My code can be seen below.
Environnement
Python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
CUDA available: True
GPU 0: Xavier
CUDA_HOME: /usr/local/cuda-11.4
NVCC: Cuda compilation tools, release 11.4, V11.4.166
GCC: aarch64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
- GCC 9.4
- C++ Version: 201402
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: NO AVX
- CUDA Runtime 11.4
- NVCC architecture flags: -gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_87,code=sm_87
- CuDNN 8.3.2
- Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.11.1
OpenCV: 4.5.5
MMCV: 1.5.2
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.4
MMDetection: 2.25.0+ca11860
My code
from mmdet.apis import init_detector, inference_detector
import mmcv
import cv2
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
# wrap_fp16_model(model)
def main():
video_reader = mmcv.VideoReader("/home/thalesgroup/Thales/medias/video_sample.mp4")
for frame in mmcv.track_iter_progress(video_reader):
result = inference_detector(model,frame)
frame = model.show_result(frame, result)
cv2.namedWindow('Processed video', 0)
mmcv.imshow(frame, 'Processed video', 1)
if __name__ == '__main__':
main()
Any help or idea is welcomed, thanks !