Darknet slower using Jetpack 4.4 (cuDNN 8.0.0 / CUDA 10.2) than Jetpack 4.3 (cuDNN 7.6.3 / CUDA 10.0)

Hi

This might be an issue directly with darknet, I’ve also opened an issue there: https://github.com/AlexeyAB/darknet/issues/5426 , just let me know in this case I will close this.

Problem

When I use darknet and yolov3-tiny on my jetson nano with the latest Jetpack 4.4 DP I get worse performance than with Jetpack 4.3. My guess is that is related to cuDNN 8.0.0 vs 7.x

I get 6.6 FPS with Jetpack 4.4 DP whereas I get 16.3 FPS with Jetpack 4.3

Complete procedure to reproduce

jetpack 4.4 (cuDNN: 8.0.0 CUDA 10.2) with CUDNN=1 flag:

darknet build with:

GPU=1
CUDNN=1
OPENCV=1
ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]

result for yolov3-tiny ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights <videoinput>

CUDA-version: 10020 (10020), cuDNN: 8.0.0, GPU count: 1  
OpenCV version: 4.1.1

FPS:6.3

cuDNN: 8.0.0 CUDA 10.2 with CUDNN=0 flag (jetpack 4.4):

darknet build with:

GPU=1
CUDNN=0
OPENCV=1
ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]

result for yolov3-tiny ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights <videoinput>

CUDA-version: 10020 (10020), GPU count: 1  
OpenCV version: 4.1.1

FPS:13.3

cuDNN: 7.6.3 CUDA 10 with CUDNN=1 flag (jetpack 4.4):

GPU=1
CUDNN=1
OPENCV=1
ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]

result for yolov3-tiny ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights <videoinput>

CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1  
 OpenCV version: 4.1.1

 FPS:16.4

cuDNN: 7.6.3 CUDA 10 with CUDNN=0 flag (jetpack 4.3):

GPU=1
CUDNN=0
OPENCV=1
ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]

result for yolov3-tiny ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights <videoinput>

CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1  
 OpenCV version: 4.1.1

FPS:13.3

Thanks

Hi,

Thanks for your report.
We are going to reproduce this issue and update more information with you later.

Thanks.

Hi,

May I know the setting of this experiment:

cuDNN: 7.6.3 CUDA 10 with CUDNN=1 flag (jetpack 4.4):

How do you install cuDNN 7.6.3 and CUDA 10.0 with JetPack4.4?
Thanks.

1 Like

Hi, Thanks.

This is a mistake, I meant jetpack 4.3 … Somehow I can’t edit the post

cuDNN: 7.6.3 CUDA 10 with CUDNN=1 flag (jetpack 4.3):

Basicaly I flashed to jetpack 4.3 , did test, then erase everything and flashed to jetpack 4.4 and did the same tests.

Hi,

So the performance without cuDNN is similar.
But with cuDNN drop from 16fps into 6fps, is that correct?

We are trying to reproduce this issue.
Will update more information with you later.

Thanks.

Yes this is what I experienced, a 10 FPS drop.

Many thanks !

Hi,

Just want to keep you updated.

This issue can be reproduced in our environment with Xavier.
We are going to check which cuDNN call causes the performance drop.

Thanks.

2 Likes

Good to know ! Good luck ! No problem for now I’m sticking with Jetpack 4.3 ;-)

2 Likes

Hi, I have plan to test darknet with Xavier NX. I have found that there is only 4.4 DP for for Xavier NX. Is it possible that this issue still to Xavier NX?

Hi,

We are working on this.
Will keep this topic updated once we solve this.

Thanks.

I am likewise experiencing ~40% performance drop using a different detection model, with efficientnet backbone and a customized Retina head.

Same problem on PyTorch 1.4
I am follow this page to install PyTorch:

I am testing the YOLOv5:

and found JetPack 4.4 inference time is about 0.25s
the JetPack 4.3 inference time is about 0.14s