Yolov3 on Jetson Nano GPU = 1 really slow and freezes

Hi everyone!

I just received my Jetson nano and wanted to get YOlov3 running! But I can’t get it to work yet and I’d appreciate some help. I detail what I did, ad more detail on my setup at the end.

{Instalation instructions}

After following the {Setup Details} (see it at the ned of the post), I followed setup instructions on
https://pjreddie.com/darknet/

I compiled the original make file and run YoloV3 on the test

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

It runs slowly (~90 seconds)

After that, I changed the make file (GPU, CUDNN, OPENCV =1) and recompiled to use the GPU. I ran the same test

./darknet detect cfg/yolov3.cfg yolov3.weights

,but this time it is extremely slow, and it freezed and restarts (or it kills it) on layer 9.

I read a lot on possible causes:

  • I changed this, to fix possible incompatibility between CUDA and the GPU
ARCH= -gencode arch=compute_53,code=[sm_53,compute_53] \
      -gencode arch=compute_62,code=[sm_62,compute_62]

-I changed a minor possible issue with darknet
https://github.com/pjreddie/darknet/issues/1141

{Setup Details}
-nv-jetson-nano-sd-card-image-r32.2.1
-I’m using a 5V/4A charger

    • I run install_basics.sh to add CUDA stuffs into the PATH and LD_LIBRARY_PATH variables

https://jkjung-avt.github.io/setting-up-nano/
-I added 8 Gb of swap memory

$ sudo fallocate -l 8G /mnt/8GB.swap
$ sudo mkswap /mnt/8GB.swap
$ sudo swapon /mnt/8GB.swap
$ /mnt/8GB.swap none swap sw 0 0

-I switched to high power mode

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

hello eimarinb.telefonica,

please also check Topic 1050377 for the instructions of deep learning inference benchmark.
thanks

Thanks! @JerryChang
A couple of updates

SSD-Mobilenet-V2
I couldn’t follow step 2. I tried to patch the file via terminal. I couldn’t apply the patch and tried the next steps anyway, but got an error. Can you walk me through on how to apply the patch?

Image classification

ResNet-50
Average over 10 runs is 28.1001 ms

Inception V4
over 10 runs is 96.5391 ms

VGG-19
Average over 10 runs is 101.042

U-Net Segmentation
error

Pose Estimation
Average over 10 runs is 71.6836

Tini Yolov3
Inference time per image: 31.2 ms (for 5 test images provided)

Hi,

Sorry for the late update.
AFAIK, Darknet use OpenCV as the camera interface which is slow due to the CPU memory implementation.

We have a several YOLO sample for Jetson system and it’s worthy to try first:
Pure TensorRT: /usr/src/tensorrt/samples/python/yolov3_onnx/
Integrated with Deepstream: /opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo

Here is a discussion that to reach 20FPS with YOLOv3 model on Nano for your reference:
https://devtalk.nvidia.com/default/topic/1064871/deepstream-sdk/deepstream-gst-nvstreammux-change-width-and-height-doesn-t-affect-fps/post/5392823/#5392823

Thanks.

Thanks! I’ll check it out now.