Hi everyone!

I just received my Jetson nano and wanted to get YOlov3 running! But I can’t get it to work yet and I’d appreciate some help. I detail what I did, ad more detail on my setup at the end.

{Instalation instructions}

After following the {Setup Details} (see it at the ned of the post), I followed setup instructions on

I compiled the original make file and run YoloV3 on the test

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

It runs slowly (~90 seconds)

After that, I changed the make file (GPU, CUDNN, OPENCV =1) and recompiled to use the GPU. I ran the same test

./darknet detect cfg/yolov3.cfg yolov3.weights

,but this time it is extremely slow, and it freezed and restarts (or it kills it) on layer 9.

I read a lot on possible causes:

  • I changed this, to fix possible incompatibility between CUDA and the GPU
ARCH= -gencode arch=compute_53,code=[sm_53,compute_53] \
      -gencode arch=compute_62,code=[sm_62,compute_62]

-I changed a minor possible issue with darknet

{Setup Details}
-I’m using a 5V/4A charger

    • I run install_basics.sh to add CUDA stuffs into the PATH and LD_LIBRARY_PATH variables

-I added 8 Gb of swap memory

$ sudo fallocate -l 8G /mnt/8GB.swap
$ sudo mkswap /mnt/8GB.swap
$ sudo swapon /mnt/8GB.swap
$ /mnt/8GB.swap none swap sw 0 0

-I switched to high power mode

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

please also check Topic 1050377 for the instructions of deep learning inference benchmark.

Thanks!
A couple of updates

I couldn’t follow step 2. I tried to patch the file via terminal. I couldn’t apply the patch and tried the next steps anyway, but got an error. Can you walk me through on how to apply the patch?

Image classification

Average over 10 runs is 28.1001 ms

Inception V4
over 10 runs is 96.5391 ms

Average over 10 runs is 101.042

U-Net Segmentation

Pose Estimation
Average over 10 runs is 71.6836

Tini Yolov3
Inference time per image: 31.2 ms (for 5 test images provided)


AFAIK, Darknet use OpenCV as the camera interface which is slow due to the CPU memory implementation.

We have a several YOLO sample for Jetson system and it’s worthy to try first:
Pure TensorRT: /usr/src/tensorrt/samples/python/yolov3_onnx/
Integrated with Deepstream: /opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo

Here is a discussion that to reach 20FPS with YOLOv3 model on Nano for your reference:


Thanks! I’ll check it out now.