[Pls help/Beginner] Person detection on Jetson Nano in Images / PERFORMANCE/GENERELL

Sorry i am totally beginner. Just received my Jetson nano today.

What i want to do is to detect persons on a giving picture.

I want to start/use the detection on the linux console. Until now i am doing it like:

> ./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

The result is:
CUDA-version: 10020 (10020), cuDNN: 8.0.0, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV isn’t used - data augmentation will be slow
0 : compute_capability = 530, cudnn_half = 0, GPU: NVIDIA Tegra X1
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 320 x 320 x 3 -> 320 x 320 x 32 0.177 BF
1 conv 64 3 x 3/ 2 320 x 320 x 32 -> 160 x 160 x 64 0.944 BF
2 conv 32 1 x 1/ 1 160 x 160 x 64 -> 160 x 160 x 32 0.105 BF
3 conv 64 3 x 3/ 1 160 x 160 x 32 -> 160 x 160 x 64 0.944 BF
4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 160 x 160 x 64 0.002 BF
5 conv 128 3 x 3/ 2 160 x 160 x 64 -> 80 x 80 x 128 0.944 BF
6 conv 64 1 x 1/ 1 80 x 80 x 128 -> 80 x 80 x 64 0.105 BF
7 conv 128 3 x 3/ 1 80 x 80 x 64 -> 80 x 80 x 128 0.944 BF
8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 80 x 80 x 128 0.001 BF
9 conv 64 1 x 1/ 1 80 x 80 x 128 -> 80 x 80 x 64 0.105 BF
10 conv 128 3 x 3/ 1 80 x 80 x 64 -> 80 x 80 x 128 0.944 BF
11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 80 x 80 x 128 0.001 BF
12 conv 256 3 x 3/ 2 80 x 80 x 128 -> 40 x 40 x 256 0.944 BF
13 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
14 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
16 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
17 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
19 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
20 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
22 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
23 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
25 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
26 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
28 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
29 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
31 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
32 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
34 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
35 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 40 x 40 x 256 0.000 BF
37 conv 512 3 x 3/ 2 40 x 40 x 256 -> 20 x 20 x 512 0.944 BF
38 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
39 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
41 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
42 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
44 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
45 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
47 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
48 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
50 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
51 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
53 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
54 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
56 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
57 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
59 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
60 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 20 x 20 x 512 0.000 BF
62 conv 1024 3 x 3/ 2 20 x 20 x 512 -> 10 x 10 x1024 0.944 BF
63 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
64 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 10 x 10 x1024 0.000 BF
66 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
67 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 10 x 10 x1024 0.000 BF
69 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
70 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 10 x 10 x1024 0.000 BF
72 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
73 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 10 x 10 x1024 0.000 BF
75 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
76 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
77 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
78 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
79 conv 512 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 512 0.105 BF
80 conv 1024 3 x 3/ 1 10 x 10 x 512 -> 10 x 10 x1024 0.944 BF
81 conv 255 1 x 1/ 1 10 x 10 x1024 -> 10 x 10 x 255 0.052 BF
82 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
83 route 79 -> 10 x 10 x 512
84 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF
85 upsample 2x 10 x 10 x 256 -> 20 x 20 x 256
86 route 85 61 -> 20 x 20 x 768
87 conv 256 1 x 1/ 1 20 x 20 x 768 -> 20 x 20 x 256 0.157 BF
88 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
89 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
90 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
91 conv 256 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
92 conv 512 3 x 3/ 1 20 x 20 x 256 -> 20 x 20 x 512 0.944 BF
93 conv 255 1 x 1/ 1 20 x 20 x 512 -> 20 x 20 x 255 0.104 BF
94 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
95 route 91 -> 20 x 20 x 256
96 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF
97 upsample 2x 20 x 20 x 128 -> 40 x 40 x 128
98 route 97 36 -> 40 x 40 x 384
99 conv 128 1 x 1/ 1 40 x 40 x 384 -> 40 x 40 x 128 0.157 BF
100 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
101 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
102 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
103 conv 128 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
104 conv 256 3 x 3/ 1 40 x 40 x 128 -> 40 x 40 x 256 0.944 BF
105 conv 255 1 x 1/ 1 40 x 40 x 256 -> 40 x 40 x 255 0.209 BF
106 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 38.981
avg_outputs = 315056
Allocate additional workspace_size = 29.49 MB
Loading weights from yolov3.weights…
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
data/dog.jpg: Predicted in 1251.257000 milli-seconds.
bicycle: 100%
dog: 99%
truck: 89%
car: 31%
Not compiled with OpenCV, saving to predictions.png instead
uwe72@uwe72-desktop:~/darknet$

IT TAKES ARROUND 10 Seconds in total.

Is there a way to increase this?

Would be nice if the duration just takes 1-2 seconds in total.

Should i follow in general an other approve?

Here my yolo3.cfg

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=320
height=320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

Here a part of the Makefile:

GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=0
AVX=0
OPENMP=0
LIBSO=1
ZED_CAMERA=0 # ZED SDK 3.0 and above
ZED_CAMERA_v2_8=0 # ZED SDK 2.X

Is there a way to skip the first part which takes very long?

Done! Loaded 107 layers from weights-file
Because to real creation of the predicted file just need 1.2 seconds. This would be fast enough for me. (data/dog.jpg: Predicted in 1251.257000 milli-seconds.)

Hi,

First, please maximize the device performance first.

sudo nvpmodel -m 0
sudo jetson_clocks

Then we recommended to use our Deepstream sample for better performance.

/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/

You can get around 2fps for an end-to-end usage from a real camera.
If the 416 network input and periodically inference is acceptable, here is a change for 20fps for YOLOv3 on Nano:

Thanks.

1 Like