Thank you @dusty_nv, the documentation is exactly what I was looking for. Sorry i didn’t found it alone
the performance still far away from 20-25 FPS.
So I will give you some informations :
sudo nvpmodel -q
NVPM WARN: fan mode is not set!
NV Power Mode: MAXN
0
sudo jetson_clocks --show
SOC family:tegra210 Machine:NVIDIA Jetson Nano Developer Kit
Online CPUs: 0-3
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu1: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu2: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu3: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
GPU MinFreq=921600000 MaxFreq=921600000 CurrentFreq=921600000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MAXN
detectnet-camera.py modified
#!/usr/bin/python
import time
import cv2
from imutils.video import VideoStream
import numpy as np
import jetson.inference
import jetson.utils
import argparse
import sys
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage())
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load (see below for options)")
parser.add_argument("--overlay", type=str, default="box,labels,conf", help="detection overlay flags (e.g. --overlay=box,labels,conf)\nvalid combinations are: 'box', 'labels', 'conf', 'none'")
parser.add_argument("--threshold", type=float, default=0.5, help="minimum detection threshold to use")
parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (e.g. CSI camera 0)\nor for VL42 cameras, the /dev/video device to use.\nby default, MIPI CSI camera 0 will be used.")
parser.add_argument("--width", type=int, default=1280, help="desired width of camera stream (default is 1280 pixels)")
parser.add_argument("--height", type=int, default=720, help="desired height of camera stream (default is 720 pixels)")
try:
opt = parser.parse_known_args()[0]
except:
print("")
parser.print_help()
sys.exit(0)
net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold)
#camera = VideoStream("rtsp://admin:xx@192.168.0.13:554//h264Preview_01_sub").start()
camera = VideoStream("rtmp://192.168.0.13/bcs/channel0_sub.bcs?channel=0&stream=1&user=admin&password=xx").start()
display = jetson.utils.glDisplay()
time.sleep(1)
while True:
image = camera.read()
image = cv2.resize(image, (300, 300))
img = cv2.cvtColor(image, cv2.COLOR_BGR2RGBA).astype(np.float16)
img = jetson.utils.cudaFromNumpy(img)
detections = net.Detect(img, image.shape[1], image.shape[0], False)
print("detected {:d} objects in image".format(len(detections)))
for detection in detections:
print(detection)
display.RenderOnce(img, 300, 300)
display.SetTitle("{:s} | Network {:.0f} ms".format(opt.network, net.GetNetworkTime()))
net.PrintProfilerTimes()
and the result :
Terminal output
charly@nano:~/Projects$ cd /home/charly/Projects ; env /usr/bin/python3 /home/charly/.vscode-oss/extensions/ms-python.python-2020.6.88468/pythonFiles/lib/python/debugpy/launcher 38117 -- /home/charly/Projects/video-surv/jetson/camera.py
jetson.inference.__init__.py
jetson.inference -- initializing Python 3.6 bindings...
jetson.inference -- registering module types...
jetson.inference -- done registering module types
jetson.inference -- done Python 3.6 binding initialization
jetson.utils.__init__.py
jetson.utils -- initializing Python 3.6 bindings...
jetson.utils -- registering module functions...
jetson.utils -- done registering module functions
jetson.utils -- registering module types...
jetson.utils -- done registering module types
jetson.utils -- done Python 3.6 binding initialization
jetson.inference -- PyTensorNet_New()
jetson.inference -- PyDetectNet_Init()
jetson.inference -- detectNet loading network using argv command line params
jetson.inference -- detectNet.__init__() argv[0] = '/home/charly/Projects/video-surv/jetson/camera.py'
detectNet -- loading detection network model from:
-- prototxt networks/ped-100/deploy.prototxt
-- model networks/ped-100/snapshot_iter_70800.caffemodel
-- input_blob 'data'
-- output_cvg 'coverage'
-- output_bbox 'bboxes'
-- mean_pixel 0.000000
-- mean_binary NULL
-- class_labels networks/ped-100/class_labels.txt
-- threshold 0.500000
-- batch_size 1
[TRT] TensorRT version 7.1.0
[TRT] loading NVIDIA plugins...
[TRT] Plugin creator registration succeeded - ::GridAnchor_TRT
[TRT] Plugin creator registration succeeded - ::NMS_TRT
[TRT] Plugin creator registration succeeded - ::Reorg_TRT
[TRT] Plugin creator registration succeeded - ::Region_TRT
[TRT] Plugin creator registration succeeded - ::Clip_TRT
[TRT] Plugin creator registration succeeded - ::LReLU_TRT
[TRT] Plugin creator registration succeeded - ::PriorBox_TRT
[TRT] Plugin creator registration succeeded - ::Normalize_TRT
[TRT] Plugin creator registration succeeded - ::RPROI_TRT
[TRT] Plugin creator registration succeeded - ::BatchedNMS_TRT
[TRT] Could not register plugin creator: ::FlattenConcat_TRT
[TRT] Plugin creator registration succeeded - ::CropAndResize
[TRT] Plugin creator registration succeeded - ::DetectionLayer_TRT
[TRT] Plugin creator registration succeeded - ::Proposal
[TRT] Plugin creator registration succeeded - ::ProposalLayer_TRT
[TRT] Plugin creator registration succeeded - ::PyramidROIAlign_TRT
[TRT] Plugin creator registration succeeded - ::ResizeNearest_TRT
[TRT] Plugin creator registration succeeded - ::Split
[TRT] Plugin creator registration succeeded - ::SpecialSlice_TRT
[TRT] Plugin creator registration succeeded - ::InstanceNormalization_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension '.caffemodel')
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT] loading network profile from engine cache... /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel loaded
[TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[TRT] Deserialize required 3905137 microseconds.
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding -- index 0
-- name 'data'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (SPATIAL)
-- dim #1 512 (SPATIAL)
-- dim #2 1024 (SPATIAL)
[TRT] binding -- index 1
-- name 'coverage'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 1 (SPATIAL)
-- dim #1 32 (SPATIAL)
-- dim #2 64 (SPATIAL)
[TRT] binding -- index 2
-- name 'bboxes'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 4 (SPATIAL)
-- dim #1 32 (SPATIAL)
-- dim #2 64 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=512 w=1024) size=6291456
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=1 c=1 h=32 w=64) size=8192
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=1 c=4 h=32 w=64) size=32768
device GPU, /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel initialized.
detectNet -- number object classes: 1
detectNet -- maximum bounding boxes: 2048
detectNet -- loaded 1 class info entries
detectNet -- number of object classes: 1
jetson.utils -- PyDisplay_New()
jetson.utils -- PyDisplay_Init()
[OpenGL] glDisplay -- X screen 0 resolution: 1920x1080
[OpenGL] glDisplay -- display device initialized
jetson.utils -- cudaFromNumpy() ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 2 = 4
detected 0 objects in image
[OpenGL] creating 300x300 texture
[cuda] registered 1440000 byte openGL texture for interop access (300x300)
[TRT] ----------------------------------------------
[TRT] Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT] ----------------------------------------------
[TRT] Pre-Process CPU 0.08500ms CUDA 2.84552ms
[TRT] Network CPU 127.37260ms CUDA 124.45849ms
[TRT] Post-Process CPU 0.33917ms CUDA 0.49391ms
[TRT] Total CPU 127.79677ms CUDA 127.79791ms
[TRT] ----------------------------------------------
[TRT] note -- when processing a single image, run 'sudo jetson_clocks' before
to disable DVFS for more accurate profiling/timing measurements
jetson.utils -- freeing CUDA mapped memory
jetson.utils -- cudaFromNumpy() ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 2 = 4
detected 0 objects in image
[TRT] ----------------------------------------------
[TRT] Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT] ----------------------------------------------
[TRT] Pre-Process CPU 0.08797ms CUDA 3.88193ms
[TRT] Network CPU 130.04083ms CUDA 126.09641ms
[TRT] Post-Process CPU 0.33975ms CUDA 0.49729ms
[TRT] Total CPU 130.46855ms CUDA 130.47562ms
[TRT] ----------------------------------------------
jetson.utils -- freeing CUDA mapped memory
jetson.utils -- cudaFromNumpy() ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 2 = 4
detected 0 objects in image
[TRT] ----------------------------------------------
[TRT] Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT] ----------------------------------------------
[TRT] Pre-Process CPU 0.07651ms CUDA 2.88812ms
[TRT] Network CPU 130.97610ms CUDA 128.01839ms
[TRT] Post-Process CPU 0.32193ms CUDA 0.47964ms
[TRT] Total CPU 131.37454ms CUDA 131.38614ms
[TRT] ----------------------------------------------
jetson.utils -- freeing CUDA mapped memory
jetson.utils -- cudaFromNumpy() ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy() ndarray dim 2 = 4
detected 0 objects in image
[TRT] ----------------------------------------------
[TRT] Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT] ----------------------------------------------
[TRT] Pre-Process CPU 0.08698ms CUDA 4.26156ms
[TRT] Network CPU 132.05255ms CUDA 127.73360ms
[TRT] Post-Process CPU 0.31933ms CUDA 0.31958ms
[TRT] Total CPU 132.45886ms CUDA 132.31474ms
[TRT] ----------------------------------------------
One more detail, i’m running the jetson nano on a tv screen. I don’t think this matter
I will try to take a look to DeepStream but, for the moment I want to make it work. To learn from the beginning.
Convert the image input in float16 or 32 didn’t change anything
I want to reach 20-25 FPS to be abble to get analysis of two images par secondes for each camera. To reduce the chances which some one get through without get detected
thank you for your time helping me.