openCv + detectNet in python

Hello, I need your help again !
I’m trying to use netdetect-camera.py with IP Camera read by opencv.
I use opencv 3.2.0 like :

camera = cv2.VideoCapture(“rtsp://admin:xxx@192.168.0.13:554//h264Preview_01_sub”)
image = camera.read()[1]
img = cv2.cvtColor(image, cv2.COLOR_BGR2RGBA).astype(np.float32)
img = jetson.utils.cudaFromNumpy(img)
detections = net.Detect(img, image.shape[1], image.shape[0], opt.overlay)

image is a numpy array with shape (480, 640, 3)
so, i convert the color to RGBA and to float32 and put it in CUDA Memory.
But I got a problem when it’s time to run the network.

detections = net.Detect(img, image.shape[1], image.shape[0], opt.overlay)
Exception: jetson.inference – detectNet.Detect() failed to parse args tuple

there is not more informations to explain what is going wrong.

And I’m looking for python documentation of jetson.utils and jetson.inference but I found nothing.

Thank you again for your help !!!

Have a great day

Hi,

You should be able to feed OpenCV image into jetson_inference like this:

The main difference is that there is no overlay argument in net.Detect:

predictions = net.Detect(input_image, input.shape[1], input.shape[0])

Thanks.

Hello, I don’t see the for overlay.
I will try, thank you !

Hello AastaLLL. Your solution work ! Thank you !
but there is another problem…

The performance is worst.
I use ssd-mobilenet-v2.
before detection I resize the image to 300x300.
But the detection take 115ms. ( 8,6 fps)
In nVidia benchmark ssd-mobilenet-v2 300x300 is given to 39fps

To estimate the time I use

start = time.time()
Detections = net.detect(…)
print((time.time()- start) *1000
It is aprox 115ms.

Thank you !

Hi @Charly,

Use net.PrintProfilerTimes() for a more accurate breakdown of the timing information. net.Detect() also performs image overlay, which you can disable by passing overlay='none' option to net.Detect().

Also, you may need to run sudo nvpmodel -m 0 and sudo jetson_clocks beforehand, since you are only processing one image and the processor clocks don’t have time to spin up. So these commands will set them to the maximum.

The 39FPS benchmark was with the 37-class Oxford PETS dataset, not the 90-class MS COCO model. But you should still see 20-25FPS with MS COCO SSD-Mobilenet-v2 model on Jetson Nano.

Hello @dusty_nv.
Thank you for your quick reponse.

I will try your proposition.
In facts I got 10 ip cams. I want to read one ilage of each. And detect if there is a perwon in one of the images. So, can I put a batch of images to net.Detect()?

I Just want to detect persons, maybe cars. Which pre-trained model do you recommanded ? Maybe train a custom model ?

I’m desperately looking for documentation for the python jetson library, maybe with a documentation I will reach my goal without help !

Thank you so much. I very much appreciate your help.

jetson-inference doesn’t do batched processing. If you have 10 IP cameras, I would recommend looking into DeepStream, it is ideal for multi-camera video analytics.

There are some pre-trained models like this from the Transfer Learning Toolkit: https://developer.nvidia.com/transfer-learning-toolkit
They are compatible with DeepStream.

The jetson-inference documentation is from the tutorial, and the Python API is here: Python: package jetson.inference

Thank you @dusty_nv, the documentation is exactly what I was looking for. Sorry i didn’t found it alone

the performance still far away from 20-25 FPS.

So I will give you some informations :

sudo nvpmodel -q
NVPM WARN: fan mode is not set!
NV Power Mode: MAXN
0
sudo jetson_clocks --show
SOC family:tegra210  Machine:NVIDIA Jetson Nano Developer Kit
Online CPUs: 0-3
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0 
cpu1: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0 
cpu2: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0 
cpu3: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0 
GPU MinFreq=921600000 MaxFreq=921600000 CurrentFreq=921600000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MAXN
detectnet-camera.py modified
#!/usr/bin/python
import time
import cv2
from imutils.video import VideoStream
import numpy as np
import jetson.inference
import jetson.utils

import argparse
import sys


parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.", 
						   formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage())

parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load (see below for options)")
parser.add_argument("--overlay", type=str, default="box,labels,conf", help="detection overlay flags (e.g. --overlay=box,labels,conf)\nvalid combinations are:  'box', 'labels', 'conf', 'none'")
parser.add_argument("--threshold", type=float, default=0.5, help="minimum detection threshold to use") 
parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (e.g. CSI camera 0)\nor for VL42 cameras, the /dev/video device to use.\nby default, MIPI CSI camera 0 will be used.")
parser.add_argument("--width", type=int, default=1280, help="desired width of camera stream (default is 1280 pixels)")
parser.add_argument("--height", type=int, default=720, help="desired height of camera stream (default is 720 pixels)")

try:
	opt = parser.parse_known_args()[0]
except:
	print("")
	parser.print_help()
	sys.exit(0)

net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold)

#camera =  VideoStream("rtsp://admin:xx@192.168.0.13:554//h264Preview_01_sub").start()
camera =  VideoStream("rtmp://192.168.0.13/bcs/channel0_sub.bcs?channel=0&stream=1&user=admin&password=xx").start()
display = jetson.utils.glDisplay()
time.sleep(1)

    while True:
    	image = camera.read()
    	image = cv2.resize(image, (300, 300))
    	img = cv2.cvtColor(image, cv2.COLOR_BGR2RGBA).astype(np.float16)
    	img = jetson.utils.cudaFromNumpy(img)
    	detections = net.Detect(img, image.shape[1], image.shape[0], False)

    	print("detected {:d} objects in image".format(len(detections)))

    	for detection in detections:
    		print(detection)

    	display.RenderOnce(img, 300, 300)

    	display.SetTitle("{:s} | Network {:.0f} ms".format(opt.network, net.GetNetworkTime()))

    	net.PrintProfilerTimes()

and the result :

Terminal output
charly@nano:~/Projects$  cd /home/charly/Projects ; env /usr/bin/python3 /home/charly/.vscode-oss/extensions/ms-python.python-2020.6.88468/pythonFiles/lib/python/debugpy/launcher 38117 -- /home/charly/Projects/video-surv/jetson/camera.py 
jetson.inference.__init__.py
jetson.inference -- initializing Python 3.6 bindings...
jetson.inference -- registering module types...
jetson.inference -- done registering module types
jetson.inference -- done Python 3.6 binding initialization
jetson.utils.__init__.py
jetson.utils -- initializing Python 3.6 bindings...
jetson.utils -- registering module functions...
jetson.utils -- done registering module functions
jetson.utils -- registering module types...
jetson.utils -- done registering module types
jetson.utils -- done Python 3.6 binding initialization
jetson.inference -- PyTensorNet_New()
jetson.inference -- PyDetectNet_Init()
jetson.inference -- detectNet loading network using argv command line params
jetson.inference -- detectNet.__init__() argv[0] = '/home/charly/Projects/video-surv/jetson/camera.py'

detectNet -- loading detection network model from:
          -- prototxt     networks/ped-100/deploy.prototxt
          -- model        networks/ped-100/snapshot_iter_70800.caffemodel
          -- input_blob   'data'
          -- output_cvg   'coverage'
          -- output_bbox  'bboxes'
          -- mean_pixel   0.000000
          -- mean_binary  NULL
          -- class_labels networks/ped-100/class_labels.txt
          -- threshold    0.500000
          -- batch_size   1

[TRT]   TensorRT version 7.1.0
[TRT]   loading NVIDIA plugins...
[TRT]   Plugin creator registration succeeded - ::GridAnchor_TRT
[TRT]   Plugin creator registration succeeded - ::NMS_TRT
[TRT]   Plugin creator registration succeeded - ::Reorg_TRT
[TRT]   Plugin creator registration succeeded - ::Region_TRT
[TRT]   Plugin creator registration succeeded - ::Clip_TRT
[TRT]   Plugin creator registration succeeded - ::LReLU_TRT
[TRT]   Plugin creator registration succeeded - ::PriorBox_TRT
[TRT]   Plugin creator registration succeeded - ::Normalize_TRT
[TRT]   Plugin creator registration succeeded - ::RPROI_TRT
[TRT]   Plugin creator registration succeeded - ::BatchedNMS_TRT
[TRT]   Could not register plugin creator:  ::FlattenConcat_TRT
[TRT]   Plugin creator registration succeeded - ::CropAndResize
[TRT]   Plugin creator registration succeeded - ::DetectionLayer_TRT
[TRT]   Plugin creator registration succeeded - ::Proposal
[TRT]   Plugin creator registration succeeded - ::ProposalLayer_TRT
[TRT]   Plugin creator registration succeeded - ::PyramidROIAlign_TRT
[TRT]   Plugin creator registration succeeded - ::ResizeNearest_TRT
[TRT]   Plugin creator registration succeeded - ::Split
[TRT]   Plugin creator registration succeeded - ::SpecialSlice_TRT
[TRT]   Plugin creator registration succeeded - ::InstanceNormalization_TRT
[TRT]   completed loading NVIDIA plugins.
[TRT]   detected model format - caffe  (extension '.caffemodel')
[TRT]   desired precision specified for GPU: FASTEST
[TRT]   requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]   native precisions detected for GPU:  FP32, FP16
[TRT]   selecting fastest native precision for GPU:  FP16
[TRT]   attempting to open engine cache file /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT]   loading network profile from engine cache... /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT]   device GPU, /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel loaded
[TRT]   Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[TRT]   Deserialize required 3905137 microseconds.
[TRT]   device GPU, CUDA engine context initialized with 3 bindings
[TRT]   binding -- index   0
               -- name    'data'
               -- type    FP32
               -- in/out  INPUT
               -- # dims  3
               -- dim #0  3 (SPATIAL)
               -- dim #1  512 (SPATIAL)
               -- dim #2  1024 (SPATIAL)
[TRT]   binding -- index   1
               -- name    'coverage'
               -- type    FP32
               -- in/out  OUTPUT
               -- # dims  3
               -- dim #0  1 (SPATIAL)
               -- dim #1  32 (SPATIAL)
               -- dim #2  64 (SPATIAL)
[TRT]   binding -- index   2
               -- name    'bboxes'
               -- type    FP32
               -- in/out  OUTPUT
               -- # dims  3
               -- dim #0  4 (SPATIAL)
               -- dim #1  32 (SPATIAL)
               -- dim #2  64 (SPATIAL)
[TRT]   binding to input 0 data  binding index:  0
[TRT]   binding to input 0 data  dims (b=1 c=3 h=512 w=1024) size=6291456
[TRT]   binding to output 0 coverage  binding index:  1
[TRT]   binding to output 0 coverage  dims (b=1 c=1 h=32 w=64) size=8192
[TRT]   binding to output 1 bboxes  binding index:  2
[TRT]   binding to output 1 bboxes  dims (b=1 c=4 h=32 w=64) size=32768
device GPU, /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel initialized.
detectNet -- number object classes:   1
detectNet -- maximum bounding boxes:  2048
detectNet -- loaded 1 class info entries
detectNet -- number of object classes:  1
jetson.utils -- PyDisplay_New()
jetson.utils -- PyDisplay_Init()
[OpenGL] glDisplay -- X screen 0 resolution:  1920x1080
[OpenGL] glDisplay -- display device initialized
jetson.utils -- cudaFromNumpy()  ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 2 = 4
detected 0 objects in image
[OpenGL]   creating 300x300 texture
[cuda]   registered 1440000 byte openGL texture for interop access (300x300)

[TRT]   ----------------------------------------------
[TRT]   Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT]   ----------------------------------------------
[TRT]   Pre-Process   CPU  0.08500ms  CUDA  2.84552ms
[TRT]   Network       CPU 127.37260ms  CUDA 124.45849ms
[TRT]   Post-Process  CPU  0.33917ms  CUDA  0.49391ms
[TRT]   Total         CPU 127.79677ms  CUDA 127.79791ms
[TRT]   ----------------------------------------------

[TRT]   note -- when processing a single image, run 'sudo jetson_clocks' before
                to disable DVFS for more accurate profiling/timing measurements

jetson.utils -- freeing CUDA mapped memory
jetson.utils -- cudaFromNumpy()  ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 2 = 4
detected 0 objects in image

[TRT]   ----------------------------------------------
[TRT]   Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT]   ----------------------------------------------
[TRT]   Pre-Process   CPU  0.08797ms  CUDA  3.88193ms
[TRT]   Network       CPU 130.04083ms  CUDA 126.09641ms
[TRT]   Post-Process  CPU  0.33975ms  CUDA  0.49729ms
[TRT]   Total         CPU 130.46855ms  CUDA 130.47562ms
[TRT]   ----------------------------------------------

jetson.utils -- freeing CUDA mapped memory
jetson.utils -- cudaFromNumpy()  ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 2 = 4
detected 0 objects in image

[TRT]   ----------------------------------------------
[TRT]   Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT]   ----------------------------------------------
[TRT]   Pre-Process   CPU  0.07651ms  CUDA  2.88812ms
[TRT]   Network       CPU 130.97610ms  CUDA 128.01839ms
[TRT]   Post-Process  CPU  0.32193ms  CUDA  0.47964ms
[TRT]   Total         CPU 131.37454ms  CUDA 131.38614ms
[TRT]   ----------------------------------------------

jetson.utils -- freeing CUDA mapped memory
jetson.utils -- cudaFromNumpy()  ndarray dim 0 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 1 = 300
jetson.utils -- cudaFromNumpy()  ndarray dim 2 = 4
detected 0 objects in image

[TRT]   ----------------------------------------------
[TRT]   Timing Report /usr/local/bin/networks/ped-100/snapshot_iter_70800.caffemodel
[TRT]   ----------------------------------------------
[TRT]   Pre-Process   CPU  0.08698ms  CUDA  4.26156ms
[TRT]   Network       CPU 132.05255ms  CUDA 127.73360ms
[TRT]   Post-Process  CPU  0.31933ms  CUDA  0.31958ms
[TRT]   Total         CPU 132.45886ms  CUDA 132.31474ms
[TRT]   ----------------------------------------------

One more detail, i’m running the jetson nano on a tv screen. I don’t think this matter
I will try to take a look to DeepStream but, for the moment I want to make it work. To learn from the beginning.
Convert the image input in float16 or 32 didn’t change anything

I want to reach 20-25 FPS to be abble to get analysis of two images par secondes for each camera. To reduce the chances which some one get through without get detected

thank you for your time helping me.

I see, it is actually loading the pednet model which is slower than SSD-Mobilenet. pednet is based on an older neural network architecture.

Can you try running your script with --model=ssd-mobilenet-v2 option?

When you run it, beneath detectNet -- loading detection network model from: in the log, make sure it is loading ssd-mobilenet-v2. For some reason, it appears your extra command line settings at the beginning (debugpy/launcher, ect) may be confusing my parsing.

Hello @dusty_nv

You were right.
The default model is ssd-mobilenet-v2.
I see pednet in the logs and I did not realize the wrong model was loaded.

The whole process takes 40ms. 25 FPS.

Thank you for your help. I hope you don’t waste too much time !

I will go further with pre-trained models with Transfert Learning Toolkit for person detection.

Have a good day, and thanks again !

No problem, glad you got it working!