HTTP image object detection for HikVision DVRs

Good day

I’ve written a small python application that uses the HTTP images from my HikVision DVR’s 8x 1080p channels to do object detection.

The basic flow is:

  • Get images from all 8 channels asynchronously and convert them to numpy arrays using cudaFromNumpy and then store in memory for the detection routine
  • Run the 8 frames through detectnet using SSD-Mobilenet-v2
  • Then ultimately start a recording on a channel if a person is detected(this still needs implementation)

Please find a link to my github repo JetsonSecVision

Currently this whole process takes about 2 - 3 seconds for the whole routine(grab frames then detect), writing to disk implies some time penalty too when a person is detected.

My questions would be, is there a quicker way to achieve similar functionality?

  1. Perhaps if I take every nth RTSP frame from all 8 channels add that to a queue and then process?

    • This would seem like the most complex proposition, since my test have shown RTSP feeds not being very stable in the long term, I could be wrong.
  2. Would re-training the model to only detect persons also improve the network speed and thereby increasing overall speed?

    • This seems like an option to explore regardless since the model does give a bunch of false positives.

Or perhaps someone has a better suggestion I could explore?

Thanks in advance
Ohan Smit

My silly implementation has gained a moderate speed increase by just moving the ClientSession to the main loop…

it now takes 1.1s for the whole routine :D


I’ve added yolov4 support via tensorRT.

Takes 2.6 seconds for the run, but is a lot more accurate than the jetson-inference with ssd-mobilenet-v2.

Im okay with person detection <3 seconds with full yolov4 :)

The detection takes only 1.75s so for 8 images thats about 4.5FPS.


Can you please tell me if the power of the Jetson Nano is enough to detect at least a person/cat/dog/car from 7 cameras MJPEG stream 1080p@5fps online?
Or will there be big delays?

you could try that with tensorrt_demos and perhaps yolov4-tiny.

7 cameras @ 5fps is 35FPS.

The full yolov4-416 currently does 4.76fps consistently, so you would need to use another model like yolov4-tiny-288 (FP16) @ 36.6FPS to achieve this goal.


1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.