Real time parking and traffic management

Hello all,

During Onboarding Q&A session it was recommended that we use forum to explain what do we want to develop so that we could get feedback from developers working on a similar project. So, I will use the opportunity to describe our project.

Proof of concept is required where people and vehicles are needed to be detected in video stream from existing cameras. Detection must include stationary and moving objects. From their location in video stream, we will calculate their geo location and display each detected object as an icon on a map. In short, proof of concept is a groundwork for city AI parking and traffic management.

Our company strong point is location retrieval from any type of video stream and video camera. For this we use our own lens and camera distortion model. Here is the video of a plugin that showcase our technology implemented in Milestone XProtect video surveillance system. Recording is from one of our online meetings. There are couple of features to notice here that are unique to our technology. First is the usage of jpeg image as a map to control PTZ camera direction. Second one is the usage of live video stream to control PTZ camera direction. Moving a mouse over live video stream directs PTZ camera in that location. Two live streams are used in this example, to control one PTZ camera. And the last one is the AI integration. When vehicles are detected in video streams they geo location is calculated and they are displayed on map as icons. When person is detected (in this case me) icon is also displayed on a map and PTZ camera and is activated for automatic tracking.

My idea is to use NVIDIA Metropolis service on Jetson, where several video streams will be used as inputs and where Intelligent Video Analysis will be used for people and vehicle detection in video streams. From detected locations in pixels we will calculate object geo locations and stream that as metadata to end used (cloud or in house server app). Based upon this, real time data, server or cloud app will do it’s thing (tracking, heat maps, etc).

Feel free to contact me for any questions and if interested www.3visiond.com has some more info.

Best regards,

Hrvoje Bilić

Thanks for your sharing. Do you want multiple camera tracking on Jetson or dGPU on server? We have MTMC for dGPU currently: NVIDIA Multi-Camera Tracking AI Workflow

Hello Kesong,
we want to use Jetson for detection only. Just like in multicamera tracking example here (Tracking Objects Across Multiple Cameras Made Easy with Metropolis Microservices | NVIDIA On-Demand). Our system architecture is almost the same like the one from example in video, but we do one thing differently.
In the area of mapping the pixels with a physical world we are actually calculating the location of detected object in cm in respect to camera mounting location. This enables us to calculate detected object width and height in cm.
Furthermore, outdoor applications are requiring 3D space to be embedded with video stream. Our background is in entertainment industry where we used ultrawide angle cameras to track the stage performer with moving heads. Most of the times there are stairs on the stage and those stairs are seen in video stream when we do a tracking. Once stage performer starts to walk on the stairs tracking height must be adjusted automatically. This means that vision system must be able to combine 3D data over live video stream. We have managed to invent new camera model that enable us to overlay 3D data over video stream.
Why am I talking about 3D? It seems to me that embedding 3D data of detected object is the right way to go. For example, ONVIF standard supports geo location for a number of years now, and only recently, new cameras are arriving on the market with the possibility to send metadata with geo location tags. One can assume that, in future, video cameras or some other device/service will export location data for each detected object either in a form of cm or in a form of lat and long. So why not embed that data for right now. If the data are there, it will be used, if data is not there, it will not be used.
Best regards,
Hrvoje

Thanks for your sharing. Seems your use case need 3D object detection. I am woundering how camera get the geo? Maybe there is model to predict it within the camera.

No, we are calculating geo from pixels. It this video surveillance example old discontinued Axis model 3707-PE with no AI is used, so calculation is done on a PC side.
I can write a book on this topic, but in short, I have invented a system how to transform real world video camera into pinhole camera model. Any camera, panoramic, or camera with ultra wide angle lenses, ever thermal cameras can be transform to pinhole camera model. Afterwards it is all about trigonometry and geometry. No chessboard and intrinsic and extrinsic camera parameters are used and we don’t even “undistort” the image. We undistort only the point of interest, for example detected person bounding box bottom middle point is the location of persons feet, we just undistort that point.
Video posted here gives the example on how do we calculate geo coordinates. In 3:13 I am tracking a person’s feet with mouse pointer over camera live video feed from Axis 3707. Notice that image is distorted (line over the garage doors shows the level of distortion). Mouse pointer location in pixels is used to calculate real world 3D coordinate of a lady feet, in reference to camera mounting position. Since video camera geo location is known, geo location of the person is easily calculated. So, first we calculate the 3D location of the person from its location in video stream in centimetres, then we calculate its geo location, and then, since we know geo location of PTZ camera we are calculating the amount of pan, tilt and zoom to direct PTZ camera in person location.
I have invented this system simply because it was not possible to use chessboard method for camera calibration in our application that was tracking the stage performer with moving heads. This system have enabled us to use GoPro Max Lens Mode for stage tracking and to achieve tracking accuracy more than 99%. There are couple of video on our page https://visionspot.eu/ that shows how the same camera model is working for stage tracking.
We can use any type of camera, transfer it to pinhole camera model, and used for tracking or distance measuring, or whatever…

Best regards,
Hrvoje