Hello AI World - now supports Python and onboard training with PyTorch!

Hi all, just merged a large set of updates and new features into jetson-inference master:

  • Python API support for imageNet, detectNet, and camera/display utilities
  • Python examples for processing static images and live camera streaming
  • Support for interacting with numpy ndarrays from CUDA
  • Onboard re-training of ResNet-18 models with PyTorch
  • Example datasets: 800MB Cat/Dog and 1.5GB PlantCLEF
  • Camera-based tool for collecting and labeling custom datasets
  • Text UI tool for selecting/downloading pre-trained models
  • New pre-trained image classification models (on 1000-class ImageNet ILSVRC)
    • ResNet-18, ResNet-50, ResNet-101, ResNet-152
    • VGG-16, VGG-19
    • Inception-v4
  • New pre-trained object detection models (on 90-class MS-COCO)
    • SSD-Mobilenet-v1
    • SSD-Mobilenet-v2
    • SSD-Inception-v2
  • API Reference documentation for C++ and Python
  • Command line usage info for all examples, run with --help
  • Output of network profiler times, including pre/post-processing
  • Improved font rasterization using system TTF fonts

Screencast video - Realtime Object Detection in 10 Lines of Python Code on Jetson Nano

https://www.youtube.com/watch?v=bcM5AQSAzUY

Here’s an object detection example in 10 lines of Python code using SSD-Mobilenet-v2 (90-class MS-COCO) with TensorRT, which runs at 25FPS on Jetson Nano on a live camera stream with OpenGL visualization:

import jetson.inference
import jetson.utils

net = jetson.inference.detectNet("ssd-mobilenet-v2")
camera = jetson.utils.gstCamera()
display = jetson.utils.glDisplay()

while display.IsOpen():
	img, width, height = camera.CaptureRGBA()
	detections = net.Detect(img, width, height)
	display.RenderOnce(img, width, height)
	display.SetTitle("Object Detection | Network {:.0f} FPS".format(1000.0 / net.GetNetworkTime()))

Thanks to all the beta testers of the new features from here on the forums!

Project Link…https://github.com/dusty-nv/jetson-inference/
Model Mirror…https://github.com/dusty-nv/jetson-inference/releases

Just unpacked it. Just got one thing to say. I want it all ------DAMN!!!

Hi,

Yay Python!

I have run into a minor snag. I choose to install just PyTorch v1.1.0 for Python 3.6 when I was doing the initial install. On the next step when I tried to run the imagenet_console.py I got an error telling me that a python 2.7 .h file was missing. (I didn’t grab the full error message sorry). I then ran the C++ example without a problem before going back to the PyTorch installer and selected the missing package.
I again ran the following
cd jetson-inference/build ./install-pytorch.sh
make sudo make install

and then tested it out as follows

$ ./imagenet-console.py --network=googlenet orange_0.jpg output_0.jpg
jetson.inference.init.py
Traceback (most recent call last):
File “./imagenet-console.py”, line 24, in
import jetson.inference
File “/usr/lib/python2.7/dist-packages/jetson/inference/init.py”, line 4, in
from jetson_inference_python import *
ImportError: libjetson-utils.so: cannot open shared object file: No such file or directory

Should I set up a new image and start over selecting both Python packages first time through?

Any input greatly appreciated :)

Thanks!

Hi, can you try running ‘sudo make install’ and then ‘sudo ldconfig’? Thanks.

That worked a treat!

Thanks!

import jetson.inference
import jetson.utils

net = jetson.inference.detectNet("ssd-mobilenet-v2")
camera = jetson.utils.gstCamera()
display = jetson.utils.glDisplay()

while display.IsOpen():
	img, width, height = camera.CaptureRGBA()
	detections = net.Detect(img, width, height)
	display.RenderOnce(img, width, height)
	display.SetTitle("Object Detection | Network {:.0f} FPS".format(1000.0 / net.GetNetworkTime()))

display.SetTitle doesn’t show “Object Detection | Network {:.0f} FPS”.format(1000.0 / net.GetNetworkTime())".
Should this text be printed in the image and how can I change the text color?

Hmm that is strange, that text shows up in the window’s title bar for me.

You can see example of text rendering in imagenet-camera.py. To change the color, modify the font.White argument to one of these from here (or pass in your own RGB tuple): https://rawgit.com/dusty-nv/jetson-inference/python/docs/html/python/jetson.utils.html#cudaFont

Hi,
I know what it was. My host is a Windows laptop.
I started the program with the Python code directly from PuTTy and then it gave only the image with the object detection in the VNC viewer because it doesn’t know the Microsoft title bar.
But I had to start a desktop like xfce4 from PuTTy and then from the desktop in the VNC viewer start an XTerm console and start the program there. And then it shows up with its own title bar.
Works with 20FPS.

Hi Dusty-NV.

I have carefully followed your instruction and finally everything looks fine except the below.

  1. I cant change camera source default CSI camera to USB camera(Logitech C920) or Video file in the provided sample codes.

  2. I tried to change --camera flag to /dev/video0 But no luck.

if you want I can post the error details here.

highly appreciated your help on this.

thank you

Hi Saddam, did you try using /dev/video1?

Can you post the error and also the output of this:

$ sudo apt-get install v4l-utils
$ v4l2-ctl --list-formats-ext

Well,
after some tests I need to say that FPS shown are not true…

Code should be like this to get real FPS:

import jetson.inference
import jetson.utils
import time

net = jetson.inference.detectNet("ssd-mobilenet-v2")
camera = jetson.utils.gstCamera()
display = jetson.utils.glDisplay()

while display.IsOpen():
	<b>prev_time = time.time()</b>

	img, width, height = camera.CaptureRGBA()
	detections = net.Detect(img, width, height)
	display.RenderOnce(img, width, height)

	<b>frame_rate = round(1 / (time.time() - prev_time), 2)</b>
	display.SetTitle("Object Detection | Network {:.0f} FPS"<b>+str(frame_rate)</b>)

In fact allocate image to CUDA memory takes 40-70ms and for this reason real FPS are 12-14.

Hi simone.rinaldi, there shouldn’t be CUDA memory being allocated during the main loop, as it should all be pre-allocated, however I will look into it to make sure. As indicated in the status bar text and terminal output, the framerate given is for the network time - depending on your camera the global framerate may be lower. The visualization code that draws the bounding boxes and renders the image adds overhead, as the device can typically be deployed to headless systems without display, which is provided primarily for testing purposes.

I’m using a Logitech C270 that has video output at 720p30fps.
Anyway I was able to use net.Detect with an IP camera, as shown here:
https://devtalk.nvidia.com/default/topic/1063275/jetson-nano/-python-how-to-convert-opencv-frame-to-cuda-memory/
And result is always the same, in fact conversion from numpy to cuda takes 40-70ms dropping down frames.

I understand what you say: “the framerate given is for the network time…” but network requires data formatted in particular way so preparation of data cannot be considered separately from network execution time.

PS: IP camera is a Dahua IPC-HFW1431S 4K25fps configured to 1080p25fps

Using IP camera has extra overhead for networking and depacketization, and in the case of this compressed camera, decoding and going through OpenCV. The example numpy to CUDA routine wasn’t intended for realtime use, as the incoming numpy array can be of arbitrary dimensions and format and requires extra data conversion. If you want a path with less overhead you should eliminate use of OpenCV, which suboptimally copies the memory and stores it in numpy array. You can allocate the CUDA memory from python with the jetson.utils.cudaAllocMapped() function.

See my reply to your other thread about modifying the gstCamera pipeline so the video doesn’t need to go through OpenCV and numpy: https://devtalk.nvidia.com/default/topic/1063275/jetson-nano/-python-how-to-convert-opencv-frame-to-cuda-memory/post/5384856/#5384856

Pre-processing the data in CUDA to the planar NCHW format that the DNN expects does not take that long, on average around 0.5 milliseconds. You can see this in the Timing Report in the console - what is taking you the extra time is the use of OpenCV capture, which stores the image in CPU numpy array, and the subsequent numpy conversion. You can also set the camera to a lower resolution because the object detection DNN downsamples it.

OK folks, the pytorch dev branch with new segmentation models and Python bindings for segNet have been merged into master.

The docs have been updated, see here:

https://github.com/dusty-nv/jetson-inference/blob/master/docs/segnet-console-2.md
https://github.com/dusty-nv/jetson-inference/blob/master/docs/segnet-camera-2.md

Let me know if you encounter any issues with using the updated master branch, thanks.

I’m having issues running the live camera output when working with the Hello AI World exercises on JupyterLab. I am running the commands through the terminal launcher that the github pages say to run through the Ubuntu -> right click -> open terminal area. It works perfectly on Ubuntu, outputting the live camera object detection and segmentation exercises, but cannot seem to get this same live camera output on JupyterLab.

I’ve not tried these through JupyterLab - the camera apps in Hello AI World create an OpenGL display on the Jetson. Do you have a display directly connected to your Nano, or are you trying to view them remotely over the network (headless)?

I’ve tried it on a monitor display where it works perfectly on the OpenGL display it creates on the Ubuntu OS. What I’m trying to see is if a similar output is able to be generated on the headless mode. Thanks for the reply and help!

Hi lramos13, it isn’t supported by the project to view the OpenGL video headlessly with SSH forwarding. Even if it were to work, it would display the video very slowly. Such an approach would typically use video compression and RTP/RTSP streaming. There is a gstEncoder class included with jetson-utils that works with RTP, but admittedly I have not used it for that in some time.

I have a really basic question- I have a pretrained network that expects a 224x224 image, but I can’t use it until I figure out how to crop and resize the 1280x720 camera image to the dimensions that the network is expecting. I’ve been searching through the docs but there is no information on how to prepare the input image or why the aspect ratio doesn’t seem to matter.