Hello AI World - new object detection training and video interfaces

Hi everyone,

Happy to have just merged a significant set of updates and new features to jetson-inference master:

note: API changes from this update are intended to be backwards-compatible, so previous code should still run.

The same code can now run from images/video/cameras and encoded network streams (RTP/RTSP):

import jetson.inference
import jetson.utils

import argparse
import sys

# parse the command line
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.")

parser.add_argument("input_URI", type=str, default="", nargs='?', help="URI of the input stream")
parser.add_argument("output_URI", type=str, default="", nargs='?', help="URI of the output stream")
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load (see below for options)")

	opt = parser.parse_known_args()[0]

# load the object detection network
net = jetson.inference.detectNet(opt.network, sys.argv)

# create video sources & outputs
input = jetson.utils.videoSource(opt.input_URI, argv=sys.argv)
output = jetson.utils.videoOutput(opt.output_URI, argv=sys.argv)

# process frames until EOS or the user exits
while True:
	img = input.Capture()
	detections = net.Detect(img)
	output.SetStatus("{:s} | Network {:.0f} FPS".format(opt.network, net.GetNetworkFPS()))

	if not input.IsStreaming() or not output.IsStreaming():

For more info, see these new pages from the repo:

Thanks to everyone from the forums and GitHub who helped to test these updates in advance!

Hi dusty_nv

Thx for this great tutorial which worked for me almost fine using Jetson Nano.
I trained mobile_ssd network successfully on some classes of typical household items and when I feed pictures into the re-trained model, they are mostly interpreted correctly However I face an issue in the end when trying to run the re-trained model using live feed from my USB cam. When I run the command:

detectnet --model=models/household_two/ssd-mobilenet.onnx --labels=models/household_two/labels.txt --v4l2:///dev/video0 --input-blob=input_0 --output-cvg=scores --output-bbox=boxes

the following errors appear

URI -- invalid resource or file path:  blob=input_0
[video]  videoOptions -- failed to parse input resource URI (blob=input_0)
[video]  videoSource -- failed to parse command line options
detectnet:  failed to create input stream

However, when running

video-viewer v4l2:///dev/video0

my USB video stream pops up normally. My version of TensorRT is 7.1.3

I am happy for any hint and I hope this topic was not brought up somewhere else already - then I missed that in my search

All the best

Hi @m_fy, I think it’s confused by the --v4l2 flag (which isn’t an actual flag, it shouldn’t have the -- in front).

Also the camera device is expected to be a positional argument (not flag), so it should be listed after all of the position arguments. Like this:

detectnet --model=models/household_two/ssd-mobilenet.onnx --labels=models/household_two/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes v4l2:///dev/video0

Hi @dusty_nv, thx for your fast reply.

with your hint, the model starts working.
However, the live feed is not yet coming, seems there are some issues with the argument cvg=scores

URI – invalid resource or file path: cvg=scores
[video] videoOptions – failed to parse output resource URI (cvg=scores)
[video] videoOutput – failed to parse command line options
detectnet: failed to create output stream

May I ask for a hint here again ?

Try adding a display://0 to the end of your command line:

I also just patched this command-line parsing error in the code (where you see URI – invalid resource or file path: cvg=scores). So if you run these commands, it should be fixed (without having to specify display://0)

$ cd jetson-inference
$ git pull origin master
$ cd build
$ cmake ../
$ make
$ sudo make install

Dear @dusty_nv ,


works like a charm.

Thx again for this fast and good support

Dear @dusty_nv

Found today again some time to go run through your tutorials. So far I was already able to do a lot of cool stuff using the hints given.

But with the tutorial https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-collect-detection.md I have some issues.
While capturing the images works fine, I receive the following error when starting the training:

File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 99, in __call__ boxes[:, 0] /= width IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

To me, the data as well as the linkage to mobilenet_v1_ssd seem to work,

2020-07-31 21:14:52 - Prepare training datasets. 2020-07-31 21:14:52 - VOC Labels read from file: ('BACKGROUND', 'cat litter') 2020-07-31 21:14:52 - Stored labels into file models/debris/labels.txt. 2020-07-31 21:14:52 - Train dataset size: 9 2020-07-31 21:14:52 - Prepare Validation datasets. 2020-07-31 21:14:52 - VOC Labels read from file: ('BACKGROUND', 'cat litter') 2020-07-31 21:14:52 - Validation dataset size: 8 2020-07-31 21:14:52 - Build network. 2020-07-31 21:14:53 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth 2020-07-31 21:14:53 - Took 0.50 seconds to load the model. 2020-07-31 21:15:06 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01. 2020-07-31 21:15:06 - Uses CosineAnnealingLR scheduler. 2020-07-31 21:15:06 - Start training from epoch 0.

Therefore I would be glad for a hint how to proceed. Googling this error leads me to some potential work-arounds but I would like to know before if the error is due to a wrong input or a bug in transforms.py.

Hi @M_Fy, can you provide the command-line that you used to run train_ssd.py, and also the contents of your datasets labels.txt file?

You might also want to add a print(boxes) and print(len(boxes)) at this line of transforms.py to help debug:

It might be there the bounding box data got malformed or there is too few or zero bounding boxes for some classes.

Hi @dusty_nv

I used the following command

aimobile@aimobile:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=/home/aimobile/datasets/data_debris --model-dir=models/debris

And since I did already several trials but all failed with the same error, my latest labels.txt just contains one class
However, BACKGROUND was added as expected when starting python3 train_ssd.py

cat litter

Dear @dusty_nv

seems I have a bounding box with zero inside during trial 1

  mode = random.choice(self.sample_options)
[[ 994.  419. 1043.  474.]
 [1399.  357. 1476.  405.]
 [1230.  341. 1305.  411.]
 [ 908.  312.  962.  368.]]
[[136. 448. 182. 495.]
 [188. 456. 258. 528.]
 [150. 387. 216. 443.]
 [ 64. 296. 117. 354.]
 [434. 392. 566. 484.]
 [588. 284. 674. 378.]]
[[ 87.   0. 128.  37.]]
[[ 97. 154. 218. 242.]]
[[19. 58. 58. 97.]]

Executing the python3 train_ssd.py another time, I receive the following output

  mode = random.choice(self.sample_options)
[[1717.  750. 1791.  812.]
 [1644.  773. 1692.  829.]
 [1520.  630. 1582.  702.]]
[[ 449. 1557.  516. 1611.]
 [ 621. 1423.  695. 1496.]
 [ 198. 1601.  279. 1664.]
 [ 156. 1530.  243. 1591.]
 [ 153. 1589.  204. 1643.]]
[[2366.  571. 2415.  626.]
 [1933.  509. 2010.  557.]
 [2104.  493. 2179.  563.]
 [2447.  464. 2501.  520.]]
[[557.   4. 629.  69.]]
[[118. 480. 220. 548.]
 [509. 241. 554. 296.]
 [540. 311. 576. 345.]
 [627. 272. 696. 317.]]
/home/aimobile/.local/lib/python3.6/site-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
for i, data in enumerate(loader):
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 971, in _next_data
return self._process_data(data)
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 207, in __getitem__
return self.datasets[dataset_idx][sample_idx]
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 64, in __getitem__
image, boxes, labels = self.transform(image, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
return self.augment(img, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
img, boxes, labels = t(img, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in __call__
boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Hmm…would you be able to send me a tarball of your dataset so I could give it a try next week and debug?

Dear Dusty,

Great tutorial and step-by-step guide you did. Honestly it is a great work.

I am trying to replicate your FPS on Jetson Nano with segnet (fcn_resnet18), but unfortunately its not the best outcome that I have got. You wrote that on Pascal VOC 512x320 fcn-resnet18-voc-512x320 can run on Jetson Nano with 34 FPS using FP16. If I am not wrong it is only 12 fps, see the Timing Report below. I am using JetPack 4.4, pytorch 1.6.0, torchvision 0.7.0 (with the fcn_resnet18 modifications).

I used the clock boost commands:

sudo nvpmodel -m 0
sudo jetson_clocks

Timing Report:

[image] loaded ‘images/object_0.jpg’ (500 x 333, 3 channels)

[TRT] ------------------------------------------------
[TRT] Timing Report networks/FCN-ResNet18-Pascal-VOC-512x320/fcn_resnet18.onnx
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.08245ms CUDA 0.92099ms
[TRT] Network CPU 36.64665ms CUDA 35.67771ms
[TRT] Post-Process CPU 0.53996ms CUDA 0.54005ms
[TRT] Visualize CPU 0.09438ms CUDA 7.76937ms
[TRT] Total CPU 37.36343ms CUDA 44.90812ms
[TRT] ------------------------------------------------

I managed to retrain the fcn_resnet18 segnet with my own dataset (only 3 classes) and got he Timing Report below. The Post-Process step is taking much more, than with the original network.

segNet – loaded 3 class colors
[image] loaded ‘test_images/barcode_0.jpg’ (500 x 281, 3 channels)
[TRT] ------------------------------------------------
[TRT] Timing Report networks/qbar_white_1/qbar.onnx
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.07667ms CUDA 0.45859ms
[TRT] Network CPU 24.04648ms CUDA 23.49146ms
[TRT] Post-Process CPU 45.18449ms CUDA 45.37104ms
[TRT] Visualize CPU 0.05125ms CUDA 5.77688ms
[TRT] Total CPU 69.35889ms CUDA 75.09797ms
[TRT] ------------------------------------------------

I have 2 questions:

  • How can I get the same FPS results with a pre-trained network as in this table ?
  • How can I boost the post-process step up to get a timing like the pre-trained network?

Yours sincerely,


Hi @dusty_nv.

Please find the tarball of my dataset attached. Tried it today on a different Nano but results are still the same

file.tar.gz (1.9 MB)

Best regards


Hi @regard001, for the pre-trained model, you got 27FPS. Also are you only processing one image? If so, the processor clock frequencies may not have had time to fully spin up.

I’m not entirely sure why it would be taking so much longer, except if (a) this timing data was only for one image or (b) the model you trained is of a higher resolution. What is the input resolution of your model?

Also, you might want to try “point” filtering mode instead of “linear” to see if the processing time of that is also increased.

OK thank you - I found the issue, it was that image 20200731-211422 did not have any bounding box annotations.

I have checked in a fix where train_ssd.py will check to make sure each image has >0 annotations (or else the image will be ignored). So if you pull the latest, or delete 20200731-211422 from your ImageSets, it should work. I was able to run the training on it.

Thank your for the answers. I tried with “point” filtering mode, and it makes it a little faster (2-3 fps), for bigger images it has more significance.

I calculated the FPS like this: 1/(Total CPU in seconds + Total GPU in seconds).

The pre-trained network was tested only on one image.

With the re-trained network only one image was processed at a time, but in a sequence. Maybe in batch > 1 it is faster. The input resolution of my model is 320x320 (when trained I used the --resolution flag, but the default is also 320). I have one more concern, I want to use the model in FP16, but when the model is loaded it says:

[TRT] binding – index 0
– name ‘input_0’
– type FP32
– in/out INPUT

[TRT] binding – index 1
– name ‘output_0’
– type FP32
– in/out OUTPUT

It is possible that the first and the last layer makes the processing slower, because of FP32?

Dear @dusty_nv.
Pulling latest changes worked for me and solved my issues. Thx a lot for the fix.

Best regards

@regard001, the times are not summed, the GPU time is running in parallel at the same time as the CPU. So you can just use the CPU time. The CPU launches GPU kernel, then waits for GPU to finish.

The input and output layers will be FP32, even if the model is using FP16. TensorRT will convert internally so the user doesn’t need to deal with FP16 from the CPU. So it is still using FP16.

I forgot to mention earlier, the filtering is also dependent on the raw size of the image you are processing (for example if your test image is 1024x512, it will take longer to post-process than a 512x256 image). This is because the overlay/mask processing is applied to the original size of the image being processed.