Hello AI World - new object detection training and video interfaces

dusty_nv · July 15, 2020, 8:25pm

Hi everyone,

Happy to have just merged a significant set of updates and new features to jetson-inference master:

Re-training SSD-Mobilenet Object Detection tutorial with PyTorch
Support for collection of object detection datasets and bounding-box labeling in camera-capture tool
videoSource and videoOutput APIs for C++/Python that supports multiple types of video streams:
- MIPI CSI cameras
- V4L2 cameras
- RTP / RTSP
- Videos & Images
- Image sequences
- OpenGL windows
Unified the -console and -camera samples to process both images and video streams
Support for uchar3/uchar4/float3/float4 images (default is now uchar3 as opposed to float4)
Replaced opaque Python memory capsule with jetson.utils.cudaImage object
- See Image Capsules in Python for more info
- Images are now subscriptable/indexable from Python to directly access the pixel dataset
- Numpy ndarray conversion now supports uchar3/uchar4/float3/float4 formats
cudaConvertColor() automated colorspace conversion function (RGB, BGR, YUV, Bayer, grayscale, ect)
Python CUDA bindings for cudaResize(), cudaCrop(), cudaNormalize(), cudaOverlay()
- See Image Manipulation with CUDA and cuda-examples.py for examples of using these
Transitioned to using Python3 by default since Python 2.7 is now past EOL
DIGITS tutorial is now marked as deprecated (replaced by PyTorch transfer learning tutorial)
Logging can now be controlled/disabled from the command line (e.g. --log-level=verbose)

note: API changes from this update are intended to be backwards-compatible, so previous code should still run.

The same code can now run from images/video/cameras and encoded network streams (RTP/RTSP):

import jetson.inference
import jetson.utils

import argparse
import sys

# parse the command line
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.")

parser.add_argument("input_URI", type=str, default="", nargs='?', help="URI of the input stream")
parser.add_argument("output_URI", type=str, default="", nargs='?', help="URI of the output stream")
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load (see below for options)")

try:
	opt = parser.parse_known_args()[0]
except:
	print("")
	parser.print_help()
	sys.exit(0)

# load the object detection network
net = jetson.inference.detectNet(opt.network, sys.argv)

# create video sources & outputs
input = jetson.utils.videoSource(opt.input_URI, argv=sys.argv)
output = jetson.utils.videoOutput(opt.output_URI, argv=sys.argv)

# process frames until EOS or the user exits
while True:
	img = input.Capture()
	detections = net.Detect(img)
	output.Render(img)
	output.SetStatus("{:s} | Network {:.0f} FPS".format(opt.network, net.GetNetworkFPS()))
	net.PrintProfilerTimes()

	if not input.IsStreaming() or not output.IsStreaming():
		break

For more info, see these new pages from the repo:

Thanks to everyone from the forums and GitHub who helped to test these updates in advance!

M_Fy · July 23, 2020, 3:27pm

Hi dusty_nv

Thx for this great tutorial which worked for me almost fine using Jetson Nano.
I trained mobile_ssd network successfully on some classes of typical household items and when I feed pictures into the re-trained model, they are mostly interpreted correctly However I face an issue in the end when trying to run the re-trained model using live feed from my USB cam. When I run the command:

detectnet --model=models/household_two/ssd-mobilenet.onnx --labels=models/household_two/labels.txt --v4l2:///dev/video0 --input-blob=input_0 --output-cvg=scores --output-bbox=boxes

the following errors appear

URI -- invalid resource or file path:  blob=input_0
[video]  videoOptions -- failed to parse input resource URI (blob=input_0)
[video]  videoSource -- failed to parse command line options
detectnet:  failed to create input stream

However, when running

video-viewer v4l2:///dev/video0

my USB video stream pops up normally. My version of TensorRT is 7.1.3

I am happy for any hint and I hope this topic was not brought up somewhere else already - then I missed that in my search

All the best
M_FY

dusty_nv · July 23, 2020, 4:49pm

Hi @m_fy, I think it’s confused by the --v4l2 flag (which isn’t an actual flag, it shouldn’t have the -- in front).

Also the camera device is expected to be a positional argument (not flag), so it should be listed after all of the position arguments. Like this:

detectnet --model=models/household_two/ssd-mobilenet.onnx --labels=models/household_two/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes v4l2:///dev/video0

M_Fy · July 23, 2020, 5:24pm

Hi @dusty_nv, thx for your fast reply.

with your hint, the model starts working.
However, the live feed is not yet coming, seems there are some issues with the argument cvg=scores

URI – invalid resource or file path: cvg=scores
[video] videoOptions – failed to parse output resource URI (cvg=scores)
[video] videoOutput – failed to parse command line options
detectnet: failed to create output stream

May I ask for a hint here again ?

dusty_nv · July 23, 2020, 5:37pm

Try adding a display://0 to the end of your command line:

I also just patched this command-line parsing error in the code (where you see URI – invalid resource or file path: cvg=scores). So if you run these commands, it should be fixed (without having to specify display://0)

$ cd jetson-inference
$ git pull origin master
$ cd build
$ cmake ../
$ make
$ sudo make install

M_Fy · July 23, 2020, 7:35pm

Dear @dusty_nv ,

display://0

works like a charm.

Thx again for this fast and good support

M_Fy · July 31, 2020, 7:32pm

Dear @dusty_nv

Found today again some time to go run through your tutorials. So far I was already able to do a lot of cool stuff using the hints given.

But with the tutorial jetson-inference/pytorch-collect-detection.md at master · dusty-nv/jetson-inference · GitHub I have some issues.
While capturing the images works fine, I receive the following error when starting the training:

File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 99, in __call__ boxes[:, 0] /= width IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

To me, the data as well as the linkage to mobilenet_v1_ssd seem to work,

2020-07-31 21:14:52 - Prepare training datasets. 2020-07-31 21:14:52 - VOC Labels read from file: ('BACKGROUND', 'cat litter') 2020-07-31 21:14:52 - Stored labels into file models/debris/labels.txt. 2020-07-31 21:14:52 - Train dataset size: 9 2020-07-31 21:14:52 - Prepare Validation datasets. 2020-07-31 21:14:52 - VOC Labels read from file: ('BACKGROUND', 'cat litter') 2020-07-31 21:14:52 - Validation dataset size: 8 2020-07-31 21:14:52 - Build network. 2020-07-31 21:14:53 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth 2020-07-31 21:14:53 - Took 0.50 seconds to load the model. 2020-07-31 21:15:06 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01. 2020-07-31 21:15:06 - Uses CosineAnnealingLR scheduler. 2020-07-31 21:15:06 - Start training from epoch 0.

Therefore I would be glad for a hint how to proceed. Googling this error leads me to some potential work-arounds but I would like to know before if the error is due to a wrong input or a bug in transforms.py.

dusty_nv · July 31, 2020, 7:40pm

Hi @M_Fy, can you provide the command-line that you used to run train_ssd.py, and also the contents of your datasets labels.txt file?

You might also want to add a print(boxes) and print(len(boxes)) at this line of transforms.py to help debug:

github.com

dusty-nv/pytorch-ssd/blob/16ed474b556941ce2d40793d6fefea66b4acc89f/vision/transforms/transforms.py#L98


      
                  boxes[:, 0] *= width
                  boxes[:, 2] *= width
                  boxes[:, 1] *= height
                  boxes[:, 3] *= height
          
          
        return image, boxes, labels
          
          

          
class ToPercentCoords(object):
              def __call__(self, image, boxes=None, labels=None):
                  height, width, channels = image.shape
                  boxes[:, 0] /= width
                  boxes[:, 2] /= width
                  boxes[:, 1] /= height
                  boxes[:, 3] /= height
          
          
        return image, boxes, labels
          
          

          
class Resize(object):
              def __init__(self, size=300):

It might be there the bounding box data got malformed or there is too few or zero bounding boxes for some classes.

M_Fy · July 31, 2020, 7:55pm

Hi @dusty_nv

I used the following command

aimobile@aimobile:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=/home/aimobile/datasets/data_debris --model-dir=models/debris

And since I did already several trials but all failed with the same error, my latest labels.txt just contains one class
However, BACKGROUND was added as expected when starting python3 train_ssd.py

cat litter

M_Fy · July 31, 2020, 8:05pm

Dear @dusty_nv

seems I have a bounding box with zero inside during trial 1

  mode = random.choice(self.sample_options)
[[ 994.  419. 1043.  474.]
 [1399.  357. 1476.  405.]
 [1230.  341. 1305.  411.]
 [ 908.  312.  962.  368.]]
4
[[136. 448. 182. 495.]
 [188. 456. 258. 528.]
 [150. 387. 216. 443.]
 [ 64. 296. 117. 354.]
 [434. 392. 566. 484.]
 [588. 284. 674. 378.]]
6
[]
0
[[ 87.   0. 128.  37.]]
1
[[ 97. 154. 218. 242.]]
1
[[19. 58. 58. 97.]]
1

Executing the python3 train_ssd.py another time, I receive the following output

  mode = random.choice(self.sample_options)
[[1717.  750. 1791.  812.]
 [1644.  773. 1692.  829.]
 [1520.  630. 1582.  702.]]
3
[[ 449. 1557.  516. 1611.]
 [ 621. 1423.  695. 1496.]
 [ 198. 1601.  279. 1664.]
 [ 156. 1530.  243. 1591.]
 [ 153. 1589.  204. 1643.]]
5
[[2366.  571. 2415.  626.]
 [1933.  509. 2010.  557.]
 [2104.  493. 2179.  563.]
 [2447.  464. 2501.  520.]]
4
[[557.   4. 629.  69.]]
1
[[118. 480. 220. 548.]
 [509. 241. 554. 296.]
 [540. 311. 576. 345.]
 [627. 272. 696. 317.]]
4
/home/aimobile/.local/lib/python3.6/site-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
for i, data in enumerate(loader):
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 971, in _next_data
return self._process_data(data)
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 207, in __getitem__
return self.datasets[dataset_idx][sample_idx]
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 64, in __getitem__
image, boxes, labels = self.transform(image, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
return self.augment(img, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
img, boxes, labels = t(img, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in __call__
boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

dusty_nv · August 1, 2020, 1:24am

Hmm…would you be able to send me a tarball of your dataset so I could give it a try next week and debug?

regard001 · August 1, 2020, 8:40am

Dear Dusty,

Great tutorial and step-by-step guide you did. Honestly it is a great work.

I am trying to replicate your FPS on Jetson Nano with segnet (fcn_resnet18), but unfortunately its not the best outcome that I have got. You wrote that on Pascal VOC 512x320 fcn-resnet18-voc-512x320 can run on Jetson Nano with 34 FPS using FP16. If I am not wrong it is only 12 fps, see the Timing Report below. I am using JetPack 4.4, pytorch 1.6.0, torchvision 0.7.0 (with the fcn_resnet18 modifications).

I used the clock boost commands:

sudo nvpmodel -m 0
sudo jetson_clocks

Timing Report:

[image] loaded ‘images/object_0.jpg’ (500 x 333, 3 channels)

[TRT] ------------------------------------------------
[TRT] Timing Report networks/FCN-ResNet18-Pascal-VOC-512x320/fcn_resnet18.onnx
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.08245ms CUDA 0.92099ms
[TRT] Network CPU 36.64665ms CUDA 35.67771ms
[TRT] Post-Process CPU 0.53996ms CUDA 0.54005ms
[TRT] Visualize CPU 0.09438ms CUDA 7.76937ms
[TRT] Total CPU 37.36343ms CUDA 44.90812ms
[TRT] ------------------------------------------------

I managed to retrain the fcn_resnet18 segnet with my own dataset (only 3 classes) and got he Timing Report below. The Post-Process step is taking much more, than with the original network.

segNet – loaded 3 class colors
[image] loaded ‘test_images/barcode_0.jpg’ (500 x 281, 3 channels)
[TRT] ------------------------------------------------
[TRT] Timing Report networks/qbar_white_1/qbar.onnx
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.07667ms CUDA 0.45859ms
[TRT] Network CPU 24.04648ms CUDA 23.49146ms
[TRT] Post-Process CPU 45.18449ms CUDA 45.37104ms
[TRT] Visualize CPU 0.05125ms CUDA 5.77688ms
[TRT] Total CPU 69.35889ms CUDA 75.09797ms
[TRT] ------------------------------------------------

I have 2 questions:

How can I get the same FPS results with a pre-trained network as in this table ?
How can I boost the post-process step up to get a timing like the pre-trained network?

Yours sincerely,

regard001

M_Fy · August 1, 2020, 12:47pm

Hi @dusty_nv.

Please find the tarball of my dataset attached. Tried it today on a different Nano but results are still the same

file.tar.gz (1.9 MB)

Best regards

M_Fy

dusty_nv · August 2, 2020, 12:51am

Hi @regard001, for the pre-trained model, you got 27FPS. Also are you only processing one image? If so, the processor clock frequencies may not have had time to fully spin up.

I’m not entirely sure why it would be taking so much longer, except if (a) this timing data was only for one image or (b) the model you trained is of a higher resolution. What is the input resolution of your model?

Also, you might want to try “point” filtering mode instead of “linear” to see if the processing time of that is also increased.

dusty_nv · August 2, 2020, 2:20am

OK thank you - I found the issue, it was that image 20200731-211422 did not have any bounding box annotations.

I have checked in a fix where train_ssd.py will check to make sure each image has >0 annotations (or else the image will be ignored). So if you pull the latest, or delete 20200731-211422 from your ImageSets, it should work. I was able to run the training on it.

regard001 · August 2, 2020, 8:16pm

Thank your for the answers. I tried with “point” filtering mode, and it makes it a little faster (2-3 fps), for bigger images it has more significance.

I calculated the FPS like this: 1/(Total CPU in seconds + Total GPU in seconds).

The pre-trained network was tested only on one image.

With the re-trained network only one image was processed at a time, but in a sequence. Maybe in batch > 1 it is faster. The input resolution of my model is 320x320 (when trained I used the --resolution flag, but the default is also 320). I have one more concern, I want to use the model in FP16, but when the model is loaded it says:

[TRT] binding – index 0
– name ‘input_0’
– type FP32
– in/out INPUT

[TRT] binding – index 1
– name ‘output_0’
– type FP32
– in/out OUTPUT

It is possible that the first and the last layer makes the processing slower, because of FP32?

M_Fy · August 3, 2020, 3:06pm

Dear @dusty_nv.
Pulling latest changes worked for me and solved my issues. Thx a lot for the fix.

Best regards
M_Fy

dusty_nv · August 3, 2020, 5:43pm

@regard001, the times are not summed, the GPU time is running in parallel at the same time as the CPU. So you can just use the CPU time. The CPU launches GPU kernel, then waits for GPU to finish.

The input and output layers will be FP32, even if the model is using FP16. TensorRT will convert internally so the user doesn’t need to deal with FP16 from the CPU. So it is still using FP16.

I forgot to mention earlier, the filtering is also dependent on the raw size of the image you are processing (for example if your test image is 1024x512, it will take longer to post-process than a 512x256 image). This is because the overlay/mask processing is applied to the original size of the image being processed.

M_Fy · August 18, 2020, 7:01pm

Hi @dusty_nv,

i am facing another issue with the following tutorial:

github.com

dusty-nv/jetson-inference/blob/master/docs/pytorch-collect-detection.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="pytorch-plants.md">Back</a> | <a href="../README.md#hello-ai-world">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Transfer Learning - Object Detection</sup></s></p>

# Collecting your own Detection Datasets

The previously used `camera-capture` tool can also label object detection datasets from live video:

<img src="https://github.com/dusty-nv/jetson-inference/raw/dev/docs/images/pytorch-collection-detect.jpg" >

When the `Dataset Type` drop-down is in Detection mode, the tool creates datasets in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) format (which is supported during training).

> **note:** if you want to label a set of images that you already have (as opposed to capturing them from camera), try using a tool like [`CVAT`](https://github.com/openvinotoolkit/cvat) and export the dataset in Pascal VOC format.  Then create a labels.txt in the dataset with the names of each of your object classes.

## Creating the Label File

Under `jetson-inference/python/training/detection/ssd/data`, create an empty directory for storing your dataset and a text file that will define the class labels (usually called `labels.txt`).  The label file contains one class label per line, for example:

``` bash

This file has been truncated. show original

While image collection, labeling and training works fine, my video output streams always stops with a segmentation fault error, when I introduce an item that looks similar to my trained object.

Segmentation fault (core dumped)

I do not think it is a memory issue since I can run pre-trained networks with images from e.g. here Open Images Dataset V6 quite well.

Any idea to support me ?

Best regards

M_Fy

Topic		Replies	Views
Hello AI World - now supports Python and onboard training with PyTorch! Jetson Nano	95	7725	July 18, 2022
What almost everyone with a nano is looking for Jetson Nano	65	6213	October 15, 2021
Imagenet-camera.cpp libraries not working on jetson nano Jetson Nano	40	4472	October 14, 2021
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1233	April 21, 2022
Jetson NANO and USB 5.8G UVC Camera Receiver Jetson Nano	12	1503	October 14, 2021
Gstreamer reports Raspberry Pi camera streaming at 120fps when in reality it is only 60fps Jetson Nano camera , gstreamer	53	3825	July 19, 2022
Jetson Nano Object Detection C/C++ Example Jetson Nano	23	7750	October 14, 2021
Deep Learning Inference Benchmarking Instructions Jetson Nano	134	47509	May 30, 2023
Dusty-nv jetson training custom data sets generating labels Jetson Nano ai-training	27	4403	October 15, 2021
Train custom object detectio model Jetson Nano ai-training	12	3019	October 18, 2021

Hello AI World - new object detection training and video interfaces

Related topics