Hello AI World - new object detection training and video interfaces

Dear @dusty_nv

seems I have a bounding box with zero inside during trial 1

  mode = random.choice(self.sample_options)
[[ 994.  419. 1043.  474.]
 [1399.  357. 1476.  405.]
 [1230.  341. 1305.  411.]
 [ 908.  312.  962.  368.]]
4
[[136. 448. 182. 495.]
 [188. 456. 258. 528.]
 [150. 387. 216. 443.]
 [ 64. 296. 117. 354.]
 [434. 392. 566. 484.]
 [588. 284. 674. 378.]]
6
[]
0
[[ 87.   0. 128.  37.]]
1
[[ 97. 154. 218. 242.]]
1
[[19. 58. 58. 97.]]
1

Executing the python3 train_ssd.py another time, I receive the following output

  mode = random.choice(self.sample_options)
[[1717.  750. 1791.  812.]
 [1644.  773. 1692.  829.]
 [1520.  630. 1582.  702.]]
3
[[ 449. 1557.  516. 1611.]
 [ 621. 1423.  695. 1496.]
 [ 198. 1601.  279. 1664.]
 [ 156. 1530.  243. 1591.]
 [ 153. 1589.  204. 1643.]]
5
[[2366.  571. 2415.  626.]
 [1933.  509. 2010.  557.]
 [2104.  493. 2179.  563.]
 [2447.  464. 2501.  520.]]
4
[[557.   4. 629.  69.]]
1
[[118. 480. 220. 548.]
 [509. 241. 554. 296.]
 [540. 311. 576. 345.]
 [627. 272. 696. 317.]]
4
/home/aimobile/.local/lib/python3.6/site-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
for i, data in enumerate(loader):
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 971, in _next_data
return self._process_data(data)
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/aimobile/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 207, in __getitem__
return self.datasets[dataset_idx][sample_idx]
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 64, in __getitem__
image, boxes, labels = self.transform(image, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
return self.augment(img, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
img, boxes, labels = t(img, boxes, labels)
  File "/home/aimobile/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in __call__
boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Hmm…would you be able to send me a tarball of your dataset so I could give it a try next week and debug?

Dear Dusty,

Great tutorial and step-by-step guide you did. Honestly it is a great work.

I am trying to replicate your FPS on Jetson Nano with segnet (fcn_resnet18), but unfortunately its not the best outcome that I have got. You wrote that on Pascal VOC 512x320 fcn-resnet18-voc-512x320 can run on Jetson Nano with 34 FPS using FP16. If I am not wrong it is only 12 fps, see the Timing Report below. I am using JetPack 4.4, pytorch 1.6.0, torchvision 0.7.0 (with the fcn_resnet18 modifications).

I used the clock boost commands:

sudo nvpmodel -m 0
sudo jetson_clocks

Timing Report:

[image] loaded ‘images/object_0.jpg’ (500 x 333, 3 channels)

[TRT] ------------------------------------------------
[TRT] Timing Report networks/FCN-ResNet18-Pascal-VOC-512x320/fcn_resnet18.onnx
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.08245ms CUDA 0.92099ms
[TRT] Network CPU 36.64665ms CUDA 35.67771ms
[TRT] Post-Process CPU 0.53996ms CUDA 0.54005ms
[TRT] Visualize CPU 0.09438ms CUDA 7.76937ms
[TRT] Total CPU 37.36343ms CUDA 44.90812ms
[TRT] ------------------------------------------------

I managed to retrain the fcn_resnet18 segnet with my own dataset (only 3 classes) and got he Timing Report below. The Post-Process step is taking much more, than with the original network.

segNet – loaded 3 class colors
[image] loaded ‘test_images/barcode_0.jpg’ (500 x 281, 3 channels)
[TRT] ------------------------------------------------
[TRT] Timing Report networks/qbar_white_1/qbar.onnx
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.07667ms CUDA 0.45859ms
[TRT] Network CPU 24.04648ms CUDA 23.49146ms
[TRT] Post-Process CPU 45.18449ms CUDA 45.37104ms
[TRT] Visualize CPU 0.05125ms CUDA 5.77688ms
[TRT] Total CPU 69.35889ms CUDA 75.09797ms
[TRT] ------------------------------------------------

I have 2 questions:

  • How can I get the same FPS results with a pre-trained network as in this table ?
  • How can I boost the post-process step up to get a timing like the pre-trained network?

Yours sincerely,

regard001

Hi @dusty_nv.

Please find the tarball of my dataset attached. Tried it today on a different Nano but results are still the same

file.tar.gz (1.9 MB)

Best regards

M_Fy

Hi @regard001, for the pre-trained model, you got 27FPS. Also are you only processing one image? If so, the processor clock frequencies may not have had time to fully spin up.

I’m not entirely sure why it would be taking so much longer, except if (a) this timing data was only for one image or (b) the model you trained is of a higher resolution. What is the input resolution of your model?

Also, you might want to try “point” filtering mode instead of “linear” to see if the processing time of that is also increased.

OK thank you - I found the issue, it was that image 20200731-211422 did not have any bounding box annotations.

I have checked in a fix where train_ssd.py will check to make sure each image has >0 annotations (or else the image will be ignored). So if you pull the latest, or delete 20200731-211422 from your ImageSets, it should work. I was able to run the training on it.

Thank your for the answers. I tried with “point” filtering mode, and it makes it a little faster (2-3 fps), for bigger images it has more significance.

I calculated the FPS like this: 1/(Total CPU in seconds + Total GPU in seconds).

The pre-trained network was tested only on one image.

With the re-trained network only one image was processed at a time, but in a sequence. Maybe in batch > 1 it is faster. The input resolution of my model is 320x320 (when trained I used the --resolution flag, but the default is also 320). I have one more concern, I want to use the model in FP16, but when the model is loaded it says:

[TRT] binding – index 0
– name ‘input_0’
– type FP32
– in/out INPUT

[TRT] binding – index 1
– name ‘output_0’
– type FP32
– in/out OUTPUT

It is possible that the first and the last layer makes the processing slower, because of FP32?

Dear @dusty_nv.
Pulling latest changes worked for me and solved my issues. Thx a lot for the fix.

Best regards
M_Fy

@regard001, the times are not summed, the GPU time is running in parallel at the same time as the CPU. So you can just use the CPU time. The CPU launches GPU kernel, then waits for GPU to finish.

The input and output layers will be FP32, even if the model is using FP16. TensorRT will convert internally so the user doesn’t need to deal with FP16 from the CPU. So it is still using FP16.

I forgot to mention earlier, the filtering is also dependent on the raw size of the image you are processing (for example if your test image is 1024x512, it will take longer to post-process than a 512x256 image). This is because the overlay/mask processing is applied to the original size of the image being processed.

Hi @dusty_nv,

i am facing another issue with the following tutorial:

While image collection, labeling and training works fine, my video output streams always stops with a segmentation fault error, when I introduce an item that looks similar to my trained object.

Segmentation fault (core dumped)

I do not think it is a memory issue since I can run pre-trained networks with images from e.g. here Open Images Dataset V6 quite well.

Any idea to support me ?

Best regards

M_Fy

Hmm, does it work if you test on a static image from disk that has an object your model was trained on? Or is it only with the camera streaming?

Hi @dusty_nv, it seems to be an issue with the video stream, the analysis of a static image works and the object is shown with a box

The “Hello AI World” thread is no longer pinned and does not appear at the top of the forum.

And there are no hyperlinks in the threads that are pinned.

Please fix.

Hi @xplanescientist, I have added a link to it from the permanent Links to Jetson Nano Resources sticky.

Hello AI World is also included in the Jetson Zoo, which is linked to from each Jetson desktop.

Continuing the discussion from Hello AI World - new object detection training and video interfaces:

Hi is there already a forum for this issue? I followed the Hello World tutorial with my own custom models using pascal voc and converting to onnx but when I run: detectnet --model=models/flex/ssd-mobilenet.onnx --lables=models/flex/labels.txt --input_blob=input_0 --output-cvg=scores --output-bbox=boxes /dev/video0

Camera runs then I get segmentation fault(core dumped) error and camera shuts down.

Hi @Rodimir_V, did you train your model using train_ssd.py?

Before running the camera, can you try running your model with detectnet on a test image first? If there is still error, please provide the console log.

1 Like

Hi @dusty_nv , yes I trained my model with train_ssd.py. I’ll try that and get back to you. Thx for the quick response!

Hi @dusty_nv , I tried the fruits models on your youtube video “Jetson AI Fundamentals - S3E5 - Training Object Models” and it worked fine and was able to detect some of the fruits.

I still get the segmentation fault again when I trained my own models.

Here’s the log:
[TRT] device GPU, loaded models/flex/ssd-mobilenet.onnx
[TRT] Deserialize required 232914 microseconds.
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 104
[TRT] – maxBatchSize 1
[TRT] – workspace 0
[TRT] – deviceMemory 19595776
[TRT] – bindings 3
[TRT] binding 0
– index 0
– name ‘input_0’
– type FP32
– in/out INPUT
– # dims 4
– dim #0 1 (SPATIAL)
– dim #1 3 (SPATIAL)
– dim #2 300 (SPATIAL)
– dim #3 300 (SPATIAL)
[TRT] binding 1
– index 1
– name ‘scores’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 3000 (SPATIAL)
– dim #2 2 (SPATIAL)
[TRT] binding 2
– index 2
– name ‘boxes’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 3000 (SPATIAL)
– dim #2 4 (SPATIAL)
[TRT]
[TRT] binding to input 0 input_0 binding index: 0
[TRT] binding to input 0 input_0 dims (b=1 c=3 h=300 w=300) size=1080000
[TRT] binding to output 0 scores binding index: 1
[TRT] binding to output 0 scores dims (b=1 c=3000 h=2 w=1) size=24000
[TRT] binding to output 1 boxes binding index: 2
[TRT] binding to output 1 boxes dims (b=1 c=3000 h=4 w=1) size=48000
[TRT]
[TRT] device GPU, models/flex/ssd-mobilenet.onnx initialized.
[TRT] detectNet – number object classes: 2
[TRT] detectNet – maximum bounding boxes: 3000
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> jpegdec0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> v4l2src0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> jpegdec0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> v4l2src0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> jpegdec0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> v4l2src0
[gstreamer] gstreamer message stream-start ==> pipeline0
detectnet: failed to capture video frame
[gstreamer] gstCamera – onPreroll
[gstreamer] gstCamera – map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstCamera recieve caps: video/x-raw, format=(string)I420, width=(int)1280, height=(int)720, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)mpeg2, colorimetry=(string)1:4:0:0, framerate=(fraction)30/1
[gstreamer] gstCamera – recieved first frame, codec=mjpeg format=i420 width=1280 height=720 size=1382407
RingBuffer – allocated 4 buffers (1382407 bytes each, 5529628 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
RingBuffer – allocated 4 buffers (2764800 bytes each, 11059200 bytes total)
Segmentation fault (core dumped)

Hi @dusty_nv , I found my issue. It was a typo on my side. took me a while to see it my “–labels” was “lables”. I was copying and pasting the same typo from my notepad to the terminal.

Thanks though!

Ah gotcha, ok - glad you got it running!