Ros_Deep_learning taking lot of time to load

Hi @dusty_nv

I’ve been using SSD-Mobilenet-V2 for performing Detections from ROS_Deep_learning application.

Everytime I execute the ROS launch to run the detection it takes a lot of time to train the model with 2-3 minutes of [TRT] prints like :

$ roslaunch ros_deep_learning detectnet.ros1.launch

[TRT] Tactic: 10747903 time 0.013952
[TRT] Tactic: 10944511 time 0.018832
[TRT] Fastest Tactic: 6291455 Time: 0.009552
[TRT] --------------- Timing Runner: FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_4_1x1_128/Conv2D + FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_4_1x1_128/Relu6 (CaskConvolution)

Question1: Is there a way to store all the trained data, and directly enter into the detection mode ?

Also, tried RTSP streaming as :
In NVIDA Jetson Xavier NX:
$ roslaunch ros_deep_learning detectnet.ros1.launch output:=rtsp://yash.j:12345678@192.168.13.180:1234

In host pc from VLC ran :
VLC->stream : test.sdp

$ cat test.sdp

  c=IN IP4 127.0.0.1
  m=video 1234 RTP/AVP 96
  a=rtpmap:96 H264/90000

Question2: But in the host system I am not getting the streamed results, can you please help me fix this ?

Thanks in advance…

Hi @yash.j, what TensorRT is doing is optimizing the model for inference, but without changing the weights (so it is not actually doing any training). Alas, this should process should only happen the first time the model is loaded, because the optimized TensorRT model engine gets saved to disk. Then on subsequent runs, the optimized model should be loaded.

Can you check your jetson-inference/data/networks/ssd-mobilenet directory and see if there is a *.engine file in there?

Do you let the ROS node fully load so it has a chance to save this optimized engine file? Can you provide the full log of it running?

Another thing to try, is to use the detectnet program from jetson-inference, which will generate the engine. Try processing a couple of test images using the standalone detectnet program: https://github.com/dusty-nv/jetson-inference/blob/master/docs/detectnet-console-2.md#detecting-objects-from-images

You should notice that after the first run of detectnet program, it loads much quicker, because it has the optimized engine already saved. Then you can try the ROS node again to see if it loads the saved engine too.

I think you meant to specify the rtp protocol as output, not rtsp (rtsp output isn’t supported in the library). Try testing it first with the detectnet program or video-viewer utility:

detectnet /dev/video0 rtp://192.168.13.180:1234

If you can view the stream with VLC player, you may want to try GStreamer on the PC.

Yes I feel the issue is in the ssd_mobilenet_v2_coco.uff.1.1.7103.GPU.FP16.engine file creation.

$ ./detectnet-camera --width=1280 --height=720 /dev/video0

[TRT] Layer(Reformat): GridAnchor copy, Tactic: 1002, GridAnchor[Float(2,4332,1)] -> concat_priorbox[Float(2,4332,1)]
[TRT] Layer(Reformat): GridAnchor_1 copy, Tactic: 0, GridAnchor_1[Float(2,2400,1)] -> concat_priorbox[Float(2,2400,1)]
[TRT] Layer(Reformat): GridAnchor_2 copy, Tactic: 0, GridAnchor_2[Float(2,600,1)] -> concat_priorbox[Float(2,600,1)]
[TRT] Layer(Reformat): GridAnchor_3 copy, Tactic: 0, GridAnchor_3[Float(2,216,1)] -> concat_priorbox[Float(2,216,1)]
[TRT] Layer(Reformat): GridAnchor_4 copy, Tactic: 0, GridAnchor_4[Float(2,96,1)] -> concat_priorbox[Float(2,96,1)]
[TRT] Layer(Reformat): GridAnchor_5 copy, Tactic: 0, GridAnchor_5[Float(2,24,1)] -> concat_priorbox[Float(2,24,1)]
[TRT] Layer(PluginV2): NMS, Tactic: 0, concat_box_conf[Float(174447,1,1)], Squeeze[Float(7668,1,1)], concat_priorbox[Float(2,7668,1)] -> NMS[Float(1,100,7)], NMS_1[Float(1,1,1)]
[TRT] device GPU, completed building CUDA engine
[TRT] network profiling complete, writing engine cache to networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff.1.1.7103.GPU.FP16.engine
[TRT] failed to open engine cache file for writing networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff.1.1.7103.GPU.FP16.engine
[TRT] device GPU, completed writing engine cache to networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff.1.1.7103.GPU.FP16.engine
[TRT] device GPU, loaded networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
[TRT] Deserialize required 221087 microseconds.
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 108
[TRT] – maxBatchSize 1
[TRT] – workspace 0
[TRT] – deviceMemory 28495360
[TRT] – bindings 3
[TRT] binding 0
– index 0
– name ‘Input’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (SPATIAL)
– dim #1 300 (SPATIAL)
– dim #2 300 (SPATIAL)
[TRT] binding 1
– index 1
– name ‘NMS’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 100 (SPATIAL)
– dim #2 7 (SPATIAL)
[TRT] binding 2
– index 2
– name ‘NMS_1’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 1 (SPATIAL)
– dim #2 1 (SPATIAL)
[TRT]
[TRT] binding to input 0 Input binding index: 0
[TRT] binding to input 0 Input dims (b=1 c=3 h=300 w=300) size=1080000
[TRT] binding to output 0 NMS binding index: 1
[TRT] binding to output 0 NMS dims (b=1 c=1 h=100 w=7) size=2800
[TRT] binding to output 1 NMS_1 binding index: 2
[TRT] binding to output 1 NMS_1 dims (b=1 c=1 h=1 w=1) size=4
[TRT]
[TRT] device GPU, networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff initialized.
[TRT] W = 7 H = 100 C = 1
[TRT] detectNet – maximum bounding boxes: 100
[TRT] detectNet – loaded 91 class info entries
[TRT] detectNet – number of object classes: 91
Segmentation fault (core dumped)

This is the same error I’m facing when I run:
$ roslaunch ros_deep_learning detectnet.ros1.launch
But there the code executes without giving Segmentation Fault.

Thanks, for the response it did solve the issue.

Hmm it seems like your user doesn’t have permissions to write to your jetson-inference/data directory. Can you try giving read/write permissions?

$ sudo chmod -R 666 <your-jetson-inference>/data

Thanks it worked.

One query regarding the frame rate :
I need to support 10frames/second, is there a way to add desired fps in the source, or via command line ROS parameter.

Is it running faster than that, and you want to limit it to 10FPS? If so, you may want to add videorate limiter to the GStreamer camera pipeline in jetson-inference/utils/camera/gstCamera.cpp

You can try adding this to the pipeline:

videorate ! video/x-raw,framerate=15/1

If you are using V4L2 camera, gstCamera.cpp:183 would become ss << "videorate ! video/x-raw,framerate=15/1 ! appsink name=mysink";

If you are using MIPI CSI camera, gstCamera.cpp:150 would become ss << "video/x-raw ! videorate ! video/x-raw,framerate=15/1 ! appsink name=mysink";

After making changes, re-compile jetson-inference with make and sudo make install

Thanks for the exact solution…