MobileNetV2 on jetson nano slower than expected

aanish.p · October 13, 2021, 3:26pm

• Hardware Platform (Jetson / GPU) Jetson Nano Dev Kit 4GB ram
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only) 4.5.1 rev1
• TensorRT Version 7.1.3

I am trying to infer from MobileNetV2 on my Jetson Nano and I am not getting more than 55fps for the same.
https://developer.nvidia.com/blog/jetson-nano-ai-computing/ according to this blog its mentioned 64fps for input dimensions of 300x300, whereas in my case its 224x224 so i should get better.

I have implemented MobileNetV2 in two different ways,
1). OpenCV and Python
2). Deepstream

the results are as follows

results on cv2
bs=1
MobileNetV2 = avg. 20 ms per frame or 50fps
results on DeepStream
bs=1
MobileNetV2 = avg. 22ms per frame or 45 fps

This is the link to the drive folder with the code, keras model of MobileNetV2, conversion script and engine file nvidiadevpost - Google Drive
The MobileNetV2 i have used is slightly different, in the output layer it has no SoftMax but has a Dense layer with 7 output nodes.

Note: the .trt conversion is done with fp16

mchi · October 14, 2021, 4:00pm

will check
thanks!

mchi · October 16, 2021, 7:36am

Hi ,
Looks these two have different pipelines.

could you try below command to check the fps?

$ gst-launch-1.0 -v v4l2src device=/dev/video1 num-buffers=300 ! video/x-raw,format=YUY2,width=640,height=360,framerate=30/1 !
videoconvert ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=NV12’ ! m.sink_0 nvstreammux name=m live-source=1 batch-size=1 !
nvinfer config-file-path=dstest1_pgie0_config.txt ! nvvideoconvert ! nvdsosd ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false

may need some change according to your camera typye which can be detected by

$ sudo apt-get install v4l-utils
$ v4l2-ctl -d /dev/video0 --list-formats-ext

aanish.p · October 16, 2021, 6:29pm

this is the output i got

datakalp@datakalp-desktop:~$ gst-launch-1.0 -v v4l2src device=/dev/video1 num-buffers=300 ! video/x-raw, format=YUY2, width=640, height=360, framerate=30/1 ! videoconvert ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=NV12' ! m.sink_0 nvstreammux name=m live-source=1 batch-size=1 width=224 height=224 ! nvinfer config-file-path=/home/datakalp/nvidiadevpost/dstest1_pgie0_config.txt ! nvvideoconvert ! nvdsosd ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false
Setting pipeline to PAUSED ...
0:00:05.894427474 14948   0x55adf86ad0 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/home/datakalp/TRTBS1.trt
INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_1         3x224x224       
1   OUTPUT kFLOAT model           7               

0:00:05.894565969 14948   0x55adf86ad0 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /home/datakalp/TRTBS1.trt
0:00:05.920308178 14948   0x55adf86ad0 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<nvinfer0> [UID 1]: Load new model:/home/datakalp/nvidiadevpost/dstest1_pgie0_config.txt sucessfully
Pipeline is live and does not need PREROLL ...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = false
Setting pipeline to PLAYING ...
New clock: GstSystemClock
/GstPipeline:pipeline0/GstV4l2Src:v4l2src0.GstPad:src: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstVideoConvert:videoconvert0.GstPad:src: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/GstNvStreamMux:m.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/GstNvInfer:nvinfer0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert1.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstNvDsOsd:nvdsosd0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0.GstGhostPad:sink.GstProxyPad:proxypad0: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0.GstGhostPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstNvDsOsd:nvdsosd0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert1.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/GstNvInfer:nvinfer0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/GstNvStreamMux:m.GstNvStreamPad:sink_0: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert0.GstPad:sink: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstVideoConvert:videoconvert0.GstPad:sink: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = false
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 24, dropped: 0, current: 45.17, average: 45.17
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 39, dropped: 0, current: 29.75, average: 37.66
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 54, dropped: 0, current: 29.78, average: 35.08
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 69, dropped: 0, current: 29.67, average: 33.74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 84, dropped: 0, current: 29.85, average: 32.98
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 99, dropped: 0, current: 29.78, average: 32.45
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 114, dropped: 0, current: 29.98, average: 32.10
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 129, dropped: 0, current: 29.76, average: 31.81
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 144, dropped: 0, current: 29.75, average: 31.58
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 159, dropped: 0, current: 29.77, average: 31.40
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 173, dropped: 0, current: 27.79, average: 31.07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 188, dropped: 0, current: 29.76, average: 30.97
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 203, dropped: 0, current: 29.74, average: 30.87
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 218, dropped: 0, current: 29.78, average: 30.79
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 233, dropped: 0, current: 29.97, average: 30.74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 248, dropped: 0, current: 29.72, average: 30.68
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 263, dropped: 0, current: 29.82, average: 30.63
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 278, dropped: 0, current: 29.68, average: 30.57
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 293, dropped: 0, current: 29.84, average: 30.54
Got EOS from element "pipeline0".
Execution ended after 0:00:12.684944700
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

mchi · October 17, 2021, 1:07am

From above log, the fps of the pipeline is 30fps , I think, the reason is the camera input source has only 30fps.

How do you get 22ms or 45 fps as you mentioned above?

aanish.p · October 17, 2021, 12:24pm

yes the maximum fps that the camera can give is 30 fps. I calculated the total time of the inference by placing event probes before and after the NV infer plugin’s sink and src probes. The same is there in the code on the drive.

mchi · October 17, 2021, 1:40pm

don’t understand this.
So, how do you get below data? What do you want us to look into?

aanish.p · October 17, 2021, 2:54pm

the results i got are from the
OpenCV scrip from the drive

while(True):
    st=time.time() 
    ret, frame = vid.read()
    stretch_near = cv2.resize(frame, (224, 224),interpolation = cv2.INTER_NEAREST)
    m_rgb=np.array(stretch_near)
    imp_arr=m_rgb.reshape((1,224,224,3))
    imp_arr=imp_arr.astype(np.float16)
    out=predict(imp_arr)  
    print("time taken :", time.time()-st)
    print(out)

and deepstream script from the drive

st=0
def sta(pad, info, u_data):
    global st
    st=time.time()
    return Gst.PadProbeReturn.OK

et=0
def eta():
    global et
    global st
    et=time.time()
    print("time taken for mobilenetv2 :", et-st)
    st=et

my question is simple nvidia has posted https://developer.nvidia.com/blog/jetson-nano-ai-computing/ that MobileNetV2 runs at 64Fps on a 300x300 frame on the jetson nano i am using 224x224 and still not able to get some thing similar. Is there any mistake in my implementation? how to get a higher fps?

mchi · October 18, 2021, 12:38am

By “/usr/src/tensorrt/bin/trtexec --loadEngine=TRTBS1.trt”, you can get
$ /usr/src/tensorrt/bin/trtexec --loadEngine=TRTBS1.trt
…
[10/18/2021-08:36:28] [I] median: 14.3082 ms (end to end 14.3177 ms —> this indicates the inference fps = 1000 / 14.3 = 71 FPS
…

aanish.p · October 18, 2021, 4:29am

yes i am getting 14.x ms but the primary question remains unsolved how to get this fps for my implementation? i need to inference from a live webcam, is it possible with trtexec? if so how do we get the output tensors?

mchi · October 18, 2021, 4:33am

the fps of the source - a live camera only has 30fps input, how it’s possible to achieve higher fps?

aanish.p · October 18, 2021, 4:45am

yes, i would not get more than 30 fps with web cam. But I am interested in the msec for execution because i need to do some other computation apart from this mobilenetv2 hence its import for me to get this 14.X ms in the deepstream or cv2 implementation of mobilenetv2 only.

aanish.p · October 19, 2021, 3:45pm

@mchi any update on this ?

mchi · October 21, 2021, 1:19am

Hi @aanish.p ,
“sta” hooks on the sink pad, “inferance_results_probe” hooks on src pad, “eta” is called in inferance_results_probe, before it, there are server other functions, e.g. rgba2rgb(), did you check how much time “rgba2rgb()” takes?
could you put “eta” in the front of inferance_results_probe() and check the time again?
And, in nvinfer, besides TRT inference, it also need to convert RGBA packed data to RGB planar.

aanish.p · October 21, 2021, 5:12am

hello,
1). i removed rgba2rgb() as it was redundant.
2). i put the eta in the beginning of inferance_results_probe()

I observe the mobilenetv2 inference time is varying from 16-20ms is this normal?

16-20ms is 50-62fps is this the best we can get ?

mchi · October 21, 2021, 5:49am

did you boost the Jetson clock?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

if you didn’t, boost the clock and test again

aanish.p · October 21, 2021, 6:02am

boosting the clock has helped i am getting 15-18ms for inference now.

mchi · October 21, 2021, 6:10am

since there is pre-processing as I mentioned previously, this time should make sense

system · November 4, 2021, 6:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to view Jetson nano live video from Android OS device Jetson Nano gstreamer	26	3523	October 15, 2021
Deepstream-app custom CSI v4l2src Camera frame drop DeepStream SDK	6	1266	October 12, 2021
How can I customize to use non-blocking mode nvv4l2h264enc Jetson Xavier NX encoder	11	2293	April 27, 2022
Problem using gstreamer with opencv and cuda on TX2 with Jetpack 4.3 Jetson TX2 rtsp , opencv , cuda , gstreamer	21	3341	October 18, 2021
Gstreamer issue in NX: Nano is faster than NX! Jetson Xavier NX gstreamer	16	1991	October 18, 2021
Connecting CSI Camera IMX219-83 to Jetson Nano Jetson Nano camera	19	975	July 25, 2023
DeepStream 5.1, PyTorch, MobileNet SSD v1, retained, ONNX - poor performance DeepStream SDK	8	1722	October 12, 2021
What is limiting the camera framerate (gstreamer) Jetson Nano camera , gstreamer	11	3075	December 22, 2021
Camera's frame rate unstable Jetson AGX Xavier camera	22	4640	October 18, 2021
Jetson nano developer kit nvjpeg encoder's speed so slowly. it encode YUV420_7264X4112_Pic 100 times need 20s Jetson Nano mmapi	23	1867	October 18, 2021

MobileNetV2 on jetson nano slower than expected

Related topics