MobileNetV2 on jetson nano slower than expected

• Hardware Platform (Jetson / GPU) Jetson Nano Dev Kit 4GB ram
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only) 4.5.1 rev1
• TensorRT Version 7.1.3

I am trying to infer from MobileNetV2 on my Jetson Nano and I am not getting more than 55fps for the same.
Jetson Nano Brings AI Computing to Everyone | NVIDIA Developer Blog according to this blog its mentioned 64fps for input dimensions of 300x300, whereas in my case its 224x224 so i should get better.

I have implemented MobileNetV2 in two different ways,
1). OpenCV and Python
2). Deepstream

the results are as follows

  • results on cv2
    bs=1
    MobileNetV2 = avg. 20 ms per frame or 50fps

  • results on DeepStream
    bs=1
    MobileNetV2 = avg. 22ms per frame or 45 fps

This is the link to the drive folder with the code, keras model of MobileNetV2, conversion script and engine file nvidiadevpost - Google Drive
The MobileNetV2 i have used is slightly different, in the output layer it has no SoftMax but has a Dense layer with 7 output nodes.

Note: the .trt conversion is done with fp16

will check
thanks!

Hi ,
Looks these two have different pipelines.

could you try below command to check the fps?

$ gst-launch-1.0 -v v4l2src device=/dev/video1 num-buffers=300 ! video/x-raw,format=YUY2,width=640,height=360,framerate=30/1 !
videoconvert ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=NV12’ ! m.sink_0 nvstreammux name=m live-source=1 batch-size=1 !
nvinfer config-file-path=dstest1_pgie0_config.txt ! nvvideoconvert ! nvdsosd ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false

may need some change according to your camera typye which can be detected by

$ sudo apt-get install v4l-utils
$ v4l2-ctl -d /dev/video0 --list-formats-ext

this is the output i got

datakalp@datakalp-desktop:~$ gst-launch-1.0 -v v4l2src device=/dev/video1 num-buffers=300 ! video/x-raw, format=YUY2, width=640, height=360, framerate=30/1 ! videoconvert ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=NV12' ! m.sink_0 nvstreammux name=m live-source=1 batch-size=1 width=224 height=224 ! nvinfer config-file-path=/home/datakalp/nvidiadevpost/dstest1_pgie0_config.txt ! nvvideoconvert ! nvdsosd ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false
Setting pipeline to PAUSED ...
0:00:05.894427474 14948   0x55adf86ad0 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/home/datakalp/TRTBS1.trt
INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_1         3x224x224       
1   OUTPUT kFLOAT model           7               

0:00:05.894565969 14948   0x55adf86ad0 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /home/datakalp/TRTBS1.trt
0:00:05.920308178 14948   0x55adf86ad0 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<nvinfer0> [UID 1]: Load new model:/home/datakalp/nvidiadevpost/dstest1_pgie0_config.txt sucessfully
Pipeline is live and does not need PREROLL ...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = false
Setting pipeline to PLAYING ...
New clock: GstSystemClock
/GstPipeline:pipeline0/GstV4l2Src:v4l2src0.GstPad:src: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstVideoConvert:videoconvert0.GstPad:src: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/GstNvStreamMux:m.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/GstNvInfer:nvinfer0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert1.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstNvDsOsd:nvdsosd0.GstPad:src: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0.GstGhostPad:sink.GstProxyPad:proxypad0: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0.GstGhostPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/GstNvDsOsd:nvdsosd0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, batch-size=(int)1, num-surfaces-per-frame=(int)1, format=(string)RGBA
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert1.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/GstNvInfer:nvinfer0.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)224, height=(int)224, framerate=(fraction)30/1, format=(string)NV12, batch-size=(int)1, num-surfaces-per-frame=(int)1
/GstPipeline:pipeline0/GstNvStreamMux:m.GstNvStreamPad:sink_0: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:sink: caps = video/x-raw(memory:NVMM), width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, format=(string)NV12
/GstPipeline:pipeline0/Gstnvvideoconvert:nvvideoconvert0.GstPad:sink: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstVideoConvert:videoconvert0.GstPad:sink: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, colorimetry=(string)2:4:5:1, interlace-mode=(string)progressive
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstFakeSink:fakesink0: sync = false
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 24, dropped: 0, current: 45.17, average: 45.17
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 39, dropped: 0, current: 29.75, average: 37.66
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 54, dropped: 0, current: 29.78, average: 35.08
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 69, dropped: 0, current: 29.67, average: 33.74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 84, dropped: 0, current: 29.85, average: 32.98
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 99, dropped: 0, current: 29.78, average: 32.45
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 114, dropped: 0, current: 29.98, average: 32.10
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 129, dropped: 0, current: 29.76, average: 31.81
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 144, dropped: 0, current: 29.75, average: 31.58
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 159, dropped: 0, current: 29.77, average: 31.40
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 173, dropped: 0, current: 27.79, average: 31.07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 188, dropped: 0, current: 29.76, average: 30.97
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 203, dropped: 0, current: 29.74, average: 30.87
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 218, dropped: 0, current: 29.78, average: 30.79
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 233, dropped: 0, current: 29.97, average: 30.74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 248, dropped: 0, current: 29.72, average: 30.68
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 263, dropped: 0, current: 29.82, average: 30.63
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 278, dropped: 0, current: 29.68, average: 30.57
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 293, dropped: 0, current: 29.84, average: 30.54
Got EOS from element "pipeline0".
Execution ended after 0:00:12.684944700
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...


From above log, the fps of the pipeline is 30fps , I think, the reason is the camera input source has only 30fps.

How do you get 22ms or 45 fps as you mentioned above?

yes the maximum fps that the camera can give is 30 fps. I calculated the total time of the inference by placing event probes before and after the NV infer plugin’s sink and src probes. The same is there in the code on the drive.

don’t understand this.
So, how do you get below data? What do you want us to look into?

the results i got are from the
OpenCV scrip from the drive

while(True):
    st=time.time() 
    ret, frame = vid.read()
    stretch_near = cv2.resize(frame, (224, 224),interpolation = cv2.INTER_NEAREST)
    m_rgb=np.array(stretch_near)
    imp_arr=m_rgb.reshape((1,224,224,3))
    imp_arr=imp_arr.astype(np.float16)
    out=predict(imp_arr)  
    print("time taken :", time.time()-st)
    print(out)

and deepstream script from the drive

st=0
def sta(pad, info, u_data):
    global st
    st=time.time()
    return Gst.PadProbeReturn.OK

et=0
def eta():
    global et
    global st
    et=time.time()
    print("time taken for mobilenetv2 :", et-st)
    st=et

my question is simple nvidia has posted Jetson Nano Brings AI Computing to Everyone | NVIDIA Developer Blog that MobileNetV2 runs at 64Fps on a 300x300 frame on the jetson nano i am using 224x224 and still not able to get some thing similar. Is there any mistake in my implementation? how to get a higher fps?

By “/usr/src/tensorrt/bin/trtexec --loadEngine=TRTBS1.trt”, you can get
$ /usr/src/tensorrt/bin/trtexec --loadEngine=TRTBS1.trt

[10/18/2021-08:36:28] [I] median: 14.3082 ms (end to end 14.3177 ms —> this indicates the inference fps = 1000 / 14.3 = 71 FPS

yes i am getting 14.x ms but the primary question remains unsolved how to get this fps for my implementation? i need to inference from a live webcam, is it possible with trtexec? if so how do we get the output tensors?

the fps of the source - a live camera only has 30fps input, how it’s possible to achieve higher fps?

yes, i would not get more than 30 fps with web cam. But I am interested in the msec for execution because i need to do some other computation apart from this mobilenetv2 hence its import for me to get this 14.X ms in the deepstream or cv2 implementation of mobilenetv2 only.

@mchi any update on this ?

Hi @aanish.p ,
“sta” hooks on the sink pad, “inferance_results_probe” hooks on src pad, “eta” is called in inferance_results_probe, before it, there are server other functions, e.g. rgba2rgb(), did you check how much time “rgba2rgb()” takes?
could you put “eta” in the front of inferance_results_probe() and check the time again?
And, in nvinfer, besides TRT inference, it also need to convert RGBA packed data to RGB planar.

hello,
1). i removed rgba2rgb() as it was redundant.
2). i put the eta in the beginning of inferance_results_probe()

I observe the mobilenetv2 inference time is varying from 16-20ms is this normal?

16-20ms is 50-62fps is this the best we can get ?

did you boost the Jetson clock?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

if you didn’t, boost the clock and test again

boosting the clock has helped i am getting 15-18ms for inference now.

since there is pre-processing as I mentioned previously, this time should make sense