Increase FPS of Jetson-inference using complete utilisation of CPU-GPU

Hi ,
We are running Jetson-inference on NVIDIA JETSON TX2 board using TensorRT .
We have modified the code to meet certain requirements lying lane overlays, CAN transmission etc.
When there is no pedestrian for detection the algorithm runs at 15fps.
When it detects an object it slows down to 3-4fps.
Is there a way to compile the code or make changes such that functions can be split to run between CPU and GPU?
Or
Is there any method to increase the FPS even when objects are detected?

Thanks,
Pratosha

Hi Pratosha, I haven’t really seen this behavior with detectNet, in my experience it runs the same performance regardless of how many objects are detected on-screen. How big are the objects? You may want to try disabling the rendering by editing detectnet-camera.cpp.

The clustering is performed on CPU, here is the code: https://github.com/dusty-nv/jetson-inference/blob/e12e6e64365fed83e255800382e593bf7e1b1b1a/detectNet.cpp#L383

You may want to profile it or try disabling it to see if that is causing your performance issue.

Hi dusty_nv,

On commenting the lines from 383 to 410 in detectnet.cpp - The frame rate remains 15fps constantly , no detection or overlays were seen.
On changing the profile mode from TRUE to FALSE in tensornet.cpp and building the fps dropped down to 11-12 for no objects in the frame.
The size of the objects detected is that of a person - depends on where the person is standing(can be multiple/single).

We have added a lot of lines in CudaOverlay.cu for drawing lane lines and boundaries. That is the reason for reduction in fps.

Please help.

Thanks,
Pratosha

Hi Pratosha, if you plan to have a display attached, you may wish to investigate having OpenGL do the rendering of the lines/ect. since you require additional visualization and that is impacting performance. The reason for cudaOverlay.cu is so that it could still do some basic visualization even with no display attached (i.e. in a headless robot). The rasterization implemented in cudaOverlay is not optimized and meant for simplicity.

Hi Dusty,

Thanks a lot for your response
you may wish to investigate having OpenGL do the rendering of the lines/ect.
-if cudaOverlay.cu is meant for simplicity Can you suggest where else can the code be written for overlays/lines? OR Which cpp file do we have to make changes to get the rendering of lines?

Another observation we made was - running jetson inference detectnet-camera (without making any changes) in the presence of an object gives 10fps . When no object is detected on the screen its 15.
How can we increase the fps on jetson tx2 for jetson-inference ?
I have attached the screenshot for your reference. https://imgur.com/a/bUmHi

Kindly help

Thanks,
Pratosha

You would want to add additional OpenGL rendering code after this line: https://github.com/dusty-nv/jetson-inference/blob/e12e6e64365fed83e255800382e593bf7e1b1b1a/detectnet-camera/detectnet-camera.cpp#L259

This is where the OpenGL texture is rendered after the CUDA<->OpenGL interopability is complete, after this point in the code you would want to render your more complex overlay.

Have you tried running “sudo ~/jetson_clocks.sh” or “sudo nvpmodel -m 0” ?

Hi dusty,

Yes the comment given by you did work.
We are adding our lines there . Thank you :)

"Have you tried running “sudo ~/jetson_clocks.sh” or “sudo nvpmodel -m 0” ? " - Yes , we pass the 2 commands on the terminal before running detectnet-camera

Thanks.