tensorflow performence improvements is not linear before/after jetson_clocks.sh

fujunde · April 28, 2018, 2:48pm

Hello, I’m using tf1.7 with jackpak3.2. I’ve two models, one is little (6.3MB) the other is twice larger (13.6MB).

After jetson_clocks, the little one accelerated from 0.125 s/f to 0.043 s/f, however, the larger one only from 0.332 s/f to 0.286 s/f.

Why one is accelerated well but the other one is not?

Anthoer interesting thing, after I keep the input image only half size, the larger one drops from 0,.286 s/f to 0.143 s/f, while the little one keeps around 0.040 s/f.

So, why this time the little one doesn’t reduce the execution time along with the input image size reduction?

How can I accelerate my model correctly?

Hope help, thanks.

AastaLLL · April 30, 2018, 2:51am

Hi,

We need more information to give a dedicated suggestion.
Could you profile your model with nvprof and share the results with us?

sudo ./nvprof -o output.nvprof [your program]

Here are some common causes for your reference:
1. Some layers inside your model is not suitable for GPU architecture.
2. The model is too small to gain performance from cutting down batch size.

Thanks.

fujunde · May 1, 2018, 12:43am

Hello AastaLLL, thank you for reply.

Here the nvprof result with jeston_clocks.sh, [url]output.nvprof - Google Drive

Without jeston_clocks.sh, inferring one frame (100x500) tf_1.7 needs 0.35s, with jeston_clocks.sh it is 0.286 s.

My model is pure 10 conv2d_3x3s, with 2 full connection at end. The input is 100x500, output is also 100x500.

I cannot understand why my littler model benefits so much from jeston_clocks.sh while the larger one can not.

Hope reply.

AastaLLL · May 3, 2018, 7:36am

Hi,

You can check the nvprof data with NVVP on the host.

Based on the result, your application takes most of the time in CUDA preparation and memory free/allocate.
That why the improvement of inferencing is slightly in your use-case.

Thanks.

Topic		Replies	Views
High latency while run TensorFlow with keras on Jetson Tx2 Jetson TX2	5	1626	October 18, 2021
Low Compute utilization of converted TensorFlow model during inference Jetson TX2	19	1745	October 18, 2021
Performance of Tensorflow (1.5) on Jetson TX2 slower than expected Jetson TX2	3	2799	October 18, 2021
First time run of Tensorflow on Jetson Tx2 is slow. Jetson TX2	2	1256	October 18, 2021
Jetson AGX Xavier shows unstable inference time Jetson AGX Xavier tensorrt , jetson-inference	6	717	October 18, 2021
Poor Inference Time on Jetson TX1 Jetson TX1 jetson-inference	4	784	June 21, 2022
Slow inference on jetson TX2 with tensorflow Jetson TX2	2	612	October 18, 2021
Object detection models are very slow Jetson TX2	5	1477	October 18, 2021
TensorRT inference Time TensorRT	1	768	September 20, 2018
tensorflow mobilenet object detection model in Tx2 is very slow? Jetson TX2	11	3986	October 18, 2021

tensorflow performence improvements is not linear before/after jetson_clocks.sh

Related topics