.numpy() is very slow on Jetson Nano

kmfrick98 · May 13, 2020, 6:20am

I’m running some experiments with semantic segmentation models on a Nano. One of the models I am using is DeepLabv3 with a MNv2 backbone, trained on the ADE20K dataset.

I noticed that the performance I was getting is very poor (~720ms per inference), on par with what I get on my five-year-old laptop running TensorFlow on CPU.

I debugged the issue and traced it to the memory copy, that is, EagerTensor.numpy(). Inference is very fast, as expected, but then converting the results to a format I can use in the rest of my workflow takes MUCH longer (up to 20x inference time if I use heavier networks like PSPNet).

I tried using TF1.x semantics with tf.Session() but no dice, tf.Session.run() becomes the bottleneck with similar timings. I tried converting the model to TensorRT… which makes the model even slower! (~920ms per inference)

Now, normally I would just say “oh well, memory copy is slow, duh” and try to find another way around it. But isn’t the Nano supposed to share memory between its CPU and GPU? Is there any way to exploit this peculiarity?

I am running TF2.1 on JP4.3.

AastaLLL · May 13, 2020, 7:13am

Hi,

Could you share a simple code for us checking?
More, could you try to enable TensorRT to see if helps also?

Thanks.

kmfrick98 · May 13, 2020, 7:17am

I mentioned in the OP that I tried converting to TRT and that made the model even slower.
To test, download DeepLabv3, convert to SavedModel using this script, then run

import tensorflow as tf
import numpy as np
import time
imported = tf.saved_model.load('deeplabv3_mnv2_ade20k_train_2018_12_03_saved')
inp = np.random.uniform(0, 255, [1, 513, 513, 3])
t0 = time.time(); imported.signatures['serving_default'](tf.cast(inp, tf.uint8)); t1 = time.time(); print((t1 - t0) * 1000)

AastaLLL · June 2, 2020, 8:42am

Hi,

Sorry for the late reply.

Would you mind to check the memory status when running the script?
If the memory also reaches the limitation, it may have some impact on the performance.

Thanks.

Topic		Replies	Views
TensorFlow-TensorRT inference time and memory consumption on Nano Jetson Nano	2	990	October 18, 2021
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1247	August 27, 2020
Loading TensorRT model is very slow on Jetson Nano Jetson Nano tensorrt , tensorflow , jetson-inference , python	5	2673	October 15, 2021
How to convert Tensorflow model to Tensorrt? Jetson Nano tensorrt , tensorflow	8	2407	October 15, 2021
Question about inference speed Jetson Nano	2	600	October 18, 2021
Performance statistics of Jetson Nano on deep learning inference Jetson Nano	7	3635	October 18, 2021
Reasons for unexpected inference speed Jetson Nano jetson-inference	4	53	July 16, 2025
Tensorflow to TensorRT [help] Jetson Nano tensorflow	2	687	November 14, 2022
Error Converting model to tensor RT Jetson Nano tensorrt , tensorflow	3	686	October 15, 2021
TensorRT on Jetson Nano Jetson Nano tensorrt	2	445	June 6, 2023

.numpy() is very slow on Jetson Nano

Related topics