Inference time on Jetson Xavier compared with local host PC？

13126678366 · December 3, 2019, 6:50am

Hello, everyone. I compared the inference time on Jetson Xavier and local host PC. I found infered a 28 pixels*28 pixels grayscale MNIST image on my local host PC only takes 0.08 ms, while on the Jetson Xavier needs 0.6 ms. I am confused about the comparison results.

1.In my opinion, the inference on the Jetson Xavier must be faster than the host PC. But it shows the otherwise results.

2.The Nvidia only gives a simple example on model training on DIGITS(lenet5) with TensorFlow framework, but it didn’t give detailed procedures on how to make inference with the trained model(form DIGITS) on Jetson Xavier.

3.I have tried to train a lenet5 model, convert it to lenet5.pb, then convert it to lenet5.uff, when make inference on the Jetson Xavier, it failed.

Does anyone have the same questions？
If you have the solutions of the above questions, I would be very appreciated for you kind response.

AastaLLL · December 3, 2019, 7:20am

Hi,

Could you share how do you measure the MNIST performance.

1.If you are using TensorFlow, it’s recommended to run it with TensorRT instead.
TensorFlow implementation doesn’t optimize for ARM and Jetson architecture, which may have much lower performance.

2. We have lots of inference example. You can start from these tutorial:
https://github.com/dusty-nv/jetson-inference
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html

3. TensorRT has a TensorFlow-based LeNet5 example. It can give you some information.
/usr/src/tensorrt/samples/sampleUffMNIST

By the way, please remember to maximize the device performance before benchmark.

sudo nvpmodel -m 0
sudo jetson_clocks

Thanks.

13126678366 · December 3, 2019, 12:01pm

Thanks,AastaLLL.

1.I followed the sampleUffMNIST sample, the sample infered 10 MNIST images, when the power is 15W, the averaged run time is about 0.6ms, when I maximize the device performance, the averaged run time decreased to about 0.4ms.

2.For I failed to modify the given sampleUffMNIST sample(such as infer 100 images), I just take the 10 images results as the benchmark(0.4 ms per MNIST images).

3.While in my host PC, I train a lenet5 model with MNIST dataset(45002 images as train data,14998 images as test data),here is the network:

def LeNet5(w_path=None):

    input_shape = (1, img_rows, img_cols)
    img_input = Input(shape=input_shape)

    x = Conv2D(32, (3, 3), activation="relu", padding="same", name="conv1")(img_input)
    x = MaxPooling2D((3, 3), strides=(2, 2), name='pool1')(x)
    x = Conv2D(64, (3, 3), activation="relu", padding='same', name='conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='pool2')(x)
    x = Dropout(0.25)(x)

    x = Flatten(name='flatten')(x)

    x = Dense(128, activation='relu', name='fc1')(x)
    x = Dropout(0.5)(x)
    x = Dense(128, activation='relu', name='fc2')(x)
    x = Dropout(0.5)(x)
    x = Dense(10, activation='softmax', name='predictions')(x)

    model = Model(img_input, x, name='LeNet5')
    if w_path:
        model.load_weights(w_path)

    return model

Although my trained model may has little difference with the sampleUffMNIST, they are both lenet5 network. Using my trained lenet5 make inference 60000 MNIST images only need 5s(that’s 0.08 ms per images).

So I have doubts on the time comparison that host PC inference is faster than Jetson Xavier.

Is it due to the sampleUffMNIST need to create the TensorRT engine,and the sample only infered 10 images,the TensorRT engine creation time occupied lots of time?

That was just my guess, hope you have reasonable explainations.

AastaLLL · December 4, 2019, 2:31am

Hi,

Could you share the detail information for your host PC with us first?
Do you have a desktop GPU on the PC?

Thanks.

13126678366 · December 4, 2019, 6:06am

My host PC detailed information：

Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
total processors：12

Yes,I have a desktop GPU on the PC, it is GeForce GT 730. But I think when I make inference using my trained model, it didn’t use the GPU.
I have done another experiment, making inference on my colleague’s PC which has no GPU.It still takes 5s inferring 60000 MNIST images. The same results as before.

Could you please compare the inference time on you side? (Take the simple lenet5 for example,verify my inference results)
Find the reasons why inference on Jetson takes longer time than on host PC without GPU.

Thanks.

AastaLLL · December 16, 2019, 8:09am

Hi,

Please find this page for the Xavier performance benchmark:
https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks

For Xaiver+15W+MNIST(with AlexNet):
It should run with 299 image/sec (batchsize=1) ~ 2270 image/sec (b=128).

A possible reason is that TensorRT requires a compiling time when generate TRT engine from uff model.
This may take several minuites to choose a fast kernel based on the model and GPU architecure.

However, this is an one-time job and it is only needed when first time launch.
You can always create a TensorRT engine with the serialized file after first time.

Is it possible that your performance score includes the model compiling time?

Thanks.

13126678366 · December 18, 2019, 12:43am

Hi,AastaLLL. I am not sure whether the performance scorce include the model complining time.For I directly use your samples-sampleUffMNIST under the tensorrt.
I just run the sampleUffMNIST,I think the sampleUffMNIST has created a TensorRT engine with the seralized file. When I run the sampleUffMNIST,it won’t create the TensorRT engine again,am I right?

Thanks for your patient response.

AastaLLL · December 30, 2019, 8:15am

Hi,

sampleUffMNIST is demonstrate how to create a TensorRT engine from the uff file.
So it re-compile TensorRT engine from the MNIST uff each time.

To profile TensorRT, it’s recommended to use our trtexec which is located at /usr/src/tensorrt/bin/ instead.

sudo ./trtexec --uff=../data/mnist/lenet5.uff --uffInput=in,1,28,28 --output=out

Thanks.

Topic		Replies	Views
Jetson AGX Xavier shows unstable inference time Jetson AGX Xavier tensorrt , jetson-inference	6	729	October 18, 2021
Tensorflow inference of the very first image cost about 1min Jetson AGX Xavier	9	574	October 18, 2021
TensorRT inference Time TensorRT	1	781	September 20, 2018
Base Tensorflow Inference performance Jetson AGX Xavier	2	914	October 18, 2021
Poor Inference Time on Jetson TX1 Jetson TX1 jetson-inference	4	795	June 21, 2022
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3597	October 23, 2020
High latency while run TensorFlow with keras on Jetson Tx2 Jetson TX2	5	1637	October 18, 2021
Inference slow using nvInfer and TensorRT directly into PX2 General	6	779	April 17, 2019
Slow inference on jetson TX2 with tensorflow Jetson TX2	2	622	October 18, 2021
How to improve py-faster-caffe performance on JTX1? Jetson TX1	7	1748	October 18, 2021

Inference time on Jetson Xavier compared with local host PC？

Related topics