How to improve image-classify speed in tx2.

liu.stover · February 5, 2018, 5:46am

Hi,

I run imagenet-console with TensorRT in jeston tx2, it spends about 30s to classify 300 pictures when only using one thread。
when using 5 threads, it spends about 180s to classify 20005 pictures(2000 pictures in each thread, total is 20005), about 55fps.
when using 6 threads, there are some errors（Cuda Error in execute: 4）.
PS:googlenet, batch-size is 128, FP16 enabled, mode 0,

my source based on the following:
https://github.com/dusty-nv/jetson-inference/tree/master/imagenet-console

thanks.

AastaLLL · February 6, 2018, 3:37am

Hi,

CUDA error 4 is cudaErrorLaunchFailure.
Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory.
[url]CUDA Runtime API :: CUDA Toolkit Documentation

Guess that you may meet the out of resource issue.

Thanks.

liu.stover · February 6, 2018, 8:45am

@AastaLLL
Hi，

How can I reproduce conditions to improve inference the performance in jeston TX2, accoriding to Table 2. Could u show the test source code to help me to speed up imagenet-console.

Table 2 below shows how the performance increases going from Max-Q to Max-P and the maximum GPU clock frequency while the efficiency gradually reduces.
https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/

Thanks.

AastaLLL · February 9, 2018, 7:10am

Hi,

Apply following command to maximize TX2 performance:

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

You can find more information about Max-Q and Max-P here:

Thanks.