How to improve image-classify speed in tx2.

Hi,

I run imagenet-console with TensorRT in jeston tx2, it spends about 30s to classify 300 pictures when only using one thread。
when using 5 threads, it spends about 180s to classify 20005 pictures(2000 pictures in each thread, total is 20005), about 55fps.
when using 6 threads, there are some errors(Cuda Error in execute: 4).
PS:googlenet, batch-size is 128, FP16 enabled, mode 0,

my source based on the following:
https://github.com/dusty-nv/jetson-inference/tree/master/imagenet-console

thanks.

Hi,

CUDA error 4 is cudaErrorLaunchFailure.
Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory.
[url]CUDA Runtime API :: CUDA Toolkit Documentation

Guess that you may meet the out of resource issue.

Thanks.

@AastaLLL
Hi,

How can I reproduce conditions to improve inference the performance in jeston TX2, accoriding to Table 2. Could u show the test source code to help me to speed up imagenet-console.

Table 2 below shows how the performance increases going from Max-Q to Max-P and the maximum GPU clock frequency while the efficiency gradually reduces.
https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/

Thanks.

Hi,

Apply following command to maximize TX2 performance:

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

You can find more information about Max-Q and Max-P here:

Thanks.