I run imagenet-console with TensorRT in jeston tx2, it spends about 30s to classify 300 pictures when only using one thread。
when using 5 threads, it spends about 180s to classify 20005 pictures(2000 pictures in each thread, total is 20005), about 55fps.
when using 6 threads, there are some errors(Cuda Error in execute: 4).
PS:googlenet, batch-size is 128, FP16 enabled, mode 0,
CUDA error 4 is cudaErrorLaunchFailure.
Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory.
[url]CUDA Runtime API :: CUDA Toolkit Documentation
Guess that you may meet the out of resource issue.
How can I reproduce conditions to improve inference the performance in jeston TX2, accoriding to Table 2. Could u show the test source code to help me to speed up imagenet-console.