Hi,

I am using the Jetson AGX Xavier with the latest JetPack 4.1.1 (TensorRT 5.0)

I was trying to duplicate results with the benchmarks posted on this site:

https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks

and found out, I have a gap between the published results and my results.

Can you guide me how to get the same results?

My only interest is in ResNet-50 graph with Batch-size=8.

The published results show:

LATENCY (ms) = 11.2 for 15W Mode

LATENCY (ms) = 6.2 for MAX-N Mode

I assume they used this command:

./trtexec --avgRuns=100 --deploy=resnet50.prototxt --int8 --batch=8 --iterations=10000 --output=prob --useSpinWait

Witch is for GPU only with int8 precision.

(Using DLA is X3 slower with fp16 VS GPU only with fp16)

Please see my ./trtexec output prints using the same command (except --iterations=10):

(15W mode)

avgRuns: 1000

deploy: /home/nvidia/Networks/ResNet-50/deploy.prototxt

int8

batch: 8

iterations: 10

output: prob

useSpinWait

Input “data”: 3x224x224

Output “prob”: 1000x1x1

name=data, bindingIndex=0, buffers.size()=2

name=prob, bindingIndex=1, buffers.size()=2

Average over 1000 runs is 14.3147 ms (host walltime is 14.3454 ms, 99% percentile time is 14.3826).

Average over 1000 runs is 14.2869 ms (host walltime is 14.3124 ms, 99% percentile time is 14.3984).

Average over 1000 runs is 14.2821 ms (host walltime is 14.308 ms, 99% percentile time is 14.3534).

…

(MAX-N Mode)

avgRuns: 100

deploy: /home/nvidia/Networks/ResNet-50/deploy.prototxt

int8

batch: 8

iterations: 10

output: prob

useSpinWait

Input “data”: 3x224x224

Output “prob”: 1000x1x1

name=data, bindingIndex=0, buffers.size()=2

name=prob, bindingIndex=1, buffers.size()=2

Average over 100 runs is 9.6837 ms (host walltime is 9.69914 ms, 99% percentile time is 33.8719).

Average over 100 runs is 7.48239 ms (host walltime is 7.49908 ms, 99% percentile time is 8.92989).

Average over 100 runs is 7.49587 ms (host walltime is 7.50919 ms, 99% percentile time is 8.79376).

Average over 100 runs is 7.47715 ms (host walltime is 7.49505 ms, 99% percentile time is 8.53834).

…

Any idea why the gap between published benchmarks and mine?