Why the engine created by tensorrt runs with higher speedup ratio on Jetson TX2 than on Titan X?


Recently,I used the tensorrt2 to speed up a network,and compared its inference time with the origin network.I found that when testing on Titan X the engine doubles the inference speed.While on Jetson TX2,the engine can reach nearly 10 speedup ratio compared with the origin network.On Titan X,the platform doesn't support 16-bit khalf mode,and I used this mode on TX2.Is this the only reason that makes the engine can reach much bigger speedup ratio on TX2? Is there any other reasons? Waiting for your reply,thanks.


1.Please remember that TensorRT PLAN is required to compile on the correct GPU architecture to have best optimization.
That is, the PLAN compiled on TX2 cannot be used on Titan X.

2.Titan X supports INT8 mode, it’s recommended to give it a try.

3.If your use-case is a tiny network with lots of IO transfer(ex. image), TX2 may gain some performance from the shared memory design.

If possible, could you share your model with us? We want to have a further investigation on this use case.