Description
Hello,
I have transferred a Yolov3(ONNX) model to Xavier(Jetpack4.2.2) and Xavier NX(Jetpack4.4), but the model running on XavierNX is faster than on Xavier.
After some test, I have found that on Xavier a function named,
voidcuPointwise::launchPointwise<cuPointwise::SimpleAlgo<char, int>>(cuPointwise::LaunchParams, nvinfer1::VirtualMachineProgram)
occupied the most time.
But on XavierNX this function hasn’t been invoked.
I also use another model to test, HigherHRNet(ONNX), but this will not call voidcuPointwise::launchPointwise<cuPointwise::SimpleAlgo<char, int>>(cuPointwise::LaunchParams, nvinfer1::VirtualMachineProgram)
on Xavier.
Any ideas?
Environment
TensorRT Version: Xavier: TensorRT5.1, XavierNX: TensorRT7.1
CUDA Version: Xavier: 10.0, XavierNX: 10.2
CUDNN Version: Xavier: 7.5, XavierNX: 8.0.0
Relevant Files
Xavier
XavierNX
Steps To Reproduce
I test with trtexec
and the command is:
Xavier:
./trtexec --onnx=/home/ets/Documents/yolov3/yolov3_bn16_m.onnx --loadEngine=/home/ets/Documents/yolov3/yolov3_bn16_int8_m.engine --workspace=4096 --int8 --fp16 --batch=16
XavierNX:
./trtexec --onnx=/home/ets/Documents/yolov3/yolov3_bn16_m.onnx --loadEngine=/home/ets/Documents/yolov3/yolov3_bn16_in8t.engine --explicitBatch --workspace=4096 --fp16 --int8 --batch=16 --verbose