Hello ,some trouble in my platform.
I use Xavier for algorithm development ，
used yolov3 for detection.
To run this model I transfer Pytorch -> onnx -> trt
The first time I run this model ,that I just parse onnx and build engine with official interface , the inference time is normal .
So I serialized the model to binary file
Actually, when I deserialized and rerun model again , there must be a operation taking long time to run .
I check the inference time in each operation using profiler, there would be one layer that in fixed position that cost long time , not all layer.
And I repeatly run the same inference , the large time cost disappeared after about 10 time .
So Why and how to solve this problem ?
For I would use two stage network in future work ,this will have bad influence.