currently I’m testing TensorRT 2.1 on the TX2 but I’m not that happy about loading times.
I’ve taken https://github.com/dusty-nv/jetson-inference/blob/master/tensorNet.cpp as an example and based my code on it.
Here the developer creates a cache which is loaded if it exists. If not it is created by profiling.
My guess is this is done to save startup time of the program after the first run.
I’ve measured some of the time consuming functions and here are my results for my own net.
The call builder->buildCudaEngine(*network) takes 24.7 seconds.
After that the call infer->deserializeCudaEngine takes 0.003 seconds.
Now if I rerun the program the cache gets loaded. But infer->deserializeCudaEngine now takes 24.5 seconds.
This is extremly confusing. Why did the deserializeCudaEngine after buildCudaEngine takes up so less time?
For my example it doesnt’ matter if I build the model during every startup or not.
FP16 is not used…
For reference I compiled the example as well on my TX2 and added time measurements and disabled FP16.
The call builder->buildCudaEngine(*network) takes 37.3 seconds.
After that the call infer->deserializeCudaEngine takes 0.0851 seconds.
Now if I rerun the program the cache gets loaded. But infer->deserializeCudaEngine now takes 16.4 seconds.
The resulting bvlc_googlenet.caffemodel.2.tensorcache generated during first riun is 27MB big.
My own cache only 41KB as the net is much smaller. I’m confused why my net takes that much time to load.
Is there a way to debug or profile this?
And why is deserializeCudaEngine after buildCudaEngine so incredibly fast? Is there a cache in memory I can’t see as program code?
I can’t find lots of informations about this on this board. So maybe other ones haven’t same issues or arent’ as picky as I am. I only created this topic because I didn’t understand why cached loading is equally slow.