(1) We have computers with a GPU, with a model that we have converted to tensorrt and cached on disk. Imagine we want to update the version of CUDA or CuDNN or tensorrt. For these changes, must we re-convert the model to tensorrt? Are there other changes that would require us to reconvert the model? Under what conditions would it simply be a good idea to re-convert the model, but perhaps not a requirement?
(2) I believe it is recommended to convert models to tensorrt using the GPU architecture of choice. E.g. if we are deploying on a tegra, we should do the conversion on that specific type of tegra. Is that correct? If we deploy to multiple GPU architectures, it seems to behoove us to deploy a generic model format (e.g. onnx) and then convert it on the device. Some of our models are large, and require 10s of seconds to convert. For this circumstance, it would seem that our system would be forced to have downtime when we upgrade our model. What is the recommended way to avoid downtown?
Thanks for your quick answer. I think you are saying:
It doesn’t matter where we convert to tensorrt files.
It could be on a machine with a target GPU or a different GPU.
There is backwards compatibility for newer tensorrt versions reading in older tensorrt files. The only reason to re-convert would be if a newer versions of tensorrt failed to read an older version, which should not happen.
So, they look incorrect to me. It looks like when we upgrade to a newer version of tensorrt, we should we-convert. It also looks like we should do the correct conversion for each GPU.
(1) How does one minimize downtime when uploading a new network? If it takes a while to convert to tensorrt?
(2) I would guess one would download pre-converted tensorrt files to the board. If that is true, I need to know which tegra I am running on. What’s best practices for figuring out which board you are running on?