TRT7 + Algorithm selection - questions

carlosgalvezp · September 24, 2020, 8:50am

Hi,

Given that we implement an Algorithm Selection class that only provides 1 tactic per layer among the list of choices:

Is tactic profiling still needed?
If not, does TRT builder subscribe to system resources when building?
If not, can we now have multiple TRT builder instances (threads) building in parallel? It’s not the case otherwise, Nvidia claims it’s “undefined behavior”.
If profiling is not needed (we know in advance which tactics to use for each layer) - can we cross-build plan files for different GPU architectures? For example, given the tactic list for a Xavier architecture, can we build a plan file for that architecture on a regular x86 computer? (Instead of having to run TRT builder on the Xavier).

Thanks

AakankshaS · September 25, 2020, 3:48am

Hi @carlosgalvezp
Please find the response below :
(1) in principle timing is not needed, but TRT is not optimized for this case because at the point TRT allows the user to select the algorithm, the potential tactics have all been timed.
(2) Yes
(3) This should be possible, but there are some open bugs against parallel builds on distinct GPUs (which is supported.)
(4) No, for multiple reasons. We understand that cross-compilation is desirable, but it’s signficant work to enable beyond just choice of tactics.

Thanks!

carlosgalvezp · September 25, 2020, 1:55pm

Thanks a lot for the replies!

So basically Algorithm Selection cannot be used as “a cache” (like the int8 calibration cache) to speed up the build process, since “there will always be profiling and timing of all the tactics”. It’s main purpose is then reproducibility - choose the same tactic every time.

That answers my question, thanks!