Effect of setAverageFindIterations/setMinFindIterations

The default values for builder->setAverageFindIterations and builder->setMinFindIterations are 1 and 2, respectively. When changing these values, the builder seems to choose a different set of cuda kernels for the engine.

  1. Is this expected? I assume that those parameters are used to time the layers during building and the timing is in turn used to select the fastest cuda kernels for the available GPU. So I guess this is expected? Can you confirm?

  2. Are there any recommended values for those parameters? I would assume setting these parameters higher would make the selection of cuda kernels more robust to small variations in timing. But, for example, doing builder->setAverageFindIterations(1) and builder->setMinFindIterations(1) makes the resulting engine run slightly faster than with builder->setMinFindIterations(2) – for ResNet50 on V100.

Hello,

Per engineering:
TRT measures the runtime of a tactic by: min( avg( t_00, t_01, … , t_0A ), avg( t_10, t_11, … , t_1A ), … , avg( t_M0, t_M1, … , t_MA ) ), where M = MinFind and A = AvgFind, and “t” is the runtime. We’d expect that larger M and A result in better tactic selection but longer build time.

So,

  1. This is expected. When we enlarge M and A, TRT gets more reliable runtime for each tactic and then get faster engine.

  2. Recommend “avgFind=8 minFind=1”. We can get more reliable runtime by enlarge avgFind, and we needn’t wait for 2ms before launching each kernel. Therefore it would not increase the time to build engine.

regards
NVIDIA Enterprise Support

Thank you for your answer. Concerning 2), you mention

a) “we needn’t wait for 2ms before launching each kernel”.
Does that mean that when timing a layer/kernel with min( avg( t_00, t_01, … , t_0A ), avg( t_10, t_11, … , t_1A ), … , avg( t_M0, t_M1, … , t_MA ) ), the kernel/layer is launched M times and each kernel launched is executed A times? I’m wondering where these 2ms come from.

b) “Therefore it would not increase the time to build engine.”
If I understand a) correctly, this means that the cost of launching a kernel is much larger than the cost of running the kernel A times? I would expect that enlarging avgFind would also increase the build time.