This question has two parts, and I would appreciate if you could answer both parts.
-
Are there any examples demonstrating how to use the
setOptimizationProfileAsync
function withexecuteV2
?
executeV2
is synchronous (and does not require a CUDA stream), whereassetOptimizationProfileAsync
is asynchronous and requires a CUDA stream. Therefore, is there an example of how to use these together? I’d ideally usesetOptimizationProfile
and keep everything synchronous, but this API function is now deprecated. -
In my application, I will be rapidly switching between multiple batch sizes. For example, I’ll run inference with a batch size of 10, 10, 10, 4, 1, 1, 10, 10, 4, 1, 10, 1, 1, 1…
In a situation like the above, is it better to create a single IExecutionContext
and change the optimization profile (using setOptimizationProfileAsync
) each time the batch size changes?
Or it is better to create 3 separate execution contexts, each with their own optimization profile. Based on the batch size, I’d dispatch the inference request to the appropriate execution context.
Which will be faster? What will be the memory overhead incurred by creating multiple contexts? Is there a speed penalty when switching optimization profile on a context?
Once again, I’d appreciate if you could answer both questions. Thank you