This question has two parts, and I would appreciate if you could answer both parts.
Are there any examples demonstrating how to use the
executeV2is synchronous (and does not require a CUDA stream), whereas
setOptimizationProfileAsyncis asynchronous and requires a CUDA stream. Therefore, is there an example of how to use these together? I’d ideally use
setOptimizationProfileand keep everything synchronous, but this API function is now deprecated.
In my application, I will be rapidly switching between multiple batch sizes. For example, I’ll run inference with a batch size of 10, 10, 10, 4, 1, 1, 10, 10, 4, 1, 10, 1, 1, 1…
In a situation like the above, is it better to create a single
IExecutionContext and change the optimization profile (using
setOptimizationProfileAsync) each time the batch size changes?
Or it is better to create 3 separate execution contexts, each with their own optimization profile. Based on the batch size, I’d dispatch the inference request to the appropriate execution context.
Which will be faster? What will be the memory overhead incurred by creating multiple contexts? Is there a speed penalty when switching optimization profile on a context?
Once again, I’d appreciate if you could answer both questions. Thank you