Releasing Optimization profile

I am using multiple trt engines in my inference pipeline.For each of those engines,I create execution context once and initially.All these engines have multiple optimization profiles linked with different batch sizes.(Each engine has 8 optimization profiles).These profiles have to be switched depending upon the variable input batch size.While reusing a particular profile in subsequent runs,I get the following error while using context.active_optimization_profile -

[TensorRT] ERROR: Profile 3 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.

According to the documentation-

When multiple execution contexts run concurrently, it is allowed to switch to a profile which was formerly used but already released by another execution context with different dynamic input dimensions.

How to release the optimization profile from a context?I cannot afford to delete and recreate context after every batch as engine.create_execution_context() is slow and incurs a lot of overhead.

Following is my inference code-

def do_inference(context, bindings, inputs, outputs, stream):

    #Copy data to GPU

    [cuda.memcpy_htod_async(inp.device,inp.host,stream) for inp in inputs]

    #Run Inference

    context.execute_async_v2(bindings=bindings,stream_handle=stream.handle)


    #Copy output back to CPU

    [cuda.memcpy_dtoh_async(out.host,out.device,stream) for out in outputs]

    #Synchronize the stream

    stream.synchronize() 

    return [out.host for out in outputs]

Following is my profile switching code that calls the above do_inference method -

#switch optimization profile depending upon batch size
context.active_optimization_profile = dynamic_batch_size-1
#images to host
inputs[dynamic_batch_size-1].host = images
#do inference
output = do_inference(context,bindings,inputs,outputs,stream)
#parse output            
output = output[dynamic_batch_size-1]

Any help would be appriciated.

Hi @jasdeepchhabra94,
I believe below link should be able to help you
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/ExecutionContext.html

The IExecutionContext contains shared resource, so if you want parallel execution, you have to create two IExecutionContext , one assigned for each cuda stream.

Thanks!