TensorRT V10 inference using context.execute_async_v3()

There are many examples of inference using context.execute_async_v2().
However, v2 has been deprecated and there are no examples anywhere using context.execute_async_v3(…).

The TensorRT developer page says to: Specify buffers for inputs and outputs with “context.set_tensor_address(name, ptr)”

The API has "context.set_input_shape(“name, tuple(input_batch.shape))” and “set_output_allocator()”, but after days of mucking around I have got nowhere.

Can some please provide an example or suggestion.


First, you have to set input shape:

tensor_name = engine.get_tensor_name(0) # input tensor
context.set_input_shape(tensor_name, input_shape) # use your input_shape
assert context.all_binding_shapes_specified

Then set up input and output buffers (I use numpy arrays as input and output):

d_input = cuda.mem_alloc(np.prod(input_shape) * np.dtype(np.float32).itemsize)
d_output = cuda.mem_alloc(np.prod(output_shape) * np.dtype(np.float32).itemsize)
context.set_tensor_address(engine.get_tensor_name(0), int(d_input)) # input buffer
context.set_tensor_address(engine.get_tensor_name(1), int(d_output)) #output buffer

Then you can run inference:

cuda.memcpy_htod_async(d_input, input_data, stream) # put data to input