No method in TensorRT Python API for setting DLA core for inference


There is no method in TensorRT Python API for setting a particular DLA core for inference?


TensorRT Version:
GPU Type: Jetson Xavier NX
Nvidia Driver Version: JatPack-4.4
CUDA Version: 10.2
CUDNN Version: 8.0
Python Version (if applicable): 3.6
Baremetal or Container (if container which image + tag): baremetal

Steps To Reproduce

According to official documentation, there are TensorRT C++ API functions for checking whether DLA cores are available, as well as setting a particular DLA core for inference. However, there is no such functions in the Python API?

I tried the following with python3 on Jetson Xavier NX (TensorRT

>>> import tensorrt as trt
>>> logger = trt.Logger(trt.Logger.VERBOSE)
>>> runtime = trt.Runtime(logger)
>>> dir(runtime)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'deserialize_cuda_engine', 'gpu_allocator']
>>> with open('yolov3-dla0-608.trt', 'rb') as f:
...     engine = runtime.deserialize_cuda_engine(
[TensorRT] VERBOSE: Deserialize required 900134 microseconds.
>>> dir(engine)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', 'binding_is_input', 'create_execution_context', 'create_execution_context_without_device_memory', 'device_memory_size', 'get_binding_bytes_per_component', 'get_binding_components_per_element', 'get_binding_dtype', 'get_binding_format', 'get_binding_format_desc', 'get_binding_index', 'get_binding_name', 'get_binding_shape', 'get_binding_vectorized_dim', 'get_location', 'get_profile_shape', 'get_profile_shape_input', 'has_implicit_batch_dimension', 'is_execution_binding', 'is_shape_binding', 'max_batch_size', 'max_workspace_size', 'name', 'num_bindings', 'num_layers', 'num_optimization_profiles', 'refittable', 'serialize']
>>> context = engine.create_execution_context()
>>> dir(context)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'active_optimization_profile', 'all_binding_shapes_specified', 'all_shape_inputs_specified', 'debug_sync', 'device_memory', 'engine', 'execute', 'execute_async', 'execute_async_v2', 'execute_v2', 'get_binding_shape', 'get_shape', 'get_strides', 'name', 'profiler', 'set_binding_shape', 'set_shape_input']

So neither of the “tensorrt.Runtime”, “tensorrt.ICudaEngine” or “tensorrt.IExecutionContext” classes provides any API for setting DLA core for inferencing (for a deserialzed TensorRT engine). How do I make sure the deserialzed TensorRT engine is running on a DLA core, or on DLA core #1 vs. #0??

Hi @jkjung13,

You can set DLA core in IBuilderConfig

Check if the layer can run on DLA:


@SunilJB When deserializing a TensorRT engine from a file, we don’t create a Builder or BuilderConfig, do we??

1 Like

I think what you are looking for is a python equivalent of below command.
Set the DLA engine to execute on:


But this is not available yet. You can use IBuildeConfig to set DLA while using Python API.


This does not help at inference time… (We don’t build TensorRT engines on the deployed products. We only use serialized engine.)

Could you consider adding the Python API in the next TensorRT release? Thanks.


Same opinion here, please add it