Description
There is no method in TensorRT Python API for setting a particular DLA core for inference?
Environment
TensorRT Version: 7.1.3.4
GPU Type: Jetson Xavier NX
Nvidia Driver Version: JatPack-4.4
CUDA Version: 10.2
CUDNN Version: 8.0
Python Version (if applicable): 3.6
Baremetal or Container (if container which image + tag): baremetal
Steps To Reproduce
According to official documentation, there are TensorRT C++ API functions for checking whether DLA cores are available, as well as setting a particular DLA core for inference. However, there is no such functions in the Python API?
I tried the following with python3 on Jetson Xavier NX (TensorRT 7.1.3.4):
>>> import tensorrt as trt
>>> logger = trt.Logger(trt.Logger.VERBOSE)
>>> runtime = trt.Runtime(logger)
>>> dir(runtime)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'deserialize_cuda_engine', 'gpu_allocator']
>>> with open('yolov3-dla0-608.trt', 'rb') as f:
... engine = runtime.deserialize_cuda_engine(f.read())
...
[TensorRT] VERBOSE: Deserialize required 900134 microseconds.
>>> dir(engine)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', 'binding_is_input', 'create_execution_context', 'create_execution_context_without_device_memory', 'device_memory_size', 'get_binding_bytes_per_component', 'get_binding_components_per_element', 'get_binding_dtype', 'get_binding_format', 'get_binding_format_desc', 'get_binding_index', 'get_binding_name', 'get_binding_shape', 'get_binding_vectorized_dim', 'get_location', 'get_profile_shape', 'get_profile_shape_input', 'has_implicit_batch_dimension', 'is_execution_binding', 'is_shape_binding', 'max_batch_size', 'max_workspace_size', 'name', 'num_bindings', 'num_layers', 'num_optimization_profiles', 'refittable', 'serialize']
>>> context = engine.create_execution_context()
>>> dir(context)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'active_optimization_profile', 'all_binding_shapes_specified', 'all_shape_inputs_specified', 'debug_sync', 'device_memory', 'engine', 'execute', 'execute_async', 'execute_async_v2', 'execute_v2', 'get_binding_shape', 'get_shape', 'get_strides', 'name', 'profiler', 'set_binding_shape', 'set_shape_input']
>>>
So neither of the “tensorrt.Runtime”, “tensorrt.ICudaEngine” or “tensorrt.IExecutionContext” classes provides any API for setting DLA core for inferencing (for a deserialzed TensorRT engine). How do I make sure the deserialzed TensorRT engine is running on a DLA core, or on DLA core #1 vs. #0??