Adding the default python profiler to the execution context works fine:
context.profiler = tensorrt.Profiler()
But this only prints the profiling information (time for each layer) to the console. Is there a way to programmatically get the computation time for each layer? I tried to write my own profiler by subclassing tensorrt.Profiler, but it seems that the report_layer_time() function is not even called?
class CustomProfiler(trt.Profiler):
def __init__(self, name):
super().__init__()
self.name = name
self.layers = {}
def report_layer_time(self, layer_name: str, ms: float):
print('Report layer {} = {}'.format(layer_name, ms))
self.layers[layer_name] = ms
# In the execution context
context.profiler = CustomProfiler('custom')
The attribute self.layers remains empty after execution. Line 8 is not printed. But it somehow still prints the computation time of each layer after execution (i.e. as if I was using the default profiler tensorrt.Profiler).
Any idea on how to get the computation time for each layer and then aggregate them to get the total execution time? The samples for TensorRT 5 only show an example in C++. The documentation is also rather limited https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Core/Profiler.html