TensorRT 5 - Python profiler

Adding the default python profiler to the execution context works fine:

context.profiler = tensorrt.Profiler()

But this only prints the profiling information (time for each layer) to the console. Is there a way to programmatically get the computation time for each layer? I tried to write my own profiler by subclassing tensorrt.Profiler, but it seems that the report_layer_time() function is not even called?

class CustomProfiler(trt.Profiler):
    def __init__(self, name):
        self.name = name
        self.layers = {}

    def report_layer_time(self, layer_name: str, ms: float):
        print('Report layer {} = {}'.format(layer_name, ms))
        self.layers[layer_name] = ms

# In the execution context
context.profiler = CustomProfiler('custom')

The attribute self.layers remains empty after execution. Line 8 is not printed. But it somehow still prints the computation time of each layer after execution (i.e. as if I was using the default profiler tensorrt.Profiler).

Any idea on how to get the computation time for each layer and then aggregate them to get the total execution time? The samples for TensorRT 5 only show an example in C++. The documentation is also rather limited https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Core/Profiler.html


Provide your own class that derives from IProfiler, you could create a std::map or std::vector that captures the layer name and timing.


NVIDIA Enterprise Support

Thank you for your answer.

I’m looking for a Python implementation. I saw a C++ example in the samples (sampleNMT.cpp) and tried to reproduce it in Python. Here is the implementation in sampleNMT:

struct SimpleProfiler : public nvinfer1::IProfiler
    struct Record
        float time{0};
        int count{0};

    virtual void reportLayerTime(const char* layerName, float ms)
        mProfile[layerName].time += ms;

        const char* name,
        const std::vector<SimpleProfiler>& srcProfilers = std::vector<SimpleProfiler>())
        : mName(name)

    std::string mName;
    std::map<std::string, Record> mProfile;

// In main()
std::vector<SimpleProfiler> profilers;
if (gEnableProfiling)
    profilers.push_back(SimpleProfiler("Beam shuffle"));
encoderContext->execute(...) // Does this automatically call profilers[1].reportLayerTime() and populate profilers[1].mProfile?

Does Line 39 automatically call profilers[1].reportLayerTime() and populate profilers[1].mProfile? If that is the case, I don’t see why my Python implementation is not working. After executing the context, the dictionary self.layers remains empty.

I solve this by using pybind11 to build own binding library:

class PyProfiler : public nvinfer1::IProfiler
    using reportLayerTime_t = std::function<void(std::string, float)>;
    PyProfiler(reportLayerTime_t cbf): mCBReportLayerTime(cbf){}
    virtual void reportLayerTime(const char* layerName, float ms) override
        mCBReportLayerTime(std::string(layerName), ms);
    reportLayerTime_t mCBReportLayerTime;

PYBIND11_MODULE(sample, m)
  namespace py = pybind11;

  py::class_<PyProfiler, std::shared_ptr<PyProfiler>, nvinfer1::IProfiler>(m, "PyProfiler")


from sample import PyProfiler

class Profiler(PyProfiler):
    def __init__(self):
        super(Profiler, self).__init__(self.report_layer_time)

    def report_layer_time(self, name, time):
        raise NotImplementedError

class SimpleProfiler(Profiler):
    def report_layer_time(self, name, time):
        print(name, time)