Performance metrics of Native TensorRT 5.0 using Python API

Hi all, I am running the sample to test the performance of the TRT5 inference using the Python API, the script only predicts the label but doesn’t show performance metrics such as throughput (img/sec) and latency.

To calculate the latency I have modified the do_inference method as below:

def do_inference(context, h_input, d_input, h_output, d_output, stream):
    # Transfer input data to the GPU.
    cuda.memcpy_htod_async(d_input, h_input, stream)
    # Run inference.
    <b>tstart = time.time() </b>
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    <b>timing = (time.time() - tstart) </b>

to calculate the throughput (img/sec) as below I need the batch_size but I don’t see such parameter in the script:

throughput (img/sec) = batch_size / timing

Could you please recommend how to extract throughput (img/sec)?

Hello, because execute_[b]async/b is used, I’m not sure you can assume the execution time is before and after execute_async() call. You’d need to instrument timing around the callback/handle.

Other than that, yes, I agree you can calculate img/sec = batch_size/timing

Hi NVES, the script doesn’t show the parameter batch size, therefore I can’t calculate the throughput as stated above. What do you recommend?

Please reference

batch size is passed to execute_async()

Hi NVES, I have handled the parameter batch size with

builder.max_batch_size = batch_size

On the other hand, I didn’t find anything on how to instrument timing around the callback/handle with execute_async(), could you please provide more specific instructions on how to implement it?

There is no simple or standard solution to measure an asynchronous call’s performance. I don’t think this is specific to execute_async().

You may want to research for a generic solution. Such as: