Hi all, I am running the sample uff_resnet50.py to test the performance of the TRT5 inference using the Python API, the script only predicts the label but doesn’t show performance metrics such as throughput (img/sec) and latency.
To calculate the latency I have modified the do_inference method as below:
def do_inference(context, h_input, d_input, h_output, d_output, stream): # Transfer input data to the GPU. cuda.memcpy_htod_async(d_input, h_input, stream) # Run inference. <b>tstart = time.time() </b> context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle) <b>timing = (time.time() - tstart) </b>
to calculate the throughput (img/sec) as below I need the batch_size but I don’t see such parameter in the script:
throughput (img/sec) = batch_size / timing
Could you please recommend how to extract throughput (img/sec)?