How can I divide/segment the function so that I can measure ONLY the inference time in the GPU?

Hi

I want to measure the time of ONLY the inference in the TX2. How can I improve my function to do that? As right now I am measuring:

  • the transfer of the image from CPU to GPU

  • transfer of results from GPU to CPU

  • the inference

Or is that not possible because of the way GPUs work?

THIS WHERE I HAVE SET THE TIMER

THIS IS THE “do_inference” FUNCTION


Thank you

Hi @Aizzaac, you would put the timing around the context.execute() call. That is what is actually running the inferencing with TensorRT.

@dusty_nv
Can you give me an example, please?

The function, I have it in Inference.py
And the timer I have it in Timer.py

I haven’t used that exact timer before, but I think you would want to add import time to the top of your Inference.py.

And then psuedocode for in your do_inference() function:

cuda.memcpy_htod_async(...)

start = time.perf_counter()
context.execute(...)
inference_time = time.perf_counter() - start
print("INFERENCE TIME")
print(inference_time * 1000)

cuda.memcpy_dtoh_async(...)
1 Like