How can I divide/segment the function so that I can measure ONLY the inference time in the GPU?

Aizzaac · October 22, 2020, 6:48pm

Hi

I want to measure the time of ONLY the inference in the TX2. How can I improve my function to do that? As right now I am measuring:

the transfer of the image from CPU to GPU
transfer of results from GPU to CPU
the inference

Or is that not possible because of the way GPUs work?

THIS WHERE I HAVE SET THE TIMER

THIS IS THE “do_inference” FUNCTION

Thank you

dusty_nv · October 22, 2020, 7:05pm

Hi @Aizzaac, you would put the timing around the context.execute() call. That is what is actually running the inferencing with TensorRT.

Aizzaac · October 22, 2020, 7:07pm

@dusty_nv
Can you give me an example, please?

The function, I have it in Inference.py
And the timer I have it in Timer.py

dusty_nv · October 22, 2020, 8:01pm

I haven’t used that exact timer before, but I think you would want to add import time to the top of your Inference.py.

And then psuedocode for in your do_inference() function:

cuda.memcpy_htod_async(...)

start = time.perf_counter()
context.execute(...)
inference_time = time.perf_counter() - start
print("INFERENCE TIME")
print(inference_time * 1000)

cuda.memcpy_dtoh_async(...)