How to find out GPU time for executing a particular block of code?

Using python.
How do I find out the time taken for neural network inference?

Thank you

Assuming your question has to do with CUDA programming as suggested by the forum title, you would use the functions described in Appendix B.11 of the Programming Guide.

In ordinary python, I would just do host-based timing.
The only caveat will be to make sure that whatever CUDA processing you have launched from Python is complete, and the specific approach here will depend on exactly what you are doing. For example Numba CUDA python would be different than Tensorflow.

For example, in Numba CUDA python, I would do something like this:

[url]performance - Why is @cuda.jit python program faster than its cuda-C equivalent? - Stack Overflow

I am using Keras with TF backend, in that case how do I find out time taken by GPU for model inference?

Thank you for replying.
I am using Keras with TF backend, in that case how do I find out time taken by GPU for model inference?

I think a google search can give you a pretty good starting point. For example this should generally work:

[url]How to measure execution time for prediction per image (keras) - Stack Overflow

Would this give me the GPU time? (and not CPU?)

It will give you the time it takes to execute a particular block of code, including when that code calls GPU activity, which seemed to be what you were asking.

You might wish to explore GPU profilers. They are documented here:

[url]https://docs.nvidia.com/cuda/profiler-users-guide/index.html[/url]

I am using transformer-based LLM for inference and I want to record the GPU time and memory usage. The nvidia nsight system is not working, just complicated things…

I think that is probably the best choice for what you have described. If you want to try to get it working, you can ask questions about nsight systems on the nsight systems forum.