Advanced API Performance: Async Compute and Overlap

Originally published at: Advanced API Performance: Async Compute and Overlap | NVIDIA Developer Blog

This post covers best practices for async compute and overlap on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. The general principle behind async compute is to increase the overall unit throughput by reducing the number of unused warp slots and to use nonconflicting…

Writing this blog has been very insightful. Finding overlap opportunities for different datapaths is my personal favorite. Perhaps because it’s more challenging to find.

I would like to underscore the importance of GPU Trace for displaying a large amount of valuable performance data in a meaningful and organized way. Its an absolute enabler in visualizing performance gaps that can lead to improvement opportunities such as async compute.

If you have any questions or you want to share your experience on this topic, please feel free to reply!