Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA

Originally published at: https://developer.nvidia.com/blog/advanced-strategies-for-high-performance-gpu-programming-with-nvidia-cuda/

Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping applications onto massively parallel machines. Going beyond the basics to explore the intricacies of GPU programming, he focuses on practical techniques such as parallel program design and specific details of GPU…

Thank you for the video, but I was lost when you talked about running the kernel in an reversed order to optimize the cache efficiency, because I thought we can’t control the order of kernel execution order?

For example if we have launch<3,3>(), do we know which kernel gets executed first? I thought we don’t, it should be random?