Simplify GPU Programming with NVIDIA CUDA Tile in Python

Originally published at: Simplify GPU Programming with NVIDIA CUDA Tile in Python | NVIDIA Technical Blog

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was invented. Writing GPU tile kernels enables you to write your algorithm at a higher level than a single-instruction multiple-thread (SIMT) model, while the compiler and runtime handle the partitioning of…

Any roadmap for cuTile for C++?

Hi @Osayamen we plan to release cuTile for C++ sometime in 2026.

Isn’t this similar to Triton?

Tile based programming seems easier to understand
Good step for people working with AI and GPUs
Nice that it works across future hardware