I’m trying to find any working example of cuTile kernel in C++. I only ran into hypothetical versions of how it would be (including Nvidia presentations), where the tile kernel will be defined with _tile_global_, _tile_, etc. It was promised to be delivered in CUDA 13.2 but recently I watched Bryce Lelbach’s presentation about cuTiles in C++ (GTC 2026) and it says that now it will be delivered in CUDA 13.3, at least planned. Maybe there are some examples I can look at?
CUDA Tile C++ is now available in the CUDA 13.3 toolkit. Here are some resources for getting started:
Examples: cuda-samples/cpp/9_CUDA_Tile at master · NVIDIA/cuda-samples · GitHub
Programming Guide: 2.4. Writing Tile Kernels — CUDA Programming Guide
Blog: Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile | NVIDIA Technical Blog
API Reference: CUDA Tile C++ API Reference — CUDA Tile C++ API Reference 13.3 documentation
Tile Gym: TileGym/src/tilegym/ops/tilecpp at main · NVIDIA/TileGym · GitHub