CUDA 10 Features Revealed: Turing, CUDA Graphs, and More

Originally published at: https://developer.nvidia.com/blog/cuda-10-features-revealed/

For the last eleven years, NVIDIA’s CUDA development platform has unleashed the power of GPUs for general purpose processing in a wide variety of applications. These include: high performance computing (HPC), data center applications, and content creation workflows. Most recently, artificial intelligence systems and applications ranging from embedded systems to the cloud have benefited from…

The RT Core is the most exciting part of the Turing architecture to me, since BVH traversal is a big performance bottleneck for my applications. Is OptiX the only way to access this new hardware, or will there be an API for accessing the RT core directly from CUDA in the future?

Unfortunately yes, according to the presentation at GTC 2018 in Munich, OptiX (or Vulkan/DX12) is currently the only way to make use of the RT cores.

MPS(Multi-Process Service) has a few restrictions. One of the most mysterious one is unsupport of dynamic parallelism. Is it still prohibited on the Turing generation?

What is L1/shared bandwidth per smx in Turing? Around 1 TB/s? I can get mostly 110 MB/s per sm from a low end Kepler gpu.