Exploring the New Features of CUDA 11.3

Originally published at: Exploring the New Features of CUDA 11.3 | NVIDIA Technical Blog

CUDA is the software development platform for building GPU-accelerated applications, providing all the components you need to develop applications that use NVIDIA GPUs. CUDA is ideal for diverse workloads from high performance computing, data science analytics, and AI applications. The latest release, CUDA 11.3, and its features are focused on enhancing the programming model and…

CUDA 11.3 significantly improves the performance of Ampere/Turing/Volta Tensor Core kernels.

298TFLOPS was recorded on A100 when benchmarking FP16 GEMM from CUTLASS, an open source CUDA DL/HPC library (GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines). This is 14% higher than CUDA 11.2. FP32(via TF32) GEMM is improved by 39% and can reach 143TFLOPS. The same speedup applies to the CONV kernels.

Also, see the discussion here: CUDA 11.3 significantly improved the performance of CUTLASS · Discussion #241 · NVIDIA/cutlass · GitHub

Metrics and Performances of Cuda 11.3 made it easy for me to buy GE RTX 3070 for my new desktop…But your website is too confusing for a seamless installation of the drivers, toolkit and CuDNN…I have done cleanup twenty times, and still to see one instance of my model training using the GPU…