Programming Tensor Cores in CUDA 9

jwitsoe · October 17, 2017, 9:29am

Originally published at: https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/

Tensor cores provide a huge boost to convolutions and matrix operations. Tensor cores are programmable using NVIDIA libraries and directly in CUDA C++ code. A defining feature of the new Volta GPU Architecture is its Tensor Cores, which give the Tesla V100 accelerator a peak throughput 12 times the 32-bit floating point throughput of the previous-generation…

anon7279269 · October 17, 2017, 1:40pm

Tensor-Cores - will it also be available within the next Gamer-GPU (Geforce) ?

anon61728009 · October 29, 2017, 9:26am

That's a good question !

anon68409244 · January 1, 2018, 1:51am

interesting possibilities for crypto currencies using tensor cores...

anon19689551 · January 5, 2018, 11:53pm

Which NN framework did you use for Figure4 inference?

anon90255818 · January 8, 2018, 7:46pm

This will really help the 3D porn industry, oh and cancer research, possibly...
When is "Titan W" coming out? You know, Two Titan-V's in one card?
Are you waiting for "Windows 4K" to be made?

anon19750978 · March 4, 2018, 4:17am

I concur, I think I even know the theoretical solution: For each instance of CUDA algorithm line calculation, it gets stored in Matrix A until 16 instances are filled, then Stored into B, where their computation is multiplied which ought to lower the the net computation time by 4096 per tensor core utilized.... I've already seen this applied with the cryptonight algorithm, in terms of Hash Calculation however it was insufficient to generate a nonce. Create the nonce with that version of cryptonight and you revolutionize cryptomining!

anon56595808 · August 24, 2018, 9:14am

10 months later, the answer is yes.

anon70852881 · April 26, 2019, 3:57am

Extremely helpful. Thanks as always :)

anon5260794 · May 31, 2019, 7:07am

Question: Do the Tensor cores run concurrently with the CUDA cores? If I were to have my deep learning model cranking away on TCs, could I simultaneously be rendering high quality graphics?

anon76359922 · September 6, 2019, 8:10am

Nice blog but it misses the most important statement - "Tensor Cores require that the Tensors be in NHWC data layout." So if NCHW is given, it transposes it to NHWC.

anon78119193 · October 30, 2019, 12:46am

Can you provide information about relative performance V100 / RTX 2080 TI or V100 / RTX 2080 ?
Thank you

khlorghaal · October 20, 2020, 3:11pm

GEMMs that do not satisfy the above rules will fall back to a non-Tensor Core implementation

this sounds like a silent failure, and a really bad thing
i ~~assume~~ hope theres some function to assert or check that TPUs are being used?

mitra.subhrajit007 · November 27, 2021, 9:03pm

I think there is a technical error in this image https://developer-blogs.nvidia.com/wp-content/uploads/2017/12/tensor_cube_white-624x934.png. There should be 16 green layers instead of 12. As tensor core only performing multiplication of a 4x4 data, as a result 16 -4x4 array will be generated.

santamonic · November 28, 2022, 12:47pm

I have a little question for tensor core code example in blog.
Since the threadblock dim3 is (128, 4), namely 16 warps, I think the loop over k is should be 4xWMMA_K which would optimize the performance.
Would this change the loop stride affect the correctness of result ?

Topic		Replies	Views
Programming Tensor Cores in CUDA 9 Technical Blog	0	247	August 21, 2022
Tensor core, is my analysis correct? CUDA Programming and Performance	2	58	February 5, 2025
Tensor core architecture deep-dive any whitepaper blog available? GPU-Accelerated Libraries cudnn , cublas	1	941	February 20, 2024
Tensor Cores Jetson AGX Xavier	8	1314	October 18, 2021
Question regarding Tensor Cores/GV100 CUDA Programming and Performance	8	2542	August 12, 2017
How to detect if GPU has tensor cores ? CUDA Programming and Performance	2	2360	November 15, 2018
Tensor core boiler plate with cublas, can not compile GPU-Accelerated Libraries cudnn	3	14	February 18, 2025
Tensor cores and CUDA cores work in parallel Video Processing & Optical Flow cuda	2	191	July 10, 2024
How to achieve peak tensor core utilization TensorRT	1	728	September 20, 2022
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	516	November 22, 2022

Programming Tensor Cores in CUDA 9

Related topics