Some inconsistencies in the CUDA documentation?

spraesi · October 5, 2022, 4:37pm

Hi all!

On the page: Programming Guide :: CUDA Toolkit Documentation
they use src_shared in the example, which is not defined. I assume they meant src_global?

And on the page: Kernel Profiling Guide :: Nsight Compute Documentation
it is stated:

fp16 pipeline: […] It also contains a fast FP32-to-FP16 and FP16-to-FP32 converter. Starting with GA10x chips, this functionality is part of the FMA pipeline.
alu pipeline: […] On NVIDIA Ampere architecture chips, the ALU pipeline performs fast FP32-to-FP16 conversion.

My understanding is that GA10x is an Ampere architecture. So which pipe does the FP32-to-FP16 conversion, the FMA pipeline, the ALU pipeline or FP16 pipeline? (edit: Submitted this question to the profiler forum, here.)

Robert_Crovella · October 5, 2022, 5:38pm

Yes, I agree, I recommend filing a bug.

For questions specific to profiler or profiler documentation, I recommend asking those on the relevant profiler forum, here is the nsight compute forum.

system · October 19, 2022, 5:39pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Which pipeline does FP32-to-FP16 conversion? Nsight Compute	6	870	October 27, 2022
Separate CUDA Core pipeline for FP16 and FP32? Nsight Compute	11	256	August 20, 2024
Mapping of pipelines to functional units CUDA Programming and Performance	7	59	November 30, 2024
What's cuda cores? CUDA Programming and Performance	0	550	May 19, 2023
Nsight Compute Profiler reports "Very High Utilization" CUDA Programming and Performance	5	958	January 10, 2023
I need help understanding how concurrency of CUDA Cores and Tensor Cores works between Turing and Ampere/Ada? CUDA Programming and Performance cuda , tensorflow , rtx , ampere	10	1206	September 27, 2024
Difference in SM performance of float16 and bfloat16 CUDA Programming and Performance	4	308	August 7, 2024
How FP32 and FP16 units are implemented in GP100 GPU's CUDA Programming and Performance	8	7421	March 28, 2017
A Question about how Ampere/Lovelace (RTX 3000/4000, GA10X/AD10X) cards handle Warp Dispatching CUDA Programming and Performance	13	371	June 1, 2024
Is there a document about in which hardware unit(ie. ALU FMU...) an instruction is executed? CUDA Programming and Performance	35	2653	October 5, 2022

Some inconsistencies in the CUDA documentation?

Related topics