Some inconsistencies in the CUDA documentation?

Hi all!

On the page: Programming Guide :: CUDA Toolkit Documentation
they use src_shared in the example, which is not defined. I assume they meant src_global?

And on the page: Kernel Profiling Guide :: Nsight Compute Documentation
it is stated:

fp16 pipeline: […] It also contains a fast FP32-to-FP16 and FP16-to-FP32 converter. Starting with GA10x chips, this functionality is part of the FMA pipeline.
alu pipeline: […] On NVIDIA Ampere architecture chips, the ALU pipeline performs fast FP32-to-FP16 conversion.

My understanding is that GA10x is an Ampere architecture. So which pipe does the FP32-to-FP16 conversion, the FMA pipeline, the ALU pipeline or FP16 pipeline? (edit: Submitted this question to the profiler forum, here.)

Yes, I agree, I recommend filing a bug.

For questions specific to profiler or profiler documentation, I recommend asking those on the relevant profiler forum, here is the nsight compute forum.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.