Separate CUDA Core pipeline for FP16 and FP32?

On GA100 (SM8.0)

  • Shared pipe handles Tensor, FP16, and FP64
  • FMA pipe handles IMAD, IDP, and FP32 operations.

On GA10x (SM8.6)

  • First chip family with 2x FP32
  • Shared pipe handles Tensor operations.
  • FMAheavy pipe handles IMAD, IDP, and FP32 operations.
  • FMAlite pipe handles FP32
  • FP16x2 operations are dual-issued to both FMAheavy and FMAlite pipe.

On GH100 (SM9.0)

  • Shared pipe handles Tensor and FP64 operations.
  • Same as GA10s for FMA pipes and FP16x2.