INT 32 and FP64 can be used concurrently in the Volta architecture?

GV100 has several math pipelines:
FMA pipe - executes FP32 instructions and IMAD (integer multiply and add)
ALU pipe - executes INT32 (not IMAD), logical operations, binary operations, and data movement operations
FP64 pipe - executes FP64 instructions
FP16 pipe - executes FP16x2 instructions
Tensor pipe - executes matrix multiply and accumulate instructions

The FP64, FP16, and Tensor pipe use the same dispatch port so you cannot dispatch to these pipes at the same time.

The FMA and ALU pipeline each have separate dispatch ports. It takes 2 cycles to dispatch a warp to the each of these pipes (16 cores).

Concurrent execution is done by alternating instruction dispatch to different pipes.

On GV100 INT64 math is implemented by various units including:

  • FMA pipe - IMAD instruction
  • ALU pipe - LEA instruction

The answer to your question is not straight-forward.

Pipeline utilization metrics for GV100 are available in CUDA >= 10.1 tools. Nsight 2019.1 adds the pipeline utilization in the Compute Workload Analysis section.

3 Likes