How to figure out a total number of double operations in cuFFT::cufftExecZ2Z on a device?

Hello, I’m work on a benchmark about a performance comparision of cufft vs. third-party FFT library. I think to look an internal implementations’ algorithms in terms of counters of floating point operations. There is there a way do it with the ncu output metrics?

I’m not 100% sure this will get you what you need, but I’d run ncu with “–set=full --details-all” and then look at the roofline section. You can see the number of instructions per cycle. Cycles per usecond/nsecond is given if you need to convert to seconds. Also, I believe FMA should be weighted as 2 operations if converting to FLOPS.

Here’s an example of the output. This kernel is memory bound so ignore the values, I’m just posting it to show what information is available:


    Section: GPU Speed Of Light Roofline Chart
    ---------------------------------------------------------------------- --------------- ------------------------------
    derived__sm__sass_thread_inst_executed_op_dfma_pred_on_x2                         inst                           5120
    derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2                         inst                          10240
    derived__smsp__sass_thread_inst_executed_op_dfma_pred_on_x2                       inst                          27.02
    derived__smsp__sass_thread_inst_executed_op_ffma_pred_on_x2                       inst                           0.89
    dram__bytes.sum.peak_sustained                                             Kbyte/cycle                           1.02
    dram__bytes.sum.per_second                                                Gbyte/second                         415.33
    dram__cycles_elapsed.avg.per_second                                      cycle/usecond                         886.93
    sm__cycles_elapsed.avg.per_second                                        cycle/nsecond                           1.26
    sm__sass_thread_inst_executed_op_dfma_pred_on.sum.peak_sustained            inst/cycle                           2560
    sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained            inst/cycle                           5120
    smsp__cycles_elapsed.avg.per_second                                      cycle/nsecond                           1.26
    smsp__sass_thread_inst_executed_op_dadd_pred_on.sum.per_cycle_elapsed       inst/cycle                           8.46
    smsp__sass_thread_inst_executed_op_dfma_pred_on.sum.per_cycle_elapsed       inst/cycle                          13.51
    smsp__sass_thread_inst_executed_op_dmul_pred_on.sum.per_cycle_elapsed       inst/cycle                           3.86
    smsp__sass_thread_inst_executed_op_fadd_pred_on.sum.per_cycle_elapsed       inst/cycle                              0
    smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed       inst/cycle                           0.45
    smsp__sass_thread_inst_executed_op_fmul_pred_on.sum.per_cycle_elapsed       inst/cycle                              0

-Mat

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.