Hi Community member,
I use FP16 in cusparseSpMM() function, and use nvprof tool to analysis performance.
root@xxxxx:/home/ubuntu/csr_mm_cuda# nvprof --print-gpu-trace --metrics half_precision_fu_utilization ./nvidia-spmm.out 2
==1091== NVPROF is profiling process 1091, command: ./nvidia-spmm.out 2
0.456082
warm up
0.362930
==1091== Profiling application: ./nvidia-spmm.out 2
==1091== Profiling result:
Device Context Stream Kernel half_precision_fu_utilization
Tesla V100S-PCI 1 7 void cusparse::matri Idle (0)
Tesla V100S-PCI 1 7 void cusparse::parti Idle (0)
Tesla V100S-PCI 1 7 void cusparse::csrmm Low (1)
Tesla V100S-PCI 1 7 void cusparse::matri Idle (0)
Tesla V100S-PCI 1 7 void cusparse::parti Idle (0)
Tesla V100S-PCI 1 7 void cusparse::csrmm Low (1)
But i can’t get any tensor core information.
Would you please teach me the way to confirm whether the Tensor core is working or not in CuSPARSE.