I just want to test the performance of the Orin CUDA core, but I find that I use tensor core every time for calculations. How can I avoid using tensor core and only use cuda core?
Use ncu to get the usage of tensor core:
Could I ask you another question? I want to confirm my method. My hardware is Jetson orin. I just want to know if my program uses tensorCore.
My Nsight Compute Version is :
According to this article, I sould use command “ --csv --nvtx -f -o ./.nsight-cuprof-report --metrics sm__inst_executed_pipe_hmmafp32_sum ”, I can’t get any report by using this command:
But I get a report by using command “ --csv --nvtx -f -o ./.nsight-cuprof-report --metrics sm__inst_executed_pipe_tensor_op_hmma.sum ”, this report show that in Windows Nsight Compute:
Based on the above results,the value of “sm__inst_executed_pipe_tensor_op_hmma.sum” is non-zero. Can I conclude that the program uses TensoreCore? If so, which commands in SASS use TenoreCore for computation? Do the two instructions “HFMA2.MMA” and “IMAD.WIDE” correspond to TensorCore computation?
I use command “nvcc hmma.cu -arch=sm_80 -o hmma” and command “nvprof --kernels wmma_gemm_a_col_major_b_col_major --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./hmma”, but get this result:
I used root privileges. But “nvprof --kernels wmma_gemm_a_col_major_b_col_major --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./hmma” still doesn’t work.
I tried your method, but encountered a problem. It seems that the GPU cannot be detected?
When I used command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile --gpu-metrics-device=0 ./hmma”, I get this result:
Command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile --gpu-metrics-device=help ./hmma”, I get this result:
Command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile ./hmma”, I get this result:
Sorry for not paying attention to the version information. Due to network environment limitations, I used the 2022.1.1 version of nsys. But it still doesn’t seem to work.
When I used command “sudo /root/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=help ./hmma”, I get this result:
Comand “sudo /root/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./hmma”, I get this result:
By the way, how can I tell whether the machine architecture is tegra(jetson) or sbsa? The result obtained by “uname -m” is aacrh64.
That seems I cannot find the tensor core activity in the SM instructions/Tensor Active row. My command and result are as follows:
Command “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile ./hmma”
I opened the generated report in Windows NSight System, but it seems that the SM column is not found. Is there something wrong with my setting? Or is the software version I downloaded wrong?