How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device?

I just want to test the performance of the Orin CUDA core, but I find that I use tensor core every time for calculations. How can I avoid using tensor core and only use cuda core?
Use ncu to get the usage of tensor core:


This kernel named MatrixMulCUDA is from https://github.com/NVIDIA/cuda-samples/blob/master/Samples/0_Introduction/matrixMul/matrixMul.cu.

Hi,

We don’t have an API to disable TensorCore currently.
Thanks.

All right,Thank you!

Hi, AastaLLL

Could I ask you another question? I want to confirm my method. My hardware is Jetson orin. I just want to know if my program uses tensorCore.
My Nsight Compute Version is :
w5zlKwVcUa

According to this article, I sould use command “ --csv --nvtx -f -o ./.nsight-cuprof-report --metrics sm__inst_executed_pipe_hmmafp32_sum ”, I can’t get any report by using this command:

But I get a report by using command “ --csv --nvtx -f -o ./.nsight-cuprof-report --metrics sm__inst_executed_pipe_tensor_op_hmma.sum ”, this report show that in Windows Nsight Compute:

Based on the above results,the value of “sm__inst_executed_pipe_tensor_op_hmma.sum” is non-zero. Can I conclude that the program uses TensoreCore? If so, which commands in SASS use TenoreCore for computation? Do the two instructions “HFMA2.MMA” and “IMAD.WIDE” correspond to TensorCore computation?

Thanks.

Hi,

Please check below comment:

Thanks.

Hi,

I use command “nvcc hmma.cu -arch=sm_80 -o hmma” and command “nvprof --kernels wmma_gemm_a_col_major_b_col_major --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./hmma”, but get this result:


image

It seems that computing versions greater than 7.5 cannot use this method.

Are there any other methods? Or will the method I talked about earlier work?

Thanks.

Hi,

Have you launched the profiler with sudo?
Please note that it required the root authority to gather the GPU trace.

Thanks.

Hi,

I used root privileges. But “nvprof --kernels wmma_gemm_a_col_major_b_col_major --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./hmma” still doesn’t work.

Thanks.

1 Like

Hi,

Please try it with nsys.
For example:

$ sudo /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./hmma

Then you can find the tensor core activity in the SM instructions/Tensor Active row.

Thanks.

Hi,
AastaLLL

I tried your method, but encountered a problem. It seems that the GPU cannot be detected?
When I used command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile --gpu-metrics-device=0 ./hmma”, I get this result:
image
Command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile --gpu-metrics-device=help ./hmma”, I get this result:
image
Command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile ./hmma”, I get this result:

Thanks.

Hi,

Your tool is for the SBSA server rather than Jetson.

Please install it via the below command:

$ sudo apt install nsight-systems-2023.2.4
$ /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys --version
NVIDIA Nsight Systems version 2023.2.4.44-33011852v0

Thanks.

Hi,
AastaLLL

Sorry for not paying attention to the version information. Due to network environment limitations, I used the 2022.1.1 version of nsys. But it still doesn’t seem to work.
When I used command “sudo /root/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=help ./hmma”, I get this result:
image
Comand “sudo /root/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./hmma”, I get this result:
image
By the way, how can I tell whether the machine architecture is tegra(jetson) or sbsa? The result obtained by “uname -m” is aacrh64.

thanks.

Hi,

-gpu-metrics-device=0 might need a newer profiler.
Could you check if you can get the profiling file without using the flag?

Tegra or L4T (linux4tegra) is used for integrated GPU (onboard chip).
While the SBSA package is for discrete GPU (PCIe).

Thanks.

Hi,
AastaLLL

That seems I cannot find the tensor core activity in the SM instructions/Tensor Active row. My command and result are as follows:
Command “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile ./hmma”
image
Command “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys nvprof ./hmma”

Thanks.

Hi,

Please open the file with the Nsight system UI tool on the desktop.
You can find it in the Timeline View

Thanks.

Hi,
AastaLLL

I opened the generated report in Windows NSight System, but it seems that the SM column is not found. Is there something wrong with my setting? Or is the software version I downloaded wrong?

Thanks.

Hi,

Based on the screenshot, it looks like the .rep file is captured on a desktop rather than Orin.
Here are the details for the steps:

1. Generate a profile file on Orin.

$ sudo /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./cudaTensorCoreGemm

2. Copy the file to the desktop.
We use 2023.4.1.84-234133515197v0 on Windows.

There should be an iGPU (Orin) option.
Then click iGPU (Orin) → GPU Metrics → SM Instructions → Tensor Active

Thanks.

Hi,
AastaLLL

I followed your steps and updated the nsys version to 2023.4.1, but could not found " iGPU (Orin)".

The command I used is “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile ./hmma”.

Thanks.

Hi,

Would you mind testing the profiler with our CUDA sample:

/usr/local/cuda/samples/0_Simple/cudaTensorCoreGemm/

This can help us figure out whether the issue is from the profiler or the sample.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.