How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device?

yuanxd20 · December 12, 2023, 11:50am

I just want to test the performance of the Orin CUDA core, but I find that I use tensor core every time for calculations. How can I avoid using tensor core and only use cuda core?
Use ncu to get the usage of tensor core：

This kernel named MatrixMulCUDA is from https://github.com/NVIDIA/cuda-samples/blob/master/Samples/0_Introduction/matrixMul/matrixMul.cu.

AastaLLL · December 13, 2023, 2:49am

Hi,

We don’t have an API to disable TensorCore currently.
Thanks.

yuanxd20 · December 13, 2023, 3:30am

All right，Thank you!

yuanxd20 · December 19, 2023, 1:39pm

Hi, AastaLLL

Could I ask you another question？ I want to confirm my method. My hardware is Jetson orin. I just want to know if my program uses tensorCore.
My Nsight Compute Version is :
w5zlKwVcUa

According to this article, I sould use command “ --csv --nvtx -f -o ./.nsight-cuprof-report --metrics sm__inst_executed_pipe_hmmafp32_sum ”, I can’t get any report by using this command:

But I get a report by using command “ --csv --nvtx -f -o ./.nsight-cuprof-report --metrics sm__inst_executed_pipe_tensor_op_hmma.sum ”, this report show that in Windows Nsight Compute:

Based on the above results，the value of “sm__inst_executed_pipe_tensor_op_hmma.sum” is non-zero. Can I conclude that the program uses TensoreCore? If so, which commands in SASS use TenoreCore for computation? Do the two instructions “HFMA2.MMA” and “IMAD.WIDE” correspond to TensorCore computation?

Thanks.

AastaLLL · December 20, 2023, 8:30am

Hi,

Please check below comment:

Thanks.

yuanxd20 · December 20, 2023, 10:59am

Hi,

I use command “nvcc hmma.cu -arch=sm_80 -o hmma” and command “nvprof --kernels wmma_gemm_a_col_major_b_col_major --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./hmma”, but get this result:

It seems that computing versions greater than 7.5 cannot use this method.

Are there any other methods? Or will the method I talked about earlier work?

Thanks.

AastaLLL · December 25, 2023, 5:32am

Hi,

Have you launched the profiler with sudo?
Please note that it required the root authority to gather the GPU trace.

Thanks.

yuanxd20 · January 4, 2024, 1:51am

Hi,

I used root privileges. But “nvprof --kernels wmma_gemm_a_col_major_b_col_major --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization ./hmma” still doesn’t work.

Thanks.

AastaLLL · January 17, 2024, 5:26am

Hi,

Please try it with nsys.
For example:

$ sudo /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./hmma

Then you can find the tensor core activity in the SM instructions/Tensor Active row.

Thanks.

yuanxd20 · January 17, 2024, 10:17am

Hi,
AastaLLL

I tried your method, but encountered a problem. It seems that the GPU cannot be detected?
When I used command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile --gpu-metrics-device=0 ./hmma”, I get this result:

Command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile --gpu-metrics-device=help ./hmma”, I get this result:

Command “sudo /opt/nvidia/nsight-systems/2023.4.1/target-linux-sbsa-armv8/nsys profile ./hmma”, I get this result:

Thanks.

AastaLLL · January 18, 2024, 4:46am

Hi,

Your tool is for the SBSA server rather than Jetson.

Please install it via the below command:

$ sudo apt install nsight-systems-2023.2.4
$ /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys --version
NVIDIA Nsight Systems version 2023.2.4.44-33011852v0

Thanks.

yuanxd20 · January 18, 2024, 1:32pm

Hi,
AastaLLL

Sorry for not paying attention to the version information. Due to network environment limitations, I used the 2022.1.1 version of nsys. But it still doesn’t seem to work.
When I used command “sudo /root/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=help ./hmma”, I get this result:

Comand “sudo /root/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./hmma”, I get this result:

By the way, how can I tell whether the machine architecture is tegra(jetson) or sbsa? The result obtained by “uname -m” is aacrh64.

thanks.

AastaLLL · January 22, 2024, 7:45am

Hi,

-gpu-metrics-device=0 might need a newer profiler.
Could you check if you can get the profiling file without using the flag?

Tegra or L4T (linux4tegra) is used for integrated GPU (onboard chip).
While the SBSA package is for discrete GPU (PCIe).

Thanks.

yuanxd20 · January 22, 2024, 12:01pm

Hi,
AastaLLL

That seems I cannot find the tensor core activity in the SM instructions/Tensor Active row. My command and result are as follows:
Command “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile ./hmma”

Command “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys nvprof ./hmma”

Thanks.

AastaLLL · January 23, 2024, 5:26am

Hi,

Please open the file with the Nsight system UI tool on the desktop.
You can find it in the Timeline View

Thanks.

yuanxd20 · January 23, 2024, 12:08pm

Hi,
AastaLLL

I opened the generated report in Windows NSight System, but it seems that the SM column is not found. Is there something wrong with my setting? Or is the software version I downloaded wrong?

Thanks.

AastaLLL · January 24, 2024, 6:41am

Hi,

Based on the screenshot, it looks like the .rep file is captured on a desktop rather than Orin.
Here are the details for the steps:

1. Generate a profile file on Orin.

$ sudo /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys profile --gpu-metrics-device=0 ./cudaTensorCoreGemm

2. Copy the file to the desktop.
We use 2023.4.1.84-234133515197v0 on Windows.

There should be an iGPU (Orin) option.
Then click iGPU (Orin) → GPU Metrics → SM Instructions → Tensor Active

Thanks.

yuanxd20 · January 29, 2024, 7:34am

Hi,
AastaLLL

I followed your steps and updated the nsys version to 2023.4.1, but could not found " iGPU (Orin)".

The command I used is “sudo /opt/nvidia/nsight-systems/2022.1.2/target-linux-tegra-armv8/nsys profile ./hmma”.

Thanks.

AastaLLL · February 5, 2024, 6:48am

Hi,

Would you mind testing the profiler with our CUDA sample:

/usr/local/cuda/samples/0_Simple/cudaTensorCoreGemm/

This can help us figure out whether the issue is from the profiler or the sample.

Thanks.

system · March 14, 2024, 5:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The performance of the Jetson Orin Nano module does not match the data provided on the official website Jetson AGX Orin cuda , performance	15	2394	September 28, 2023
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	505	November 22, 2022
Error when trying to run an Application with auto-profile on the Nsight Compute GUI Jetson AGX Orin nsight	15	853	March 1, 2023
Nsight Compute for Jetson Orin Jetson AGX Orin nsight	5	1942	August 31, 2022
Can you use nsight to see tensor core occupancy? Nsight Compute cudnn	4	917	March 23, 2024
Nsight compute - SSH login with root Jetson AGX Orin nsight	34	3613	September 21, 2022
Tensor core metrics not showing up in NSight? Profiling Linux Targets pytorch	9	2882	May 18, 2024
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1771	March 18, 2024
Model inference on multiple cuda streams with tensorrt api Jetson AGX Orin tensorrt , nsight , nvbugs	23	2078	February 20, 2024
Cannot profiler GPU Kernel on Orin-n using NsightCompute Jetson AGX Orin nsight	2	406	December 25, 2023

How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device?

Related topics