Deviations in NCU reading

kunal.sahoo2003 · January 17, 2025, 5:06am

We are trying to profile an LSTM-based neural network during inference where the observation is as follows:

When we set the sequence length to be 2, the NCU readings match with our estimates.
Whereas when we set the sequence length to be 8 or 16, the NCU readings scale up by 4 and 2 times respectively.

Our assumption was that it is a problem of the last fully-connected layer and upon profiling the Fully-Connected Layer (same dimension) during inference of we observe the similar trend. This implies that the major fault is in the NCU readings of the kernels responsible for the scaling mismatch.

Can anyone help me navigating against this problem. Some potential faults I assume are:

Might be due to some tensor operations (presence of kernel ampere_sgemm_128x32_tn)
Some settings in the NCU profiling that we have not accounted for

sudo ncu --target-processes all --set roofline -f -o results/ncu-reps/lstm_infer_SEQ_2_bs_1_20epoch bash exp_script.sh

Topic		Replies	Views
Nsight Compute getting confused between kernels? Nsight Compute	6	328	June 17, 2024
Ncu-ui not profiling some sections Nsight Compute	4	2304	November 26, 2020
How to optimize kernel based on ncu report? Nsight Compute	4	518	March 18, 2024
Why does Throughput improve when profiling my TensorRT model inference performance using ncu Nsight Compute	4	241	July 18, 2024
Ncu does not detect kernels, ==ERROR== The application returned an error code (11) Nsight Compute kernel , profiling	6	1718	December 13, 2023
Nsight compute hanging issue Nsight Compute kernel	7	716	March 11, 2024
What exactly does SM Active Cycles mean? Nsight Compute	3	680	July 30, 2024
Question about ncu profiling Nsight Compute	2	551	March 2, 2022
Nsight-compute and NvBit differences Nsight Compute	3	910	February 14, 2023
Kernel time of Nsight system is larger than nsight compute Profiling Linux Targets	11	814	April 3, 2024

Deviations in NCU reading

Related topics