We are trying to profile an LSTM-based neural network during inference where the observation is as follows:
- When we set the sequence length to be 2, the NCU readings match with our estimates.
- Whereas when we set the sequence length to be 8 or 16, the NCU readings scale up by 4 and 2 times respectively.
Our assumption was that it is a problem of the last fully-connected layer and upon profiling the Fully-Connected Layer (same dimension) during inference of we observe the similar trend. This implies that the major fault is in the NCU readings of the kernels responsible for the scaling mismatch.
Can anyone help me navigating against this problem. Some potential faults I assume are:
- Might be due to some tensor operations (presence of kernel ampere_sgemm_128x32_tn)
- Some settings in the NCU profiling that we have not accounted for
sudo ncu --target-processes all --set roofline -f -o results/ncu-reps/lstm_infer_SEQ_2_bs_1_20epoch bash exp_script.sh