I am running a ML workload and I would like to know the matrix sizes - m, n, k and the data types of these matrices. With nsys I am able to see the actual gemms but not the matrix size, transpose flags, and the data types of input/output matrices. How or which profiler would give out these information?
With Compute Nsight you can look into the SASS instructions.
This should show you everything except “transpose flags”, as those are handled by filling the registers in a transposed way.
I can get the grid and the block sizes but not the input sizes of the matrices to the kernel via nsight compute. Could you please show it with an example or any link showing this information?