On Xavier AGX, is there a way to check/confirm that tensor cores and cuda cores can be executed in parallel on the same SM?
I have done quite some research online, including reading this post: Run Parallel Tensor Cores GEMM and Cuda GEMM - #8 by mnicely
I also used a dummy program that creates multiple streams, some of them running GEMM on tensor cores and the rest of them running on cuda cores(non-MMA OPs). I see some overlap between the two kinds of streams but I cannot confirm whether the overlapping streams run on the same SM or not since there’s no such information from nvprof+nvvp.
Can someone from Nvidia help confirm whether this is possible? If so, how do I explicitly program the GPUs to run with more parallelism?