I am using the ‘tcgen05.mma.cta_group::2.kind::mxf4nvf4.block_scale.scale_vec::4X’ instruction to write a nvfp4 gemm kernel. When I use clusterdim (2,1,1) , the mma instruction can reach 95% peak computing throughput. But when I use clusterdim (4,1,1) or (2,2,2), the mma instruction only reach 50% peak computing throughput no matter what I try. I wonder if there any way to solve or explain this problem?
Hi,
Could you share the sample code you are running?
So we can check it internally?
Thanks.
It seems that the number of Active Clusters affects the compute throughput.
When I use _cluster_dims_(2,1,1), ncu showed 10 clusters active and can fill up 20 SMs;
But when I use _cluster_dims_(4, 1, 1), the activate clusters was 4 or 3, and it couldn’t use all SMs.
I ran the cutlass case 72a_blackwell_nvfp4-bf16_gemm https://github.com/NVIDIA/cutlass/blob/main/examples/72_blackwell_narrow_precision_gemm/72a_blackwell_nvfp4_bf16_gemm.cu or the case examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100 https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu, and ncu analyzed the same result. I am wondering if the distribution of GPC determined that the clusterdim of thor could only be set to (2, 1, 1). Can you confirm internally? Thank you.
Hi,
Sure, we will check this with our internal team.
Just double-confirm that these two experiments are all running with the same nvpmodel mode. Is that correct?
Some nvpmodel (ex., 90W) will turn off partial TPC so it will affect the results.
Thanks.
yes,and command nvpmodel -q shows “NV Power Mode: MAXN“
Hi,
Could you try to set it to 90W?
Thanks.
I found there are only mode 0 and mode 1 in /etc/nvpmodel.conf, how to use nvpmodel -m 2 ?
May I ask what the purpose of 90W test is?And if it is necessary, what to add to nvpmodel.conf? Because I noticed that 90W can only use 6 TPCs.
Hello, may I ask if you have any results yet?
Hi,
Thanks for your patience.
We are still checking this issue internally.
90W is for a test to see if any difference when changing the number of TPC.
We do see the 90W configuration in our nvpmodel.conf.
...
< POWER_MODEL ID=2 NAME=90W >
...
Do you use JetPack 7.1?
Thanks.
I’m using Jetpack 7.0 and I can only find MAXN and 120W configuration when I use cat /etc/nvpmodel.conf.
Hello,I’m wondering if there are any results yet?
