Hopper thread block reconfiguration

https://developer.nvidia.com/blog/nvidia-grace-hopper-superchip-architecture-in-depth/ says thread block reconfiguration can mprove fft performance. another report says it is newly introduced feature with h100. However i cant find any information about that in cuda programming guide:( Is there any references how to use this feature (API or something?)

double-post?

1 Like

Thank you. Can i change the number of thread during kernel execution?? Im working on implementing fft and fusing 2 or 3 kernels with that feature needs to change execution environment like gridDim or blockDim. For example, when i compute 64k fft with 128,512 radix, i need 256 threads for 128 and 512 threads for 512 for each threadblock