Best combination of dim3 Block and dim3 Grid

I have a RTX 4060 and want know, what best combination of dim3 Block and dim3 Grid to program run fast.

That depends on the program. I would suggest testing different block sizes and select the fastest one.

This is one of the most commonly asked questions about CUDA. With a bit of searching you will find many suggestions for guidelines. Here is one example, there are many others.

Thank you all