In theory, yes. The correct interpretation of the grid size limits (unlike e.g. block size limits) is that all 3 dimensional limits can be achieved simultaneously.

I say “in theory” because it’s difficult to test the limits simultaneously. The number of blocks for such a grid size (i.e. the product of the numbers) is quite large. As a thought experiment, if we suppose that a given GPU has on average a cost of 10ns per block, Then just launching 2^31-1 blocks (the first dimension maximum) will require ~20 seconds to run. If we then max out the 2nd dimension, we’re at 390 hours. If we then max out the 3rd dimension, we’re at over 1 million days of run time.

FWIW I tried 2^31-1 blocks on a GT640 GPU, and it took about 30 seconds for a trivial kernel. I then upped it by a factor of 10 (changing second dimension from 1 to 10) and sure enough it took ~300 seconds to run.

I tried launching a maximal configuration, and the kernel began to run (i.e. it did not immediately throw an error on launch, as it would if you had exceeded one of the individual limits). I did not wait 1 million days however.

Even if you assume you have a GPU that is 100 times faster than mine, you’re still looking at 10,000 days (~30 years), to test a maximal launch with a trivial kernel, according to this thought experiment.