Launch dimensions. How big is too big?

One of the purposes of my pathtracer is to calculate the amount of light that falls on each face of a mesh.

Currently it computes direct illumination in one launch, then it calculates the indirect lighting by sampling the hemisphere across multiple launches so as to avoid a possible timeout error.

The mesh is pretty much a plane so one can get a width and height value based on the number of rows in the polygon and number of polygons per row. The width and height are then used as input for a 2D launch.

Now I am considering the possibility of making a 3D launch with the depth value corresponding to the number of hemisphere samples; that way the sampling can be done in parallel.

But now I’m concerned about the possibility of having many many threads could potentially cause a timeout error, because my 2D launch can be like 1024 by 1024 and I’m considering the depth value to be at least 256.

If you’re concerned about a timeout, keep doing less work more often and accumulate the result.

A million rays per launch should be plenty enough of workload to make the additional launch overhead become irrelevant.
If things are still too slow and you’re only interested in the final result after each 256 iterations, you could also try to sample more convergent rays by using same seeds or other techniques to let your rays follow similar directions.