This depends on which GPUs you’re targeting and which version of OptiX you’re using. It also depends on what you mean about hitting a first group and then continuing, I don’t quite understand the details yet.
For OptiX version & GPU what matters is whether you’re using OptiX 6, whether you’re targeting any RTX hardware, and whether you’re using the GeometryTriangles API.
For your scene setup, is your first-hit group a static set of objects? Like, do all rays need to test against the same set of objects first before continuing?
For anything before OptiX 6, our advice for best performance with shadow rays has been to use any_hit and immediately terminate the ray in the any_hit program. You might use an any-hit shader that passes through but records your first-hit group hits, and then for any other hit will terminate the ray if the first-hit group was hit. That would allow you to do this with a single ray cast rather than a hit & re-casting a second ray.
For OptiX 6, the best-practices advice has changed, mostly for use with RTX hardware & GeometryTriangles. The fastest shadow rays are now done using closest_hit along with the rtTrace ray flag RT_RAY_FLAG_TERMINATE_ON_FIRST_HIT. Also in that case, you should disable any-hit completely by using one of the *_DISABLE_ANYHIT flags either on your instances or as a flag to rtTrace(). Any-hit programs are enabled by default, so you have to opt-out if you don’t need them and don’t want to incur any cost at all. The more you can stick to the RT cores during traversal, the better, and the main way to do that is to set it up so that no programs are executed until after the hit is decided.
That OptiX 6 advice doesn’t apply so much to custom geometry that has an intersection program. In that case, the intersection program always has to run during traversal. If that’s what you have, a fourth option might be to put your hit/miss logic in the intersection code instead of any-hit…
For your first-hit group test, if it’s a single group then you might be able to use the visibility mask features of OptiX 6 to break your scene into the first-hit group and the second-hit group. Doing that will be faster than using CUDA code to test which set the hit is in.
Earlier this year at GTC I talked about these things and how to think about the relationship between RT cores and CUDA. There might be things in here that could help you, if you haven’t already listened: https://developer.nvidia.com/gtc/2019/video/S9768