Decomposing BVH to accelerate traversal

What means “large” in number of primitives?

The BVH traversal is happening in parallel already. The GAS is a BVH itself, so there is some hierarchy in that which helps to prune unnecessary AABB intersection tests. Normally it’s faster to combine more primitives into one GAS.

If you mean you would want to store multiple GAS traversable handles explicitly and determine which one to intersect yourself before calling optixTrace, don’t do that. Do not even trace against a single GAS. Always put an IAS on top even if it’s using an identity matrix, because that is the fully hardware accelerated path on RTX boards and faster then tracing against a GAS.

If you mean you have a GAS which you want to split into multiple smaller GAS placed under an instance AS, it would depend on your scene data and the way the rays are shot through that world if that resulted in any improvement.

What you could try, is to order geometric primitives spatially into a grid structure of smaller GAS which you place into OptixInstance with identity matrices so that the resulting instance AABBs overlap the least amount.
The thing which could help BVH traversal there is pruning more rays from testing AABB intersections than happens inside the GAS traversal already.

If there are islands/clumps/buckets of geometry in your scene, you might be able to build a better spatially disjunct set of AABBs over that with the instances with that knowledge about the scene, than the built-in GAS builder over the full set of geometry.
The goal is to have the least amount of overlap in these AABBs to avoid testing the contents inside them.

The benefit of that would also be, that the amount of temporary and final memory for the GAS would decrease compared to building one big GAS in case memory that is a concern.

You should compact the GAS for better memory locality and use the OPTIX_BUILD_FLAG_PREFER_FAST_TRACE if you want maximum traversal performance.