Optix 6, moved for loop in ray gen to launch index. Getting launch error 9

As the title said, I found a way to move a for loop inside the ray generation into the launch index.

This seems to work well for smaller launches (~50000, ~100) but when I increase the height I get a launch error (~50000, ~50000)

Optix context error : Unknown error (Details: Function “_rtContextLaunch3D” caught exception: Encountered a rtcore error: m_api.launch3D( cmdlist, pipeline, launchBufferVA, scratchBufferVA, raygenSbtRecordVA, exceptionSbtRecordVA, firstMissSbtRecordVA, missSbtRecordSize, missSbtRecordCount, firstInstanceSbtRecordVA, instanceSbtRecordSize, instanceSbtRecordCount, firstCallableSbtRecordVA, callableSbtRecordSize, callableSbtRecordCount, toolsOutputVA, toolsOutputSize, scratchBufferSizeInBytes, width, height, depth ) returned (9): Launch failure)

I tried launching 2 dimensional and 3 dimensional with the depth set to 1. I double checked the stack size, increased the maxDepth (trace and program).

It’s with Optix 6, running RTX, Windows 10 (up to date), Cuda 10, iirc.

Any ideas at what to best poke at?