RG program wouldn't compile if I allocate a large local array

I wanted to create a per-ray local float array of 10K elements. When running the code, I got:

Caught exception: OPTIX_ERROR_INTERNAL_COMPILER_ERROR: Optix call 'optixModuleCreateFromPTX( state.context, &module_compile_options, &state.pipeline_compile_options, ptx.c_str(), ptx.size(), log, &sizeof_log, &state.camera_module )' failed:

If I decrease the local array size the program runs. So I guess there is a limit of the local memory size? I am guessing it has to do with the stack size that can be set in the host code? I am not using CCs and DCs,

I set the maxTraceDepth in optixUtilComputeStackSizes to be 2, because I never recursively call optixTrace other than in the RG program.

Hmm I think I am just being stupid here. I am launching ~120K rays, so the total amount of memory required could be more than that VRAM size.