I want to use malloc() to allocate memory space in the ray generation program, but it failed because the space requested was too large. What is the size limit for allocating memory space in the program? The size of memory space I applied for is 160MB.
That is the wrong approach. You should not use dynamic memory allocations inside OptiX device code. That wasn’t even possible in older OptiX versions and is potentially super slow. It’s absolutely not recommended to do that for performance reasons even if it worked. None of the OptiX SDK or other example code is using malloc() inside OptiX device programs for a reason.
Also mind that you have to think parallel programming inside OptiX device code!
The optixLaunch dimension is effectively the number of threads running in parallel and if you allocated 160 MB while launching millions of threads, that is definitely going to run out of VRAM on any board if each thread does that allocation.
OptiX is using a single ray programming model. You have a launch index you get with the OptiX device function optixGetLaunchIndex and its values are defined by the optixLaunch dimension you can query with optixGetLaunchDimensions.
See this chapter inside the OptiX Programming Guide:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#device-side-functions
What you need to do instead is find out how much memory you need per launch index, allocate that upfront inside the CUDA host code before the optixLaunch and provide the pointer to that device memory inside your OptiX launch parameter block.
See this thread which explains how to do that:
https://forums.developer.nvidia.com/t/going-through-optix7course-and-am-confused-about-launchparams-and-how-to-get-depth-buffer/201439/2
That explains how to access that buffer per thread, means writing data per each launch index (in a gather algorithm) by addressing the elements inside your buffer with a linear index calculated from the optixGetLaunchIndex values. Look at how the OptiX examples write into their output buffer usually at the end of the ray generation program.
If you need to implement a scatter algorithm where different threads (OptiX launch indices) can change data in different output buffer cells, you would need to use CUDA atomics to write to these elements in your buffers because OptiX is using a single ray programming model and nothing is known about neighbouring threads inside the current optixLaunch.