Welcome to the OptiX forum.
First, you cannot call any OptiX API device function inside a native CUDA kernel. These functions do not exist outside OptiX but get inserted by the compiler chain when building the OptiX kernels inside a pipeline.
There is also no way to access the data inside acceleration structures inside a native CUDA kernel. The acceleration structures and access to them are abstracted on purpose in all three ray tracing APIs (OptiX, DXR, Vulkan RT) because they differ among GPU generations and can change any time the internal implementation is improved or adds new features.
Now, if you need to shoot rays starting on geometric primitives, that can easily be done inside the ray generation program.
As said inside the other post, to be able to access your geometric primitives, you’d need to store their data in buffers on the device (which you needed to do for at least the vertex positions anyways for the optixAccelBuild call) in a way which allows to access their vertex attributes (at least positions) and topology (e.g. triangle indices) to sample positions on any of the geometric primitives (e.g. triangles)
If you have more than one GAS and also use instance transforms to place them into the world, you would also need to have access to those transform matrices to be able to calculate the world position of your ray origin on a geometric primitive.
There are some ways to do that, but the straightforward approach would be to generate an array of structures (“MeshData
”) which contains all necessary device pointers to the vertex positions, indices, matrices, and maybe a cumulative distribution function (CDF) over the area of the primitives’ surfaces to be able to sample the whole mesh uniformly.
This is basically the same you would do for arbitrary triangle mesh lights, and since I implemented that inside my OptiX examples, you could take a look at the LightDefinition
structure,
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/shaders/light_definition.h
the code which fills it,
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/src/Application.cpp#L2211
and the function __direct_callable__light_mesh
which samples the triangle mesh surface uniformly.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/shaders/light_sample.cu#L209
You would need the sample position and the geometric normal to define the front-face hemisphere into which you probably want to shoot your rays.
Depending on what distribution your azimuth and elevation define, it would be your responsibility to calculate the ray directions from that.
Note, these triangle mesh lights are also part of the scene inside their respective GAS which gets evaluated when hitting them implicitly with radiance rays.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/shaders/edf_diffuse.cu
(You don’t need that emissive part.)
This array of my LightDefinition
structures can be accessed in any OptiX program domain, because that array and its size is stored inside the OptiX launch data:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/shaders/system_data.h#L84
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/shaders/system_data.h#L105
If you setup your “MeshData” structures similarly and put the array of that into your launch data, you would be able to calculate points on the surface of the mesh inside your ray generation program. Make sure to offset the ray origin a little from the surface to avoid self-intersections.
That would be the exact same data you would need inside a native CUDA kernel when you wanted to pre-calculate your primary rays into a device buffer which an OptiX ray generation program would then read, so doing that with a native CUDA kernel isn’t really necessary. It just adds more work and memory bandwidth.
If your scene consists only of a single mesh and no instances, this would be even simpler, because then all you’d need were the device pointers to the vertex attributes and the number of primitives on your OptiX launch parameter structure (and a CDF in case you need to sample the whole mesh surface uniformly.)
Depending on what the whole algorithm does, means what results it produces and how many rays you need to shoot from each surface sample point for that, it could be beneficial to implement that as a progressive algorithm if possible, where each optixLaunch calculates only a part of the solution, like shooting one ray per surface sample point (e.g. ambient occlusion algorithms do that often).