Return the coordinate of the hit point

I want to modify the optixTriangle example, to output the coordinate of each ray hit the triangle.
In the hit program I will add follow to get the hit point:
const float3 ray_orig = optixGetWorldRayOrigin();
const float3 ray_dir = optixGetWorldRayDirection(); // incident direction
const float ray_t = optixGetRayTmax();
float3 hit_point = ray_orig + ray_t * ray_dir;

My question would be how do I pass it back to host? Should I save it as payload? Then where can I read and print the coordinate out? Since the Triangle example is in rendering context, I am not sure how to pass raw data (hitpoint) back.

All OptiX communication back to the host works via device memory buffers you allocated up front and then copy from device to host after the desired results have been written to the buffer inside the OptiX device programs, which normally happens at the end of the ray generation program.
It’s completely under your control what you write to which device buffer and how you interpret that.

So yes, in the given case the usual mechanism is to write your hit point coordinates float3 to per-ray payload. In this case you can use three more of the 8 available payload registers.

Inside the optixTriangle example, that works the same way as the setPayload function for the returned colors.
You would just need to enhance that to write two float3.
(BTW, the correct reinterpret casts inside the optixTriangle.cu source should be float_as_uint() and uint_as_float() because the payload registers are unsigned int.)

Note that you need to change the host code as well to indicate that you’re using 6 instead of 3 payload registers inside the OptixPipeline as well.
That is, change this code line setting that inside the OptixPipelineCompileOptions: pipeline_compile_options.numPayloadValues = 3;

The closest hit is the end of the current ray. After that the program will return to the caller, which is normally the ray generation program when there are no other recursions.
At the end of the ray generation program you write the results from your per ray payload into your output buffer at the proper location, the linear index of your launch indices inside the launch dimension sized output buffer.

Inside the ray generation program the color is converted to uchar4 and written to the image buffer which device pointer is stored inside the launch parameters.
To return the hit point you calculated as well, you would need to allocate another device buffer besides that image and add that pointer to the launch parameters and set it to you new buffer.
I’d recommend using a float4 format because that writes faster than float3. Put the hit point coordinates into the .xyz components and set the .w component to 1.0f.
Means you index the same way with the linear index from the 2D launch indices, but just write the float data directly.

The whole optixTriangle program is explained step-by-step in this blog post: https://developer.nvidia.com/blog/how-to-get-started-with-optix-7/

If you need more than the eight available payload registers in the future, search the OptiX SDK code for packPointer and unpackPointer functions, or in my OptiX 7 examples splitPointer and mergePointer functions:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/shaders/per_ray_data.h#L108
Bigger per ray payloads are allocated as a structure inside the ray generation program and the local device pointer to that is passed around in two of the payload registers that way.