Please read these posts:
https://forums.developer.nvidia.com/t/going-through-optix7course-and-am-confused-about-launchparams-and-how-to-get-depth-buffer/201439/2
https://forums.developer.nvidia.com/t/optix-launch-parameters-best-practices/231443/2
If you search the OptiX SDK *.cpp source files for cudaMemcpyDeviceToHost
you’ll find such cases.
The optixConsole
application is one of the simplest examples which is not using and interop and graphics display and it does this call to copy the output buffer from device to host:
CUDA_CHECK( cudaMemcpy( output_buffer.data(), state.params.frame_buffer, width * height * sizeof( uchar4 ), cudaMemcpyDeviceToHost ) );
The optixHello
example is using the CUDAOutputBuffer
helper class and that copy from device to host happens inside its getHostPointer()
function which handles the necessary steps depending on how that buffer had been allocated.
Mind the actual data type, it’s not floating point in these examples but uchar4.
For performance reasons it’s not recommended to use 3-component vector data types like float3 for output buffers.
float4 is faster to read and write because there are vectorized .v4
instructions for these and 2-component vectors. Similar for other basic types.