You would need to setup your launch parameters before the launch and esp. the output buffer device pointer needs to be set in that before the launch or you get illegal access errors.
I suppose that uchar4 is the fastest way to do this cudaMemcpy,
memcpy doesn’t care what type your buffer is. It takes the size in bytes.
how do I declare the local float buffer before I cudaMemcpy to it?
I repeat: The optixHello
example is using the CUDAOutputBuffer
helper class and that copy from device to host happens inside its getHostPointer()
function which handles the necessary steps depending on how that buffer had been allocated.
If you look at the CUDAOutputBuffer getHostPointer()
function code, you’ll see how the class is maintaining a host side allocation in its m_host_pixels
member variable already (See std::vector<PIXEL_FORMAT> m_host_pixels;
). No need to allocate your own host side data.
And again, the optixConsole
application shows how to do that without using the CUDAOutputBuffer helper class. It’s simply using a std::vector (named output_buffer
) for that.
Search for this code inside the optixConsole applicaton.
unsigned int width = state.params.width;
unsigned int height = state.params.height;
std::vector<uchar4> output_buffer( width * height );
CUDA_CHECK( cudaMemcpy( output_buffer.data(), state.params.frame_buffer, width * height * sizeof( uchar4 ), cudaMemcpyDeviceToHost ) );
with the same size as the for the device buffer
state.params.width = 48u * 2u;
state.params.height = 32u * 2u;
...
CUDA_CHECK( cudaMalloc( reinterpret_cast<void**>( &state.params.frame_buffer ),
state.params.width * state.params.height * sizeof( uchar4 ) ) );
You would just need to change the type to float4 in your case.
how can I get the actual data from the device in order to apply some math calculations.
Once you get everything working, depending on what calculations you need to do, it might be a lot faster to do these calculations with native CUDA kernels on the device data.
An example which uses CUDA kernels to generate rays and calculate shading on the intersection data can be found inside the optixRaycasting example which implements a wavefront rendering approach.