But I want to bind the buffer to a global memory, then I can avoid the cost of cudaMempyDeviceToHost and then bind a host pointer to the buffer as above during iterative kernel launches.
I tried
void* position_buffer_data;
...
RT_CHECK_ERROR2( rtBufferMap( input_energy_buffer_obj, &position_buffer_data));
(float*)position_buffer_data = testm;
/*testm is pointer to a global memory that have been cudaMalloc. */
RT_CHECK_ERROR2( rtBufferUnmap( input_energy_buffer_obj));
but got error “error: lvalue required as left operand of assignment”
Yes, you can do this. Look in the OptiX Programming Guide for Chapter 7, Interoperability with CUDA. Search for the functions rtBufferGetDevicePtr and rtBufferCreateForCUDA.
What is “DevicePtr”? Isn’t it pointer to data on device?
But I have error for
RT_CHECK_ERROR2( rtBufferSetDevicePointer( input_test_buffer_obj, 0, testm));
//testm is pointer to a float array allocated by cudaMalloc()
invalid conversion from ‘float*’ to ‘CUdeviceptr {aka long long unsigned int}’ [-fpermissive]
error: initializing argument 3 of ‘RTresult rtBufferSetDevicePointer(RTbuffer, unsigned int, CUdeviceptr)’ [-fpermissive]
Edit: just found CUdeviceptr is used for CUDA driver API. How can I use it with runtime API as I usually do with CUDA?