Hi,
I wonder how is the proper way of initializing device buffers.
I have an output buffer where I add (accumulate) some values from the hits.
I create the result buffer as
optix::Buffer b = sceneManager->sceneContext->createBuffer(RT_BUFFER_OUTPUT, RT_FORMAT_USER, receivers, transmitters);
b->setElementSize(sizeof(ReceptionInfo));
sceneManager->sceneContext["receptionInfoBuffer"]->set(b);
In ReceptionInfo I have a float2 and a couple of ints. I need to initialize all of them to 0.
Then I have a couple of auxiliary buffers that I use as auxiliary variable holders that are updated in the closest hits. I create them as
optix::Buffer duplicatesBuffer = sceneManager->sceneContext->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_BUFFER_ID, receivers);
int* buffers = static_cast<int*>(duplicatesBuffer->map());
for (unsigned int i = 0; i < receivers; i++)
{
optix::Buffer aux = sceneManager->sceneContext->createBuffer(RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_UNSIGNED_INT, sceneManager->elevationSteps/duplicateBlockSize, sceneManager->azimutSteps/duplicateBlockSize, transmitters);
buffers[i] = aux->getId();
}
duplicatesBuffer->unmap();
I am launching rays multiples times. So I launch rays, collect results and then launch again. The content of these buffers has to be initialized before launching again, otherwise they have the previous values.
So, how should they be initialized?
I was thinking of several possibilites but I am not sure:
- intializing them in parallel in the ray generation program, using the launchIndex, but I do not think it is a good idea, since to ensure all is properly initialized I should synchronize the threads before starting generating rays (something like memoryfence of similar), should I?.
- using a CUDA kernel with a rtBufferGetDevicePointer before launching.
- I am also wondering if the auxiliary buffers is the correct way since their contents are never used by the host.
By the way, I have another question. I am using bindless callable program to do some complex arithmetic in the closest hits program, something like
RT_CALLABLE_PROGRAM float2 complex_prod(float2 lhs, float2 rhs) {
return make_float2(lhs.x*rhs.x - lhs.y*rhs.y,
lhs.x*rhs.y + lhs.y*rhs.x);
}
But I have seen in the optixVox sample that I could use something like:
static __device__ float3 boxanchor(float3 boxmin, float3 boxmax, float t)
{
float3 t0 = (boxmin - ray.origin) / ray.direction;
if ( t == t0.x || t == t0.y || t == t0.z ) return boxmin;
return boxmax;
}
Is there any advantage in using one or another method?
Thanks a lot