Buffers initialization

Hi,
I wonder how is the proper way of initializing device buffers.
I have an output buffer where I add (accumulate) some values from the hits.
I create the result buffer as

optix::Buffer b = sceneManager->sceneContext->createBuffer(RT_BUFFER_OUTPUT, RT_FORMAT_USER, receivers, transmitters);
	b->setElementSize(sizeof(ReceptionInfo));
	sceneManager->sceneContext["receptionInfoBuffer"]->set(b);

In ReceptionInfo I have a float2 and a couple of ints. I need to initialize all of them to 0.

Then I have a couple of auxiliary buffers that I use as auxiliary variable holders that are updated in the closest hits. I create them as

optix::Buffer  duplicatesBuffer = sceneManager->sceneContext->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_BUFFER_ID, receivers);
	int* buffers = static_cast<int*>(duplicatesBuffer->map());
	for (unsigned int i = 0; i < receivers; i++)
	{
		optix::Buffer aux = sceneManager->sceneContext->createBuffer(RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_UNSIGNED_INT, sceneManager->elevationSteps/duplicateBlockSize, sceneManager->azimutSteps/duplicateBlockSize, transmitters);
		buffers[i] = aux->getId();

	}
	duplicatesBuffer->unmap();

I am launching rays multiples times. So I launch rays, collect results and then launch again. The content of these buffers has to be initialized before launching again, otherwise they have the previous values.

So, how should they be initialized?
I was thinking of several possibilites but I am not sure:

  • intializing them in parallel in the ray generation program, using the launchIndex, but I do not think it is a good idea, since to ensure all is properly initialized I should synchronize the threads before starting generating rays (something like memoryfence of similar), should I?.
  • using a CUDA kernel with a rtBufferGetDevicePointer before launching.
  • I am also wondering if the auxiliary buffers is the correct way since their contents are never used by the host.

By the way, I have another question. I am using bindless callable program to do some complex arithmetic in the closest hits program, something like

RT_CALLABLE_PROGRAM float2 complex_prod(float2 lhs, float2 rhs) {
	return make_float2(lhs.x*rhs.x - lhs.y*rhs.y,
		lhs.x*rhs.y + lhs.y*rhs.x);
}

But I have seen in the optixVox sample that I could use something like:

static __device__ float3 boxanchor(float3 boxmin, float3 boxmax, float t)
{
    float3 t0 = (boxmin - ray.origin) / ray.direction;
    if ( t == t0.x || t == t0.y || t == t0.z ) return boxmin;
    return boxmax;
}

Is there any advantage in using one or another method?

Thanks a lot

“The content of these buffers has to be initialized before launching again, otherwise they have the previous values.”

On default the API calls are synchronous in OptiX (see Chapter 3.6.1 of OptiX Programming Guide 5.1.0 page 33)
So when you don’t use the “Progressive Launch API” all kernel launches run one after the other.
There is no “GroupMemoryBarrier” as in HLSL, cause in OptiX you cannot access shared memory on the GPU. (see Chapter 14 Caveats of the Programming Guide)
For clearing a buffer I simply use a kernel launch, which only assigns buffer[launch_index] with zeroes. If you run always the same kernel after a clearing and if that kernel does not need to access other values than the own related one, you simply can initialize it with zero as first instruction in the first kernel instead. (This would save one full “clearing” kernel launch)

A case where you can use rtBufferGetDevicePointer you can see in function “updateHeightfield” in optixOcean.cpp of the Ocean sample in the “OptiX Advanced Samples”. Maybe I understand it wrong, but I think here for clearing you don’t need that.

For callable programs look at “4.10 Callable programs” in the Programming Guide (page 55)
[…]Callable programs […] allow the changing of the target of a function call at runtime […] [and] can reduce code replication and compile time, and potentially improve runtime through increased warp utilization.[…]
Bindless Callable Programs can be called from arbitray locations. A simple device function only can be called within that OptiX program.
a very good example of how you can apply callable programs you can see in the optixIntroduction samples of the new “OptiX Advanced Samples” https://github.com/nvpro-samples/optix_advanced_samples/tree/master/src/optixIntroduction
see also https://devtalk.nvidia.com/default/topic/998546/optix/optix-advanced-samples-on-github/post/5253913/#5253913
and: https://devtalk.nvidia.com/default/topic/1030935/optix/what-is-the-best-practice-of-implementing-shader-graph-in-optix-/post/5244626/#5244626

Hi,
thanks,
I finally use a launch with a specific ray generation program used only to initialize buffers.