Update TextureSampler data using cudaMemcpyDeviceToDevice.

Hi, I’m new into the OptiX world and it would be great if someone can help me with the following:

The problem I’m facing is the same asked by OmSp in his second question here: https://devtalk.nvidia.com/default/topic/1013672/optix/using-texture-memory/

In my implementation, I have two buffers, the first one is used to do some calculations (using cufft) and the second one is a copy of the first to be used in a TextureSampler (I get an error if I try to use the same buffer for the fft and TextureSampler). My problem comes when I need to update de TextureSampler’s data with the result of the fft. I get it to work by copying data to host and then back to gpu, doing something like this:

fftBuffer = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, fftSize, fftSize);
samplerBuffer = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT, fftSize, fftSize);

(...)

// Copying data from one buffer to the other
float* dataOrigin = reinterpret_cast<float*>(fftBuffer->map()); 
float* dataDestination = reinterpret_cast<float*>(samplerBuffer->map()); 
memcpy(dataDestination , dataOrigin , sizeof(float)*fftSize*fftSize);
fftBuffer->unmap(); 
samplerBuffer->unmap();

But this is really inefficient due to the device-host and host-device data transfers. I suppose it wolud be much better if I could do a device2device copy, so I tried this:

cudaError_t lastError = cudaMemcpy(dataDestination, dataOrigin, sizeof(float)*fftSize*fftSize, cudaMemcpyDeviceToDevice);

This code throws “cudaErrorInvalidValue(11)”. Can you give me any hint about what I’m doing wrong or which is the right way to update the TextureSampler buffer efficiently?

Thanks a lot for your time!

Environment:

  • Visual Studio Community 2017
  • CUDA 9.1
  • OptiX 5.1
  • Nvidia GTX 750Ti

The buffer->map() is a virtual mapping into host memory. CUDA can’t access this on the device.

Please read the OptiX Programming Chapter about OptiX-CUDA interop.
[url]http://raytracing-docs.nvidia.com/optix_6.0/guide/index.html#cuda#interoperability-with-cuda[/url]
You would need to use rtBufferGetDevicePointer or rtBufferSetDevicePointer to share device addressable memory among the two APIs.

That said, I don’t actually know if that will work for buffers on OptiX TextureSamplers because textures are not stored linearly.

Hi Detlef, thanks a lot for your answer, I really appreciate ;)

I will definetly check that reference in detail this weekend (I have almost no time during the rest of the week), but following your advice I quickly tried to use rtBufferGetDevicePointer, so I coded this small test:

// Declare buffers
optix::Buffer bufferOrigin = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, 2, 2);
optix::Buffer bufferDestination = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, 2, 2);

// Populate data of first buffer
float* pointerBufferOrigin = static_cast<float*>(bufferOrigin->map());
for (int i = 0; i < 4; i++)
	pointerBufferOrigin[i] = i;
bufferOrigin->unmap();

// getting device pointers
float* bufferOriginDevicePtr = static_cast<float*>(bufferOrigin->getDevicePointer(GpuID));
float* bufferDestinationDevicePtr = static_cast<float*>(bufferDestination->getDevicePointer(GpuID));

// dev2dev copy 
cudaError_t lastError = cudaMemcpy(bufferDestinationDevicePtr, bufferOriginDevicePtr, sizeof(float)*4, cudaMemcpyDeviceToDevice);

// Copy data of the second buffer to the host in order to check values
float* auxPtr = (float*)malloc(4*sizeof(float)); // auxiliar host pointer
lastError = cudaMemcpy(auxPtr, bufferDestinationDevicePtr, sizeof(float)*4, cudaMemcpyDeviceToHost);

This time I get CudaSuccess from both cudaMemcpy, but the data retrieved is not correct. It doesn’t even retrieve correct values from the bufferOriginDevicePtr. This weekend I will experiment a bit more, but I would like to know if you see a better strategy to update the TextureSampler data avoiding device-host transfers.

Thanks!

Hi again,

after reading the OptiX Programming Chapter about OptiX-CUDA interop, I haven’t found anything against updating TextureSampler’s buffer from the device (but neither found any confirmation this can be done).

I carried out some more tests, but what I saw confused me more:

TEST 1: I did exactly the same as in my last post, the only difference is that I replaced lines 19 and 20 with a map() call, but I wasn’t able get the expected values:

float* data = (float*)(bufferDestination->map());

TEST 2: I created an optix Program to print buffer values from the GPU using rtPrintf. The Progam is this:

rtBuffer<float, 2> bufferDestination;

RT_PROGRAM void print_program() {
	rtPrintf("Value at position [1,1] : %f \n", bufferDestination[make_uint2(1, 1)]);
}

After running it, the console output was “Value at position [1,1] : 3.000000”, ¡the expected value!

TEST 3: The result of test 2 indicates that values are correctly stored, although they are not correctly retrieved using buffer->map() as demonstrated test 1. So, in this test, I tried to create a TextureSampler from the bufferDestination, immediately after declaring buffers, like this:

bufferOrigin = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, 2, 2);
bufferDestination = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT, 2, 2); // NOTE THAT THIS IS RT_BUFFER_INPUT NOW, BUT IT ALSO WORKED CORRECTLY IN TEST 2. THIS CHANGE IS NECESSARY BECAUSE OF THE TEXTURESAMPLER
	
optix::TextureSampler sampler;
sampler = context->createTextureSampler();
sampler->setWrapMode(0, RT_WRAP_REPEAT);
sampler->setWrapMode(1, RT_WRAP_REPEAT);
sampler->setWrapMode(2, RT_WRAP_REPEAT);
sampler->setIndexingMode(RT_TEXTURE_INDEX_NORMALIZED_COORDINATES);
sampler->setReadMode(RT_TEXTURE_READ_NORMALIZED_FLOAT);
sampler->setMaxAnisotropy(1.0f);
sampler->setMipLevelCount(1u);
sampler->setArraySize(1u);
sampler->setBuffer(0u, 0u, bufferDestination);
sampler->setFilteringModes(RT_FILTER_LINEAR, RT_FILTER_LINEAR, RT_FILTER_NONE);

But I found that, if the buffer has been bound to a TextureSampler, rtBufferGetDevicePointer throws the following error:
OptiX Error: 'Unknown error (Details: Function “_rtBufferGetDevicePointer” caught exception: Assertion failed: “m_kind == LINEAR : Not a linear pointer”
This might be related with Detlef’s comment that textures are not stored linearly. My conclusion is that cudaMemcpy is working correctly with 2D buffers, but that doesn’t work if the buffers are bound to a TextureSampler.

I really need to update my texture before rendering every frame, and I found that doing it from CPU lowers the performace dramatically (at least one order of magnitude). I hope someone can help…