Update TextureSampler data using cudaMemcpyDeviceToDevice.

mapic · March 3, 2019, 7:45pm

Hi, I’m new into the OptiX world and it would be great if someone can help me with the following:

The problem I’m facing is the same asked by OmSp in his second question here: https://devtalk.nvidia.com/default/topic/1013672/optix/using-texture-memory/

In my implementation, I have two buffers, the first one is used to do some calculations (using cufft) and the second one is a copy of the first to be used in a TextureSampler (I get an error if I try to use the same buffer for the fft and TextureSampler). My problem comes when I need to update de TextureSampler’s data with the result of the fft. I get it to work by copying data to host and then back to gpu, doing something like this:

fftBuffer = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, fftSize, fftSize);
samplerBuffer = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT, fftSize, fftSize);

(...)

// Copying data from one buffer to the other
float* dataOrigin = reinterpret_cast<float*>(fftBuffer->map()); 
float* dataDestination = reinterpret_cast<float*>(samplerBuffer->map()); 
memcpy(dataDestination , dataOrigin , sizeof(float)*fftSize*fftSize);
fftBuffer->unmap(); 
samplerBuffer->unmap();

But this is really inefficient due to the device-host and host-device data transfers. I suppose it wolud be much better if I could do a device2device copy, so I tried this:

cudaError_t lastError = cudaMemcpy(dataDestination, dataOrigin, sizeof(float)*fftSize*fftSize, cudaMemcpyDeviceToDevice);

This code throws “cudaErrorInvalidValue(11)”. Can you give me any hint about what I’m doing wrong or which is the right way to update the TextureSampler buffer efficiently?

Thanks a lot for your time!

Environment:

Visual Studio Community 2017
CUDA 9.1
OptiX 5.1
Nvidia GTX 750Ti

droettger · March 4, 2019, 11:42am

The buffer->map() is a virtual mapping into host memory. CUDA can’t access this on the device.

Please read the OptiX Programming Chapter about OptiX-CUDA interop.
[url]http://raytracing-docs.nvidia.com/optix_6.0/guide/index.html#cuda#interoperability-with-cuda[/url]
You would need to use rtBufferGetDevicePointer or rtBufferSetDevicePointer to share device addressable memory among the two APIs.

That said, I don’t actually know if that will work for buffers on OptiX TextureSamplers because textures are not stored linearly.

mapic · March 4, 2019, 10:09pm

Hi Detlef, thanks a lot for your answer, I really appreciate ;)

I will definetly check that reference in detail this weekend (I have almost no time during the rest of the week), but following your advice I quickly tried to use rtBufferGetDevicePointer, so I coded this small test:

// Declare buffers
optix::Buffer bufferOrigin = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, 2, 2);
optix::Buffer bufferDestination = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, 2, 2);

// Populate data of first buffer
float* pointerBufferOrigin = static_cast<float*>(bufferOrigin->map());
for (int i = 0; i < 4; i++)
	pointerBufferOrigin[i] = i;
bufferOrigin->unmap();

// getting device pointers
float* bufferOriginDevicePtr = static_cast<float*>(bufferOrigin->getDevicePointer(GpuID));
float* bufferDestinationDevicePtr = static_cast<float*>(bufferDestination->getDevicePointer(GpuID));

// dev2dev copy 
cudaError_t lastError = cudaMemcpy(bufferDestinationDevicePtr, bufferOriginDevicePtr, sizeof(float)*4, cudaMemcpyDeviceToDevice);

// Copy data of the second buffer to the host in order to check values
float* auxPtr = (float*)malloc(4*sizeof(float)); // auxiliar host pointer
lastError = cudaMemcpy(auxPtr, bufferDestinationDevicePtr, sizeof(float)*4, cudaMemcpyDeviceToHost);

This time I get CudaSuccess from both cudaMemcpy, but the data retrieved is not correct. It doesn’t even retrieve correct values from the bufferOriginDevicePtr. This weekend I will experiment a bit more, but I would like to know if you see a better strategy to update the TextureSampler data avoiding device-host transfers.

Thanks!

mapic · March 9, 2019, 11:11am

Hi again,

after reading the OptiX Programming Chapter about OptiX-CUDA interop, I haven’t found anything against updating TextureSampler’s buffer from the device (but neither found any confirmation this can be done).

I carried out some more tests, but what I saw confused me more:

TEST 1: I did exactly the same as in my last post, the only difference is that I replaced lines 19 and 20 with a map() call, but I wasn’t able get the expected values:

float* data = (float*)(bufferDestination->map());

TEST 2: I created an optix Program to print buffer values from the GPU using rtPrintf. The Progam is this:

rtBuffer<float, 2> bufferDestination;

RT_PROGRAM void print_program() {
	rtPrintf("Value at position [1,1] : %f \n", bufferDestination[make_uint2(1, 1)]);
}

After running it, the console output was “Value at position [1,1] : 3.000000”, ¡the expected value!

TEST 3: The result of test 2 indicates that values are correctly stored, although they are not correctly retrieved using buffer->map() as demonstrated test 1. So, in this test, I tried to create a TextureSampler from the bufferDestination, immediately after declaring buffers, like this:

bufferOrigin = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT, 2, 2);
bufferDestination = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT, 2, 2); // NOTE THAT THIS IS RT_BUFFER_INPUT NOW, BUT IT ALSO WORKED CORRECTLY IN TEST 2. THIS CHANGE IS NECESSARY BECAUSE OF THE TEXTURESAMPLER
	
optix::TextureSampler sampler;
sampler = context->createTextureSampler();
sampler->setWrapMode(0, RT_WRAP_REPEAT);
sampler->setWrapMode(1, RT_WRAP_REPEAT);
sampler->setWrapMode(2, RT_WRAP_REPEAT);
sampler->setIndexingMode(RT_TEXTURE_INDEX_NORMALIZED_COORDINATES);
sampler->setReadMode(RT_TEXTURE_READ_NORMALIZED_FLOAT);
sampler->setMaxAnisotropy(1.0f);
sampler->setMipLevelCount(1u);
sampler->setArraySize(1u);
sampler->setBuffer(0u, 0u, bufferDestination);
sampler->setFilteringModes(RT_FILTER_LINEAR, RT_FILTER_LINEAR, RT_FILTER_NONE);

But I found that, if the buffer has been bound to a TextureSampler, rtBufferGetDevicePointer throws the following error:
OptiX Error: 'Unknown error (Details: Function “_rtBufferGetDevicePointer” caught exception: Assertion failed: “m_kind == LINEAR : Not a linear pointer”
This might be related with Detlef’s comment that textures are not stored linearly. My conclusion is that cudaMemcpy is working correctly with 2D buffers, but that doesn’t work if the buffers are bound to a TextureSampler.

I really need to update my texture before rendering every frame, and I found that doing it from CPU lowers the performace dramatically (at least one order of magnitude). I hope someone can help…

Topic		Replies	Views
Using Texture memory OptiX	13	2296	June 14, 2022
How to correctly replace the buffer of a TextureSample? OptiX	9	1770	June 14, 2022
Optix TextureSampler from CUDA buffer/texture OptiX	1	584	June 14, 2022
Using cudaMemcpy() to copy data from host to OptiX buffer object OptiX	4	2793	June 14, 2022
Use D3D11 textures with TextureSampler in Optix 4.1.1 OptiX	11	1805	June 14, 2022
How to assign a RT_BUFFER_INPUT_OUTPUT type optix::Buffer to optix::TextureSampler OptiX	3	613	June 14, 2022
[SOLVED] OptiX 5 interop DirectX 11 example? OptiX	9	2391	June 14, 2022
Host-device transfer bottleneck OptiX	4	1078	June 14, 2022
OptixHello: how can I access the values of device data? OptiX	9	1198	July 10, 2023
rtBufferSetDevicePointer : problem to update OptiX	10	2239	June 14, 2022

Update TextureSampler data using cudaMemcpyDeviceToDevice.

Related topics