[SOLVED] OptiX 5 interop DirectX 11 example?


UPDATE using CUDA 10.0 and OptiX 6.0.0 Win10PRO 64bit v1607:
I finally got it done using the technique shown in the CUDA sample
CUDA Samples\v10.0\2_Graphics\simpleD3D11Texture

the OptiX outputbuffer I access this way:

void* buffer_DevPtr = (float*)outputbuffer->getDevicePointer(0);

writing the output buffer data into a D3D Texture2D then is done in a CUDA kernel. DirectX 11 can access it.

it now also works for writing to a D3D StructuredBuffers (there cudaGraphicsResourceGetMappedPointer works and gives a device pointer, which directly can be used in a CUDA kernel).

writing to an RT_BUFFER_INPUT is obivously possible, but attaching it to a TextureSampler can cause problems; see:


I’m new to OptiX and I looked at all the samples of OptiX 5. Its great! But I did not find any example for using the “D3D11 interop”.
In folder “C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0\include” there is a file called “optix_d3d11_interop.h” But how should I use it?

I would be interested in rendering OptiX output into a D3D11 rendertarget

If no direct interop is possible, I want to read out all the color + depth data from OptiX and render them with D3D11

UPDATE: In the docs of OptiX 5.0.0 (under “3.6.7 Limitations”) I found “Direct3D and CUDA interop are not supported.”

So I need to use the CPU / host dependent way yet. SOLVED.

RTresult result = rtContextSetD3D11Device(rtcon, D3D11device);
reports RT_ERROR_INVALID_VALUE = 0x501

I use a mapped base pointer of the OptiX output buffer directly for creation of a new ID3D11Texture2D + SRV (assigning pSysMem field).
A Vertex Shader and a Pixel Shader then render the final image directly to a D3D11 render target; I simply inversed the V texcoords to make it also top-down.

You can also share your D3D11 texture with CUDA, then get the cuda array pointer, finally use optix mapped base pointer and cudaMemcpyToArray to upload data from optix buffer to the D3D11 texture.

AFAIK, As long as we use optix map based pointer, we copy over PCIE…

To avoid copying over PCIE and for only one GPU, maybe you can also create optix output buffer as RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, then setDevicePointer a CUDA shared D3D11 compute buffer. In your shader, you render your D3D11 compute buffer to your render target.


Ok, for copying data to the D3D11 texture through CUDA I also could try “cuda array pointer”. Thank you!

Yes, I tried “RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL”, but then the texture sampler cannot be used; that one requires RT_BUFFER_INPUT. But I will try now a pure buffer access.

However, the PCIE-based version now finally runs great.

If I use this code in the .cu PROGRAM:

rtBuffer<float4, 2>               tex_buffer;
        uint2 xy;
        xy.x = 0; // yet for test, so that its guaranteed to be in-range
        xy.y = 0; // yet for test, so that its guaranteed to be in-range
        const float3 Kd =   make_float3(tex_buffer[xy]);

still this exception occurs:
OptiX Error: ‘Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)’ And on my system the app cannot be closed; when trying to reboot a freeze occurred before I adjusted TDR

cudaGraphicsD3D11RegisterResource, cudaGraphicsMapResources, cudaGraphicsResourceGetMappedPointer all succeed.
devPTR=204a00000h seems to be a valid GPU device pointer
cudaGraphicsUnmapResources and cudaGraphicsUnmapResources also succeed.

Host Code:

ID3D11Resource* pTexRes = NULL;
      void*  PPMtex = loadFloat4BufferTexFromPPMfile("data/grid.ppm");   // loads a ppm to a float4 CPU RAM buffer (4 floats per pixel)
      int texwid = 64;  // grid.ppm from the OptiX 5.0.0 SDK is 64x64 (so for test its constant here)
      int texhei = 64;

	ID3D11Buffer *Buf = NULL;
	D3D11_BUFFER_DESC bufDesc = {};	
	D3D11_SUBRESOURCE_DATA bufInitData = {};
	bufDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;  
	bufDesc.ByteWidth = texwid * texhei * 16; 
	bufDesc.CPUAccessFlags = 0;
	bufDesc.MiscFlags = 0;
	bufDesc.Usage = D3D11_USAGE_IMMUTABLE;
        bufInitData.pSysMem = PPMtex;
	hr = dev->CreateBuffer(&bufDesc, &bufInitData, ppBuf);

       pTexRes = (ID3D11Resource*)pBuf;
        if (pTexRes)
              cudaGraphicsResource * CUDAres = NULL;
	        CUstream hStream = 0;

            cudaError_t r = cudaGraphicsD3D11RegisterResource(&CUDAres, pTexRes, cudaGraphicsRegisterFlagsNone);
            if (!r)

              r = cudaGraphicsMapResources(1, &CUDAres, hStream);
              if (!r)

                // OptiX related:
                  optix::Buffer buffer;
                  void* devPTR = NULL;
                  size_t size = 0;

                  buffer = context->createBuffer(RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL, RT_FORMAT_FLOAT4, texwid, texhei);

                  r = cudaGraphicsResourceGetMappedPointer(&devPTR, &size, CUDAres);
                  if (!r)

                    buffer->setDevicePointer(OptiXdevice, devPTR);

                  } // if (!r) after  cudaGraphicsResourceGetMappedPointer

// EDIT:  seems to be wrong here:          r = cudaGraphicsUnmapResources(1, &CUDAres, hStream);


              } //  if (!r) after cudaGraphicsMapResources

// EDIT:  seems to be wrong here:              r = cudaGraphicsUnregisterResource(CUDAres);
            }  //   if (!r) after cudaGraphicsD3D11RegisterResource

// EDIT:  seems to be wrong here:            SAFE_RELEASE(pTexRes);
        } //  if (pTexRes)

However, so I still have to use the PCIE-based solution.

my current system info:
Device: GTX 1050 Driver: 390.77 (Jan 29 2018)
OptiX 5.0 with CUDA Toolkit 9.1.85 on Visual Studio 2017 Community 15.5.6 (toolset v140 of VS2015)
on Windows10 PRO 64bit (still Anniversary Update version 1607 build 14393.1593)
Win SDK Target Platform 10.0.14393.0

OptiX 5.0.0
CUDA 9.1.85 + Patch 1 (Released Jan 25, 2018)

Hi M1,

Maybe I missed something, why do you need a texture sampler for your output buffer?

I am trying to verbose my second suggestion (no PCIE copy):
1, create your optix output buffer with RT_BUFFER_INPUT_OUTPUT | RT_BUFFER_GPU_LOCAL (no texture sampler)
2, register your D3D11 compute buffer with CUDA by register resource, then map resource and setDevicePointer with get mapped pointer to your output buffer (no texture sampler)
3, do raytracing with optix launch as usually to fill data into you optix output buffer (no texture sampler)
4, render your D3D11 compute buffer to your D3D11 render texture in a D3D11 shader (no texture sampler)



Hi M1,

Your variable name “pTexRes” has confused me, I thought it was a texture…

1, This exception may refer to a out of bounds when accessing your buffer, you could check your buffer size and access range firstly. width, texwid and 64 are not consistent to me for understanding.

2, I don’t think you can cudaGraphicsUnregisterResource before Optix finishs its usage. Please keep it registered.

3, try to cudaGraphicsMapResources then optix launch, then cudaGraphicsUnmapResources. Maybe not needed

4, please check your D3D11 buffer creation. Not quite sure about this.

bufDesc.Usage = D3D11_USAGE_DEFAULT; // This worked for me
bufDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE // my case was D3D11_BIND_VERTEX_BUFFER, it worked

BTW. You can check an example here:

Sorry for the misleading identifier name “pTexRes”. First I really used a ID3D11Texture2D, but then I ran into the problem with the cudaArray (which has no device pointer). And so I tried ID3D11Buffer, but did not change the pointer variable name. “width” + “height” are of course also “texwid” and “texhei”. The buffer creation part was done in a sub function, which I copy+pasted into this code block. (I now edited the post above to have a consistent code in this thread; Thank you.)

Thank you for that example! I think I also missed “cudaGraphicsResourceSetMapFlags”. These seem to be access flags, which CUDA needs. And so maybe the buffer is present (even if not registered, but its maybe WRITE-ONLY and so I cannot access it for reading)

I generally wanted to do both types of access:
1.this is solved now: writing the optix::Buffer to a D3D11 buffer / texture. (to export the final frame)
(CPU-based way and CUDA kernel works)

  1. still unsolved: reading from a D3D11 buffer / texture to write to an optix::Buffer used by a TextureSampler (so that a .cu shader program can read
    from an image texture/buffer) The code sample above showed this goal.