Importing Vulkan image into OptixImage2D for Optix Denoiser

We would like to use vulkan for RTX rendering, writing to vulkan images which we can import into cuda with the new interop functionality. However it is not clear how to correctly import a vulkan image into optix for denoising.

The images from Vulkan are VK_IMAGE_TILING_OPTIMAL and when imported and fed through denoising it looks like the tiling is not resolved. I found a single post of someone successfully importing a vulkan image (gpu - Use Vulkan VkImage as a CUDA cuArray - Stack Overflow) however the result is a cuda mipmapped array, which I don’t know how to feed to optix. Can I get a device pointer from that and can optix use that?

Do I need to copy into a linear tiling image first before being able to send it through optix?

Hi heeen,

I noticed that the newest version of the CUDA toolkit has a sample for Vulkan-CUDA interop of textures, will you check if that sample answers your question thoroughly?

1.1. CUDA 10.1 Update 2
Added 3_Imaging/vulkanImageCUDA. Demonstrates how to perform Vulkan Image-CUDA Interop.


edit: sorry I misread your post. let me read that new example.

So if I’m reading it right, the example uses CUDA surfaces to access the vulkan image. Optix however expects a device pointer in OptixImage2D. So The question of how to get a pointer to the data or how to make optix accept a surface or a pointer to a tiled image memory.

In the meantime I got it to work by adding intermediate images with linear tiling and blitting to and from those in the commandbuffers preceding and following the denoiser invokation. Could those steps be skipped though?

You first need to convert the Vulkan image to a Vulkan buffer. The internal image layout is not what the denoiser is looking for.

// Make the image layout eTransferSrcOptimal to copy to buffer
  vk::ImageSubresourceRange subresourceRange(vk::ImageAspectFlagBits::eColor, 0, 1, 0, 1);
  nvvkpp::image::setImageLayout(cmdBuff, imgIn.image, vk::ImageLayout::eGeneral, 
                                vk::ImageLayout::eTransferSrcOptimal, subresourceRange);

  // Copy the image to the buffer
  vk::BufferImageCopy copyRegion;
  copyRegion.setImageSubresource({vk::ImageAspectFlagBits::eColor, 0, 0, 1});
  copyRegion.setImageExtent(vk::Extent3D(m_imageSize, 1));
  cmdBuff.copyImageToBuffer(imgIn.image, vk::ImageLayout::eTransferSrcOptimal, pixelBufferOut, {copyRegion});

  // Put back the image as it was
  nvvkpp::image::setImageLayout(cmdBuff, imgIn.image, vk::ImageLayout::eTransferSrcOptimal,
                                vk::ImageLayout::eGeneral, subresourceRange);

In my case, the buffer was allocated with the export flag vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32, which allow to have a Cuda pointer on the Vulkan buffer.

This is how I’m doing it:

// Get the Vulkan buffer and create the Cuda equivalent using the memory allocated in Vulkan
void DenoiserOptix::createBufferCuda(BufferCuda& buf)
  buf.handle = m_device.getMemoryWin32HandleKHR(
      {buf.bufVk.allocation, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32});
  auto req = m_device.getBufferMemoryRequirements(buf.bufVk.buffer);

  cudaExternalMemoryHandleDesc cudaExtMemHandleDesc{};
  cudaExtMemHandleDesc.type                = cudaExternalMemoryHandleTypeOpaqueWin32;
  cudaExtMemHandleDesc.handle.win32.handle = buf.handle;
  cudaExtMemHandleDesc.size                = req.size;

  cudaExternalMemory_t cudaExtMemVertexBuffer{};
  cudaError_t          result;
  result = cudaImportExternalMemory(&cudaExtMemVertexBuffer, &cudaExtMemHandleDesc);

  cudaExternalMemoryBufferDesc cudaExtBufferDesc{};
  cudaExtBufferDesc.offset = 0;
  cudaExtBufferDesc.size   = req.size;
  cudaExtBufferDesc.flags  = 0;

  cudaExternalMemoryGetMappedBuffer(&buf.cudaPtr, cudaExtMemVertexBuffer, &cudaExtBufferDesc);

As you have pointed out, the Optix denoiser takes OptixImage2D, and they can be constructed like this:

OptixImage2D inputLayer{(CUdeviceptr)m_pixelBufferIn.cudaPtr, imgSize.width, imgSize.height, 0, 0, pixelFormat};
    OptixImage2D outputLayer = {
        (CUdeviceptr)m_pixelBufferOut.cudaPtr, imgSize.width, imgSize.height, 0, 0, pixelFormat};

The last step is to copy the buffer back to the Vulkan Image, just apply the revert operation.

Note: an example of Vulkan raytracing + Optix7 denoiser, and many others will soon be published under

Which of these would you say is better/more efficient:

  • copy vk image to vk buffer, map buffer, copy buffer back to image (your solution)
  • copy vk image(tiling optimal) to vk image (tiling linear), map linear image memory, copy linar image back to optimal image (my solution)

they sound about the same to me, two copy operations each to resolve the tiling. I wonder if optix could support reading from tiled images directly in some future revision.

In terms of speed for the image conversion, that is probably very similar, but I haven’t tried if Optix denoiser directly works with those.

As for the denoiser working directly on tiled images, I haven’t heard about this.

Is post #5 answering your question?

yes, thank you

Hi, I’m currently trying the same, but I’m stuck at this part. From what I’ve understand so far, cudaExternalMemoryGetMappedBuffer gives me a handle to an external buffer (the one imported from Vulkan).

Now I put the this handle into an OptixImage2D, which gets used as the denoiser input:

void* devPtr = nullptr;
  cudaExternalMemoryGetMappedBuffer(&devPtr, cudaExtMemBuffer, &cudaExtBufferDesc);

  OptixImage2D inputLayer; = (CUdeviceptr) devPtr;
  inputLayer.width = 1280;
  inputLayer.height = 720;
  inputLayer.rowStrideInBytes = 1280 * sizeof(float4);
  inputLayer.pixelStrideInBytes = sizeof(float4);
  inputLayer.format = OPTIX_PIXEL_FORMAT_FLOAT4;

I then invoke the denoiser with optixDenoiserInvoke with the input layer from above, and an output layer with the same dimensions.

But now I don’t know how I can get the denoised result back to Vulkan. Any code example?


The way I solved it was three vulkan images with tiling=linear and three for optimal tiling vulkan usage. you can probably get away with fewer images by ping ponging between some of them.


m_resultRGB = makeStorage();
    m_resultRGBLinear = makeLinear();
    m_resultAlbedo = makeStorage();
    m_resultAlbedoLinear = makeLinear();
    m_denoisedResult = makeLinear();

// create optix images by exporting linear images to fd and importing into cuda

    m_resultRGBOptix = DenoiserVulkanImage(m_resultRGBLinear);
    m_resultAlbedoOptix = DenoiserVulkanImage(m_resultAlbedoLinear);
    m_resultNormalOptix = DenoiserVulkanImage(m_resultNormalLinear);
    m_denoisedResultOptix = DenoiserVulkanImage(m_denoisedResult);

// draw loop: blit from storage to linear after raytracing, before optix



        vkCmdCopyImage(m_drawCmdBuffer, m_resultRGB,
                       VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copyRegion);



        vkCmdCopyImage(m_drawCmdBuffer, m_resultAlbedo,
                       VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copyRegion);

// after optix, blit back



                   VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copyRegion);




// submit draw/raytracing/blit-to-linear m_drawCmdBuffer
    VkSubmitInfo submitInfo{};
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &m_drawCmdBuffer;
    submitInfo.signalSemaphoreCount = 1;
    submitInfo.pSignalSemaphores = signalSemaphores;
    VkFence fence;
    vkCreateFence(device(), &fenceInfo, VK_NULL_HANDLE, &fence);
    vkQueueSubmit(m_instance.graphicsQueue, 1, &submitInfo, fence);
    vkWaitForFences(device(), 1, &fence, VK_TRUE, DEFAULT_FENCE_TIMEOUT);

        std::vector optixLayers {

        OptixDenoiserParams p {};
        m_optix->invoke(&p, optixLayers, 0,0, &m_denoisedResultOptix->optixImage());

        VkSemaphore waitSemaphores[] = {

        // entry 0 corresponds to semaphore 0 above etc
        VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_ALL_COMMANDS_BIT};

        // submit blit-to-rendertarget m_blitCmdBuffer
        submitInfo = {};
        submitInfo.commandBufferCount = 1;
        submitInfo.pCommandBuffers = &m_blitCmdBuffer;
        submitInfo.waitSemaphoreCount = 1;
        submitInfo.pWaitSemaphores = waitSemaphores;
        submitInfo.pWaitDstStageMask = waitStages;

        vkQueueSubmit(m_instance.graphicsQueue, 1, &submitInfo, fence);

Thanks heeen, I found a way to share a VkBuffer (containing my image data) with the denoiser without any copying. Why do you use a VkImage?