Importing Vulkan image into OptixImage2D for Optix Denoiser

heeen · October 15, 2019, 8:48am

We would like to use vulkan for RTX rendering, writing to vulkan images which we can import into cuda with the new interop functionality. However it is not clear how to correctly import a vulkan image into optix for denoising.

The images from Vulkan are VK_IMAGE_TILING_OPTIMAL and when imported and fed through denoising it looks like the tiling is not resolved. I found a single post of someone successfully importing a vulkan image (gpu - Use Vulkan VkImage as a CUDA cuArray - Stack Overflow) however the result is a cuda mipmapped array, which I don’t know how to feed to optix. Can I get a device pointer from that and can optix use that?

Do I need to copy into a linear tiling image first before being able to send it through optix?

dhart · October 15, 2019, 3:54pm

Hi heeen,

I noticed that the newest version of the CUDA toolkit has a sample for Vulkan-CUDA interop of textures, will you check if that sample answers your question thoroughly?

1.1. CUDA 10.1 Update 2
Added 3_Imaging/vulkanImageCUDA. Demonstrates how to perform Vulkan Image-CUDA Interop.

–
David.

heeen · October 15, 2019, 4:53pm

edit: sorry I misread your post. let me read that new example.

heeen · October 15, 2019, 5:24pm

So if I’m reading it right, the example uses CUDA surfaces to access the vulkan image. Optix however expects a device pointer in OptixImage2D. So The question of how to get a pointer to the data or how to make optix accept a surface or a pointer to a tiled image memory.

In the meantime I got it to work by adding intermediate images with linear tiling and blitting to and from those in the commandbuffers preceding and following the denoiser invokation. Could those steps be skipped though?

mlefrancois · October 17, 2019, 7:02am

You first need to convert the Vulkan image to a Vulkan buffer. The internal image layout is not what the denoiser is looking for.

// Make the image layout eTransferSrcOptimal to copy to buffer
  vk::ImageSubresourceRange subresourceRange(vk::ImageAspectFlagBits::eColor, 0, 1, 0, 1);
  nvvkpp::image::setImageLayout(cmdBuff, imgIn.image, vk::ImageLayout::eGeneral, 
                                vk::ImageLayout::eTransferSrcOptimal, subresourceRange);

  // Copy the image to the buffer
  vk::BufferImageCopy copyRegion;
  copyRegion.setImageSubresource({vk::ImageAspectFlagBits::eColor, 0, 0, 1});
  copyRegion.setImageExtent(vk::Extent3D(m_imageSize, 1));
  cmdBuff.copyImageToBuffer(imgIn.image, vk::ImageLayout::eTransferSrcOptimal, pixelBufferOut, {copyRegion});

  // Put back the image as it was
  nvvkpp::image::setImageLayout(cmdBuff, imgIn.image, vk::ImageLayout::eTransferSrcOptimal,
                                vk::ImageLayout::eGeneral, subresourceRange);

In my case, the buffer was allocated with the export flag vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32, which allow to have a Cuda pointer on the Vulkan buffer.

This is how I’m doing it:

//--------------------------------------------------------------------------------------------------
// Get the Vulkan buffer and create the Cuda equivalent using the memory allocated in Vulkan
//
void DenoiserOptix::createBufferCuda(BufferCuda& buf)
{
  buf.handle = m_device.getMemoryWin32HandleKHR(
      {buf.bufVk.allocation, vk::ExternalMemoryHandleTypeFlagBits::eOpaqueWin32});
  auto req = m_device.getBufferMemoryRequirements(buf.bufVk.buffer);

  cudaExternalMemoryHandleDesc cudaExtMemHandleDesc{};
  cudaExtMemHandleDesc.type                = cudaExternalMemoryHandleTypeOpaqueWin32;
  cudaExtMemHandleDesc.handle.win32.handle = buf.handle;
  cudaExtMemHandleDesc.size                = req.size;

  cudaExternalMemory_t cudaExtMemVertexBuffer{};
  cudaError_t          result;
  result = cudaImportExternalMemory(&cudaExtMemVertexBuffer, &cudaExtMemHandleDesc);

  cudaExternalMemoryBufferDesc cudaExtBufferDesc{};
  cudaExtBufferDesc.offset = 0;
  cudaExtBufferDesc.size   = req.size;
  cudaExtBufferDesc.flags  = 0;

  cudaExternalMemoryGetMappedBuffer(&buf.cudaPtr, cudaExtMemVertexBuffer, &cudaExtBufferDesc);
}

As you have pointed out, the Optix denoiser takes OptixImage2D, and they can be constructed like this:

OptixImage2D inputLayer{(CUdeviceptr)m_pixelBufferIn.cudaPtr, imgSize.width, imgSize.height, 0, 0, pixelFormat};
    OptixImage2D outputLayer = {
        (CUdeviceptr)m_pixelBufferOut.cudaPtr, imgSize.width, imgSize.height, 0, 0, pixelFormat};

The last step is to copy the buffer back to the Vulkan Image, just apply the revert operation.

Note: an example of Vulkan raytracing + Optix7 denoiser, and many others will soon be published under https://github.com/nvpro-samples

heeen · October 17, 2019, 7:27am

Which of these would you say is better/more efficient:

copy vk image to vk buffer, map buffer, copy buffer back to image (your solution)
copy vk image(tiling optimal) to vk image (tiling linear), map linear image memory, copy linar image back to optimal image (my solution)

they sound about the same to me, two copy operations each to resolve the tiling. I wonder if optix could support reading from tiled images directly in some future revision.

mlefrancois · October 17, 2019, 8:28am

In terms of speed for the image conversion, that is probably very similar, but I haven’t tried if Optix denoiser directly works with those.

As for the denoiser working directly on tiled images, I haven’t heard about this.

Is post #5 answering your question?

heeen · October 17, 2019, 1:29pm

yes, thank you

xilefmai · November 18, 2019, 5:09pm

Hi, I’m currently trying the same, but I’m stuck at this part. From what I’ve understand so far, cudaExternalMemoryGetMappedBuffer gives me a handle to an external buffer (the one imported from Vulkan).

Now I put the this handle into an OptixImage2D, which gets used as the denoiser input:

void* devPtr = nullptr;
  cudaExternalMemoryGetMappedBuffer(&devPtr, cudaExtMemBuffer, &cudaExtBufferDesc);

  OptixImage2D inputLayer;
  inputLayer.data = (CUdeviceptr) devPtr;
  inputLayer.width = 1280;
  inputLayer.height = 720;
  inputLayer.rowStrideInBytes = 1280 * sizeof(float4);
  inputLayer.pixelStrideInBytes = sizeof(float4);
  inputLayer.format = OPTIX_PIXEL_FORMAT_FLOAT4;

I then invoke the denoiser with optixDenoiserInvoke with the input layer from above, and an output layer with the same dimensions.

But now I don’t know how I can get the denoised result back to Vulkan. Any code example?

heeen · November 19, 2019, 9:28am

xilefmai,

The way I solved it was three vulkan images with tiling=linear and three for optimal tiling vulkan usage. you can probably get away with fewer images by ping ponging between some of them.

pseudocode:

m_resultRGB = makeStorage();
    m_resultRGBLinear = makeLinear();
    m_resultAlbedo = makeStorage();
    m_resultAlbedoLinear = makeLinear();
    m_denoisedResult = makeLinear();

// create optix images by exporting linear images to fd and importing into cuda

    m_resultRGBOptix = DenoiserVulkanImage(m_resultRGBLinear);
    m_resultAlbedoOptix = DenoiserVulkanImage(m_resultAlbedoLinear);
    m_resultNormalOptix = DenoiserVulkanImage(m_resultNormalLinear);
    m_denoisedResultOptix = DenoiserVulkanImage(m_denoisedResult);

// draw loop: blit from storage to linear after raytracing, before optix

        setImageLayout(m_drawCmdBuffer,
                       m_resultRGB,
                       VK_IMAGE_LAYOUT_GENERAL,
                       VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL);

        setImageLayout(m_drawCmdBuffer,
                       m_resultRGBLinear,
                       VK_IMAGE_LAYOUT_UNDEFINED,
                       VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);

        vkCmdCopyImage(m_drawCmdBuffer, m_resultRGB,
                       VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
                       m_resultRGBLinear,
                       VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copyRegion);

        setImageLayout(m_drawCmdBuffer,
                       m_resultAlbedo,
                       VK_IMAGE_LAYOUT_GENERAL,
                       VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL);

        setImageLayout(m_drawCmdBuffer,
                       m_resultAlbedoLinear,
                       VK_IMAGE_LAYOUT_UNDEFINED,
                       VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);

        vkCmdCopyImage(m_drawCmdBuffer, m_resultAlbedo,
                       VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
                       m_resultAlbedoLinear, 
                       VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copyRegion);

// after optix, blit back

   setImageLayout(m_blitCmdBuffer,
                   m_renderTarget,
                   VK_IMAGE_LAYOUT_UNDEFINED,
                   VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);

    setImageLayout(m_blitCmdBuffer,
                   m_denoisedResult,
                   VK_IMAGE_LAYOUT_GENERAL,
                   VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL);

    vkCmdCopyImage(m_blitCmdBuffer,
                   m_denoisedResult,
                   VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
                   m_renderTarget,
                   VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copyRegion);

    setImageLayout(m_blitCmdBuffer,
                   m_denoisedResult,
                   VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
                   VK_IMAGE_LAYOUT_GENERAL);

    setImageLayout(m_blitCmdBuffer,
                   m_renderTarget,
                   VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
                   VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);

then:

// submit draw/raytracing/blit-to-linear m_drawCmdBuffer
    VkSubmitInfo submitInfo{};
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &m_drawCmdBuffer;
    submitInfo.signalSemaphoreCount = 1;
    submitInfo.pSignalSemaphores = signalSemaphores;
    /*...*/
    VkFence fence;
    vkCreateFence(device(), &fenceInfo, VK_NULL_HANDLE, &fence);
    vkQueueSubmit(m_instance.graphicsQueue, 1, &submitInfo, fence);
    vkWaitForFences(device(), 1, &fence, VK_TRUE, DEFAULT_FENCE_TIMEOUT);

        std::vector optixLayers {
            m_resultRGBOptix->optixImage(),
            m_resultAlbedoOptix->optixImage()
        };

        OptixDenoiserParams p {};
        m_optix->invoke(&p, optixLayers, 0,0, &m_denoisedResultOptix->optixImage());

        VkSemaphore waitSemaphores[] = {
            m_denoiseFinished
        };

        // entry 0 corresponds to semaphore 0 above etc
        VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_ALL_COMMANDS_BIT};

        // submit blit-to-rendertarget m_blitCmdBuffer
        submitInfo = {};
        submitInfo.commandBufferCount = 1;
        submitInfo.pCommandBuffers = &m_blitCmdBuffer;
        submitInfo.waitSemaphoreCount = 1;
        submitInfo.pWaitSemaphores = waitSemaphores;
        submitInfo.pWaitDstStageMask = waitStages;

        vkQueueSubmit(m_instance.graphicsQueue, 1, &submitInfo, fence);

xilefmai · November 20, 2019, 6:55pm

Thanks heeen, I found a way to share a VkBuffer (containing my image data) with the denoiser without any copying. Why do you use a VkImage?

Topic		Replies	Views
Unity3D RenderTexture/Texture2D To OptixImage2D OptiX cuda , unity	9	3515	October 12, 2021
How to check if Optix denoiser is supported by GPU? OptiX cuda , optix	4	144	June 16, 2025
Optix Denoiser high CPU usage OptiX	6	1172	June 14, 2022
Optix 7 denoiser with vulkan RTX image format OptiX	4	975	June 14, 2022
Optix 7 denoiser OptiX	3	489	June 14, 2022
Nvidia optiX OptiX	2	598	June 14, 2022
Optix 5 denoiser example? OptiX	3	948	June 14, 2022
[OptiX 7] Tiled Denoiser OptiX	12	1300	June 14, 2022
Optix denoiser implementation exhibits black square artifacts OptiX	20	301	September 20, 2024
Basic OptiX use turning image red OptiX	7	936	November 11, 2021

Importing Vulkan image into OptixImage2D for Optix Denoiser

Related topics