CUDA_ERROR_INVALID_VALUE importing Vulkan VkImage


I’m trying to use a Vulkan VkImage as a CUarray. The ultimate objective is to feed the CUarray to the Video Encoder.

I understand that CUarray input is available in Video Coded 9.0 and that CUDA-Vulkan interoperability is available in CUDA 10.

Base code comes from, in particular I’m using the 01 - Vulkan Gears demo, enriched with the saveScreenshot method from 09 - Capturing screenshots

Instead of saving the snapshot image to a file, I’ll be sending the snapshot image into CUDA as a CUarray.

I’ve enabled the following instance and device extensions:

std::vector<const char*> instanceExtensions = {

std::vector<const char*> deviceExtensions = { 

have a VkImage, created as follows:

// Create the linear tiled destination image to copy to and to read the memory from
        VkImageCreateInfo imageCreateCI(vks::initializers::imageCreateInfo());
        imageCreateCI.imageType = VK_IMAGE_TYPE_2D;
        // Note that vkCmdBlitImage (if supported) will also do format conversions if the swapchain color format would differ
        imageCreateCI.format = VK_FORMAT_R8G8B8A8_UNORM;
        imageCreateCI.extent.width = width;
        imageCreateCI.extent.height = height;
        imageCreateCI.extent.depth = 1;
        imageCreateCI.arrayLayers = 1;
        imageCreateCI.mipLevels = 1;
        imageCreateCI.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
        imageCreateCI.samples = VK_SAMPLE_COUNT_1_BIT;
        imageCreateCI.tiling = VK_IMAGE_TILING_LINEAR;
        imageCreateCI.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

        VkExternalMemoryImageCreateInfoKHR extImageCreateInfo = {};

         * Indicate that the memory backing this image will be exported in an
         * fd. In some implementations, this may affect the call to
         * GetImageMemoryRequirements() with this image.
        extImageCreateInfo.handleTypes |= VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;

        imageCreateCI.pNext = &extImageCreateInfo;

        // Create the image
        VkImage dstImage;
        VK_CHECK_RESULT(vkCreateImage(device, &imageCreateCI, nullptr, &dstImage));
        // Create memory to back up the image
        VkMemoryRequirements memRequirements;
        VkMemoryAllocateInfo memAllocInfo(vks::initializers::memoryAllocateInfo());
        VkDeviceMemory dstImageMemory;
        vkGetImageMemoryRequirements(device, dstImage, &memRequirements);
        memAllocInfo.allocationSize = memRequirements.size;
        // Memory must be host visible to copy from
        memAllocInfo.memoryTypeIndex = vulkanDevice->getMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);

        VkExportMemoryAllocateInfoKHR exportInfo = {};

        memAllocInfo.pNext = &exportInfo;

        VK_CHECK_RESULT(vkAllocateMemory(device, &memAllocInfo, nullptr, &dstImageMemory));
        VK_CHECK_RESULT(vkBindImageMemory(device, dstImage, dstImageMemory, 0));

From there I’ll:

Get the Vulkan Memory Handler:

int CuEncoderImpl::getVulkanMemoryHandle(VkDevice device,
        VkDeviceMemory memory) {
    // Get handle to memory of the VkImage

    int fd = -1;
    VkMemoryGetFdInfoKHR fdInfo = { };
    fdInfo.memory = memory;

    auto func = (PFN_vkGetMemoryFdKHR) vkGetDeviceProcAddr(device,

    if (!func) {
        printf("Failed to locate function vkGetMemoryFdKHR\n");
        return -1;

    VkResult r = func(device, &fdInfo, &fd);
    if (r != VK_SUCCESS) {
        printf("Failed executing vkGetMemoryFdKHR [%d]\n", r);
        return -1;

    return fd;


Import the memory:

    memDesc.handle.fd = getVulkanMemoryHandle(device, memory);
    memDesc.size = extent.width*extent.height*4;

    CUDA_DRVAPI_CALL(cuImportExternalMemory(&externalMem, &memDesc));

And map the memory: This is the step that it is failing.

CUarray CuEncoderImpl::getCUDAArrayFromExternalMemory(const VkExtent3D &extent,const CUexternalMemory &m_extMem) {
    CUmipmappedArray m_mipmapArray;
    CUresult result = CUDA_SUCCESS;
    CUarray array;

    CUDA_ARRAY3D_DESCRIPTOR arrayDesc = { };
    arrayDesc.Width = extent.width;
    arrayDesc.Height = extent.height;
    arrayDesc.Depth = 0;
    arrayDesc.Format = CU_AD_FORMAT_UNSIGNED_INT32;
    arrayDesc.NumChannels = 4;
    arrayDesc.Flags = CUDA_ARRAY3D_SURFACE_LDST;

    mipmapArrayDesc.arrayDesc = arrayDesc;
    mipmapArrayDesc.numLevels = 1;
    mipmapArrayDesc.offset = 0;

    CUDA_DRVAPI_CALL(cuExternalMemoryGetMappedMipmappedArray(&m_mipmapArray, m_extMem, &mipmapArrayDesc));

    CUDA_DRVAPI_CALL(cuMipmappedArrayGetLevel(&array, m_mipmapArray, 0));
    return array;

I’ve been trying multiple combinations of the parameters, but failed so far. The error point to an invalid parameter, but I’m not sure how to find what’s wrong.

Only thing that had worked is to map the Vulkan image memory to a host buffer and then copying it into the CUDA array… but I guess that’s expensive and I’d like to avoid it if possible.

I’ve seen the CUDA-Vulkan interoperability example, but that only deals with VkBuffers.

I’ve also seen the Video Codec sample AppMotionEstimationVkCuda, but this only seems to deal with a single channel image.

I guess there’s some limitation on the structure/layout/tiling of the VkImage to make it mappable into the CUarray, but I’ve been unable to find which are those limitations.

Thanks very much in advance and best regards

For the record, I finally got this to work.

Some notes and the modifications I had to do to the code listed in the question:

  1. The tiling of the image that is going to be mapped had to be VK_IMAGE_TILING_OPTIMAL
    imageCreateCI.tiling = VK_IMAGE_TILING_OPTIMAL;
  2. The memory for that image must be allocated with the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT ``` memAllocInfo.memoryTypeIndex = vulkanDevice->getMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT); ```
  3. The memory descriptor when importing the memory should use the memory size that was returned in the memory requirements (size below is memRequirements.size from the code creating the image): ``` CUDA_EXTERNAL_MEMORY_HANDLE_DESC memDesc = { }; memDesc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD; memDesc.handle.fd = getVulkanMemoryHandle(device, memory); memDesc.size = size;

    CUDA_DRVAPI_CALL(cuImportExternalMemory(&externalMem, &memDesc));

    <li>The mapped array is described as being CU_AD_FORMAT_UNSIGNED_INT8 with four channels and with a CUDA_ARRAY3D_COLOR_ATTACHMENT

    CUDA_ARRAY3D_DESCRIPTOR arrayDesc = { };
    arrayDesc.Width = extent.width;
    arrayDesc.Height = extent.height;
    arrayDesc.Depth = 0;
    arrayDesc.Format = CU_AD_FORMAT_UNSIGNED_INT8;
    arrayDesc.NumChannels = 4;

    mipmapArrayDesc.arrayDesc = arrayDesc;
    mipmapArrayDesc.numLevels = 1;
    mipmapArrayDesc.offset = 0;

    CUDA_DRVAPI_CALL(cuExternalMemoryGetMappedMipmappedArray(&m_mipmapArray, m_extMem, &mipmapArrayDesc));

    <li>On the NVENC side, the InputBuffers are allocated based on


    desc.Format = CU_AD_FORMAT_UNSIGNED_INT32;
    desc.NumChannels = 4;
    desc.Height = getMaxEncodeHeight();
    desc.Width = getMaxEncodeWidth();
    desc.Depth = 0;

    CUDA_DRVAPI_CALL(cuArray3DCreate((CUarray *)&array, &desc));

    <li>Finally, on each frame, the CUarray mapped from VkImage is copied into the next available inputBuffer

    CUDA_MEMCPY3D copy;

    copy.srcXInBytes = 0; /< Source X in bytes */
    copy.srcY = 0; /
    < Source Y */
    copy.srcZ = 0; /< Source Z */
    copy.srcLOD = 0; /
    < Source LOD */
    copy.srcMemoryType = CU_MEMORYTYPE_ARRAY;
    copy.srcArray = array;

    copy.dstXInBytes = 0; /< Destination X in bytes */
    copy.dstY = 0; /
    < Destination Y */
    copy.dstZ = 0; /< Destination Z */
    copy.dstLOD = 0; /
    < Destination LOD */
    copy.dstMemoryType = CU_MEMORYTYPE_ARRAY; /**< Destination memory type (host, device, array) */
    copy.dstArray = target;

    copy.WidthInBytes = extent.width * 4; /< Width of 3D memory copy in bytes */
    copy.Height = extent.height; /
    < Height of 3D memory copy */
    copy.Depth = 1; /**< Depth of 3D memory copy */


    After those changes, I was able to get it to work. I few the changes were glaring mistakes on my side (like the size), a few things I found carefully re-reading the documentation for the 100th time, others were guesses at hints in the documentation and, finally, a lot of trial and error.
    There are still a few things that puzzle me (e.g. why the CUarray in the input buffer has four channels with format CU_AD_FORMAT_UNSIGNED_INT32, while the mapped array has four channels with format CU_AD_FORMAT_UNSIGNED_INT8. Perhaps the difference is with the flags, one is CUDA_ARRAY3D_COLOR_ATTACHMENT, while the other is CUDA_ARRAY3D_SURFACE_LDST.

Hi. That info is super precious! One thing I wonder:

Step 6, why do you copy from Cuda array to NVENC input buffer? I read that NVENC SDK 9 allows direct mapping of CuArray,so if you map Vulkan Image to CuArray then that should work,shouldn’t it?

Thanks for the code examples!

Might be wrong… but I was seeing the input buffer as a pool of reusable memory (in this case of CuArray)… while the other CuArray (the source of the copy) is a view into the memory of the Vulkan Image, which in my case comes from elsewhere

Perhaps if you have control on how your frame is generated, you could lock one CuArray (from the InputBuffers) and map it into a Vulkan Image that you can use to render

Hi again. Are you encoding from RGBA? What NV_ENC_BUFFER_FORMAT type do you use on the encoder side?