CUDA_ERROR_INVALID_VALUE importing Vulkan VkImage

jsantander · March 30, 2019, 11:11am

Hello,

I’m trying to use a Vulkan VkImage as a CUarray. The ultimate objective is to feed the CUarray to the Video Encoder.

I understand that CUarray input is available in Video Coded 9.0 and that CUDA-Vulkan interoperability is available in CUDA 10.

Base code comes from https://github.com/SaschaWillems/Vulkan, in particular I’m using the 01 - Vulkan Gears demo, enriched with the saveScreenshot method from 09 - Capturing screenshots

Instead of saving the snapshot image to a file, I’ll be sending the snapshot image into CUDA as a CUarray.

I’ve enabled the following instance and device extensions:

std::vector<const char*> instanceExtensions = {
    VK_EXT_DEBUG_REPORT_EXTENSION_NAME,
    VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME,
    VK_KHR_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME,
    VK_KHR_EXTERNAL_SEMAPHORE_CAPABILITIES_EXTENSION_NAME };

std::vector<const char*> deviceExtensions = { 
    VK_KHR_EXTERNAL_MEMORY_EXTENSION_NAME,
    VK_KHR_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
    VK_KHR_EXTERNAL_SEMAPHORE_EXTENSION_NAME,
    VK_KHR_EXTERNAL_SEMAPHORE_FD_EXTENSION_NAME };

have a VkImage, created as follows:

// Create the linear tiled destination image to copy to and to read the memory from
        VkImageCreateInfo imageCreateCI(vks::initializers::imageCreateInfo());
        imageCreateCI.imageType = VK_IMAGE_TYPE_2D;
        // Note that vkCmdBlitImage (if supported) will also do format conversions if the swapchain color format would differ
        imageCreateCI.format = VK_FORMAT_R8G8B8A8_UNORM;
        imageCreateCI.extent.width = width;
        imageCreateCI.extent.height = height;
        imageCreateCI.extent.depth = 1;
        imageCreateCI.arrayLayers = 1;
        imageCreateCI.mipLevels = 1;
        imageCreateCI.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
        imageCreateCI.samples = VK_SAMPLE_COUNT_1_BIT;
        imageCreateCI.tiling = VK_IMAGE_TILING_LINEAR;
        imageCreateCI.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
        imageCreateCI.usage = VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT;

        VkExternalMemoryImageCreateInfoKHR extImageCreateInfo = {};

        /*
         * Indicate that the memory backing this image will be exported in an
         * fd. In some implementations, this may affect the call to
         * GetImageMemoryRequirements() with this image.
         */
        extImageCreateInfo.sType = VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO_KHR;
        extImageCreateInfo.handleTypes |= VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;

        imageCreateCI.pNext = &extImageCreateInfo;

        // Create the image
        VkImage dstImage;
        VK_CHECK_RESULT(vkCreateImage(device, &imageCreateCI, nullptr, &dstImage));
        // Create memory to back up the image
        VkMemoryRequirements memRequirements;
        VkMemoryAllocateInfo memAllocInfo(vks::initializers::memoryAllocateInfo());
        VkDeviceMemory dstImageMemory;
        vkGetImageMemoryRequirements(device, dstImage, &memRequirements);
        memAllocInfo.allocationSize = memRequirements.size;
        // Memory must be host visible to copy from
        memAllocInfo.memoryTypeIndex = vulkanDevice->getMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);

        VkExportMemoryAllocateInfoKHR exportInfo = {};
        exportInfo.sType = VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO_KHR;
        exportInfo.handleTypes = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;

        memAllocInfo.pNext = &exportInfo;


        VK_CHECK_RESULT(vkAllocateMemory(device, &memAllocInfo, nullptr, &dstImageMemory));
        VK_CHECK_RESULT(vkBindImageMemory(device, dstImage, dstImageMemory, 0));

From there I’ll:

Get the Vulkan Memory Handler:

int CuEncoderImpl::getVulkanMemoryHandle(VkDevice device,
        VkDeviceMemory memory) {
    // Get handle to memory of the VkImage

    int fd = -1;
    VkMemoryGetFdInfoKHR fdInfo = { };
    fdInfo.sType = VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR;
    fdInfo.memory = memory;
    fdInfo.handleType = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;

    auto func = (PFN_vkGetMemoryFdKHR) vkGetDeviceProcAddr(device,
            "vkGetMemoryFdKHR");

    if (!func) {
        printf("Failed to locate function vkGetMemoryFdKHR\n");
        return -1;
    }

    VkResult r = func(device, &fdInfo, &fd);
    if (r != VK_SUCCESS) {
        printf("Failed executing vkGetMemoryFdKHR [%d]\n", r);
        return -1;
    }

    return fd;

}

Import the memory:

CUDA_EXTERNAL_MEMORY_HANDLE_DESC memDesc = { };
    memDesc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD;
    memDesc.handle.fd = getVulkanMemoryHandle(device, memory);
    memDesc.size = extent.width*extent.height*4;

    CUDA_DRVAPI_CALL(cuImportExternalMemory(&externalMem, &memDesc));

And map the memory: This is the step that it is failing.

CUarray CuEncoderImpl::getCUDAArrayFromExternalMemory(const VkExtent3D &extent,const CUexternalMemory &m_extMem) {
    CUmipmappedArray m_mipmapArray;
    CUresult result = CUDA_SUCCESS;
    CUarray array;

    CUDA_ARRAY3D_DESCRIPTOR arrayDesc = { };
    arrayDesc.Width = extent.width;
    arrayDesc.Height = extent.height;
    arrayDesc.Depth = 0;
    arrayDesc.Format = CU_AD_FORMAT_UNSIGNED_INT32;
    arrayDesc.NumChannels = 4;
    arrayDesc.Flags = CUDA_ARRAY3D_SURFACE_LDST;

    CUDA_EXTERNAL_MEMORY_MIPMAPPED_ARRAY_DESC mipmapArrayDesc = { };
    mipmapArrayDesc.arrayDesc = arrayDesc;
    mipmapArrayDesc.numLevels = 1;
    mipmapArrayDesc.offset = 0;

    CUDA_DRVAPI_CALL(cuExternalMemoryGetMappedMipmappedArray(&m_mipmapArray, m_extMem, &mipmapArrayDesc));

    CUDA_DRVAPI_CALL(cuMipmappedArrayGetLevel(&array, m_mipmapArray, 0));
    return array;
}

I’ve been trying multiple combinations of the parameters, but failed so far. The error point to an invalid parameter, but I’m not sure how to find what’s wrong.

Only thing that had worked is to map the Vulkan image memory to a host buffer and then copying it into the CUDA array… but I guess that’s expensive and I’d like to avoid it if possible.

I’ve seen the CUDA-Vulkan interoperability example, but that only deals with VkBuffers.

I’ve also seen the Video Codec sample AppMotionEstimationVkCuda, but this only seems to deal with a single channel image.

I guess there’s some limitation on the structure/layout/tiling of the VkImage to make it mappable into the CUarray, but I’ve been unable to find which are those limitations.

Thanks very much in advance and best regards

jsantander · April 6, 2019, 10:59am

For the record, I finally got this to work.

Some notes and the modifications I had to do to the code listed in the question:

The tiling of the image that is going to be mapped had to be VK_IMAGE_TILING_OPTIMAL
```
imageCreateCI.tiling = VK_IMAGE_TILING_OPTIMAL;
```
The memory for that image must be allocated with the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT ``` memAllocInfo.memoryTypeIndex = vulkanDevice->getMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT); ```
The memory descriptor when importing the memory should use the memory size that was returned in the memory requirements (size below is memRequirements.size from the code creating the image): ``` CUDA_EXTERNAL_MEMORY_HANDLE_DESC memDesc = { }; memDesc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD; memDesc.handle.fd = getVulkanMemoryHandle(device, memory); memDesc.size = size;
CUDA_DRVAPI_CALL(cuImportExternalMemory(&externalMem, &memDesc));
```
</li>
<li>The mapped array is described as being CU_AD_FORMAT_UNSIGNED_INT8 with four channels and with a CUDA_ARRAY3D_COLOR_ATTACHMENT
```
CUDA_ARRAY3D_DESCRIPTOR arrayDesc = { };
arrayDesc.Width = extent.width;
arrayDesc.Height = extent.height;
arrayDesc.Depth = 0;
arrayDesc.Format = CU_AD_FORMAT_UNSIGNED_INT8;
arrayDesc.NumChannels = 4;
arrayDesc.Flags = CUDA_ARRAY3D_COLOR_ATTACHMENT;

CUDA_EXTERNAL_MEMORY_MIPMAPPED_ARRAY_DESC mipmapArrayDesc = { };
mipmapArrayDesc.arrayDesc = arrayDesc;
mipmapArrayDesc.numLevels = 1;
mipmapArrayDesc.offset = 0;

CUDA_DRVAPI_CALL(cuExternalMemoryGetMappedMipmappedArray(&m_mipmapArray, m_extMem, &mipmapArrayDesc));
```
</li>
<li>On the NVENC side, the InputBuffers are allocated based on
```
CUDA_ARRAY3D_DESCRIPTOR desc;

desc.Format = CU_AD_FORMAT_UNSIGNED_INT32;
desc.NumChannels = 4;
desc.Height = getMaxEncodeHeight();
desc.Width = getMaxEncodeWidth();
desc.Depth = 0;
desc.Flags = CUDA_ARRAY3D_SURFACE_LDST;

CUDA_DRVAPI_CALL(cuArray3DCreate((CUarray *)&array, &desc));
```
</li>

<li>Finally, on each frame, the CUarray mapped from VkImage is copied into the next available inputBuffer
```
CUDA_MEMCPY3D copy;

copy.srcXInBytes = 0; /< Source X in bytes */
copy.srcY = 0; /< Source Y */
copy.srcZ = 0; /< Source Z */
copy.srcLOD = 0; /< Source LOD */
copy.srcMemoryType = CU_MEMORYTYPE_ARRAY;
copy.srcArray = array;

copy.dstXInBytes = 0; /< Destination X in bytes */
copy.dstY = 0; /< Destination Y */
copy.dstZ = 0; /< Destination Z */
copy.dstLOD = 0; /< Destination LOD */
copy.dstMemoryType = CU_MEMORYTYPE_ARRAY; /**< Destination memory type (host, device, array) */
copy.dstArray = target;

copy.WidthInBytes = extent.width * 4; /< Width of 3D memory copy in bytes */
copy.Height = extent.height; /< Height of 3D memory copy */
copy.Depth = 1; /**< Depth of 3D memory copy */

CUDA_DRVAPI_CALL(cuMemcpy3D(&copy));
```
</li>

</ol>



After those changes, I was able to get it to work. I few the changes were glaring mistakes on my side (like the size), a few things I found carefully re-reading the documentation for the 100th time, others were guesses at hints in the documentation and, finally, a lot of trial and error.

There are still a few things that puzzle me (e.g. why the CUarray in the input buffer has four channels with format CU_AD_FORMAT_UNSIGNED_INT32, while the mapped array has four channels with format CU_AD_FORMAT_UNSIGNED_INT8. Perhaps the difference is with the flags, one is CUDA_ARRAY3D_COLOR_ATTACHMENT, while the other is CUDA_ARRAY3D_SURFACE_LDST.
```

SasMaster · November 7, 2019, 12:24pm

Hi. That info is super precious! One thing I wonder:

Step 6, why do you copy from Cuda array to NVENC input buffer? I read that NVENC SDK 9 allows direct mapping of CuArray,so if you map Vulkan Image to CuArray then that should work,shouldn’t it?

Thanks for the code examples!

jsantander · November 8, 2019, 7:26pm

Might be wrong… but I was seeing the input buffer as a pool of reusable memory (in this case of CuArray)… while the other CuArray (the source of the copy) is a view into the memory of the Vulkan Image, which in my case comes from elsewhere

Perhaps if you have control on how your frame is generated, you could lock one CuArray (from the InputBuffers) and map it into a Vulkan Image that you can use to render

SasMaster · December 1, 2019, 2:23pm

Hi again. Are you encoding from RGBA? What NV_ENC_BUFFER_FORMAT type do you use on the encoder side?

Topic		Replies	Views
CUDA Vulkan VkImage Interop CUDA Programming and Performance cuda	3	1324	January 27, 2024
CUDA and OpenGL - Beginner question CUDA Programming and Performance	15	4453	March 6, 2015
EGLStream(CUDA) -> cv::cuda::GpuMat using Argus & nppi Computer Vision & Image Processing opencv , cuda	16	1751	August 31, 2023
CUDA-Vulkan buffer interop fails when buffer size <= 448KB CUDA Programming and Performance cuda , vulkan	1	1364	November 30, 2021
Zero Copy Memory vs Unified memory CUDA processing Jetson TX1	28	20134	October 18, 2021
OpenCV Image loading in CUDA texture CUDA Programming and Performance	11	2299	October 12, 2021
strange error in summation memory problems CUDA Programming and Performance	14	16609	July 21, 2010
Concurrent Kernel executions & Data Transfers CUDA Programming and Performance cuda	3	586	March 8, 2023
nvEncEncodePicture stuck for more input Video Processing & Optical Flow cuda , nvenc	2	925	July 24, 2023
CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered Holoscan SDK	3	734	July 25, 2024

CUDA_ERROR_INVALID_VALUE importing Vulkan VkImage

Related topics