Importing NvBufSurfaces into Vulkan

I am importing NVBufSurfaces (created for a v4l2 dec pipeline) into Vulkan by passing the bufferDesc from the surface to fd of VkImportMemoryFdInfoKHR.

This works great for a single stream.

Once I spin up a second stream in parallel, things can start to get wonky. Sometimes NVBufSurface will return fds for the second stream that are the same fd index as the first stream. It seems that vulkan takes ownership of these dma buffers and NVBufSurface thinks they are up for grabs? This results in issues between the two streams, and data from one gets smeared onto the other. This wonkiness doesnt happen every time, sometimes things work out fine and we get two streams in parallel working as intended.

I used to use nvbuf_utils.h on Jetpack 4.6 and things worked perfectly fine. Since moving to nvbufsurface.h and Jetpack 5.1 this issue started to appear.

Note: Since this issue is inconsistently appearing, I do two things to make it pop up more:

  1. I add a 750ms sleep to the first stream to have it wait a bit till the second stream starts to have conditions that are a bit more parallel, this causes the issue to appear more often, so I currently believe there is some sort of race condition happening somewhere.
  2. I increase the buffer count. I normally run with 4 buffers.

Am I using the wrong memory type? Or is this functionality just not supported? Or should I be forgoing NvBufSurface entirely and create memory with vulkan, export those as DMA buffers to v4l2 directly?

NVBufSurface is created with params as such

bool create_nvbufsurface(int* fd)
{
	NvBufSurface* nvbuf_surf = NULL;
	NvBufSurfaceAllocateParams dma_buf_create_params = {0};

	dma_buf_create_params.params.width = vd->width;
	dma_buf_create_params.params.height = vd->height;
	dma_buf_create_params.params.layout = NVBUF_LAYOUT_BLOCK_LINEAR;
	dma_buf_create_params.params.memType = NVBUF_MEM_SURFACE_ARRAY;
	dma_buf_create_params.memtag = NvBufSurfaceTag_VIDEO_DEC;
	dma_buf_create_params.params.colorFormat = NVBUF_COLOR_FORMAT_NV24;
	ret = NvBufSurfaceAllocate(&nvbuf_surf, 1, &dma_buf_create_params);
	if (ret < 0)
	{
		printf("Failed to create NvBufSurface %i", j);
		return false;
	}

	*fd = nvbuf_surf->surfaceList[0].bufferDesc;
	nvbuf_surf->numFilled = 1;
	return true;
}

Vulkan import

bool create_vkbufsurface(nvDmaBuffer* buffer, VkDeviceMemory* mem)
{
	NvBufSurface* nvbuf_surf = NULL;
	NvBufSurfaceFromFd(buffers->fd, (void**) &nvbuf_surf);
	if (!nvbuf_surf)
	{
		printf("Failed to get NvBufSurface from Fd");
		return false;
	}

	VkImportMemoryFdInfoKHR import_fd_info = {};
	import_fd_info.sType = VK_STRUCTURE_TYPE_IMPORT_MEMORY_FD_INFO_KHR;
	import_fd_info.fd = buffer->fd;
	import_fd_info.handleType = VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT;

	VkMemoryAllocateInfo allocate_info = {};
	allocate_info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
	allocate_info.allocationSize = nvbuf_surf->surfaceList[0].planeParams.psize[0] + nvbuf_surf->surfaceList[0].planeParams.psize[1];;
	allocate_info.memoryTypeIndex = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
	allocate_info.pNext = &import_fd_info;

	if (VK_SUCCESS != vkAllocateMemory(device, &allocate_info, NULL, mem))
	{
		printf("Failed to allocate memory");
		return false;
	}

	return true;
}

timeline of bad good creation of dma buffers

Stream 0; NvBufSurfaceAllocate created buffer with fd: 146
Stream 0; NvBufSurfaceAllocate created buffer with fd: 147
Stream 0; NvBufSurfaceAllocate created buffer with fd: 148
Stream 0; NvBufSurfaceAllocate created buffer with fd: 149
Stream 0; NvBufSurfaceAllocate created buffer with fd: 150
Stream 0; NvBufSurfaceAllocate created buffer with fd: 152
Stream 0; NvBufSurfaceAllocate created buffer with fd: 151
Stream 0; NvBufSurfaceAllocate created buffer with fd: 153
Stream 0; NvBufSurfaceAllocate created buffer with fd: 154
Stream 0; NvBufSurfaceAllocate created buffer with fd: 155
Stream 0; NvBufSurfaceAllocate created buffer with fd: 156
Stream 0; NvBufSurfaceAllocate created buffer with fd: 157
Stream 0; NvBufSurfaceAllocate created buffer with fd: 158
Stream 0; NvBufSurfaceAllocate created buffer with fd: 159
Stream 0; Importing DMA Bufs: 146
Stream 0; Importing DMA Bufs: 147
Stream 0; Importing DMA Bufs: 148
Stream 0; Importing DMA Bufs: 149
Stream 0; Importing DMA Bufs: 150
Stream 0; Importing DMA Bufs: 152
Stream 0; Importing DMA Bufs: 151
Stream 0; Importing DMA Bufs: 153
Stream 0; Importing DMA Bufs: 154
Stream 0; Importing DMA Bufs: 155
Stream 0; Importing DMA Bufs: 156
Stream 0; Importing DMA Bufs: 157
Stream 0; Importing DMA Bufs: 158
Stream 0; Importing DMA Bufs: 159
Stream 1; NvBufSurfaceAllocate created buffer with fd: 227
Stream 1; NvBufSurfaceAllocate created buffer with fd: 228
Stream 1; NvBufSurfaceAllocate created buffer with fd: 230
Stream 1; NvBufSurfaceAllocate created buffer with fd: 229
Stream 1; NvBufSurfaceAllocate created buffer with fd: 231
Stream 1; NvBufSurfaceAllocate created buffer with fd: 232
Stream 1; NvBufSurfaceAllocate created buffer with fd: 233
Stream 1; NvBufSurfaceAllocate created buffer with fd: 234
Stream 1; NvBufSurfaceAllocate created buffer with fd: 235
Stream 1; NvBufSurfaceAllocate created buffer with fd: 236
Stream 1; NvBufSurfaceAllocate created buffer with fd: 237
Stream 1; NvBufSurfaceAllocate created buffer with fd: 238
Stream 1; NvBufSurfaceAllocate created buffer with fd: 240
Stream 1; NvBufSurfaceAllocate created buffer with fd: 239
Stream 1; Importing DMA Bufs: 227
Stream 1; Importing DMA Bufs: 228
Stream 1; Importing DMA Bufs: 230
Stream 1; Importing DMA Bufs: 229
Stream 1; Importing DMA Bufs: 231
Stream 1; Importing DMA Bufs: 232
Stream 1; Importing DMA Bufs: 233
Stream 1; Importing DMA Bufs: 234
Stream 1; Importing DMA Bufs: 235
Stream 1; Importing DMA Bufs: 236
Stream 1; Importing DMA Bufs: 237
Stream 1; Importing DMA Bufs: 238
Stream 1; Importing DMA Bufs: 240
Stream 1; Importing DMA Bufs: 239

timeline of bad creation of dma buffers (notice how fd indexes are reused for Stream1)

Stream 0; NvBufSurfaceAllocate created buffer with fd: 148
Stream 0; NvBufSurfaceAllocate created buffer with fd: 147
Stream 0; NvBufSurfaceAllocate created buffer with fd: 149
Stream 0; NvBufSurfaceAllocate created buffer with fd: 150
Stream 0; NvBufSurfaceAllocate created buffer with fd: 151
Stream 0; NvBufSurfaceAllocate created buffer with fd: 152
Stream 0; NvBufSurfaceAllocate created buffer with fd: 153
Stream 0; NvBufSurfaceAllocate created buffer with fd: 154
Stream 0; NvBufSurfaceAllocate created buffer with fd: 155
Stream 0; NvBufSurfaceAllocate created buffer with fd: 156
Stream 0; NvBufSurfaceAllocate created buffer with fd: 157
Stream 0; NvBufSurfaceAllocate created buffer with fd: 158
Stream 0; NvBufSurfaceAllocate created buffer with fd: 159
Stream 0; NvBufSurfaceAllocate created buffer with fd: 160
Stream 0; Importing DMA Bufs: 148
Stream 0; Importing DMA Bufs: 147
Stream 0; Importing DMA Bufs: 149
Stream 0; Importing DMA Bufs: 150
Stream 0; Importing DMA Bufs: 151
Stream 0; Importing DMA Bufs: 152
Stream 0; Importing DMA Bufs: 153
Stream 0; Importing DMA Bufs: 154
Stream 0; Importing DMA Bufs: 155
Stream 0; Importing DMA Bufs: 156
Stream 0; Importing DMA Bufs: 157
Stream 0; Importing DMA Bufs: 158
Stream 0; Importing DMA Bufs: 159
Stream 0; Importing DMA Bufs: 160
Stream 1; NvBufSurfaceAllocate created buffer with fd: 151
Stream 1; NvBufSurfaceAllocate created buffer with fd: 152
Stream 1; NvBufSurfaceAllocate created buffer with fd: 153
Stream 1; NvBufSurfaceAllocate created buffer with fd: 154
Stream 1; NvBufSurfaceAllocate created buffer with fd: 155
Stream 1; NvBufSurfaceAllocate created buffer with fd: 156
Stream 1; NvBufSurfaceAllocate created buffer with fd: 157
Stream 1; NvBufSurfaceAllocate created buffer with fd: 158
Stream 1; NvBufSurfaceAllocate created buffer with fd: 159
Stream 1; NvBufSurfaceAllocate created buffer with fd: 245
Stream 1; NvBufSurfaceAllocate created buffer with fd: 246
Stream 1; NvBufSurfaceAllocate created buffer with fd: 160
Stream 1; NvBufSurfaceAllocate created buffer with fd: 247
Stream 1; Importing DMA Bufs: 151
Stream 1; Importing DMA Bufs: 152
Stream 1; Importing DMA Bufs: 153
Stream 1; Importing DMA Bufs: 154
Stream 1; Importing DMA Bufs: 155
Stream 1; Importing DMA Bufs: 156
Stream 1; Importing DMA Bufs: 157
Stream 1; Importing DMA Bufs: 158
Stream 1; Importing DMA Bufs: 159
Stream 1; Importing DMA Bufs: 245
Stream 1; Importing DMA Bufs: 246
Stream 1; Importing DMA Bufs: 160
Stream 1; Importing DMA Bufs: 247
1 Like

Hi,
Please share a sample and steps for replicating the issue. So that we can try and check with our teams.

Here is a test i spun up for you. Im running a Jetson NX running 5.1. Compiles and runs directly on a Jetson with make. If you have any more questions ill be around.

Additional thoughts. It would make sense if when vulkan takes ownership of the buffer behind the fd, that it closes the fd, so once that happens the fd is up for grabs by nvbufsurface again. If this is the case, then how do you use v4l2 with V4L2_MEMORY_DMABUF, and pass the results to vulkan, since i believe DMABUF implementation of v4l2 uses that fd, which casues my data to get scrambled?

Note: libvulkan-dev and vulkan-validationlayers-dev (apt package) needs to be installed in order to compile
My last run of this code resulted in this.

WRN nvbuf.c:29;         1 | Allocated buffer with fd:20
WRN nvbuf.c:29;         1 | Allocated buffer with fd:21
WRN nvbuf.c:29;         1 | Allocated buffer with fd:22
WRN nvbuf.c:29;         1 | Allocated buffer with fd:23
WRN nvbuf.c:29;         1 | Allocated buffer with fd:24
WRN nvbuf.c:29;         1 | Allocated buffer with fd:25
WRN nvbuf.c:29;         1 | Allocated buffer with fd:26
WRN nvbuf.c:29;         1 | Allocated buffer with fd:27
WRN nvbuf.c:29;         1 | Allocated buffer with fd:28
WRN nvbuf.c:29;         1 | Allocated buffer with fd:29
WRN nvbuf.c:29;         1 | Allocated buffer with fd:30
WRN nvbuf.c:29;         1 | Allocated buffer with fd:31
WRN nvbuf.c:29;         1 | Allocated buffer with fd:32
WRN nvbuf.c:29;         1 | Allocated buffer with fd:33
WRN nvbuf.c:29;         1 | Allocated buffer with fd:34
WRN nvbuf.c:29;         1 | Allocated buffer with fd:35
WRN nvbuf.c:29;         1 | Allocated buffer with fd:36
WRN nvbuf.c:29;         1 | Allocated buffer with fd:37
WRN nvbuf.c:29;         1 | Allocated buffer with fd:38
WRN nvbuf.c:29;         1 | Allocated buffer with fd:39
WRN nvbuf.c:29;         1 | Allocated buffer with fd:40
WRN nvbuf.c:29;         1 | Allocated buffer with fd:41
WRN nvbuf.c:29;         1 | Allocated buffer with fd:42
WRN nvbuf.c:29;         1 | Allocated buffer with fd:43
WRN nvbuf.c:29;         1 | Allocated buffer with fd:44
WRN nvbuf.c:29;         1 | Allocated buffer with fd:45
WRN nvbuf.c:29;         1 | Allocated buffer with fd:46
WRN nvbuf.c:29;         1 | Allocated buffer with fd:47
WRN nvbuf.c:29;         1 | Allocated buffer with fd:48
WRN nvbuf.c:29;         1 | Allocated buffer with fd:49
INF render/buffer.c:45; 1 | Imported buffer with fd: 20
INF render/buffer.c:45; 1 | Imported buffer with fd: 21
WRN nvbuf.c:29;         0 | Allocated buffer with fd:21
WRN nvbuf.c:29;         0 | Allocated buffer with fd:22
INF render/buffer.c:45; 1 | Imported buffer with fd: 22
WRN nvbuf.c:29;         0 | Allocated buffer with fd:23
INF render/buffer.c:45; 1 | Imported buffer with fd: 23
INF render/buffer.c:45; 1 | Imported buffer with fd: 24
INF render/buffer.c:45; 1 | Imported buffer with fd: 25
WRN nvbuf.c:29;         0 | Allocated buffer with fd:24
INF render/buffer.c:45; 1 | Imported buffer with fd: 26
INF render/buffer.c:45; 1 | Imported buffer with fd: 27
INF render/buffer.c:45; 1 | Imported buffer with fd: 28
WRN nvbuf.c:29;         0 | Allocated buffer with fd:25
INF render/buffer.c:45; 1 | Imported buffer with fd: 29
INF render/buffer.c:45; 1 | Imported buffer with fd: 30
INF render/buffer.c:45; 1 | Imported buffer with fd: 31
INF render/buffer.c:45; 1 | Imported buffer with fd: 32
WRN nvbuf.c:29;         0 | Allocated buffer with fd:26
INF render/buffer.c:45; 1 | Imported buffer with fd: 33
INF render/buffer.c:45; 1 | Imported buffer with fd: 34
INF render/buffer.c:45; 1 | Imported buffer with fd: 35
WRN nvbuf.c:29;         0 | Allocated buffer with fd:27
INF render/buffer.c:45; 1 | Imported buffer with fd: 36
INF render/buffer.c:45; 1 | Imported buffer with fd: 37
INF render/buffer.c:45; 1 | Imported buffer with fd: 38
WRN nvbuf.c:29;         0 | Allocated buffer with fd:28
INF render/buffer.c:45; 1 | Imported buffer with fd: 39
INF render/buffer.c:45; 1 | Imported buffer with fd: 40
INF render/buffer.c:45; 1 | Imported buffer with fd: 41
INF render/buffer.c:45; 1 | Imported buffer with fd: 42
WRN nvbuf.c:29;         0 | Allocated buffer with fd:29
INF render/buffer.c:45; 1 | Imported buffer with fd: 43
INF render/buffer.c:45; 1 | Imported buffer with fd: 44
INF render/buffer.c:45; 1 | Imported buffer with fd: 45
INF render/buffer.c:45; 1 | Imported buffer with fd: 46
INF render/buffer.c:45; 1 | Imported buffer with fd: 47
WRN nvbuf.c:29;         0 | Allocated buffer with fd:30
INF render/buffer.c:45; 1 | Imported buffer with fd: 48
INF render/buffer.c:45; 1 | Imported buffer with fd: 49
WRN nvbuf.c:29;         0 | Allocated buffer with fd:20
WRN nvbuf.c:29;         0 | Allocated buffer with fd:31
WRN nvbuf.c:29;         0 | Allocated buffer with fd:32
WRN nvbuf.c:29;         0 | Allocated buffer with fd:33
WRN nvbuf.c:29;         0 | Allocated buffer with fd:34
WRN nvbuf.c:29;         0 | Allocated buffer with fd:35
WRN nvbuf.c:29;         0 | Allocated buffer with fd:36
WRN nvbuf.c:29;         0 | Allocated buffer with fd:37
WRN nvbuf.c:29;         0 | Allocated buffer with fd:38
WRN nvbuf.c:29;         0 | Allocated buffer with fd:39
WRN nvbuf.c:29;         0 | Allocated buffer with fd:40
WRN nvbuf.c:29;         0 | Allocated buffer with fd:41
WRN nvbuf.c:29;         0 | Allocated buffer with fd:42
WRN nvbuf.c:29;         0 | Allocated buffer with fd:43
WRN nvbuf.c:29;         0 | Allocated buffer with fd:44
WRN nvbuf.c:29;         0 | Allocated buffer with fd:45
WRN nvbuf.c:29;         0 | Allocated buffer with fd:46
WRN nvbuf.c:29;         0 | Allocated buffer with fd:47
WRN nvbuf.c:29;         0 | Allocated buffer with fd:48
WRN nvbuf.c:29;         0 | Allocated buffer with fd:49
INF render/buffer.c:45; 0 | Imported buffer with fd: 21
INF render/buffer.c:45; 0 | Imported buffer with fd: 22
INF render/buffer.c:45; 0 | Imported buffer with fd: 23
INF render/buffer.c:45; 0 | Imported buffer with fd: 24
INF render/buffer.c:45; 0 | Imported buffer with fd: 25
INF render/buffer.c:45; 0 | Imported buffer with fd: 26
INF render/buffer.c:45; 0 | Imported buffer with fd: 27
INF render/buffer.c:45; 0 | Imported buffer with fd: 28
INF render/buffer.c:45; 0 | Imported buffer with fd: 29
INF render/buffer.c:45; 0 | Imported buffer with fd: 30
INF render/buffer.c:45; 0 | Imported buffer with fd: 20
INF render/buffer.c:45; 0 | Imported buffer with fd: 31
INF render/buffer.c:45; 0 | Imported buffer with fd: 32
INF render/buffer.c:45; 0 | Imported buffer with fd: 33
INF render/buffer.c:45; 0 | Imported buffer with fd: 34
INF render/buffer.c:45; 0 | Imported buffer with fd: 35
INF render/buffer.c:45; 0 | Imported buffer with fd: 36
INF render/buffer.c:45; 0 | Imported buffer with fd: 37
INF render/buffer.c:45; 0 | Imported buffer with fd: 38
INF render/buffer.c:45; 0 | Imported buffer with fd: 39
INF render/buffer.c:45; 0 | Imported buffer with fd: 40
INF render/buffer.c:45; 0 | Imported buffer with fd: 41
INF render/buffer.c:45; 0 | Imported buffer with fd: 42
INF render/buffer.c:45; 0 | Imported buffer with fd: 43
INF render/buffer.c:45; 0 | Imported buffer with fd: 44
INF render/buffer.c:45; 0 | Imported buffer with fd: 45
INF render/buffer.c:45; 0 | Imported buffer with fd: 46
INF render/buffer.c:45; 0 | Imported buffer with fd: 47
INF render/buffer.c:45; 0 | Imported buffer with fd: 48
INF render/buffer.c:45; 0 | Imported buffer with fd: 49

dmatest.7z (6.7 KB)

In addition to my previous comment . . .

Ive made some additional findings and put some small updates here in this new zip file.

The changes also track the NvBufSurface returned from all NvBufSurfaceAllocate calls. Then instead of trying to free the memory by grabbing the surfaces from the FDs, I call NvBufSurfaceDestroy directly on the surface created with NvBufSurfaceAllocate. This shows me that the when a fd is reused, NvBufSurface will fail to destroy the NvBufSurface associated with it the second time, even though I called NvBufSurfaceAllocate twice.

I thought maybe that the VulkanImport is corrupting them, so I only did this on one thread (like so: NvBufSurfaceAllocate->vkAllocateMemory->vkFreeMemory->NvBufSurfaceDestroy) and the result was no errors. Its only when this workflow is done in parallel that the issue arises. I have tried to wrap the calls to vulkan and NvBufSurface in a single mutex, that doesnt seem to fix the issues, since these calls are asynchronous ( i assume).

dmatest.7z (6.6 KB)

I’m closing this topic due to there is no update from you for a period, assuming this issue was resolved.
If still need the support, please open a new topic. Thanks

Hi,
Please share which print indicates it fails. In render_import_dma_fd(), it prints out:

	LOG_I("%d | Imported buffer with fd: %u", index, fd);

But not sure if this print identifies the error. Please advise.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.