NVMAP_IOC_READ error when using hardware decoding

Hi,

I’m using the following official ffmpeg to decode hevc on Jetson platform,

the-nvidia-ffmpeg-package-supports-hardware-accelerated-decode-on-jetson-platforms

I modified some code:

  1. changed output pixel format from YUV420P to NV12
  2. used cudaMallocManaged instead of malloc
  3. other code related to ffmpeg

I got a correct image output, but sometimes there were error messages,

NVMAP_IOC_READ failed: Interrupted system call
NVMAP_IOC_READ: Offset 0 SrcStride 3840 pDst 0x205625000 DstStride 3840 Count 2160

When it happened, the decoded image is not correct.

I don’t know what’s wrong. Could you give me some information about these error messages?

1 Like

Hello,

I’m not sure about this because I haven’t experienced it either.
If you post on the topic on the forum, nvidia will respond.
When the article comes up, I will look at it with interest.

Thank you.

Hi,
It looks to be concurrent access to certain buffers(CUDA or NvBuffer). We have the implementation open source. Would need your help to check the default code and share us a patch and steps to replicate the issue. So that we check further.

And please provide your release version( $ head -1 /etc/nv_tegra_release )

Thank you. I will clean up my code and try to make a patch.

Hi,
The issue is very likely to be in step 2. The buffer is allocated through cudaMallocManaged. If it is accessed by CPU and GPU concurrnetly, it triggers the issue. MAybe you can use malloc and call NvBufferMemMap/NvBufferMemUnMap to get data pointer of decoded frames, to copy the data out.

Maybe, but my program doesn’t access the buffer concurrently.

The reason I use cudaMallocManaged is to avoid copying data, because copying will increase CPU usage. I just want to decode the frame to GPU buffer, and pass it to ffmpeg. And the upper layer application could use the GPU buffer to render directly.

Is there a way to do this? Thank you.

Hi,
The buffers allocated through cudaMallocManaged are synchronized in background, so it still takes certain CPU usage.

We have a package to enable hardware decoding in ffmpeg:
https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/multimedia.html#wwpID0E0JB0HA
Jetson TX2 and FFmpeg - Can't initialize nvrm channel - #5 by DaneLLL
The decoded frames are in NvBuffer and you can access it on CUDA through NvBuffer APIs. Please take a look and see if it can be applied to your usecase.

Thanks a lot.

I need to provide more details.

My program is just based on the ffmpeg package you mentioned.

In this package, the decoded frames are copyed from NvBuffer to ffmpeg buffer by ff_get_buffer and av_image_copy.

I want to avoid this copying, so I modified some code. In my implementation, I use cudaMallocManaged instead of malloc for ctx->bufptr_n. Application get the decoded frames by ffmpeg API and access the GPU buffer directly by CUDA API.

The process seems to be OK, but I often get the following message:

NVMAP_IOC_READ failed: Interrupted system call

I want to know how to fix this error or if there is any other way to solve my problem.

We are eager for a fully-hardware-accelerated offical ffmpeg package for Jetson, just like CUDA-accelerated ffmpeg package (NVDEC and NVENC) on PC.

It may contain the following functions:

  1. hardware-accelerated decoding
  2. hardware-accelerated encoding
  3. passing the decoded video frames by hardware memory

This is useful for developing high-performance applications on Jetson. :)