Filter complex through CUDA hevc_cuvid with FFMPEG input 4k hevc, help needed

I am trying to applay complex filter through CUDA hevc_cuvid with GPU nvidia GeForce GTX 1080 with ffmpeg, input is 10bit 4k hevc video mkv.
I installed latest nvidia drivers, CUDa nad compiled lates ffmpeg. I tried a lot of combinations and codes but no luck.

ffmpeg -hwaccel_device 0 -hwaccel cuvid -c:v hevc_cuvid -i /home/select/2160p.UHD.mkv -i /home//select/4k_UHD_logo.png -filter_complex "[0:v]scale_npp=1920:1080,hwdownload,format=nv12 [base]; [base][1:v] overlay=main_w-overlay_w-15:main_h-overlay_h-15 [marked]" -map "[marked]" -t 00:01:00.000  -gpu 0 -c:v hevc_nvenc -preset slow -rc cbr_hq -b:v 5000k -maxrate 7000k -bufsize 1000k -acodec aac -ac 2 -dts_delta_threshold 1000 -ab 128k  /home/select/filtered_2_1min.mp4

Ffmpeg returns this error :

ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
    Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1608 [SAR 1:1 DAR 160:67], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
Stream mapping:
  Stream #0:0 (hevc_cuvid) -> scale_npp
  Stream #1:0 (png) -> overlay:overlay
  overlay -> Stream #0:0 (hevc_nvenc)
Press [q] to stop, [?] for help
[Parsed_scale_npp_0 @ 0x561d514dce40] Unsupported input format: p010le
[Parsed_scale_npp_0 @ 0x561d514dce40] Failed to configure output pad on Parsed_scale_npp_0
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #1:0

and with other ffmpeg code, but also no luck :

ffmpeg -hwaccel_device 0 -hwaccel cuvid -c:v hevc_cuvid -i /home/select/2160p.UHD.mkv -i /home//select/4k_UHD_logo.png -filter_complex "[0:v]scale_npp=hwdownload,format=nv12 [base]; [base][1:v] overlay=main_w-overlay_w-15:main_h-overlay_h-15[v]; [v]hwupload_cuda[v]" -map "[v]" -t 00:01:00.000  -gpu 0 -c:v hevc_nvenc -preset slow -rc cbr_hq -b:v 5000k -maxrate 7000k -bufsize 1000k -acodec aac -ac 2 -dts_delta_threshold 1000 -ab 128k  /home/select/filtered_1min.mp4

Ffmpeg returs this error :

Stream mapping:
  Stream #0:0 (hevc_cuvid) -> scale_npp
  Stream #1:0 (png) -> overlay:overlay
  hwupload_cuda -> Stream #0:0 (hevc_nvenc)
Press [q] to stop, [?] for help
Impossible to convert between the formats supported by the filter 'Parsed_scale_npp_0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #1:0
Conversion failed!

Without CUDA hevc_cuvid ffmpeg do the job but complex filter use alot of system CPU ( it takes arounfd 40 % of Z800 2x6core cpus !!!) and that is not goal here, I want to use full GPU capabilities like with CUDA h264_cuvid

/root/ffmpeg-build-static-binaries/bin/ffmpeg -hwaccel_device 0 -hwaccel cuvid -i /home/select/2160p.UHD.mkv -i /home//select/4k_UHD_logo.png  -t 00:01:00.000 -filter_complex "[0:v][1:v] overlay=main_w-overlay_w-15:main_h-overlay_h-15"  -gpu 0 -pix_fmt yuv420p -c:v hevc_nvenc -preset slow -rc cbr_hq -b:v 3000k -map 0:a:0 -acodec aac -ac 2 -dts_delta_threshold 1000 -ab 128k  /home/select/filtered_1min.mp4
ffmpeg version N-93064-ged20fbc Copyright (c) 2000-2019 the FFmpeg developers

THe same code ffmpeg but input as h264 video with h264_cuvid is OK.

I’d be incredibly grateful for any help with this problem.

I’m not sure, but it looks like you’re inputting 10bit video and then want to download it as 8bit video (NV12 instead of P010) so that it would have to be implicitly converted which doesn’t seem to work.

Thank you generix for reply,
When I try this code - i changed format=p010le (download 10bit format, it looks like complex filter is indid processed through GPU cuda-system CPU is less used , but final output video is meesed up - I suspect it is because my input mkv video is in HDR.

/root/ffmpeg-build-static-binaries/bin/ffmpeg -hwaccel_device 0 -hwaccel cuvid -c:v hevc_cuvid -i /home/select/HDR_4k_10bit_5min.mkv -i /home//select/4k_UHD_logo.png -filter_complex "[0:v]hwdownload,format=p010le [base]; [base][1:v] overlay=main_w-overlay_w-15:main_h-overlay_h-15[v]; [v]hwupload_cuda[v]" -map "[v]" -t 00:01:00.000  -gpu 0 -c:v hevc_nvenc -preset slow -rc cbr_hq -b:v 5000k -maxrate 7000k -bufsize 1000k -acodec aac -ac 2 -dts_delta_threshold 1000 -ab 128k  /home/select/HDR_overlay_1min.mp4

I dont know what is the reason why my output video is messed up , I noticed that my output video is 8bit ( I want to my output video be 10bit also like input)
Please if someone know how to transcode HDR input 10 bit video eather to HDR or SDR 10bit - as long as out is normal video without interuptions like I have now.

My input video format is HDR 10bit mkv hevc video :
[STREAM]
index=0
codec_name=hevc
codec_long_name=H.265 / HEVC (High Efficiency Video Coding)
profile=Main 10
codec_type=video
codec_time_base=1001/24000
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
width=3840
height=1600
coded_width=3840
coded_height=1600
has_b_frames=2
sample_aspect_ratio=1:1
display_aspect_ratio=12:5
pix_fmt=yuv420p10le
level=153
color_range=tv
color_space=bt2020nc
color_transfer=smpte2084
color_primaries=bt2020
chroma_location=unspecified
field_order=unknown

my output overley video details format :
root@ronald:~# /root/ffmpeg-build-static-binaries/bin/ffprobe -v error -show_format -show_streams -i /home/select/HDR_overlay_1min.mp4
[STREAM]
index=0
codec_name=hevc
codec_long_name=H.265 / HEVC (High Efficiency Video Coding)
profile=Main
codec_type=video
codec_time_base=1001/24000
codec_tag_string=hev1
codec_tag=0x31766568
width=3840
height=1600
coded_width=3840
coded_height=1600
has_b_frames=0
sample_aspect_ratio=1:1
display_aspect_ratio=12:5
pix_fmt=yuv420p
level=150
color_range=tv
color_space=unknown
color_transfer=unknown
color_primaries=unknown
chroma_location=unspecified
field_order=progressive

You might also have to specify -pix_fmt p10le after -c:v hevc_nvenc so that the encoder knows it has to produce 10bit output.
Edit: also, -profile:v main10 could help

thanks for reply , but when i specify 10bit format ffmpeg returns error :

/root/ffmpeg-build-static-binaries/bin/ffmpeg -hwaccel_device 0 -hwaccel cuvid -c:v hevc_cuvid -i /home/select/HDR_4k_10bit_5min.mkv -i /home//select/4k_UHD_logo.png -filter_complex "[0:v]hwdownload,format=p010le [base]; [base][1:v] overlay=main_w-overlay_w-15:main_h-overlay_h-15[v]; [v]hwupload_cuda[v]" -map "[v]" -t 00:01:00.000  -gpu 0 -c:v hevc_nvenc -pix_fmt p010le -preset slow -rc cbr_hq -b:v 5000k -maxrate 7000k -bufsize 1000k -acodec aac -ac 2 -dts_delta_threshold 1000 -ab 128k  /home/select/HDR_overlay_10bit_1min.mp4

Input #0, matroska,webm, from ‘/home/select/HDR_4k_10bit_5min.mkv’:

Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1600 [SAR 1:1 DAR 12:5], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)

Stream #1:0: Video: png, rgba(pc), 414x279 [SAR 4685:4685 DAR 46:31], 25 tbr, 25 tbn, 25 tbc
File ‘/home/select/HDR_overlay_10bit_1min.mp4’ already exists. Overwrite ? [y/N] y
Stream mapping:
Stream #0:0 (hevc_cuvid) -> hwdownload
Stream #1:0 (png) -> overlay:overlay
hwupload_cuda -> Stream #0:0 (hevc_nvenc)
Press [q] to stop, [?] for help
Impossible to convert between the formats supported by the filter ‘Parsed_hwupload_cuda_3’ and the filter ‘auto_scaler_2’
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #1:0

I found code that works without error , i simple remove hwupload_cuda .

/root/ffmpeg-build-static-binaries/bin/ffmpeg -hwaccel_device 0 -hwaccel cuvid -c:v hevc_cuvid -i /home/select/HDR_4k_10bit_5min.mkv -i /home//select/4k_UHD_logo.png -filter_complex "[0:v]hwdownload,format=p010le [base]; [base][1:v] overlay=main_w-overlay_w-15:main_h-overlay_h-15[v]" -map "[v]" -t 00:01:00.000  -gpu 0 -c:v hevc_nvenc -pix_fmt p010le -preset slow -rc cbr_hq -b:v 5000k -maxrate 7000k -bufsize 1000k -acodec aac -ac 2 -dts_delta_threshold 1000 -ab 128k  /home/select/HDR_overlay_10bit_1min.mp4

It works without errors BUT video is still messed up , alot of interruptions in video , i think this is because of not corected conversion of HDR input video through CUDA .
Please if someone know how to transcode HDR input 10 bit video eather to HDR or SDR 10bit - as long as out is normal video without interuptions like I have now.

thanks

Why do people keep using cuvid? Use nvdec. cuvid relies on the nvidia parser which isn’t featureful enough to correctly parse 10bit hdr streams.

ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i <file> <etc>