How to convert pix format in hardware when transcoding 10 bit HEVC to h264?

Using the latest ffmpeg(3.4.2) compiled with the latest CUDA (9.1) I am unable to encode 10 bit h264 (see below output). How do I convert pix format in hardware, since apparently scale_cuda does NOT support pixel format changes even though it can take that as an argument. If I add “-pix_fmt yuv420p” it works, but my cpu utilization skyrockets (ffmpeg uses 100% out of 800%), leading me to believe the reformatting is in software:

ffmpeg -y -v verbose -c:v hevc_cuvid -i ~/bbb_hevc_3840x1920.mp4 -c:v h264_nvenc output.mp4

[hevc_cuvid @ 0x3a49fc0] Formats: Original: nv12 | HW: p010le | SW: p010le
[graph 0 input from stream 0:0 @ 0x3a66460] w:3840 h:2160 pixfmt:p010le tb:1/15360 fr:60/1 sar:1/1 sws_param:flags=2
[h264_nvenc @ 0x3a9c380] Loaded Nvenc version 8.1
[h264_nvenc @ 0x3a9c380] Nvenc initialized successfully
[h264_nvenc @ 0x3a9c380] 1 CUDA capable devices found
[h264_nvenc @ 0x3a9c380] [ GPU #0 - < GeForce GTX 1050 Ti > has Compute SM 6.1 ]
[h264_nvenc @ 0x3a9c380] 10 bit encode not supported
[h264_nvenc @ 0x3a9c380] No NVENC capable devices found
[h264_nvenc @ 0x3a9c380] Nvenc unloaded
Error initializing output stream 0:0 – Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

when I modify the ffmpeg command line to include pixel reformatting, it takes 3 minutes to convert a 60 second hevc video:

ffmpeg -y -v verbose -c:v hevc_cuvid -i ~/bbb_hevc_3840x1920.mp4 -c:v h264_nvenc -pix_fmt yuv420p -vf scale=1280:720 output.mp4

frame= 3604 fps= 21 q=30.0 Lsize= 15326kB time=00:01:00.05 bitrate=2090.8kbits/s dup=4 drop=0 speed=0.346x
video:15311kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.101102%
Input file #0 (/home/triveni/bbb_hevc_3840x1920.mp4):
Input stream #0:0 (video): 3600 packets read (59481298 bytes); 3600 frames decoded;
Total: 3600 packets (59481298 bytes) demuxed
Output file #0 (output.mp4):
Output stream #0:0 (video): 3604 frames encoded; 3604 packets muxed (15678243 bytes);
Total: 3604 packets (15678243 bytes) muxed
[h264_nvenc @ 0x2881280] Nvenc unloaded
started at Fri Mar 16 17:20:10 EDT 2018
ended at Fri Mar 16 17:23:04 EDT 2018