Is this possible to encode -> decode on the fly on the GPU memory?

Hello,

I’d like to know if is it possible to encode a video buffer using NVENC and then decode it with NVDEC while keeping the buffer on the GPU memory?

A similar question was asked a year ago here: https://devtalk.nvidia.com/default/topic/1020675/video-codec-sdk/nvenc-nvdec-keep-compressed-data-on-gpu-to-direct-decode-/

Is this yet possible with the SDK?

Thanks!

Can you describe your use case more precisely ?

  • pipe: frame -> encoder -> bitstream -> decoder -> frame Are you requesting optimization of bitstream passing ? Bitstream is usually only very small (from ~100 Bytes to ~100k Bytes per step). I think it is useless to make GPU memory passing for few bytes (and make new API for that) without valid use case. I can see only one minor use case - SNR/HEATMAP testing (encoder/decoder quality with different parameters testing). See http://on-demand-gtc.gputechconf.com/gtc-quicklink/bscTMOl (S8761/GTC2018).
  • pipe: bitstream -> decoder -> frame -> encoder -> bitstream This pipe "frame" (~10M Bytes per step) acceleration is more useful with use case like transcoding (size change, clipping, filters, embedding subtitles, protocol change ...). Frame can be CUDA buffer and transformation can be CUDA programs and this is supported. See "doc/Using_FFmpeg_with_NVIDIA_GPU_Hardware_Acceleration.pdf" and http://on-demand-gtc.gputechconf.com/gtc-quicklink/g1Zlw0A (S8601/GTC2018).

Thanks for your response @mcerveny.

I’ll give you more details about the idea: I want to build a datamoshing application for Windows that can glitch video streams on the fly. I need to have a video buffer A that is encoded on the GPU, then for each I-frame I’d like to have possibility to replace the frame with an image taken from other video buffer B, then continue to decode the stream back to an output video buffer. This will result in a glitched video effect.

Macs have support for hardware H264 encoding for years, so there is a Mac solution (http://kriss.cx/tom/2012/10/08/datamosh.html), it’s very performant, taking only about 30% load for a modern CPU do this operation for 30fps full HD buffer. I imagine the whole encoding/frame-replacing/decoding process takes place on GPU video buffers.

I have not yet started building my prototype, just wanted to confirm this project is possible to build. I’ll look into the APIs you suggested, they might be what I need.

Hi biegun.m,

The shipping SDKs don’t support NVENCODEAPI to dump encoded bitstream in video memory and NVDECODEAPI to read bitstream from video memory.

Thanks,
Ryan Park