I have a big image, and in order to speed up a compression/decompression test, I am cutting the image into two halves, compressing each half (since the h/w has two compressors on it!) and then trying to figure out how to decompress the halves… The 1080 board I’m using only has one decompressor. Do I try and feed both halves through the same decompressor, or do i create two decompressors, and they both use the underlying hardware, or is this test simply not possible?
I might be wrong but wasn’t cuvid, at least initially, implemented in CUDA rather than in hardware? If yes, then you could run any number of decompressing threads you wanted (actually up to the number of possible streams I guess). CUVIDDECODECREATEINFO structure has ulCreationFlags field where you can pass type of decoder. Try passing cudaVideoCreate_PreferCUDA instead of cudaVideoCreate_PreferCUVID and see what happens. My blind guess.
Maxest’s answer would have been correct until a few years ago until video decoders were using CUDA.
All NVIDIA GPUs after 2009 (Fermi onwards) have a hardware decoder (aka NVDEC) on the chip (silicon) and the CUVID APIs are not routed to the on-chip NVDEC hardware for decoding.
For the use-case nvidia9dk7k is interested in, here what should be done:
- As a general recommendation, do not try to manage the application based on how many encoder and decoder engines the GPU contains. Let the driver manage as it is much better suited to make any load-balancing decision.
- Splitting the image in the middle and sending it to encode on two different NVENC's is possible (take a look at enableConstrainedEncoding flat in the API.
- Inter-slice MVs and deblocking filter will be disabled across slices in such a mode, thereby reducing the quality of encoded frames (is that acceptable?).
- I do not fully understand the question about decoder (aka "decompressor) above. I thought the question was about encoding. Are you also trying to decode on the same hardware? In any case, you can create as many decoding sessions on the GPU as you wish (subject to memory and hardware limits), and the driver will manage the context switching as well as load-balancing etc. So it should not matter whether you feed single stream for two halves or two separate streams for two halves of your image
Hope this helps.