I’m developing 360VR stitching program.
It needs to decode 6~8 mp4 files simultaneously, stitch into one and encode to a file…
Environment : Windows 10, Visual Studio 2015
I wrote a program use multiple GPU and NVLINK to speed up decoding, stitching and encoding.
It works fine on a system with RTX-8000 x2, NVLINK, Video Codec SDK 9 and CUDA 10.1 …
It works fine on a system with GV100 x2, NVLINK.
But, it crash on a system with two RTX A6000, NVLINK.
With two RTX A6000, I can create multiple NVDEC session on GPU 0.
And I create single NVDEC session on GPU 1, it works fine…
But When I create multiple NVDEC session on GPU 1, it return CUDA_ERROR_OUT_OF_MEMORY.
When I create a NVDEC session on GPU 0, it works fine.
When I create a NVDEC session on GPU 1, it crash… no return code… crash inside NVENCODEAPI64.DLL
To Test, I modified AppDec example in Video Codec SDK 9.0.18.
AppDec.cpp (15.3 KB)
I tested various environment with AppDec example…
You can see various setting in the code comments.
1 process with 1 NVDEC session on GPU 0 works fine.
2 process with 1 NVDEC session on GPU 0 works fine.
1 process with 1 NVDEC session on GPU 1 works fine.
2 process with 1 NVDEC session on GPU 1 works fine.
1 process, 2 thread, each thread has 1 NVDEC session on gpu 0 works fine.
1 process, 2 thread, each thread has 1 NVDEC session on gpu 1 failed.
1 process, 1 thread, 2 NVDEC session on gpu 0 works fine.
1 process, 1 thread, 2 NVDEC session on gpu 1 failed.
and I tested AppDec sample with Video Codec SDK 11.0.10, CUDA 11.2, visual studio 2019.
I got same result…
To do my job done,
I need to run multiple NVDEC/NVENC session on a single process…
Any sugguestion ?
How can I do that ??
Is it possible ? or not ?
Is it bad code or bad driver ??
Is there any sample for multiple NVDEC/NVDEC session on Multiple GPU ??