RTX-A6000 : how to use multiple NVDEC/NVENC sessons on multiple GPU concurrently?

I’m developing 360VR stitching program.
It needs to decode 6~8 mp4 files simultaneously, stitch into one and encode to a file…
Environment : Windows 10, Visual Studio 2015
I wrote a program use multiple GPU and NVLINK to speed up decoding, stitching and encoding.

It works fine on a system with RTX-8000 x2, NVLINK, Video Codec SDK 9 and CUDA 10.1 …
It works fine on a system with GV100 x2, NVLINK.
But, it crash on a system with two RTX A6000, NVLINK.

With two RTX A6000, I can create multiple NVDEC session on GPU 0.
And I create single NVDEC session on GPU 1, it works fine…
But When I create multiple NVDEC session on GPU 1, it return CUDA_ERROR_OUT_OF_MEMORY.

When I create a NVDEC session on GPU 0, it works fine.
When I create a NVDEC session on GPU 1, it crash… no return code… crash inside NVENCODEAPI64.DLL

To Test, I modified AppDec example in Video Codec SDK 9.0.18.

AppDec.cpp (15.3 KB)

I tested various environment with AppDec example…
You can see various setting in the code comments.

1 process with 1 NVDEC session on GPU 0 works fine.
2 process with 1 NVDEC session on GPU 0 works fine.
1 process with 1 NVDEC session on GPU 1 works fine.
2 process with 1 NVDEC session on GPU 1 works fine.

1 process, 2 thread, each thread has 1 NVDEC session on gpu 0 works fine.
1 process, 2 thread, each thread has 1 NVDEC session on gpu 1 failed.
1 process, 1 thread, 2 NVDEC session on gpu 0 works fine.
1 process, 1 thread, 2 NVDEC session on gpu 1 failed.

and I tested AppDec sample with Video Codec SDK 11.0.10, CUDA 11.2, visual studio 2019.
I got same result…

To do my job done,
I need to run multiple NVDEC/NVENC session on a single process…

Any sugguestion ?
How can I do that ??
Is it possible ? or not ?
Is it bad code or bad driver ??
Is there any sample for multiple NVDEC/NVDEC session on Multiple GPU ??