Use NvidiaOpticalFlow in Opencv-python with multiprocessing

I am currently working with optical flow sdk in python. I found that when I just called one process, it could reach ~40fps. However, when I use 16 processes, it was even slightly slower than 40fps(all processes counted), so the multiprocessing was just not working. I thought it would be a stream-related problem because all my processes used the same stream, but I don’t know what to do.
Please help, thanks!