multi-stream decoding performance ?

  • some test result for H.264 FullHD multi-stream decoding in Nvidia GTX 960M (with Nvidia Codec SDK 7.1.9)

result :

1 channel(s) total fps : 1030.91	
 2 channel(s) total fps : 334.55	 
 4 channel(s) total fps : 147.33	 
 8 channel(s) total fps : 68.90		 
16 channel(s) total fps : 33.08		 
32 channel(s) total fps : 16.39

there’s something strange in result between 1-channel and 2-channels
that the result fps is not linear.
I suppose cuda context switching performance is not good …

I estimated 500 fps in 2-channel decoding,
but it’s too low performance in multi-threaded.

I wished it can decode 32 channels for Full HD @30fps H.264 stream …
is it possible?

how can I get the goal?

Can u show me the main code? And also the utilization result the nvidia-smi watched.