cudaDecodeGL 8.0 example hangs and runs slowly on GeForce 1080 under Linux

I’m working on a team that is developing a multi-camera video solution. We’re planning to use nVidia GPUs to decode multiple H.264 streams for rendering. We’ve been testing against our streams using the codeDecodeGL example in the CUDA samples. This works fine under Windows and also under Linux on GeForce GTX 970 cards, but is hanging on the -nointerop tests and running slowly/bursty when using the default parameters on videos that are longer than a few seconds.

This is with the latest drivers installed using the “Additional Drivers” tab under Linux and with CUDA 8.0 installed using ‘apt-get install cuda’ after setting up the Debian packages, under Ubuntu 16.04.

We’re hoping this is either a driver issue or a race condition that can easily be fixed. We’re currently implementing a parallel threaded decoder for multiple streams and will be testing it as well to make sure we can keep high speeds when doing parallel decode of many (152) streams.

I am seeing similar results with the cudaDecodeGL example: it runs bursty when run via ./cudaDecodeGL -displayvideo

Were you able to identify the cause of the burstiness? Did your multi-camera solution exhibit any similar issues?

We have not yet identified the cause. I’ve been spending some time paring down the demo program to its essentials and inserting it into our code. It is still working well on our 980M machine but I haven’t re-tested it with 1080 card under Linux since then.