I’m working on an gstreamer plugin that launch 3 algorithms on each frame received. Those algorithms are executed on different core using POSIX threads and only one of those launch kernel.
When I launch that plugin whith a fakesink it works fine. But when I add an h264 encoder in the GStreamer pipeline the performances of the algorithm using CUDA are greatly impacted (Instead of requiring around 15ms the execution time randomly varies between 15 and 30ms).
I profiled the code and I saw an unexpected lattency between the kernel launches and the actual execution of those kernels. Those lattency can be up to 6ms and seems to happend randomly on certain kernel launch. When those latencies appear the profiler show that the GPU is iddling.
Moreover, when I deactivate the algoritms that doesn’t require CUDA computation, I don’t get those latencies.
So if anyone has ideas on what the problem could be, or of tools that I can use to get more information.
I’m executing the code on a TK1 using CUDA-6.5 and GStreamer-1.0