I am currently profiling an application which I run on a TK1 (L4T R21.5, Ubuntu). The application runs multiple kernels. The CPU is synchronized at the end of all kernels.
I have the TK1 set to max CPU (all CPU cores on, forced at max frequency). I have also set the GPU to force to max frequency, the problem remains.
I can see a 2-3ms gap between some of the kernels executed, see attached picture. The CPU is well ahead and waiting to synchronize.
Do you any idea what the GPU is doing during that time? Any suggestion to debug this? I couldn’t find any reason in the kernels.