GPU latency


I have application where I need to perform dnn inference with lowest latency, however image samples come with delays (2s) that are causing GPU to lower the clocks. When new sample come its affected with some extra ms. I can solve the issue by setting fixed clocks in nvidia-smi but it requires admin rights. Is there any better solution?


Fixing the clocks with nvidia-smi is the canonical solution. Everything else is going to be hacky. Programmers cannot configure the dynamic clock and power state management.

For example, you could keep on sending the same image through your GPU kernel repeatedly until a new sample arrives, then send that repeatedly, etc. The problem with that: While it keeps the GPU clocks high by redundant activity, it also imposes a maximum latency equal to the kernel execution time.

Using the same principle, you could keep issuing null (empty) kernels to the GPU while waiting for a new image sample, incurring at most the latency for processing a null kernel, which is 3 to 5 microseconds on modern hardware.

Pondering it some more, I think the idea of constantly issuing null kernels to the GPUs needs modification. You do not want your image processing kernel queued up behind hundreds of null kernels already in the pipeline. To avoid this, issue a cudaDeviceSynchronize() after each null kernel launch to “flush the pipeline”. This adds another 20 microseconds or so of delay for a total of about 25 microseconds. I am quoting numbers from memory so best to actually run some experiments to determine the actual delay.