latency benchmarks

I’m trying to figure out whether GPU-direct-video would be suitable for video as feedback sensor in control loops. Signal latency is the killer requirement. I’m lacking some benchmarks as justification for investment in a project involving high speed CoaXPress camera’s, grabbers etc.

As a reference I have a 256x256x8bit image coming in at 5KHz frame rate via CoaXpress (Kaya/Euresys/Matrox/undecided as yet).

  1. Frame grabber sends data to CPU system memory. (22usec @ 3GB/sec CXP6 x 4)
  2. Frame grabber notifies the CPU of completed frame transferred using an interrupt. The time for this is reported to me as 12us on a typical i7, and includes all the system delays (FIFO , PCIe , DDR , interrupt)
  3. The CPU instructs the GPU to fetch the frame from system memory. Time for this depends on the GPU model.
  4. Transport to GPU memory (4usec @ 15GB/sec PCIe3.0x16)
  5. Is there a need to reprogram this cycle per frame?

Thus the total latency will be 22us + 12us + GPU latency + 4usec + reprogramming?

Anybody having having experience and these numbers?


I can help out with this.

No, there should be no need to reprogram this per frame. The CPU should simply receive the interrupt each frame and trigger the transfer of the video frame to the GPU. And, the overall time should be constant.

GPU Direct for Video will permit the GPU to DMA the frame data directly into GPU memory. I assume you would like the data in a CUDA buffer or array. Euresys and Matrox have implemented GPU Direct for Video today.

Please let me know if I can answer any other questions for you.


Hi Tom,
Thank you, this really helps.

Is GPU interrupt latency (how much?) the only missing number in my pipe line overview or did I miss something relevant (ie microsecond latencies) in the concept? Some “specialists” warn me for excess overhead since the frame size (256x256xbit) is small.

If significant (microseconds), then what is the interrupt lantency of GPU cores in the Pascal/Volta line? Are all micro engines created equal wrt latency?