Implementing NVIDIA Reflex latency markers into a custom engine


I am currently in the process of implementing NVIDIA Reflex into a custom game engine. So far, the results of the initial implementation are pretty good; users have reported a noticeable latency reduction.

I am now trying to implement the latency markers as well in an attempt to further improve performance.
In the documentation, there is a notice regarding the frame ID that the programmer needs to set in the ‘NV_LATENCY_MARKER_PARAMS’ structure, prior to making the call to ‘NvAPI_D3D_SetLatencyMarker(…)’.

I am facing the same challenge mentioned in the notice, there is no shared tick count/sequence number between the simulation thread and the render thread; the simulation thread typically runs at 20 ticks per second, while the renderer runs at anywhere between 1 and 350 FPS.

The documentation suggests to enqueue the frame number from the simulation thread to the render thread, and use this sequence number for frame numbering.

The question I got now: since the simulation typically runs much slower than the renderer, the renderer would most likely end up using a frame ID that has already been submitted through the latency markers. What should I do in the event of the renderer running one or more frames ahead of the simulation thread, and therefore ‘reusing’ an already submitted frame ID?

Best regards.

Hi there @k.mazidjatari and welcome to the NVIDIA developer forums!

Thank you for your interest in Reflex and great to hear that you got it working so well!

I forwarded your question to the Reflex team, let’s see if they can give you some tips.


Hi again,

I received some feedback internally which I wanted to share:

Currently the markers only support a simple model. There needs to be exactly 1 simulation start/end marker, 1 render submit start/end marker, 1 present start/end marker per frame ID.

It sounds like for this engine, simulation happens at a fixed interval, while render happens as fast as possible. That’s fine.

We need to understand if the simulation thread is decoupled from the render thread? Or if they are synchronized, but rather simulation only runs on some frame and not on other?

Note that “simulation” doesn’t necessary mean (physics) simulation. Some engines call it “update” instead. With Reflex’s focus on latency, we mostly care about the input sampling (which usually happens at the start of “simulation”/“update”.

In other words, does the engine:

  1. At 20Hz, do simulation. And the main loop runs on a separate thread: take the last done simulation, sample input, apply input to camera, submit render to GPU, etc. … repeat for next frame.
  2. Main loop: Sample input, do simulation (if 50ms has passed since last simulation), submit render to GPU, etc. … repeat for next frame.

(1) vs (2) would have different ways to setup the markers

I hope that is understandable?

If you can clarify a bit how your engine works and if it falls into either of the above categories, I will pass it on.