At 30 Hz, each frame will take 33 milliseconds to capture, before it can be processed. Then it needs to be displayed, which will take between 0 and 33 milliseconds of queuing (depending on where scan-out is on the monitor when you’re done) plus 33 milliseconds to actually scan it out. Add 42 ms of display latency of your display, and the best possible case is 33 + 0 + 33 + 42 milliseconds, and the worst case is 33 + 33 + 33 + 42 milliseconds. So, the best achievable rate, assuming processing takes no time, would be between 109 and 141 milliseconds. You seem to see a one frame additional latency, because your rate is between 130 and 170 ms. That could be added by processing latency, or simply by using a triple-buffered output pipeline instead of double-buffered.
The numbers you report don’t seem to be concerning at all, they seem to be spot on for what’s expected at 30 Hz with the various involved subsystems.
To get lower latency, you need to up your hardware game significantly. You’d want to genlock your display to your camera. Additionally, you’d want to make sure you use as fast buffering as possible (direct mapped or double-buffered presentation.) Additionally, you’d want a display with close to zero display latency. Additionally, you’d want a very high frame rate. Even if you can only get 60 Hz for the camera capture, you might be able to get a display driven at 120 Hz to cut some of the latency down; ideally you’d want a 90 Hz or 120 Hz camera as well.