I believe you will use the OpenGL or Vulkan APIs to render the overlay, with your video being one of the available resources for the graphics.
I e, the most flexible pipeline (allowing you to use arbitrary OpenGL rendering) would be:
capture → raw data → texture buffer object → OpenGL rendering → framebuffer object → encoding → network
OpenGL and Vulkan can of course do arbitrarily fancy graphics, depending on the skill and artist resources availble to design and implement the OSD.
You can also use this for video effects, anything from simple wipes and fades to advanced 3D effects. If you’re building a UI that’s supposed to feel polished for end users, then this may be useful, too.