Grab framebuffers from Tesla M6


I would like to grab the GPU framebuffers from the Tesla M6, so that another driver can forward it to a different PCIe device. Is this possible?

Yes, this is totally possible. And, there are several ways that this can be achieved.

Are you interested to capture the entire desktop or simply the contents of the graphics window?

The complete desktop can be captured into system memory for transfer to another PCIe device by using the NVIDIA Capture SDK.

The desktop can also be captured using the Windows IDXGIOutput::GetDisplaySurfaceData() method.

If you are interested in only capturing the contents of the graphics window and the PCIe device is a video I/O card that supports GPU Direct for Video, you may be able to transfer the contents of the framebuffer to the video I/O card directly prior to scanout to the display.

If you can provide some additional details, we can provide more precise guidance.

To add to my post above, the IDXGIOutput::GetDisplaySurfaceData() method is limited to full-screen windows.

The Windows API you really want to use to capture the whole desktop into system memory is the IDXGIOutputDuplication interface.

Again, if you can provide additional information we can provide more specific recommendations.

I am working with the Tesla M6 using the vGPUs, where each VM gets assigned their own framebuffer in GPU memory. According to the Tesla documentation, the large BAR that is exposed “Can be used by PCIe devices to directly access the framebuffers”. Ideally I am looking for some way to program the GPU to push these frames using the internal DMA engine to an alternate custom PCIe device that also exposes a BAR. This would bypass the video stream and allow for processing and output by the external PCIe card.

The issue is that it seems there is no documentation on how to get the framebuffer addresses from the GPU (They appear to be set up via the Hypervisor, and then reported, so that may be a non-issue), however where is the synchronization done? If the PCIe device merely tries to read from the GPU BAR directly, there is no frame synchronization being done (i.e. a read DMA operation can occur in the middle of a frame).

I’m currently constructing the PCIe device that is connected to the root complex of the GPU, so ideally if GPU Direct For Video could somehow bypass the pinned host memory and just DMA directly into the PCIe device upon request by the PCIe device driver, that would be best.

Thanks for any assistance.

@VideoGuru, I have been looking more into the Capture SDK and it looks like it might fit my needs better. I am wondering how I can capture the vGPU profiles though? Will there need to be a capture program loaded into each individual Virtual Machine, or can it all be controlled by the host domain that manages all the Virtual Machines? The later is definitely more ideal.

The system is also running under Linux.

Yes, the CaptureSDK runs in user space, to you will need to run the capture program on each individual VM. It cannot be controlled by the host domain that manages all the Virtual Machines.

Thanks for the reply! I honestly don’t know how Nvidia expects us to access the framebuffers over PCIe then, as it is stated in the Tesla M6 documentation. If it is just offset from the exposed BAR, that’s fine, but surely there is some way for the GPU’s DMA engine to push frames to a remote PCIe device. It would seem like a major oversight if this was not the case.

Thanks again.