Well, I wonder if it is feasible to access the buffer directly from the GPU, and eliminate this extra copy. I mean, the pixels that are displayed on the screen should be somewhere in the GPU memory, right? – isn’t is possible to access them somehow or are they memory protected?
This is just a guess but I would suspect that it is a combination of the two. Several experiments have revealed that GPUs have TLBs, which suggests that there is hardware support for detecting out of bounds memory accesses. If you have a TLB then something has to program it, on CPUs is this the OS, on GPUs it is probably the driver.
I don’t think that that is it since there are screen capture solutions (search for bitblt). They are just not too efficient and do have some limitations.
The problem is that the GPU has virtual memory of some sort (allocating one buffer each in two threads returns the same pointer to both threads pointing to different locations in memory, 0x11000 or something like that)
The problem is that the memory is in different contexts, and you want to protect one program from another (malicious programs and program bugs).