This requires allocating non-pageable pinned system memory. The GPU can DMA from this memory. Thus, if you can create your data in this memory, you only need to DMA to the GPU. If you don’t, the driver has to memcpy from your array to its pinned memory (possibly in chunks), and then DMA. Therefore most transfers to the GPU are limited by CPU and chipset performance in addition to PCI-e performance.
HOWEVER, If you allocate too much pinned memory, you can bring your system to its knees. Therefore the graphics APIs don’t expose this sort of allocation for graphics data structures.
When you use pinned memory, you do so at your own (and your users’ own) risk. On fixed platforms (embedded systems, clusters, etc.), I expect pinned memory to be very useful because you can experiment to figure out how much is safe to use.
On desktop applications, you should do extensive testing to figure out what works on a variety of PC configurations.