I’ve heard that GPUs can run threads off of system RAM assuming the dedicated memory is already being occupied. Can I do this in CUDA?
Look into the UVA, but it slow, since the memory is access over the pci
See pages 27-29 of the CUDA C Programming Guide.
It works really well. With careful coding, a kernel can saturate the PCIe bus when reading from write-combined memory.