What kind of transfer rates can I realistically expect between the CPU and GPU?
Is there benefit to multiple threads sharing a GPU?
Normally a PCI Gen3 x16 link can deliver about 11-12 GB/s throughput in each direction. From a data transfer perspective, there is no throughput advantage in using multiple threads.
One should note that due to some fixed overhead, PCIe gen3 throughput typically does not reach 12 GB/sec until the size of the block of data transferred is about 15 MB; throughput will be less for smaller block sizes.