Quadro k4000: Host to Device and Device to Host low Bandwidth

Quoting my last post on the Off-topic Forum:

I am using a Nvidia Quadro k4000 board and using Cuda 6.0 on a Xubuntu 14.04 LTS operating system.

After running the bandwidthTest from within the Cuda samples in order to obtain the transfer Bandwith of the GPU, the results are:
Host to Device Bandwidth(MB/s): 750.5
Device to Host Bandwidth(MB/s): 818.1
Device to Device Bandwidth (MB/s): 91741.1

The problem is that the Host to Device/Device to Host bandwidths seem to be too low and my Cuda program is taking too long when it comes to transferring data to the GPU.
I’ve compared the times to those of a non-quadro board (from a jetson tk1) and the Host to Device/Device to Host bandwidths are around 6000 MB/s.

Is this a known problem of these board? (I couldn’t find information on this)
Is there a way to enhance the bandwidth?

What kind of system is this? The most likely cause of such low PCIe transfers is that the GPU is plugged into the “wrong” PCIe slot. You would want to use a PCIe slot configured for x16 operation operating at least at PCIe gen2. Consult your system’s documentation and/or system BIOS setup.

That was exactly the problem. The motherboard has two PCie slots and the GPU was on the wrong one.
Thanks a lot for your answer!