4 GTX1060 + PEX8749 PCI Express Switch bandwidth question

I have an Intel Broadwell platform with 1 x16 PCIe interface which connected to the PEX8749 upstream port, then divided the PEX8749 downstream port into four x8 lane pcie port, connect to Nvidia GTX1060 individually.

I have a question about the memory copy bandwidth:
When I use OpenMP to create 4 CPU threads to manage 4 GTX1060, In each thread I copy data between host and device, Can I utilize the full x16 pcie bandwidth? Does it only can achivie x8 pcie speed, which means only one GTX1060 can use DMA engine to copy data to host memory dimms at a same time?