Decode/encode with dual M10 in a dell R740 a TX/RX are 3 low speed with 4 GPU on only one M10

Hi,

We are a problem with two tesla M10 for decoding/encoding video with a 4 GPU and more in DELL R740 server with two xeon.
A debit of TX/RX is low speed and a multiple ffmpeg process decode/encode with a 200 frames by seconds if use 2 GPU by card. If i use 4 GPU by card , ffmpeg decode/encode with a 100 frames by seconds.
If i use 4 GPU on a one card, ffmpeg decode/encode with more a 300 frames.

Tested with a drivers 390 and 418 on debian linux. IO and CPU are extrem low usage.
Is it a PCIe bus limited a debit ?

Using with 4 GPU on only one card:

ffmpeg output:
frame=51113 fps=301 q=21.0 Lsize= 645986kB time=00:34:04.48 bitrate=2588.4kbits/s speed= 12x
frame=50773 fps=299 q=22.0 Lsize= 641418kB time=00:33:50.88 bitrate=2587.3kbits/s speed= 12x
frame=51005 fps=300 q=21.0 Lsize= 644596kB time=00:34:00.16 bitrate=2588.3kbits/s speed= 12x
frame=50661 fps=299 q=21.0 Lsize= 639951kB time=00:33:46.40 bitrate=2587.1kbits/s speed=11.9x

nvidia-smi -q |grep -A 11 PCI
PCI
Bus : 0x3D
Device : 0x00
Domain : 0x0000
Device Id : 0x13BD10DE
Bus Id : 00000000:3D:00.0
Sub System Id : 0x116010DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 18000 KB/s
Rx Throughput : 13000 KB/s

Using 4 GPU on 2 Card ( 2GPU x 2 )

ffmpeg output:
frame=11398 fps=226 q=21.0 Lsize= 131775kB time=00:07:35.88 bitrate=2368.0kbits/s speed=9.04x
frame=11396 fps=226 q=21.0 Lsize= 131755kB time=00:07:35.80 bitrate=2368.0kbits/s speed=9.04x
frame=11376 fps=226 q=20.0 Lsize= 131456kB time=00:07:35.00 bitrate=2366.8kbits/s speed=9.02x
frame=11380 fps=226 q=20.0 Lsize= 131574kB time=00:07:35.16 bitrate=2368.1kbits/s speed=9.03x

nvidia-smi -q |grep -A 11 PCI
PCI
Bus : 0x3D
Device : 0x00
Domain : 0x0000
Device Id : 0x13BD10DE
Bus Id : 00000000:3D:00.0
Sub System Id : 0x116010DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 12000 KB/s
Rx Throughput : 9000 KB/s