CUDA_DEVICE_MAX_CONNECTIONS is an environment variable that (roughly) defines the number of hardware queues that CUDA streams will be able to utilize or map into. When you have more streams than queues, the streams will alias onto the queues.
I’m not really sure how to interpret that statement. To a first order approximation, no PCIE traffic would mean that the GPUs cannot be utilized. All CUDA activity in current era begin with transfer of data from host to device via PCIE. So I don’t really suppose that you mean no PCIE traffic globally, across the entire application execution.
Modifying the variable does not seem to me like it would affect PCIE communications, however setting it to 1 may have some unusual side-effects in a system with both PCIE and NVLink connections; I have not experimented with it to that level. For example, if a hardware copy queue is associated with a NVLink connection, and you reduced the number of HW queues down to just that one, I’m not sure what would happen.