CUDA_DEVICE_MAX_CONNECTIONS env variable not in the CUDA programming guide anymore

Hello,

In order to achieve more concurrent stream parallelism I’m using the env variable CUDA_DEVICE_MAX_CONNECTIONS, which seems to be working as of CUDA 12.1. However I could find traces of this variable being defined in the Cuda Toolkit 5.5 but not in the latest one. Is there a reason to that?

Also while we are at it, it seems to me that modifying this number is not without consequences on the general performance. The 5.5 toolkit says :

Sets the number of compute and copy engine concurrent connections
(work queues) from the host to each device of compute capability 3.5 and above.

Would it be possible to know a little more about how this variable affects the GPU’s behaviour so that we may tune this variable best given the GPU?

Thanks

Little update to my question, the variable is actually defined in the Cuda MPS documentation Multi-Process Service :: GPU Deployment and Management Documentation.

It is still pretty tough to understand why the driver would decide that the limitation is up to 32 compute queue and not 128 on some other computations

see here.

In addition to the description there, you can find mention of it by greg in these forums. here is one example. There are others.

My bad, it was indeed in the latest toolkit (and at the most obvious place). Plus the topic you pointed is very interesting. It should be enough for me to move on with my experimentations.

Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.