CUDA_DEVICE_MAX_CONNECTIONS env variable not in the CUDA programming guide anymore

Kawa · June 20, 2023, 7:16am

Hello,

In order to achieve more concurrent stream parallelism I’m using the env variable CUDA_DEVICE_MAX_CONNECTIONS, which seems to be working as of CUDA 12.1. However I could find traces of this variable being defined in the Cuda Toolkit 5.5 but not in the latest one. Is there a reason to that?

Also while we are at it, it seems to me that modifying this number is not without consequences on the general performance. The 5.5 toolkit says :

Sets the number of compute and copy engine concurrent connections
(work queues) from the host to each device of compute capability 3.5 and above.

Would it be possible to know a little more about how this variable affects the GPU’s behaviour so that we may tune this variable best given the GPU?

Thanks

Kawa · June 20, 2023, 1:28pm

Little update to my question, the variable is actually defined in the Cuda MPS documentation Multi-Process Service :: GPU Deployment and Management Documentation.

It is still pretty tough to understand why the driver would decide that the limitation is up to 32 compute queue and not 128 on some other computations

Robert_Crovella · June 20, 2023, 1:29pm

see here.

In addition to the description there, you can find mention of it by greg in these forums. here is one example. There are others.

Kawa · June 20, 2023, 1:41pm

My bad, it was indeed in the latest toolkit (and at the most obvious place). Plus the topic you pointed is very interesting. It should be enough for me to move on with my experimentations.

Thanks!

system · July 4, 2023, 1:42pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question on Stream, Connection and Performance CUDA Programming and Performance hw , cuda	6	1625	February 23, 2024
How many streams? Maximum number of streams CUDA Programming and Performance	20	9884	January 7, 2025
Processing 8 stream at the same time not 32 stream using Hyper Q on K20 CUDA Programming and Performance	3	1463	July 25, 2013
What is Max number of MPI processes that can be used with Multi-Process Service? CUDA Programming and Performance	1	1748	March 31, 2015
Concurrent kernel and events on Kepler CUDA Programming and Performance	16	11142	January 29, 2014
Fewer concurrent kernels with Hardware Accelerated GPU Scheduling (HAGS) CUDA Programming and Performance	5	1291	September 15, 2023
CUDA_DEVICE_MAX_CONNECTIONS and PCI-E traffic CUDA Programming and Performance	7	4811	October 19, 2023
MPS contexts limitation on Turing arch CUDA Programming and Performance	0	559	August 21, 2019
Concurrent kernel execution CUDA Programming and Performance	0	501	February 23, 2013
Multi-Process Service : MPS client limit issue CUDA Programming and Performance	1	1056	February 20, 2017

CUDA_DEVICE_MAX_CONNECTIONS env variable not in the CUDA programming guide anymore

Related topics