CUDA in a Lib, global symbols? How do symbols behave in a Lib that uses CUDA?

Hi,

if i write a Lib that contains CUDA code and i use a global symbol in it:

device __constant struct ABC;

And in the host part i use:

cudaMemcpyToSymbol(“ABC”, &host_ABC, sizeof(struct ABC), 0, cudaMemcpyHostToDevice);

If that Lib has several users (e.g. a multi-user system, or several processes using that Lib), could it happen that they all work on the same memory location?

Or is it made sure that each use / instantiation of the Lib has its own location in the device? I guess it must be like this, i just want to make sure, i didn’t find any documentation about it.

Thanks for any hints,
Torsten.

Hi,

if i write a Lib that contains CUDA code and i use a global symbol in it:

device __constant struct ABC;

And in the host part i use:

cudaMemcpyToSymbol(“ABC”, &host_ABC, sizeof(struct ABC), 0, cudaMemcpyHostToDevice);

If that Lib has several users (e.g. a multi-user system, or several processes using that Lib), could it happen that they all work on the same memory location?

Or is it made sure that each use / instantiation of the Lib has its own location in the device? I guess it must be like this, i just want to make sure, i didn’t find any documentation about it.

Thanks for any hints,
Torsten.

Global variables (which are distinct from symbols) are allocated on a per-context basis. Each CUDA device gets a context, and each user process gets a context.

So if you have two separate applications that load the same library, there will be two copies of all global variables.

If you use two devices in the same application, there will be two copies of all global variables, one for each device.

If you have two user threads in the same process accessing the same device, there will be one copy of all global variables, shared by both threads.

You can also explicitly control how this works using the driver API.

Global variables (which are distinct from symbols) are allocated on a per-context basis. Each CUDA device gets a context, and each user process gets a context.

So if you have two separate applications that load the same library, there will be two copies of all global variables.

If you use two devices in the same application, there will be two copies of all global variables, one for each device.

If you have two user threads in the same process accessing the same device, there will be one copy of all global variables, shared by both threads.

You can also explicitly control how this works using the driver API.