we are running potentially overlapping computations based on the CUDA streams API - depending on the capabilities of the hardware for simultaneous kernel execution.
My question is:
Does each stream see its own version of the constant variables that are declared in a .cu module ?
In case the constant memory is the same for all streams, it will cause a major headache because in our case each stream requires its own individual version of the constants.
We could fix it by adding a stream dimension to each of the constant arrays, but it would effectivly reduce the available 64K to a much smaller space, depending on the number of streams we want to run concurrently.