As I understand, in earlier CUDA releases – If you declare a GPU array like below, each CPU thread working on the SAME DEVICE will inherit a separate context and hence a separate physical copy of the array.
__device__ int temp[CONFIGURED_MAX];
With CUDA 4.0, this “temp” array will be available per “Context” and hence all threads working on SAME DEVICE will inherit the same physical copy of the array.
What happens to existing muti-threaded applications that rely on availability of separate context for each CPU thread (for the same GPU device)?
Will they not suffer from this new CUDA behaviour?
Am I missing something? Is there a separate API call to enable this new behavior?