Getting address of __device__ variable in 4.0 Toolkit

Hello,
I recently moved to tool-kit version 4.0 from the version 3.2.

I was wondering how the device variables get handled in the newer toolkit, especially since each host thread can now access multi-GPU. I wanted to create device variable before calling the kernel, and I wanted to get the address of the variable on the host thread. Do I need to do something special for this in the newer toolkit?

Previously I was doing something like,

File1.cu

#include <cuda_runtime.h>

device int array_1 = {1,2,3,4};

extern “C” void get_array_address(int* array_h_1) {
cudaGetSymbolAddress((void**)&array_h_1, “array_1”);
}

Now the above function would be called from the main() function in some other file like “File2.cu”, which would also contain the kernel call. Do I need to set the device using cudaSetDevice(0) before trying to get the symbol address ?? How would the context of the device variable be set?

Thanks for the help.