device memory declared Globally not passed in

Suppose I malloc some device memory, and rather than pass the address in as usual
to the kernel, I try to place dev_S in global as shown.
this works fine until read inside the kernel where it is taken as 0 (Line 10).
Its ok (not 0) at line 20. (using %p to print the address

xxx.cu
1 device struct Sdef dev_S=0;
2
3
4 global void_nullKernel(…
5 {
6 int bid=blockIdx.x;
7 int tid=threadIdx.x;
8
9 if (bid==tid)
10 printf(“DEBUG %s:%ld bid=%ld tid=%ld (dev_S=%p)\n”,FILE,LINE,bid,tid,dev_S);
11 …
12 }
13
14 int_main()
15 {
16…
17 HANDLE_ERROR(cudaMalloc((void
*)&dev_S,nScn*sizeof(struct_Sdef)));
18 HANDLE_ERROR(cudaMemcpy(dev_S,S,nScnsizeof(struct_Sdef),cudaMemcpyHostToDevice));
19
20 printf(“DEBUG %s:%ld dev_S= %p\n”, FILE,LINE,dev_S);
21 nullKernel<<<4,4>>>(4, 4,devP,devO);
22 cudaThreadSynchronize();
23 …
24 }

Is there some way to make the kernel accept/read this global pointer. I have some constant
memory that works fine.
Of course I could just pass in the address dev_S as a kernel arg as usual, but can I make this
alternate method work? If I dont mark dev_s with device, then I get compile time error identifier dev_S undefined in device code.

Because dev_S is in device memory, you can’t use it directly with cudaMalloc or any other host function. You can, however, read and write to it using cudaMemcpyToSymbol.