I have a few questions,
When I compile under 64 bit (Linux), my pointers are 64 bit. I declare a structure containing pointers, and create one instance on the host and one in global memory.
Sizeof() gives me the same size for both, so I assume I can use cudaMalloc() on the host, store the pointer in the structure, and cudaMemcpy() it to the global
memory instance on the device.
My question: I read that all GPU memory pointers are 32 bit, does the GPU simply discard the higher portion?
Even trickier: In the Fermi announcement I read that Fermi unifies the address spaces for shared and global memory, so I assume that they are different
for my G80 chip. How would the system know if a pointer contains a reference to shared or to global memory? Do I have to use something like
shared char constant * address to specify a pointer stored in the shared segment which points to an object in the constant segment?
I have not seen such notation in any of the examples.
somewhat simpler: Is there a device implementation for memcpy() or must I program a for() loop? If yes, I assume I should cast char * to int * and transfer
4 byte at a time (assuming I can align the starting addresses in memory)
Sorry if these questions have been addressed in some tutorial, I could not find the answers in the manuals, in such case please just reference in which
document / URL to look for the answer.