Questions about memory pointer size in 64 bit environments

Hello,
I have a few questions,

  1. When I compile under 64 bit (Linux), my pointers are 64 bit. I declare a structure containing pointers, and create one instance on the host and one in global memory.
    Sizeof() gives me the same size for both, so I assume I can use cudaMalloc() on the host, store the pointer in the structure, and cudaMemcpy() it to the global
    memory instance on the device.
    My question: I read that all GPU memory pointers are 32 bit, does the GPU simply discard the higher portion?
    Even trickier: In the Fermi announcement I read that Fermi unifies the address spaces for shared and global memory, so I assume that they are different
    for my G80 chip. How would the system know if a pointer contains a reference to shared or to global memory? Do I have to use something like
    shared char constant * address to specify a pointer stored in the shared segment which points to an object in the constant segment?
    I have not seen such notation in any of the examples.

  2. somewhat simpler: Is there a device implementation for memcpy() or must I program a for() loop? If yes, I assume I should cast char * to int * and transfer
    4 byte at a time (assuming I can align the starting addresses in memory)

Sorry if these questions have been addressed in some tutorial, I could not find the answers in the manuals, in such case please just reference in which
document / URL to look for the answer.

Best regards,
Michael