How many times does cudaSetDevice need to be called?

In a multifile multifunction CUDA project to run on a multiGPU system which is to use only one device (for now) how many times must I call cudaSetDevice to set the device the project is to use, and where do I call it?

Is there a macro or system variable I can call in the preprocessor to set the device just once?

cudaSetDevice() is speific to the CPU “THREAD” that executes it.

And, note that For a given thread, Only 1 cudaSetDevice() is required. Subsequent calls do NOT have any effect.

btw,
If u do a “cudaMalloc” or any RT API calls (Except getDevCount or getProperties) , the RT call will automatically setDevice to default device… and subsequent calls to cudaSetDevie from your application wont b effective. (i.e. essentially a screwed up state :-) ) So, the best thing would be to do a “cudaSetDevice” and then call those RT calls (so that their cudaSetDevices if at all they do – will have no effect)

All rightey,

Bye, Gooooooooood Luck!

when I allocate memory on the device I first call cudaSetDevice and then allocate the memory. The compiler then recognises the device the memory is allocated to so that in any subsequent calls to that pointer the device the memory is allocated in is in the address itself?

I dont know what “compiler” is doing here. Compiler has no knowledge about all that. Compiler is innocent and is not aware of device memory or host memory pointers.

And, Can you kindly rephrase your last sentence. I dont understand a bit of it.

There is automatically one CUDA context created per host thread. As Sarnath said, you only need to cudaSetDevice once in each thread. When you allocate a device pointer, it only exists within the context it was allocated. This is all handled by the driver and cudart.