It seems that there is a problem when running multiple-GPUs code in device emulation mode (nvcc --device-emulation).
explanation:
Here is the draft of a code that will run properly on 2 GPUs, but which will give wrong results when running in emulation mode:
constant device cudevice;
…
void init(int device)
{
cudaSetDevice(device);
cudaMemcpyToSymbol(“cudevice”,&device,sizeof(int));
}
Then running two threads, with:
thread 1 calling: init(0);
thread 2 calling: init(1);
on two (hardware) GPUs, the constant cudevice will have the following value:
on GPU #0 cudevice==0
and on GPU #1 cudevice==1
When running the same code in device emulation mode, there will be ONLY ONE constant cudevice (which should not be) and that constant will have either the value 0 or 1.
The fact that the constant cudevice is not duplicated in device emulation mode is from my point of view a REAL BUG?!
Comments from any NVIDIA developer are welcome…
scb
System:
Linux x86_64 openSuse 11.3
Kernel 2.6.27.37-0.1-default #1 SMP 2009-10-15 14:56:58 +0200 x86_64 x86_64 x86_64 GNU/Linux
NVidia driver: 190.42
CUDA 2.3
CPU: core 2 quad Q9550 @ 2.83GHz
GPU: 2x ASUS 285GTX
Motherboard: ASUS Striker II Extreme (NVIDIA nForce 790i Ultra SLI)
RAM: 8GB DDR3