I am getting an unspecified launch failure when I try to run my code with multiple GPUs. After debugging for a while, I discovered that the pointers to device memory that were allocated by two different threads with different GPU contexts have the same address. Has anyone else had this problem? Also, is it possible that cudaMalloc could allocate the same memory location on two different devices, thus returning the same pointer, or should all device memory pointers be unique from device to device?
Pointers are not globally unique; every GPU has its own address space, so what you describe is perfectly legal.
Pointers are not globally unique; every GPU has its own address space, so what you describe is perfectly legal.