Multi-gpu programming

Are there special considerations that should be made, or other best pratices to observe, when using several GPUs?

My problem:
I have a computer with a GTX260 and a GTX295.
If I run kernels on all three of them, sooner, rather then later, I get a segfault, and the graphics freeze. One needs to ssh in and reboot get back up and running.
I can run on each card independently.
The general pattern is that the segfault appears to occur on the second card that finishes a kernel. Sometimes it works for several kernel executions, but mostly the problem shows up really quickly.

I’m trying to determine the cause of this.

Are you running on graphical environment?

luca

i have a similar problem, when i increase the size of data to be stored in GPU memory the graphical user interface crashes.

may be my GPU memory gets fully consumed.

Your Code Review ( i c u r already doing dat) should fix this.

@Calu
No, the computer in question does not run X, or other GUIs.

@Biebo
Even if your symptoms are similar, it doesn’t sound like the cause is the same, since I don’t run a gui.

@Sarnath
Yep, been at it for two days. Have no idea what’s up. I was hoping someone could help me in narrowing it down, since it only happens when I use several cards.

What is probably worth noting, is that I’m not really doing “multi-gpu” programming. I launch three seperate processes, and each one uses only one card.
Which has me very confused. I would have thought it absolutely impossible the processes to affect each other.

Just shooting wildly here:
When a unspecified launch error occurs, it’s always on the first memcpy device → host.
Does that ring a bell with anyone? Any suggestions on how to better locate the problem?

Maybe you are running out of 5 seconds limit… Try to run outside graphical interface.

luca

Sounds to me like the GPU equivalent of a segfault. A unspecified launch error pretty much everytime means you are accessing memory outside of the allocated area, especially if it is reported on the first call after the kernel. You might want to try running your code through valgrind in emulation mode.