I’ve noticed, what I think, some weird behaviour today during a run. I have two 295 gtx card in a computer, so I have 4 different devices in there which I can use. I started two runs on two different devices and these runs are using the same settings and code so they should take about the same time to finish.
The behaviour that caught my attention was that it seemed like one kernel were “waiting” on the other in some way. The way I noticed this was from the output the programs writes to the console. They seemed to progress in sync so to speak. In the program Im using the execution goes through different environments. One part is in Java and from java (over JNI) the cuda part is called and then when its done on the gpu it goes back to host-cuda and then java code again, and this repeats many times. When the flow returns from the gpu some information is written to the terminal. This information was always written at the same time from the two different runs, where each run was on a separate gpu device.
To check if it was a coincidence, I changed one of the runs to only use 20 000 rows as input where the other run still used 100 000 rows. In this particular problem the data input is read from a file and used as input for the calculation on the gpu. The number of input rows is(should be) proportional to the time it takes to execute the kernel, so here it would ideally be around 5 times faster. But even after I changed to 20 000 rows they were progressing in sync, and were progressing at the same speed. It “feels” like the kernels has to wait for each other before anyone can continue. This behaviour appears even if I am using physically different gtx 295 cards. Because I have two 295 gtx I can use devive 0 and device 3 for example but this behaviour still is there.
To me this behaviour does not seem logic at all, but there perhaps is some logical explanation to this? Is there some mechanic in the cards that make this behaviour necessary?
The only other thing I can think of that can be the cause to this problem is the graphic driver. Because the driver is the same for all cards possibly the problems comes from there. Im using the linux (Gentoo) for development.