Im working on porting an existing application to run on multigpus. It already works on a single gpu but i have a 9800gtx and a gtx280 in this pc at work so i might as well see if multigpu can help speeds things up.
Ive pretty much copied the multigpu example in the sdk. I get (way) worse performances though.
The code used to run on the gtx280 alone and the profiler reports 430ms. As a first try, ive split the work load in half between the gtx280 and 9800GTX.
The profiler now reports 1.5sec for half the work being done on the GTX280. Ive also tried to limit the number of GPUs being used to ‘1’, thus basicaly doing all the work on the GTX280 but still using host threads generations and the computation time logicaly raised to around 3 secs.
I dont get why this is much slower than a non cpu threaded application where all the work is done on the GTX280, any ideas?
Im using constant memory and textures defined in another file if this can cause problems… Populating both by each host thread.
As was to be expected, my mistake had nothing to do with multigpu code! I was just forgetting to initialize one of the constant in the single gpu configuration, causing the whole kernel execution to bypass the costly branch in the code, thus ending much quicker.
My face is indeed quite red.
Now i just need another GTX280. The 9800GTX is slower at doing half the work than the GTX280 is at doing the whole work, so this is quite useless :)