With the active support of many members on the Nvidia forum , I’ve finally managed to graduate to the ‘programming forum’. Thanks to all for that
A few questions that have been swirling in my head and google being futile, this is where I resort
little background first:
As my first task I am trying to benchmark a vector addition of 1-dimension 20000(just signifying a big number) elements each on my system (i7 930 and 4 x GTX 295(8 GPUs)
a) using only GPUs (apart from the host calls)which I plan to distribute over all the 8 GPUs equally
b) using only processor( not relevant here)
- Quoted from Cuda programming guide version 1.1 section 3.4
Would running them in quad sli relieve me of the duties of assigning a section of code to a particular GPU and be taken care of by the master GPU?
2)Quoted from same book
would some one mind shedding clarification.
3)Coming to assigning specific GPU to a part of code the only thing available( in my knowledge is) cudaSetdevice. Now to split the task do I need to call the cudaSetDevice 8 times. What I mean is, will this is how the code should resemble?
cudaSetdevice(0); myfunc<<<Nblocks,threadPerBlock>>>(param 1, param 2); cudaSetdevice(1); myfunc<<<Nblocks,threadPerBlock>>>(param 3, param 4); cudaSetdevice(2); myfunc<<<Nblocks,threadPerBlock>>>(param 6, param 5); . . . . cudaSetdevice(7); myfunc<<<Nblocks,threadPerBlock>>>(param x, param z);
and if this is the case does it imply that the cudaSetdevice for the second GPUand onwards is executed only after the completion of the kernel associated with first GPU , so on and so forth?
I have a couple more but then they better be addressed later if this post is not to be marked as spam owing to its length :)…
Thanking in anticipation for your time to read and respond to it…