Hi Everyone,
Does anyone know if there is any way to disable some of the multiprocessors through CUDA?
Thanks
Hi Everyone,
Does anyone know if there is any way to disable some of the multiprocessors through CUDA?
Thanks
There is no way to disable multiprocessors.
Still you can make sure they are not in use by using some grid with a specific shape. However, what would be the actual purpose of that ? measuring some ““speedup””, saving some power External Media ?
Can’t you also create some “blank” kernels that some multiprocessor would be running ? if you manage to figure out how wraps are mapped, perhaps you can do as if some multiprocessor were disabled. But as you usually put much more blocks than there are multiprocessors, this might become a little ankward to implement (without a dummy grid)
How can I manage so that only some multiprocessors run some blank kernels? any idea?
You can’t. Blocks are executed in an undefined order that is different every time.
Why are you trying to do this in the first place?
So, even if I don’t know which multiprocessors these blocks are assigned to, is there a way that I can have them run empty kernels? (basically do nothing? like wait ?)
I suppose. Just run the computation with more blocks than it really needs and something like this:
__shared__ int a;
if (blockIdx.x >= real_num_blocks)
{
for (int i = 0; i < 10000; i++)
a++;
}
else
{
real kernel.....
}
It’s horrible code, I know. The shared memory dummy variable is just to keep nvcc from optimizing away the loop. Adjust the 10000 to get longer/shorter delays from your dummy blocks. This really has no purpose except to make the kernel execute more slowly as you requested.
That way you just ensure that some blocks are doing nothing and not one MP. But you have the solution right there.
To really not compute on some mp’s you have to have a kernel which eats all registers or shared memory of one MP. So expand the shared memory to over 8000Kbyte and make the 10000 e.g. 10.
That way 2 blocks will execute the real part. As one block needs a whole MP they will execute on two MP’s.
However you cannot determine on which MP’s.
Adjust the 10 the way you want.
that does not work, you cannot expand registers of just one MP. A kernel is a kernel and is the same for all MP’s, the fact they take a different codepath does not change the register-usage of the kernel.
There is currently no way to do this. One of the design goals of CUDA is transparent scalability to chips of varying numbers of SMs, which is why you can’t select which SM a thread block runs on.
But if you explain what you really want to do, we might be able to offer suggestions…
Mark
Well, IMO it would be nice to have ability to dedicate some of multiprocessors to rendering UI while others are doing computations. Now it is not possible and when card is doing some maths UI is really slow and annoying.