Run different kernels parallely on different SMs

I want to run two different kernels on same GPU in parallel and also I need them to run on completely different SMs (i.e. I need compute isolation). Is this possible today? (I think it might be possible on Voltas architecture, but is it possible on prior architectures? I have a Pascal GTX 1070)

Not possible, if those kernels are issued from the same host process. You can run different kernels at the same time (see the cuda concurrent kernels sample code) assuming those kernels meet various requirements, but you have no control over SM level scheduling.

This is what I suspected. Thanks.

BTW, do you know what are the best know alternates to attain SM level scheduling? For example, a quick google search indicated “persistent thread” model might be able to achieve fine control over scheduling in software.

E.g. I found this paper: Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks.

I wanted to know if there is some standard open source code for achieving SM level control.

You could make sure a block is resident on at most one SM by using enough registers to allow only one block to run. Alternatively allocate >48kb shared dynamically per block (only available on Volta after an opt-in) to achieve the same effect.

it’s in no way “standard” or guaranteed to work on future hardware but it might achieve the job of not running on the same SM.

Different kernels running on different SMs at the same time still shares the same memory space and L2 cache contents. So there is no clean separation possible that satisfies the requirements of compute isolation. Running the code in separate CUDA contexts (and therefore not at the same time) might be the only way.

Devices like the Tesla M10 that offer multiple GPUs on one PCB could be an alternative. Also this device allows virtualization of the GPUs to support many users and applications (both graphics and compute) at the same time.

you can achieve sm level control using SMID and controlling threadblock scheduling yourself. Just google “CUDA smid” for plenty of examples. I doubt this is going to give you “compute isolation” unless you have a very narrow definition of what that means.