Thrust with MPS

We have been having issues attempting to run processes that use Thrust on pre-Volta GPUs when MPS is running. It reports that MPS does not support dynamic parallelism. Are there nvcc compiler flags to disable dynamic parallelism, and/or Thrust execution policies that would permit us to run basic Thrust algorithms such as for_each and sort when MPS is enabled? (We want to run the algorithms on the GPU, not on the host, but are willing to sacrifice some performance if necessary to avoid dynamic parallelism, and do not launch kernels from within our own functors.) Or should this not be an issue (even on pre-Volta GPUs), and we are misinterpreting the cause of the failures we are encountering?

The only time I know of that thrust uses dynamic parallelism is when you call thrust algorithms from device code and use the thrust::device or other similar policies. thrust::seq, when called from device code, is the way to prevent thrust from using dynamic parallelism.

I don’t know if this applies to your case or not, because thrust calls from device code is somewhat “unusual” but I don’t know your code base, so, maybe.

Even with thrust::device, you generally can prevent usage of dynamic parallelism by providing a compilation environment that does not support dynamic parallelism. Thrust will then “automagically” not use it. In a nutshell that means omitting things like

-rdc=true

or perhaps other related switches like -dc or -lcudadevrt, to make CDP not possible from a compilation perspective.

I haven’t tried this out with MPS nor have I seen whatever error you are referring to.

Thank you for your reply! Since it sounds like it is not necessarily a known limitation that Thrust can’t work when MPS is running on pre-Volta GPUs (at least so long as user-supplied functors don’t use dynamic parallelism), we will plan to try to create a minimal example and report back with details of the code and error messages if we still experience this issue.