NPP behaviour on CUDA streams created with `cudastreamnonblocking`

See related/original question here: https://stackoverflow.com/questions/57927742/nvidia-npp-on-cuda-streams-that-use-cudastreamnonblocking

Is there an official stance on how NPP functions interact with streams that are initialized to not synchronize with stream 0 (i.e., they use the cudastreamnonblocking flag)?

While we discovered surprising behaviour when trying to use the new _Ctx functions, when using the old nppSetStream using a cudastreamnonblocking stream we also got problematic behaviour.

My current suspicion is that cudastreamnonblocking streams are effectively not-compatible with NPP. But some kind of official documentation would be appreciated.

Thank you!

my suggestion would be to file a bug

The instructions to do so are contained in a sticky post at the top of the CUDA programming sub-forum.