NPP has many functions that perform a binary operation in which one of the operands is a constant. However, these functions assume the constant is in host memory. I regularly find cases in which I want to perform an operation against a constant where the constant is in device memory. I use existing NPP functions to calculate that constant, but then would need to transfer the constant to host memory in order to perform the next NPP function call I need. This doesn’t work so well when I’m using async calls. Any way functions with constants in device memory could be added to NPP? I’d be surprised if they weren’t almost exact copies of the existing functions that perform binary ops with one constant operand.
The suggested way to request changes/additions/improvements to CUDA is to file a bug.