Sorry about this very basic question but I haven’t found an obvious answer.
I realize VPI contains optimized algorithms and NPP is optimized primitives. However,
VPI and NPP have plenty of shared compute kernels. Both support multiple backends.

I am trying to understand if I am using something like Bilateral filter or Box filter or other common operations, how am I making a decision between using VPI and NPP? Does VPI utilize NPP underneath for supporting some operations in which case I can just use VPI?

Not sure what that means. VPI claims to support multiple backends (to wit: CPU, GPU, PVA, VIC). I don’t know of any similar claims for NPP. NPP supports usage only via its CUDA GPU backend.

You make a decision between VPI and NPP based on the function call you use, what you install, and what you link against. For example, all NPP functions start with npp.

NPP is formally part of the CUDA toolkit, and is shipped/installed via the CUDA toolkit installer, and there are NPP sample codes in the CUDA toolkit sample codes that demonstrate usage. VPI is installed separately from the CUDA toolkit, usually via the SDK manager.