Pixels variance between Cuda hw decoding and opencv video capture

I have tried the following ways to convert NV12 to BGR format:

  1. Nv12ToBgr32, code from video codec sdk
  2. nppiNV12ToBGR_8u_P2C3R, from npp libray
  3. nv12tobgr, fro this jetson-utils repo

All the options above showed different variance, compared to the opencv pixels mat with the same video source. Sometimes the variance larger (>50pixel value diff), sometimes smaller (0 or 1 pixel), which also depends on different video sources.

How could I minimize the variance? I wish I could make it quite the same to opencv mat, otherwise My DL model will output a different result, which I do not expect.