Because the Tensorrt version is different. See more in UNET - NVIDIA Docs
If the TAO inference environment has the same version of TensorRT and CUDA against the environment where you run deepstream inference , the unet engine file(generated via deepstream) can be directly loaded, and vice versa.
Since you already have similar result between your own c++ inference and deepstream inference, I suggest you to do below experiment. This experiment is only related to "tao unet inference xxx " and your c++ inference.
- Try to use "tao unet export xxx " to generate .engine file. See UNET - NVIDIA Docs
- Then use "tao unet inference xxx " to check if the inference result is expected.
- Then run your own c++ inference to check if the .engine file has similar result as step 2.