Huge compilation time in thrust::reduce and thrust::stable_sort_by_key in cud 11.4

Dear Nvidaia engineers.

In our product we have a logic that uses thrust::reduce (Thrust: thrust::reduce) and thrust::stable_sort_by_key).
Build time in cuda 11.4 (card GeForce GTX 1650) significantly slow than in cuda 10.1 (card GeForce GTX 1050 Ti) machine.
After investigation it was found that both operators param in thrust::reduce and thrust::stable_sort_by_key uses same template method.
For testing purpose I commented out various command inside this template method and it was found that commenting row return (a1< a2) == b1; (a1 and a2 are of size_t type and b1 is bool) dramatically reduced build time.
NVCC command that is used in build is
time nvcc -isystem <prerequisites_path>/include -isystem <prerequisites_path>/lib/ghc-8.6.5 -isystem <prerequisites_path>/lib/ghc-8.6.5/include -isystem <prerequisites_path>/extras/CUPTI/include -isystem <prerequisites_path>/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/plugin/include -isystem <prerequisites_path>/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/plugin/include/c-family -isystem <prerequisites_path>/aws-cpp-sdk-s3/include -isystem <prerequisites_path>/aws-cpp-sdk-core/include -isystem <prerequisites_path>/build/.deps/install/include -Icpp -Icuda -g --compiler-options -g2 --compiler-options -DDEBUG -DTHRUST_DEBUG -DCUDA --compiler-options -Werror=return-type --compiler-options -Werror=init-self --compiler-options -Werror=format --compiler-options -Werror=uninitialized --compiler-options -Werror=unused-result --compiler-options -Wno-error=maybe-uninitialized --compiler-options -Werror -w --Werror cross-execution-space-call --compiler-options -fPIC -maxrregcount=0 --machine 64 --compiler-options -pipe -DBOOST_NOINLINE= --std=c++14 --expt-extended-lambda --expt-relaxed-constexpr -arch=sm_70 -gencode=arch=compute_70,code=sm_70 -o <object_file>.o <cu_source_file>.cu (-arch=sm_50 -gencode=arch=compute_50,code=sm_50 is used on cuda 10.1 machine)

Please advise.

Thanks.