Are higher order functions still slow in CUDA C++?

Hello I had found relatively old posts about problems with passing pointers of functions in CUDA code from performance perspective .
I it still true for function objects or lambdas ?
I am thinking through how to make the code more testable and modular, and higher order functions and lambdas will make it easier to for example separate looping logic from logic happening inside the loop in easily testable way.