Reasons why a device function is not getting called from global function

I am working on a code where a global function has to call a device function but somehow that is not what is happening. I want know are there some special requirements for this, as I am new to CUDA don’t have that much insight of what is happening.

can you provide more specifics?

how do you know “it is not happening”? and what is happening instead?

run your code with cuda-memcheck, it may be instructive

There are some print statements inside the device call which are not executing I guess.
Also code runs much faster than expected.
And my device function is the main time consuming method of the code.

perhaps then step the program in the debugger, and step into to the device function, to determine whether it is actually executing, or not