I’ve got a code written in F90 which calls a CUDA/C function. It compiles and runs fine, but now I want to use the profiler to tweak performance. Unfortunately, when I run the profiler it only displays the memory copies and says nothing about the kernels. I’ve tried this using CUDA 2.1 and 2.2b. I’m running RHEL 5 and have a GTX295 and Tesla C1060. I get the same result no matter which card I select. So I’m wondering if the problem is the fact that I have CUDA wrapped in F90. I haven’t found any documentation to this effect.