I hope this is the right forum for my question. Otherwise, please accept my apology and let me know the right forum for this to go to.
I am using Compute Visual Profiler to under my CUDA program’s performance. It seems the application launched within the profiler will always run 14 or 15 times before the profiler results are generated. Is there a way to control/customize how much times my application run? Is running an application 14/15 times designed for generating an averaged profile and avoiding noise?
Secondly, the application has been developed on Linux and has embedded debug information into the build. Can Visual Profiler trace where the CUDA kernel/methods are called? For instance, how can I find out which line in the source code is calling “memcopyDtoH” or “memcpyHtoD” which is actually a further abstraction of cudaMemcpy() depending on the source and the destination pointers? Some cuda API’s, e.g. “cudaThreadSynchronize()”, “cudaBindTexture()”, “cudaGetLastError()”, are not reported despite I have turned on “CUDA API trace” and all the possible “Profiler Counters”.
Thanks a lot.