A piece of code that takes over 1 minute on the command line was done in a matter of seconds in NVIDIA Visual Profiler (running the same .exe). So the natural question is why? Is there something wrong with command line, or does Visual Profiler do something different and not really execute everything as on the command line?
My first question would be… does your program generate any output that you can verify between both of the runs? Was it correct if so? I’d guess “it not really execute everything as on the command line”.
One of the NVIDIA engineers has responded on another thread, which sounds like its the same issue.
Please check this response: [url]Performance is much better when profling with NSight than when running production code - CUDA Programming and Performance - NVIDIA Developer Forums - and respond the solution works for you.
Thanks,