I am using tensorflow library in my app to do inference of model. If I run single model, inference takes around X milliseconds. If I run the another binary with same model concurrently, then each of them takes more time. Now I want to know whether GPU is actually busy or all of its multiprocessors are being used and thats why it is running slow. When I use visual profiler, I see that total compute time is increased and each individual kernel is taking more time for the execution. Now, what could be the reason that kernels are taking more time for the execution? How do I further analyze this issue?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Profiler speeding up my kernels? Nvidia employees please read Weird timing behavior during profiler | 6 | 5882 | November 9, 2009 | |
| How to explain the performance difference? | 7 | 3566 | March 26, 2008 | |
| Function executing time | 7 | 6478 | December 17, 2007 | |
| overall time consumption computation how to compute how much time my GPU code is consuming ? | 0 | 1132 | May 18, 2009 | |
| Two copies of same kernel, one runs 2x faster | 2 | 734 | January 30, 2014 | |
| Simultaneous execution of multiple kernels | 4 | 2638 | December 24, 2008 | |
| Kernel Overhead/Profiler Accuracy | 4 | 6447 | May 25, 2008 | |
| Profiler timings vs. real world timings. VERY different... | 8 | 2505 | May 15, 2009 | |
| cput time in cuda visual profiler | 0 | 1017 | July 18, 2009 | |
| Inconsistent kernel run times | 12 | 5881 | August 5, 2009 |