I am using tensorflow library in my app to do inference of model. If I run single model, inference takes around X milliseconds. If I run the another binary with same model concurrently, then each of them takes more time. Now I want to know whether GPU is actually busy or all of its multiprocessors are being used and thats why it is running slow. When I use visual profiler, I see that total compute time is increased and each individual kernel is taking more time for the execution. Now, what could be the reason that kernels are taking more time for the execution? How do I further analyze this issue?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Slower inference times when running multiple programs | 0 | 392 | May 6, 2019 | |
Kernel Overhead/Profiler Accuracy | 4 | 6393 | May 25, 2008 | |
Cuda Profiler: CPU Time How to influence this value? | 0 | 1389 | February 16, 2010 | |
Why GPU might slow down. I'm having a problem with a CUDA program slowing down | 2 | 1806 | December 22, 2010 | |
Two copies of same kernel, one runs 2x faster | 2 | 701 | January 30, 2014 | |
Cuda sporadically slows down | 2 | 868 | May 28, 2018 | |
Profiler Kernel Speeds faster than cmd? | 4 | 6780 | June 24, 2008 | |
Profiler v. cudaEventSynchronize | 6 | 8140 | March 27, 2008 | |
cput time in cuda visual profiler | 0 | 993 | July 18, 2009 | |
GPU Idle time | 0 | 3894 | February 28, 2009 |