showing gpu utlization per process

i am looking for gpu monitoring tool where i can get gpu utilization, memory information per process. nvidia-smi does not provide per process gpu utilization.Is there any other tool I can use? I am running tensorflow on nvidia card where if I run 2 jobs parallel on 2 gpus, they run slow as compared to run on single gpu.

Nvidia Visual Profiler:
https://developer.nvidia.com/nvidia-visual-profiler

Unfortunately it doesn’t monitor everything running on system like task manager on Windows.

It is not a system monitor, but it provides charts and numbers indicating occupancy, utilization and time taken to run API and kernel calls, in addition to detailed reports on specific functions.
From this information you can track the root of the bottleneck.

when I run my application against visual profiler, it shows me analysis. which tells me that application is running with low compute utilization. so if my app runs for 3 ms out of which 1 ms is actual gpu execution time. it will show 33.33%. Now what i want to know is when it is running on gpu, how many cores or multiprocessors are being used at the max during that 1 ms which will tell me actual resources used. how do i get this number?

After you run the initial profiling in NVVP, other options become available. Look for a button like “Examine individual kernels”, then NVVP will run the program again. After it completes, select the kernel you want information (it will show all kernels in a list) and look for an option like “Check GPU Work” or “GPU Computation”. It will open a small graph showing how busy each SM was for that particular kernel.