I could not analyze GPU operation by Tensorflow's Visual profile.

I started to lean the deep leaning by using Nvidia’s GPU.
At first ,I planned to analayze GPU internal operations of various Deep learning Frame works.

I could not analyze GPU internal operation by Tensorflow’s Visual profile.
Could you gove me nay advices?


At fitst, I got profile of the Chainer(it is PF DL Frame work).I used Maxwell TitunX and Ubuntu14.04.

GPU profile of tje Chainer’s Mnist was as below.


This profile is familiar. The Nvidia visual profiler indicates operateos GPU thread 3078833984.

A lot of treads operate in Nvidia GPU,but Nvidia visial profiler indicates one profile of them.

Next,Tensorflow mnist profile by Nvidia Visual profiler.


Tensorflows’s profile is not familar for me.

There are a lot of threads on profiler.(Thread No.4057159424,3951027968,3934242560 and etc.)

I only used one GPU, not multi GPUs.

I could not understand the reason why a lot of threads were indicated on GPU profile.

Could you give me any advices or exaplanation about Tensorflow’s GPU profile ?



Please refer below description about timeline. The result depends on the sample you profiled.

A timeline will contain a Process row for each application profiled. The process identifier represents the pid of the process. The timeline row for a process does not contain any intervals of activity. Threads within the process are shown as children of the process.

A timeline will contain a Thread row for each CPU thread in the profiled application that performed either a CUDA driver or CUDA runtime API call. The thread identifier is a unique id for that CPU thread. The timeline row for a thread is does not contain any intervals of activity.

Read more at: http://docs.nvidia.com/cuda/profiler-users-guide/index.html#ixzz4PPcoxhkd

Thank you for your reply ,veraj.

But I can not understand perfectly the reason why a lot of teread of GPU appeared in the Tensorflow’s profile .

Do you know the very simple sample code which indicate a lot of threads in profile?

The sample code need not to be same profile as Tensorflow’s profile, only indicates some threads of GPU in profile.

I tried to serach some sample codes which indicate a lot of threads in profile by using Windows CUDA sample codes, I never got the profile whinc I wish.

I’d appriciate it if you will give any advices.



You can refer to the description at http://docs.nvidia.com/cuda/cuda-samples/index.html#axzz4Px1WYtpI
and search “Multithreading”, there will list some samples that indicates a lot of threads in profile.
Such as simpleCallback.

Best Regards

Hello, verai

Thank you very much for your advice.

I could get the profile which I want by using “simpleCallback” as you said.

And I understood the reason why q lot of threads apperared in GPU Profile.

Tnak you for your help.