Debug on Ubuntu - concurrent kernel execution


I’m trying to debug the concurrent kernel on nvvp. My installation is the following:

Ubuntu 14.04 + CUDA 7.0 + Nvidia Driver [375.20] + GTX780

I tried to debug 5 tasks using GPU concurrently because I wanted to verify the behavior when multiple kernels were run at the same time.
These tasks are same code referring by but process ID is different.

As a result, some kernels appear to be running at the same time like Round-Robin.

I do not use stream.
Does this really work like round robin?

Looking forward to your response.


Hi, gakky1667

Nsight EE do not support debug several applications concurrently.

You may get below info when you launch another debugging

cuda-gdb failed to grab the lock file /tmp/cuda-dbg/cuda-gdb.lock.
Another CUDA debug session (pid 16319) could be in progress.
Are you sure you want to continue? (y or [n]) [answered N; input not from terminal]

If you really want to do this, may you can debug from cuda-gdb command line. Just launch 5 cuda-gdb session.
But we can’t promise the debug result is OK as everything may happen when you use 1 GPU to do so many things.

Thank you veraj for your answer,

Do you mean the result of nvvp is incorrect?
I run the following command and generate 5 xxx.nvprof files at the same time.
After that, I can see kernels launched from different processes on nvvp.

$ mpirun -np 5 nvprof -o simpleMPI.%q{OMPI_COMM_WORLD_RANK}.nvprof ./sumArraysOnGPU-timer
// file > import > *.nvprof

What happens to the behavior when multiple kernels are launched for a single GPU?
I thought that it would be processed one by one…

Hi, gakky

I’m confused here.
You said you want to debug 5 applications.
But here the command is to profile. The command to profile is OK. And the profile result should be correct.

What’s your purpose?


Please excuse my lack of explanation.

I want to know how does the GPU handle multiple kernels?
Does the GPU do exclusive control of the kernel?
If GPU can run multiple kernels without using stream simultaneously, is it running like a Round-Robin?

Because I confirmed that the variation of kernel execution time became large when multiple kernels were launched for a single GPU.

And I visualized the kernels by nvcc.
As a result, launched kernels were not processed one by one but were run at the same time.


I think you should raise a question under CUDA programing topic not nsight EE.
I’m pretty sure they will give you a good answer.


Thank you for your suggestion.
I would like to post a question under CUDA programing topic.