How to know a kernel actually starts running in Cuda C++?

yidiwang · April 26, 2019, 1:06am

Hi,

I am wondering if there is a way to know whether a kernel instance starts running?
When we launch a kernel, the CPU will send the instance to GPU. But because of resource limitation, the kernel instance may be blocked by other kernel instances and not be able to start running immediately.
So through Cuda c++ API, am I able to know the actual time when the kernel starts to run?
Also, can CPU query the GPU resource usage in Cuda c++?

Thanks,
Yidi

Robert_Crovella · April 26, 2019, 3:38am

Launch an event before and after the kernel.
Query the first event to determine if the kernel has started. Query the second event to determine if the kernel has finished. If you need to know a precise time, then use the event elapsed time function in the event system.

You can query memory usage with cudaMemGetInfo. For other types of GPU resources, possibly not, but you would have to be specific.

yidiwang · April 26, 2019, 9:35pm

You mean something like this?

kernel1 <<< ..., stream1 >>> (...);

    cudaEventCreate(&start);
    cudaEventRecord(start, stream2);
    kernel2 <<< ..., stream2 >>> (...);  
    cudaEventCreate(&stop);
    cudaEventRecord(stop, stream2);
    cudaEventSynchronize(stop);

    cudaEventElapsedTime(&t, start, stop);

In the above case, suppose I launched two kernels at t=0. I deliberately make the second kernel thread-intense so that it would be blocked by the first kernel and could not be launched at t=0. However, if I used cuda event like that, the state captured should be at the time right after kernel1 is launched, but not the state when kernel2 start running.

Robert_Crovella · April 26, 2019, 10:34pm

Agreed. In your original description, you did not mention streams and an expectation of concurrency. When you add streams to it, the method I proposed won’t do what you want. If both kernels are launched into the same stream, my method should be instructive.

There might not be a CUDA-runtime-API method to do what you want, although it might be possible with CUPTI:

[url]https://docs.nvidia.com/cuda/cupti/index.html[/url]

If this is very important to you, you could use a flag reported in host-pinned memory, and have your host code poll the flag for execution state. The kernel code would set the flag state. See here for an example:

[url]cuda - How can I check the progress of matrix multiplication? - Stack Overflow

This is probably easier to get working correctly on windows TCC or linux. Windows WDDM mode may present some additional challenges due to command batching.

Topic		Replies	Views
Running a kernel blocks the CPU? Is it possible to run it asynchronously? CUDA Programming and Performance	2	3486	April 21, 2009
Polling device variable while kernel is running CUDA Programming and Performance	5	2388	September 20, 2016
Can a kernel be switched like a thread in OS? CUDA Programming and Performance	1	248	September 8, 2023
How to Launch Cuda kernel in different processes CUDA Programming and Performance	8	3736	November 6, 2018
Kernel Runtime CUDA Programming and Performance	9	5698	July 9, 2008
Any way to understand is GPU occupied CUDA Programming and Performance	1	666	April 23, 2015
simple question CUDA Programming and Performance	2	4963	December 7, 2007
Async start kernel in different stream after another completes? CUDA Programming and Performance	2	590	April 4, 2016
How to check if kernel was launched? Is possible that kernel failed to launch but it was not recorde CUDA Programming and Performance	3	3275	March 8, 2010
Timing Concurrent Kernels CUDA Programming and Performance	1	2357	January 18, 2011

How to know a kernel actually starts running in Cuda C++?

Related topics