some blocks in kernel can't launch

hulin · April 16, 2018, 12:48pm

I have a piece of code like this:(some parameters in functions are omitted)

cudaStream_t stream[MAXSTREAM];
for(int i = 0; i < n; i ++)
{
    int *h_toDevice;
    cudaMallocHost((void **)&h_toDevive);
    memset(h_toDevive);

    int *d_toDevice;
    cudaMalloc(&d_toDevice);
    for(int s = 0; s < MAXSTREAM; s ++)
        cudaMemcpyAsync(d_toDevice,h_toDevive,stream[i%MAXSTREAM]);

    for(int s = 0; s < MAXSTREAM; s ++)
        kernelFunc<<<stream[i%MAXSTREAM]>>>();

    for(int s = 0; s < MAXSTREAM; s ++)
    {
        cudaFreeHost(h_toDevice);
        cudaFree(d_toDevice);
    }
}

(The MAXSTREAM is about 50,and n is about 100)
For n times, I launch MAXSTREAM kernel functions at a time to process some tasks. But something I can’t understand come up. For example, I launch 20 blocks in a kernel function, but 10 of them are not launched according to output imformation! I can’t get any errors using cuda-memcheck or cudaGetLastError() after kernel function launch.
I don’t understand why some blocks in a kernel could launch while others can’t.
I guess the GPU may run out of compute resources.But I put compute tasks in streams, and max number of kernel launched simultaneously is limited, will that still happen? If so,how can I get the error information to arrange the tasks reasonably?

cbuchner1 · April 16, 2018, 3:43pm

none of the calls to cudaMallocHost, memset, cudaMalloc make sense.

the calls miss required arguments.

hulin · April 17, 2018, 1:50am

The code I showed is just pseudo code.As I saied,some parameters of the functions are omitted.
The code can be compiled and run.

cbuchner1 · April 17, 2018, 8:51am

If you posted complete & self contained code examples that compile unmodified, plus a short statement of what the expected output is vs. what you actually received, then would be more likely that someone is willing to look into the details of what is happening here.

I don’t see a cudaDeviceSynchronize() after you launch all your kernels. It may be that you’re killing the GPU context before the kernel is done launching? On the other hand, the cudaFree() should perform implicit synchronization.

Topic		Replies	Views
Odd cudaStream_t behavior CUDA Programming and Performance	8	5980	September 3, 2008
CUDA stream are blocked when luanch lots of kernels (>1000) CUDA Programming and Performance	3	604	December 30, 2018
Kernel launch concurrency CUDA Programming and Performance	10	1828	December 11, 2014
Why does my kernel launch? CUDA Programming and Performance	5	5994	February 13, 2009
Kernel randomly fails to launch after several thousand successful launches CUDA Programming and Performance	4	2555	September 25, 2009
Kernel is not launching CUDA Programming and Performance	1	656	January 18, 2016
Dynamic parallelism, Kernel didn't launch CUDA Programming and Performance	6	1890	September 12, 2016
Launching Kernel Fail CUDA Programming and Performance	15	3429	May 28, 2014
How to check if kernel was launched? Is possible that kernel failed to launch but it was not recorde CUDA Programming and Performance	3	3289	March 8, 2010
Problem about launch kernel functions several times CUDA Programming and Performance	3	5943	August 21, 2009

some blocks in kernel can't launch

Related topics