Unspecified launch failure on kernel launch

trueshadow97 · October 2, 2020, 11:45pm

I try to make a CUDA test sample, to debug the module I’m developing for my solution. In order to narrow down the scope of the problem, I make it as simple as it gets.

#include <iostream>
#include <cuda_runtime>

using namespace std;

__global__ void fillGPU(unsigned char* input)
{
    for(int i = 0; i < 256; i++)
    {
        input[i] = [i];
    }
}

int main()
{
    unsigned char* d_input;
    unsigned char input[256];

    cudaMalloc(&d_input, 256 * sizeof(unsigned char));
    cudaMemcpy(d_input, &input[0], 256 * sizeof(unsigned char), cudaMemcpyHostToDevice);

    fillGPU<<<1, 256>>>(input);
    cout << cudaGetErrorString(cudaGetLastError()) << endl; // returns "no error"
    cudaDeviceSynchronize();
    cout << cudaGetErrorString(cudaGetLastError()) << endl; // returns "unspecified launch failure"

    cudaMemcpy(&input[0], d_input, 256 * sizeof(unsigned char), cudaMemcpyDeviceToHost);
    cout << int(input[255]) << endl; // expected 255, get 0
    cudaFree(d_input);
    return 0;
}

I’m running nVidia Jetson Nano on JetPack 4.4, CUDA version 10.2.
nvcc sample_cuda.cu -o sample_cuda compiles the code no problem.
Is there a problem with the code, am I missing compiler flags, or is there something that must be done to an out-of-the-box JetPack 4.4 to get CUDA to properly launch kernels?

gavin.keith.ridley · October 4, 2020, 2:56pm

You are a bit off on two things. Firstly, I’m surprised that this expression compiles:

        input[i] = [i];

I think what you meant to put here was just:

        input[i] = i;

Secondly, you may want to note that you are currently having each thread fill 256 slots of your “input” array. I assume you wanted a one-to-one correspondence between threads and slots in “input”. If that’s what you intended (it should be much faster), your kernel should look like this instead:


__global__ void fillGPU(unsigned char* input)
{
  input[threadIdx.x] = static_cast<unsigned char>(threadIdx.x);
}

trueshadow97 · October 6, 2020, 7:49pm

Firstly, I’m surprised that this expression compiles

Indeed, I made a typo here as I didn’t copy-paste the code into the post. Can’t do that from a Nano to the machine I typed the post on.

If that’s what you intended (it should be much faster), your kernel should look like this instead

Thank you, this was indeed what I intended, but even after I changed the code to what you suggested, the result output is still 0 and the error checks still return unspecified launch failure.

gavin.keith.ridley · October 8, 2020, 2:57am

Ah. This is because you’re calling your kernel on input. That variable is a pointer to host memory. You’re passing it to a kernel where it is dereferencing to some nonsensical place in GPU memory. You should instead pass d_input to your kernel.

trueshadow97 · October 8, 2020, 4:00pm

Silly me, that was indeed the problem, thank you. In retrospect, that wasn’t worth creating a thread here, but at least if someone else has this issue, they won’t have to do it.

Topic		Replies	Views
Unspecified launch failure CUDA Programming and Performance	2	5760	May 24, 2009
Unspecified launch failure when launch my kernel CUDA Programming and Performance	2	3485	June 15, 2010
unspecified launch failure CUDA Programming and Performance	2	9101	January 21, 2009
Strange unspecified launch failure error on cudaMemcpy CUDA Programming and Performance	5	3102	March 6, 2018
Simple code won't work: unspecified launch failure CUDA Programming and Performance	0	819	October 20, 2011
unspecified launch failure: This error is in cudaMemcpy CUDA Programming and Performance	9	23360	February 16, 2010
Problem with "unspecified launch failure" CUDA Programming and Performance	4	3415	February 27, 2009
unspecified launch failure CUDA Programming and Performance	2	1189	March 18, 2009
Need help with cuda error: "unspecified launch failure" CUDA Programming and Performance	0	938	July 28, 2011
another unspecified launcher failure CUDA Programming and Performance	1	2963	March 15, 2009

Unspecified launch failure on kernel launch

Related topics