Unspecified launch failure

Hi,
I have some trouble developing a CUDA app. When I launch the application there is no error report (I catch all return values from CUDA runtime functions, also correctly handled async errors). Nevertheless, executing it in a text terminal (ctrl+alt+F1) the screen flickers and appears a lot of blue, red and green pixels.

Also, If I execute the program within cuda-gdb, I get an error (I insist, this error was not thrown without cuda-gdb).
The error is
Warning: CUDA API error detected: cudaMalloc returned (0x4)
CUDA error call (cudaMalloc(…)): unspecified launch failure.

I concern that unspecified launch failure errors are caused by invalid pointers but I think this is not the case.
The code that was causing the error was something like

double* ptr;
    unsigned long int size_in_bytes = 64 * sizeof( double );
    cudaMalloc( (void**) &ptr, size_in_bytes )

I have also tested it with a dummy pointer such as (allocate only 1 byte)

double* dummy;
    cudaMalloc( (void**) &dummy, 1 )

Executing application with valgrind, some errors are reported but I don’t if that has something to do with above problems
==20107== Conditional jump or move depends on uninitialised value(s)
==20107== at 0x73C075B: __strspn_sse42 (strspn-c.c:126)
==20107== by 0x7BF7829: ??? (in /usr/lib/libcuda.so.304.54)

==20107== Use of uninitialised value of size 8
==20107== at 0x7BF782E: ??? (in /usr/lib/libcuda.so.304.54)

==20107== Use of uninitialised value of size 8
==20107== at 0x4C2A00D: strcmp (mc_replace_strmem.c:538)
==20107== by 0x7C03D98: ??? (in /usr/lib/libcuda.so.304.54)

I don’t know why nor where these message come from because if I execute valgrind with --check-origins=yes then no error message appears.

I will be really greatful to someone who call help in this issue.
F41thful.

Run your code under cud-memcheck to see if dereferencing invalid pointers in an issue.

I had forgotten that. No erros with cuda-memcheck
========= CUDA-MEMCHECK
Running 1 iterations of poisson solver (gpu_fft)…
BEGIN 0
size_in_elems: 64
Free memory: 880 MB
END 0
Elapsed time statistics from 1 values (ms.): 132/132/132 +/- 0 (min/avg/max +/- std_dev).
========= ERROR SUMMARY: 0 errors

Ok. Since the unspecified launch failure obviously does not originate from the cudaMalloc() but from a previous kernel launch, insert a cudaDeviceSynchronize() call after each kernel launch and check the return code to see which of the kernels is causing the problem.

Hi Tera,
There is no kernel launches. I put the output of nvprof with --print-api-trace option. The error is produced in the first cudaMalloc and only when executed with cuda-gdb. In a normal execution, checking return values, there is no error thrown.

======== Profiling result:
Start Duration Name
2.33ms 2.00us cuDeviceGetCount
2.36ms 1.00us cuDeviceGet
2.36ms 27.00us cuDeviceGetName
2.39ms 36.00us cuDeviceTotalMem
2.43ms 1.00us cuDeviceGetAttribute
2.43ms 1.00us cuDeviceGetAttribute
2.43ms 0ns cuDeviceGetAttribute
2.43ms 0ns cuDeviceGetAttribute
2.43ms 0ns cuDeviceGetAttribute
2.43ms 24.00us cuDeviceGetAttribute
2.46ms 0ns cuDeviceGetAttribute
2.46ms 0ns cuDeviceGetAttribute
2.46ms 1.00us cuDeviceGetAttribute
2.46ms 1.00us cuDeviceGetAttribute
2.46ms 1.00us cuDeviceGetAttribute
2.46ms 0ns cuDeviceGetAttribute
2.46ms 0ns cuDeviceGetAttribute
2.46ms 0ns cuDeviceGetAttribute
2.46ms 1.00us cuDeviceGetAttribute
2.46ms 1.00us cuDeviceGetAttribute
2.46ms 0ns cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 1.00us cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 1.00us cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 0ns cuDeviceGetAttribute
2.47ms 1.00us cuDeviceGetAttribute
2.47ms 1.00us cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 1.00us cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 1.00us cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 1.00us cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 1.00us cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 0ns cuDeviceGetAttribute
2.48ms 1.00us cuDeviceGetAttribute
2.49ms 1.00us cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.49ms 1.00us cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.49ms 1.00us cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.49ms 109.00us cuDeviceGetAttribute
2.49ms 0ns cuDeviceGetAttribute
2.60ms 1.00us cuDeviceGetAttribute
2.60ms 0ns cuDeviceGetAttribute
2.60ms 1.00us cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.61ms 1.00us cuDeviceGetAttribute
2.61ms 1.00us cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.61ms 1.00us cuDeviceGetAttribute
2.61ms 1.00us cuDeviceGetAttribute
2.61ms 0ns cuDeviceGetAttribute
2.62ms 0ns cuDeviceGetAttribute
2.62ms 0ns cuDeviceGetAttribute
2.62ms 1.00us cuDeviceGetAttribute
2.62ms 108.00us cuDeviceGetAttribute
2.73ms 0ns cuDeviceGetAttribute
2.74ms 3.00us cudaGetDeviceCount
2.75ms 27.00us cudaGetDeviceProperties
2.79ms 1.00us cudaGetDeviceCount
2.82ms 1.00us cudaGetDeviceCount
2.83ms 30.00us cudaGetDeviceProperties
2.86ms 26.00us cudaGetDeviceProperties
3.17ms 10.00us cudaSetDevice
3.21ms 96.60ms cudaMalloc
99.84ms 0ns cuDeviceGet
99.84ms 56.00us cuMemGetInfo
99.91ms 111.00us cudaMalloc

profiling.txt (3.97 KB)

I have put this piece of code in the very start of my program, just the the first line in main.

double* ptr;
cudaMalloc( (void**) &ptr, 1000 );
cudaFree( ptr );

The two calls fail with a warning claiming for an api error, returning 0x04, just as above. But now, the execution continues. This only happens with cuda-gdb.

I have tried to do a minimum example, so the error could be isolated. I am using GeForce GTX285 in Ubuntu 11.10 and cuda Toolking 5 and latest drivers.

I have the following compilation command
nvcc -G -g -DDEBUG -DVERBOSE -gencode arch=“compute_13,code=sm_13” prof.c fft.cu -o prof -lm -lcufft

The files prof.c and fft.cu are uploaded with the post but they are appended to the final of the message. prof.c doesn’t use fft.cu but, only by linking toghether, when prof.c does cudaMalloc and cudaFree, in cuda-gdb appears a warning reading that a cuda API error has been detected. In fact, if I comment out in fft.cu the two branchs, no error is given. I repeat, those are the only files that are linked appart from headers file (no .c file is included via #include directive).

Files
***************************+
prof.c
===================
#include <stdio.h>
#include <unistd.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>

#include </usr/local/cuda/include/cuda_runtime.h>
int main( int argc, char* argv )
{
printf( “Starting prof.c\n” );
printf( “===============\n” );
double* ptr;
cudaMalloc( (void**) &ptr, 1000 );
cudaFree( ptr );

printf( "Ending prof.c\n" );
printf( "=============\n" );
return 0;

}

====================================================

fft.cu
===========================
#include <stdio.h>

#include <math.h>
#include <cufft.h>
#include “globals.h”
#include “fft.h”
#include “fft_solver.h”

/********************************************

  • Data definitions *
    *******************************************/
    struct gpu_data_t
    {
    int idevice;
    cufftHandle plan;
    int rank;
    int * points_each_dim;
    int num_elements_fft;
    int istride;
    int ostride;
    int idist;
    int odist;
    int batch;
    cufftType type;
    void * input_data;
    cufftDoubleComplex
    output_data;

    int total_size_in_elements;
    };

cufftResult exec_fft( struct gpu_data_t* gpu_config )
{
cufftResult result;
//it seems that next two branches has something to do with cuda api error
if( gpu_config->type == CUFFT_D2Z )
{
// result = cufftExecD2Z( gpu_config->plan,
// (cufftDoubleReal*) gpu_config->input_data,
// (cufftDoubleComplex*) gpu_config->output_data );
//
}
else if( gpu_config->type == CUFFT_Z2Z )
{;
result = cufftExecZ2Z( gpu_config->plan,
(cufftDoubleComplex*) gpu_config->input_data,
(cufftDoubleComplex*) gpu_config->output_data,
CUFFT_INVERSE );

}
else
{
    result = CUFFT_INVALID_PLAN;
}

return result;

}

fft.cu (31.6 KB)
prof.cu (835 Bytes)