heavy usage of macros cause warning: Cuda API error detected: cudaLaunch returned (0x2)

i have encountered a new problem,
when when i add and use the following macro in the code

_I_DEBUG(2,"%s",stringParam.c_str());

it case the error: cudaLaunch returned (0x2) when running the kernel
the definition of the macro:

#define _I_DEBUG(y, ...) if (y <= _I_DLEVEL) {printf("I: "); printf(__VA_ARGS__);}

when i comment the line the kernel build and launch fine. and uses 39 registers.

any idea what can cause it?

There is way too little information here to help diagnose the problem. When seeking help with debugging run-time failures, it is highly advisable to post buildable and runnable, self-contained code that reproduces the problem, so others can experiment with the code. The smaller the code the better. Also you would want to mention how the code is compiled (exact nvcc command line) and on what GPU and OS platform you are running the code.

In any case, make sure that your code checks the status return of every API call, and every kernel launch, otherwise it can easily happen that the source of the problem is far away from the point of failure, and much harder to find.

i know the best is to post a code, but it is not possible,
i tried to reproduce it in a small case scenario but the error did not occurred.

  • the nvcc command line is default i made no changes except few include paths
  • 3.16.0-31-generic #43~14.04.1-Ubuntu SMP Tue Mar 10 20:13:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  • i3
  • 2x980 gtx ti

i have done it no previous errors,

the cude output file is a bit large 16.6MB