launch failure -- due to call by reference?

Hello. I am implementing some spline algorithms using Cuda, and have run into a rather annoying problem. The offending function is appended at the bottom. The symptom is the dreaded “unspecified launch failure”, even with block sizes as small as 1x1. Also, this application is running at 60FPS, so it is not the 5 second timeout :-)

Now on to discussion of the problem. As can be seen from the function signature. I am passing an int k by reference. However, I am resetting k for every loop iteration as I am only interested in its final value outside of this function. In outer words, is the equivalent of a GLSL ‘out’ variable. The offending line by the way is the x = t[k+1] statement inside an if-test.

If I however stop passing k by reference (by removing the ‘&’), the function happily runs along and produce the correct x-value. Of course I do not get the ‘k’ value.

Anyone who can shed some light on what is going on here?



float findMinAbsVal( float* t, float* c, int& n,

                     const float eps, const int maxn, float cutoff, const unsigned int d, int& k)



    float x = NO_ROOT;

    for (; n<maxn; ++n ) {

       k = 0;

        float minval = fabs(c[0]);

        for ( int i=0; i<n; ++i) {

            if (fabs(c[i])<minval){

                minval = fabs(c[i]);




       if (minval>cutoff) {

            x = NO_ROOT;



       float diff = t[k+d]-t[k+1];

        if (diff<eps) {

            x = t[k+1];



       float dck  = delta(t,c,k,d);

        float dckp = delta(t,c,k+1,d);

       float beta = dckp/(dckp-dck);

        x = (knotSum(d-1,t,k+1) - beta*diff)/(d-1.0);

       float e = max(x,t[k+d-1])-min(x,t[k+2]);

        if (e<eps) {



       // otherwise, insert x



    return x;


CUDA is a C API. There are no references in C.


Good point - but replacing the references with pointers yields exactly the same problem.

Also, the compiler is compiling it, so it seems like Cuda goes a little beyond “standard” C.

What platform are you on? Run nvcc -v to see which actual compilers get called. If you are using VS, the compiler will not complain about references in C code unless you restrict it. The reference however will generate a function signature that holds rubbish data for the GPU ALU. You won’t notice that in the emulator because the CPU can handle that.

Pass your k and n parameters by value. Do you need their values returned at all? They seem to be just local counters/maxidx values.


I am using Linux. And yes, I do need the n and k values. n holds the size of my spline-knot vector and k holds the position in the knot vector of the smallest value. These are very important at a later stage of the algorithm for detecting singularities.

I have also rewrited the functions to take int*, and the exact same problem appear. I can pass in and out the value of n just fine, doing it for k triggers the launch failure.

What is the source for n and k in the calling routine? There is a known bug (fixed in next release) when you pass pointers to shared variables. Is k a pointer to shared?


k is indeed used as an index into a shared variable. I have managed to “refactor” the code enough to not trigger the bug, at the expense of a little more calculation at a later stage. Let us hope it is variant of the above that has been biting me.

Thanks for all your input!