GTX 650 shared memory problem (compute shader)

Hello!

I’m experiencing weird behavior regarding shared variables. I’m writing a tessellation shader for nurbs surfaces, however some of the shared variables don’t get written:

shared vec4 CoeffsU[52];
shared vec4 CoeffsV[52];
shared float Blossom[64];

// ...

if( gl_LocalInvocationID == 0 )
    FillCoeffTable(CoeffsU, degreeU, numControlPointsU, numknotsU, knotsU);

if( gl_LocalInvocationID == 1 )
    FillCoeffTable(CoeffsV, degreeV, numControlPointsV, numknotsV, knotsV);

barrier();

// ...

// NURBS calculation, CoeffsV is garbage
for( ... )
{
    for( ... )
    {
        index1 = spanU * MAX_ORDER + k;
        index2 = spanV * MAX_ORDER + l;

        nom = dot(CoeffsU[index1], polyU) * dot(CoeffsV[index2], polyV);
        nom = nom * weightsU[cpU] * weightsV[cpV] * denom;

        pos += ...;
    }
}

If I use CoeffsU in place of CoeffsV, then it’s “ok”.
If I swap the declaration order of CoeffsU and CoeffsV, then CoeffsU becomes garbage and CoeffsV gets valid (which means the input parameters are correct).

What might cause this? As I understand shared memory should be at least 32 KB, and I’m using just 2…

ps.: I can give compilable code later, if needed.

Tried the following:

if( gl_LocalInvocationID == 0 )
    FillCoeffTable(CoeffsU, degreeU, numControlPointsU, numknotsU, knotsU);

barrier();

if( gl_LocalInvocationID == 0 )
{
    for( int i = 0; i < 52; ++i )
        CoeffsV[i] = CoeffsU[i];
}

barrier();

No effect.

Now another interesting bug (assuming I removed CoeffsV, and using CoeffsU only):

shared vec4 CoeffsU[52];
//shared vec4 CoeffsV[52];    // removed, using CoeffsU instead
shared float Blossom[64];

As I mentioned this case is “ok”. Now swap the declaration order of CoeffsU and Blossom → CoeffsU becomes garbage.

Finally I was able to make it work… I found out the following things (1):

void FillCoeffTable(vec4 coeffs[52], int degree, int numcvs, int numknots, float knots[MAX_KNOTS])
{
    // ...
}

FUNCTION COMPILATION DOESN’T WORK! I expanded this function inline (in place of calling), and the swap bug mentioned above disappeared.

Now I hoped that this would fix the original bug too (with CoeffsV being garbage), but no. This is also an error (2):

if( gl_LocalInvocationID == 0 )
{
    // calculate CoeffsU inline
}

if( gl_LocalInvocationID == 1 )
{
    // calculate CoeffsV inline
}

Doesn’t work!! The correct solution is now:

// WORKING
if( gl_LocalInvocationID == 0 )
{
    // calculate CoeffsU inline
    // calculate CoeffsV inline
}

This is very very sad…