Copying to Constant Memory with Driver API Not Working

Had working code in Runtime API; had to rebuild as Driver API.

in Runtime API, I used:
len = nc * sizeof (DTYPE);
rc2 = cudaMemcpyToSymbol ( “cuGMean”, GMeans, len, 0, cudaMemcpyHostToDevice);
rc5 = cudaMemcpyFromSymbol ( myGMeans, “cuGMean”, len, 0, cudaMemcpyDeviceToHost ); // read back to check the copying

and this worked fine, the kernel got the values etc…

so now in Driver API version, I am doing this:
len = nc * sizeof (DTYPE);
rc4 = cuModuleGetGlobal ( &pDev, &len2, cuModule, “cuMean”);
checkCUDAError (“rc4”);
rc2 = cuMemcpyHtoD ( pDev, GMeans, len);
checkCUDAError (“rc2 cuMean”);
both rc4 and rc2 are zero; checkCUDAError reports no error. BUT, the device kernel shows me that the entries for cuMean are all 0.

Kernel side code snippet:
constant float cuGMean[HSI_LIVE]; // 60 + 1800 + 3600 = 5500 4bytes == 22Kb
constant float cuSigMu[MAXSIGNATURES
constant float cuCInv [HSI_LIVE*HSI_LIVE];
extern “C” global
void kernel_Spectral (V4IMGHANDLE *pIn, V4IMGHANDLE pOut, unsigned int Ntargets,
unsigned int nc, unsigned int nc_out, unsigned int nss, int nsa, int na, float FNormA)
// include special memory declares here
shared DTYPE acc [HSI_LIVE];
shared float saveFmax, saveKey, an_acc [HSI_LIVE

// debug info into the output array:
pOutDeb = (int *)(pOut);
*pOutDeb++ = Ntargets;
*pOutDeb++ = nc;
*pOutDeb++ = __float_as_int (cuGMean[0]);
*pOutDeb++ = __float_as_int (cuGMean[1]);
*pOutDeb++ = __float_as_int (cuGMean[2]);
*pOutDeb++ = __float_as_int (cuGMean[3]);

and when i look at the output back on the Host I see that the entries for cuGMean are all 0 and in fact the results of the kernel running are all zero which can be explained by the fact that cuGMean is all zeroes.

Any ideas?