problem in memory copy to symbol while using MACOSX

I tried copy a structure to constant memory using cudaMemcpyToSymbolAsync in my mac laptop that has GTX 650M (CC 3.0). I compiled it using -gencode arch=compute_20,code=sm_20 and everytime i tried to run, it showed this msg: CUDA Runtime API error 13: invalid device symbol. (i’m using cudaCheckErrors from SDK). the copy was done as following code

for(int dev = 0; dev < ndevs; ++dev) {    //ndevs is number of devices available
        checkCudaErrors( cudaSetDevice(dev) ); 

        checkCudaErrors(
        cudaMemcpyToSymbolAsync(cmem, &cmem_host[dev], sizeof(ConstMem), 0, //my cmem_host is array of structure resided in the host memory
        cudaMemcpyHostToDevice)
        );
    };

but if i run it in my workstation that GTX690 (CC 2.0) running on Ubuntu 10.04LTS, It run well without any error msg.
is there any way to fix this? is because of the cuda in macosx?

thank you

Your GPU has compute capability 3.0 (sm_30) but you build the code for compute capability 2.0 (sm_20). Best I can tell the command line is correctly set up to produce both SASS (machine code) for sm_20 as well as PTX for JITing on parts with higher compute capability. So this should result in an executable that can run successfully on sm_30 (by JITting from PTX). But it would be useful to eliminate JIT as a factor for the time being.

Do things work any better on the Mac laptop if you build the code there for sm_30, i.e. when you build with

-gencode arch=compute_30,code=sm_30