I developed a cuda lib that can be called by C under VS2008 win7 64-bit.

I found that I cannot allocate large memry such as 3000*3000. for instance,

cudaMalloc((void**) &g, sizeof(int)*size);

cudaMemcpy(g, a, sizeof(int)*size, cudaMemcpyHostToDevice);

<<<>>> // assign each member to be 4

If more memory used , I will get all zero about g, only small arrays such as 2850*2850 can I get all members in g to be 4

Does any one met this kind of problem?

many thanks!