cuda prog wrking only in emulation mode

i have cuda driver+toolkit v3.0 installed on my laptop with an nvidia g210m. i tried running the following very simple array reversal program:

#include<stdio.h>
#include<cuda.h>
global void rev(float *a,float B)
{
int idx=blockDim.x
blockIdx.x+threadIdx.x;
b[idx]=a[100-idx-1];

}
void main()
{
int n,i;
float *a,*b,*c;

printf("hi");
a=(float *)malloc(sizeof(float)*100);
c=(float *)malloc(sizeof(float)*100);
for(i=0;i<100;i++)
{
	a[i]=(float)(i+1);
}

int nthreads=4;
int nblocks=100/nthreads;
cudaMalloc((void**)&b,sizeof(float)*100);

rev<<<nblocks,nthreads>>>(a,B);
cudaMemcpy(a,b,sizeof(float)*100,cudaMemcpyDeviceToHost);
for(i=0;i<100;i++)
{
	printf("%f",a[i]);
	printf("\n");
}


free(a);
cudaFree(B);

}

the problem is that when i compile this program normally(nvcc file.cu) and run it, it gives an incorrect output(1,2,3,…100) but with the -deviceemu option it runs properly(100,99,98,…1). wat is the reason for this?

You need to copy a to the device before calling the kernel…

You need to copy a to the device before calling the kernel…

omg!!! cant believe i forgot it! :"> i need to go and dunk my head in a bucket of cold water. it worked. thanx a lot tera

omg!!! cant believe i forgot it! :"> i need to go and dunk my head in a bucket of cold water. it worked. thanx a lot tera