cudaStream problem

My code structure is like below:

int *arg1, *var1, *arg2;
cudaStream_t cstream1;
cudaStreamCreate(&cstream1);
cudaKernel <<< gridsize, blocksize, 0, cstream1 >>>(arg1, arg2);
cudaMemcpyAsync( var1, arg2, size, cstream1);

But this gives run time cuda Error : invalid argument.
If I use cudaStreamSynchronize(cstream1); after cuda kernel then it also give same error.

But if I use cudaStreamSynchronize(cstream1); and then copy the memory using cudaMemcpy() it runs.

Why???

I’m using 9400M.

Please help.