My code structure is like below:
int *arg1, *var1, *arg2;
cudaStream_t cstream1;
cudaStreamCreate(&cstream1);
cudaKernel <<< gridsize, blocksize, 0, cstream1 >>>(arg1, arg2);
cudaMemcpyAsync( var1, arg2, size, cstream1);
But this gives run time cuda Error : invalid argument.
If I use cudaStreamSynchronize(cstream1); after cuda kernel then it also give same error.
But if I use cudaStreamSynchronize(cstream1); and then copy the memory using cudaMemcpy() it runs.
Why???
I’m using 9400M.
Please help.