Async memory problems

Vista 64, SDK 2Beta2 32 bit,

I am trying to free up CPU for computation, but when I do cudaMemcpyAsync it returns error “invalid argument”. I was trying to compare this to asyncAPI sample, but failed :-S

Here is how I allocate page-locked memory:

cudaMallocHost((void**)&gpu[i].common_h, sizeof(md5_data));

I’ve double checked that pointer was not changed anywhere after allocation.

Any ideas what may cause the problem?

Also asyncAPI throws alos of first-chance exceptions - is that normal?

 cudaEvent_t stop;

 CUDA_SAFE_CALL( cudaEventCreate(&stop)  );

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	CUDA_SAFE_CALL( cudaThreadSynchronize() );

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	printf("1: %f\n", getTimeDelta(tmp));	

	cudaMemcpy(data_d, gpu[device_id].data_h, sizeof(int)*4*thread_n*grid_n*gpu[device_id].keys_per_thread, cudaMemcpyHostToDevice);  

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	printf("2: %f\n", getTimeDelta(tmp));	

  	md5_gpu_bruteforce_thread<<<grid, threads>>>(data_d, common_d, perm::pwd_len, perm::gpu_len, perm::charset_len, gpu[device_id].keys_per_thread);

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	printf("3: %f\n", getTimeDelta(tmp));	


 cudaMemcpyAsync(gpu[device_id].common_h, common_d, sizeof(md5_data), cudaMemcpyDeviceToHost, NULL);  

  printf("%s\n", cudaGetErrorString(cudaGetLastError()));

  while( cudaEventQuery(stop) == cudaErrorNotReady )




  CUT_CHECK_ERROR(CUDA_SAFE_CALL( cudaEventDestroy(stop) ));

This is still actual :-S
Tried to recompile for compute compatibility 1.1, didn’t helped :-S

Still not found solution :-S

Are you using host-pinned memory (allocated via cudaMallocHost)? The Async calls require that CPU-side memory is allocated this way.

Yes, I was using cudaMallocHost((void**)&gpu[i].common_h, sizeof(md5_data));

The problem is that it was in .cpp file. It appeared that cudaMallocHost does not work from non-nvcc compiled files. When I moved that to .cu files everything finally worked :-)

BTW, you can also get this type of error if you access pinned memory, but from a different cuda context than the one which performed the allocation. To work around this, you must use portable pinned memory:

cudaHostAlloc(&result, bytes, cudaHostAllocPortable)

Overhead from using portable pinned memory is fairly small versus regular pinned memory.


I am getting the same error: invalid argument.

What do you mean by a different cuda context? Do you mean two cpu threads? One is allocating memory and the other is performing


I have memory allocation in a .cu file and memcopy is performed in another .cu file. Can this be a problem?


Thank you James!!

You saved my day! actually I was allocating cpu memory with the usual cudaMalloc!