Just using the CUDA driver API. I don’t want system device driver to block the CPU while CUDA kernel is running on GPU, as CPU is also doing other computation work.
The solution I found is to use CUDA driver api and set the CU_CTX_SCHED_YIELD flag in cuCtxCreate API. Is it correct? :unsure:
But it seems useless, the system is still blocked when a cuda kernel is running. There must be some windows kernel mode polling ongoing, which is not desired.
My system is Windows XP SP2 32-bit, CUDA 2.0, 9800 GTX+ card with device driver 180.48_geforce_winxp_32bit_english_whql.exe.
Does anyone know how to solve this?
It does exactly what it claims–spinning and checking the lock followed by a yield to allow any other waiting threads to run. I’m guessing it will use as much CPU time as you have but shouldn’t render things unusable. We’re looking into other synchronization options for the future.
The only two synchronization options supported by cuCtxCreate are CU_CTX_SCHED_SPIN and CU_CTX_SCHED_YIELD. CU_CTX_SCHED_AUTO is heuristic based on different conditions.
My CUDA kernel needs 2 seconds to run. My experiment show that whatever cuCtxCreate flag I use, the CPU is completely unuseable during CUDA kernel execution (e.g. the CPU usage page of Task Manager (taskmgr.exe) just freeze), it looks like all user mode threads are blocked.
My only intent is that the CPU is also useable (by other threads of the program or other programs) when CUDA kernel is running, below is my code:
Is there any way to meet the requirement with current CUDA implementation. I assume cuCtxSynchronize works like the Windows API WaitForSingleObject.
void DoTheWork(CUfunction Function)
if (cuMemAlloc(&DevicePtr, 1024) == CUDA_SUCCESS)
unsigned int Result;
if (cuParamSeti(Function, 0, DevicePtr) == CUDA_SUCCESS)
if (cuParamSeti(Function, 4, 0x11111111) == CUDA_SUCCESS)
if (cuParamSeti(Function, 8, 0x22222222) == CUDA_SUCCESS)
if (cuParamSetSize(Function, 12) == CUDA_SUCCESS)
if (cuFuncSetBlockShape(Function, 1, 1, 1) == CUDA_SUCCESS)
if (cuLaunchGrid(Function, 1, 1) == CUDA_SUCCESS)
if (cuCtxSynchronize() == CUDA_SUCCESS)
if (cuMemcpyDtoH(&Result, DevicePtr, 4) == CUDA_SUCCESS)