@txbob, the link was spot on. I managed to fix the issue by adding a cudaEventQuery() call before querying the variable value.
https://github.com/fangq/mcx/commit/6144ee479cba7cd4b5abd316626c37268c0bccfe
It appears that this was also worked for other people who had the same issue on windows.
again, thanks a lot for the pointers!