I want to read data (with CPU) while my GPU kernel is executing, but this looks almost impossible task. 1) It doesn’t work through cudaMemcpyAsync and 2) works only sometimes through cudaHostAlloc(… cudaHostAllocMapped). At least on Windows 10 since people report 1) works on Linux visual studio - Why GPU memory updates are not read? - Stack Overflow
What I mean by “sometimes” in 2) is that if we take code from cuda - How can I check the progress of matrix multiplication? - Stack Overflow, it can break very easily. After removing printf from kernel, by replacing “for (int i = 0; i < 10; i++)” with “for (int i = 0; i < 10000000; i++)” or “while (true)”. Or after adding additional file to the project (even though added code is not called).
Is this incorrect idea, to read data from kernel in real-time? Is it hard for drivers?
I used Windows 10 Pro 19044.2251, GF 2060 mobile, driver 517.00, CUDA Toolkit 10.2 and Visual Studio 2019 16.0.2. Installed CUDA Toolkit 11.7 instead - nothing improved, but also kernel debugging has broken. Before it stopped on breakpoint in kernel, if it is not run from different stream (i.e. worked for kernel<<<1, 128>>>(params); and not for kernel<<<1, 128, 0, stream2>>>(params);). But still works in another project and project from the second link.