How to read data during kernel execution

I want to read data (with CPU) while my GPU kernel is executing, but this looks almost impossible task. 1) It doesn’t work through cudaMemcpyAsync and 2) works only sometimes through cudaHostAlloc(… cudaHostAllocMapped). At least on Windows 10 since people report 1) works on Linux visual studio - Why GPU memory updates are not read? - Stack Overflow

What I mean by “sometimes” in 2) is that if we take code from cuda - How can I check the progress of matrix multiplication? - Stack Overflow, it can break very easily. After removing printf from kernel, by replacing “for (int i = 0; i < 10; i++)” with “for (int i = 0; i < 10000000; i++)” or “while (true)”. Or after adding additional file to the project (even though added code is not called).

Is this incorrect idea, to read data from kernel in real-time? Is it hard for drivers?

I used Windows 10 Pro 19044.2251, GF 2060 mobile, driver 517.00, CUDA Toolkit 10.2 and Visual Studio 2019 16.0.2. Installed CUDA Toolkit 11.7 instead - nothing improved, but also kernel debugging has broken. Before it stopped on breakpoint in kernel, if it is not run from different stream (i.e. worked for kernel<<<1, 128>>>(params); and not for kernel<<<1, 128, 0, stream2>>>(params);). But still works in another project and project from the second link.

Reading data from a kernel is pretty easy on linux. Your code works on linux (for myself and another as indicated in comments to your question). I’m not aware of any needs beyond your posted code.

On windows WDDM, it is “more difficult”. Some of the reasons for that are indicated in the question you linked. The second kernel listed there does not have any printf in it, and works fine as attested by myself and others historically, even on Windows WDDM.

1 Like