The NVIDIA deviceQuery program had reported that my GeForce RTX 3060 Laptop GPU has 5 copy engines last year on the CUDA Runtime of version 12.3. Now, it says that only one copy engine exists on the same GPU for the 12.8 driver version. Actually, according to the profiling from the Nsight Systems, the asynchroous H2D and D2H data transfers do not occur concurrently in my CUDA program, which was OK last year (I mean concurrent data transfers were done successfully).
How can I correct this problem?
If it were me, I would try the newest/latest driver for the GPU. You don’t mention if on windows or linux, but if on linux I would try both drivers (open, legacy/proprietary), also. If the problem persists, then file a bug.
Thanks for you comment.
The OS is Windows 11 now with the CUDA SDK 12.8/12.9, and used to be Windows 10 with 12.8 last year when everything was OK.
I removed the CUDA SDK 12.8, installed the newest GPU driver from NVIDIA, and reinstalled the CUDA SKD 12.9.
However, the problem remains the same. Looks like went back to Fermi or before…