FWIW, CUDA 12 has introduced new CDP functionality. This may also be of interest.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Passing dynamically allocated memory in kernel to sub kernel via dynamic parallelism | 6 | 693 | May 3, 2019 | |
cudaSynchronizeDevice() returns error code 6 | 7 | 8606 | June 16, 2011 | |
Will calling a kernel from a kernel help on performance? | 9 | 2216 | December 23, 2022 | |
Got wrong result when not using cudaDeviceSynchronize in threads | 6 | 842 | February 1, 2024 | |
unable to get the cpu and gpu to run in parallel | 34 | 23205 | October 7, 2010 | |
cudaDeviceSynchronize needed between kernel launch and cudaMemcpy ? | 15 | 16321 | September 29, 2017 | |
Dynamic Parallelism synchronization between kernel launches | 5 | 81 | February 5, 2025 | |
What can't you do in CUDA that you'd like? Requests for the future | 407 | 134586 | May 26, 2010 | |
CUDA Dynamic Parallelism, the code doesn't run as expected | 18 | 388 | July 20, 2024 | |
CUDA very slow performance | 21 | 16764 | March 6, 2020 |