I am using cudaMemcpy to copy an image to cuda kernel for execution and it is consuming around 78% of the API calls time. I checked it using nvprof. Is there a way I can reduce this overhead or is there any other API to execute my application faster.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| cudaMemcpy takes 30% of my project time. | 5 | 4493 | July 20, 2009 | |
| cudaMemcpy execution time | 5 | 6981 | June 17, 2010 | |
| cudaMemcpy is slow the first time used in a loop | 2 | 1871 | June 16, 2020 | |
| cudaMemcpy costs too much time | 1 | 84 | October 11, 2024 | |
| cudaMemcpy sometimes very slow | 1 | 1063 | May 21, 2018 | |
| copy memory slow? | 2 | 4873 | February 12, 2009 | |
| nvprof and difference in time reported | 4 | 1213 | September 16, 2017 | |
| Inconsistent cudaMemcpy Timing cudaMemcpy and kernel timing hiccups at 1 second intervals | 1 | 1158 | October 6, 2010 | |
| cudaMemcpy latency between sequencial calls | 1 | 107 | May 1, 2025 | |
| Is there any way to copy data from device to host more efficiently in this case? | 4 | 1160 | December 14, 2018 |