The above code creates a 3D array from memory(already allocated in GPU - o/p of a kernel). The size of the memory is ~105MB. The profiler shows that, in average the D to A function takes 23ms to complete. This corresponds to ~4GBps bandwidth in GPU memory ( 105MB/23ms). Which we think is very low.
IS there any clue to what we should look for.? [ our card C1060 ]
The above code creates a 3D array from memory(already allocated in GPU - o/p of a kernel). The size of the memory is ~105MB. The profiler shows that, in average the D to A function takes 23ms to complete. This corresponds to ~4GBps bandwidth in GPU memory ( 105MB/23ms). Which we think is very low.
IS there any clue to what we should look for.? [ our card C1060 ]