I’ve been programming in some time and do not stop Cuda meet the same problems. The improvements are said in the calculations are not true, because it does not take into account the data transfers.
In my case I fail to improve the transfer that I make. This is the transfer of 10 arrays of different sizes of elements doubles.
I made it through 10 cudamemcpy but the transfers take me over 80% of computation time.
I made a modification attempted to unify the arrays in one and utiliziar a shift, but the improvement does not even reach 8%.
The next option that I am asking is whether any success cudamemcopy2D memcopy style or memcopyArraytoArray that might improve the transeferencia.
Can anyone give me a light beam? or definitely have to assume that the use of transfer cuda is very bad.
I have been programming in CUDA and I keep stumbling upon the same problema. I cant seem to get the improvement mentioned on the sdk and manuals as they dont take into account the memory transfer times.
I am trying to transfer 10 different arrays of double type to the gpu. I have done this using 10 cudamemcpys which take me over 80% of the calculation time.
I made a modification trying to unify the arrays all in one, and using a offset, but i only get an 8% of improvement.
The next option that i am looking for is if there is another function similar to cudamemcpy2D or memcopyArraytoArray that can improve the transfer.
Â¿ Can anyone shed any light on This matter, or do i have to asume that cuda has such a great set back such as memory transfer?
This is an example of my code :