I have two arrays. Two big, 10.4M elements each arrays (one dimension, 32-bit integers), taking up ~40MB of ram apiece. They have a few simple arithmetic operations preformed on them. It’s the same operations for every element, and the resultant has the same index (e.g. A[i] % B[i] = C[i]). For such a few simple operations per element, is CUDA going to significantly speed everything up? I will be able to copy the entire arrays into GPU memory once, have the threads complete, and copy the resultant back to system memory for further manipulation.
Cheers and Happy New Year!