This is my first post, please excuse me for any mistakes.
I tried to run a kernel code on a large file (200 Mb). No. of threads in a block is 256. Each thread does some computation on 16 bytes of data. I experimented with different file sizes, when I came across a interesting result:
The overall execution time of the program for file size greater than 256 Mb the execution time is around 8 times slower than for file size less than that. I am not able to comprehend it.
Here is the information about my video card:
global memory: 512Mb
No. of MP: 2
Maximum No. of threads per block: 512
Maximum sizes of each dimensions of a grid: 65535 * 65535 * 1
Thanks in advance.