Tiled 2D convolution algorithm as slow as untiled 2D convolution algorithm

This issue has been solved.
Setting the mask size and array sizes to be constant shows my tiled algorithm to be consistently twice as fast.

The cause was pretty obvious had I thought about it, and also explains why this issue didn’t exist with my 1D tiled and untiled algorithms. The issue came down to the range of random values for my maskWidth - which I set to be any odd number from 3 to 15 inclusive for both algorithms. This meant that sometimes my tiled program would use a mask of size 15x15 (225 elements!), and sometimes my untiled program would use a mask of size 3x3 (just 9 elements). This sort of levelled the playing field in an unwanted fashion for the two programs. My explanation for why this didn’t show the tiled 1D convolution algorithm being slower than the 1D untiled algorithm is probably because of the fact that a²/b² shrinks a lot faster than a/b as b increases (b is the maskWidth for our tiled algorithm, and a is the same for our untiled algorithm). This probably also means that the playing field was also being levelled for my 1D algorithms - just much less noticeably so.