i have a question about those examples, in both examples the dimension either of the matrices or of the filter kernel where fixed, so that it was pretty straightforward in which manner to use shared memory. my question is if it is possible to use shared memory the same way if you want to use arbitrarily sized matrices or filter kernels?
i have been thinking quite a lot about that problem, because i have to realise functions with arbitrarily sized matrices or filter kernels, and do not find a feasible solution. what would be the performance penalty if i would realise all of it from texture memory instead of shared. i know there is a graph comparing those two possibilities in the convolutionSeperable sdk example, but does this rely on up to date hardware? my solution is supposed to be running exclusively on fermi architecture. what are your thoughts?
thanks in advance