new to cuda… I’m just studying the platform and make some example on it.
So I’m making a cuda program , for the moment, that load an image and it apply on a median/middle filter.
I made the program without cuda and now I’m working on make it works with it.
I think it should look like this:
1 load an image and put into an array in shared memory (inputArray).
2 make a second array (outputArray) that it will contain the result.
3 execute the kernel code
4 every GPU thread compute on a region of the array (few elements of inputArray)
5 makes some calculation on thread and put the result on the outputArray.
6 Read the outputArray and write on image.
I don’t know if it is the right way to proceed but any comment on this will be appreciated.
I’m thinking to make a local copy per thread in the kernel code… so it shuld be a new point:
4.5 put the elements of the region on the local memory for the thread
What do you think about this?
What do you think about everything?
Thanks a lot to anyone who read and reply to this post.