Why taking so much time?

Can you give any hints about my code ( given in the begining of this thread ) to improve performance?

Can any one has any idea please help I have the same problem.

Hi Manjunath Gudisi,

Actually what is the problem in my code is accessing the value of Value. If I replace this with shared array that will nothing just taking value from texture memory and discarding it. then I get time 20 ms which is improvement. But till now I am not sucess to use shared array in this kernel , one of the problem is here value is unsigned char.

Can any body say how to use shared memory in my kernel inplace of value ?