Hello all.
i have a some problem for running image processing using CUDA.
original image size is 8196 * 1024 (char). device is 9800GT.
i want multiply a particular value in all pixel of the image
example>
original image ↓
(0,0)0 (1,0)255 (2,0)1 (3,0)50 (4,0)0 (5,0)0 … … . … (8196,0)0
(0,1)255 (1,1)1 (2,1)0 (3,1)0 (4,1)0 (5,1)0 .. ... . ... (8196,1)0
...
...
(0,1023) ....
multiply a particular value (0) in all pixel of the image
processed image ↓
(0,0)0 (1,0)0 (2,0)0 (3,0)0 (4,0)0 (5,0)0 … … . … (8196,0)0
(0,1)0 (1,1)0 (2,1)0 (3,1)0 (4,1)0 (5,1)0 .. ... . ... (8196,1)0
...
...
(0,1023) ....
so. i tried to run this program using CUDA but i couldn’t get reasonable processing speed.
i want to know optimal thread configuation value ( block size, grid size, thread number per block, shared memory per block etc…)
thanx for reading.