i’m using CUDA 2.3 with C++ to perform some calculations on pictures. for some calculations i treat all pixels in parallel but in some others i’m obliged to treat line by line.
My question is how can i put one line per thread ? and run all the lines in parallel?
Set up your thread blocks and grid as 1 dimensional based on the number of rows you want to process. Then in your kernel code consider each thread id as a row index. Perform some loop to operate over the entire line.
Something like:
for(int i = threadIdx * rowWidth; i < (threadIdx * rowWidth + rowWidth); i++)
{
//do stuff
}
Set up your thread blocks and grid as 1 dimensional based on the number of rows you want to process. Then in your kernel code consider each thread id as a row index. Perform some loop to operate over the entire line.
Something like:
for(int i = threadIdx * rowWidth; i < (threadIdx * rowWidth + rowWidth); i++)
{
//do stuff
}