launching threads for varying image size

what is the correct way to launch threads for varying image size? can we launch 2D threads for same?

There are a ton of ways to do it “correctly”, mostly depending on how how much data per thread you wish to process.

In the school book example of 1 pixel per thread you would do something like:

// one block with 32*16 = 512 threads
const int DIM_X = 32;
const int DIM_Y = 16;

int X_BLOCKS = (DIM_X + cols  - 1)/DIM_X;
int Y_BLOCKS = (DIM_Y + rows  - 1)/DIM_Y;


dim3 grid(X_BLOCKS, Y_BLOCKS);
dim3 block(DIM_X, DIM_Y);

your_kernel<<< grid, block>>> (....., rows, cols);

Inside “your_kernel” you would also need a boundary check:

__global__ void your_kernel(...., int rows, int cols)
{

int tx = threadIdx.x + blockIdx.x*blockDim.x;
int ty = threadIdx.x + blockIdx.y*blockDim.y;

if( tx < cols && ty < rows)
{

// your neat processing happens here

}


}

Thanks for your response…

Few more queries on this :

As per your code snippet if we calculate,considering 500 * 500 size for rows and cols of image,

X_BLOCKS = (32+500-1)/32 = 531/32 =16.59

Y_BLOCKS = (16+500-1)/16 = 515/16 =32.18

So how this float number is considered for X_BLOCKS and Y_BLOCKS ?X_BLOCKS will be 16 or 17 ?

inside your kernel,

int tx = threadIdx.x + blockIdx.xblockDim.x;
int ty = threadIdx.x + blockIdx.y
blockDim.y;

should be

int tx = threadIdx.x + blockIdx.xblockDim.x;
int ty = threadIdx.y + blockIdx.y
blockDim.y;

IS this correct ?

how to find exact gpu usage percentage on jetson TK1?

Thanks in advance.

Yes that is correct, i had a typo.

Please define usage percentage?

Compute utilization can be retrieved from the Visual profiler.
Bandwidth utilization can be retrieved from the Visual profiler.

how to remote profile an application in visual profiler using host machine?
after running application by remote profiling with visual profiler got an error:the application being profiled received a signal and profiling failed.

Thanks.

Was the application killed or maybe something happened to its parent process? Did you try with a different application just to be sure it’s not the application itself?