 # launching threads for varying image size

what is the correct way to launch threads for varying image size? can we launch 2D threads for same?

There are a ton of ways to do it “correctly”, mostly depending on how how much data per thread you wish to process.

In the school book example of 1 pixel per thread you would do something like:

``````// one block with 32*16 = 512 threads
const int DIM_X = 32;
const int DIM_Y = 16;

int X_BLOCKS = (DIM_X + cols  - 1)/DIM_X;
int Y_BLOCKS = (DIM_Y + rows  - 1)/DIM_Y;

dim3 grid(X_BLOCKS, Y_BLOCKS);
dim3 block(DIM_X, DIM_Y);

your_kernel<<< grid, block>>> (....., rows, cols);
``````

Inside “your_kernel” you would also need a boundary check:

``````__global__ void your_kernel(...., int rows, int cols)
{

int tx = threadIdx.x + blockIdx.x*blockDim.x;
int ty = threadIdx.x + blockIdx.y*blockDim.y;

if( tx < cols && ty < rows)
{

// your neat processing happens here

}

}
``````

Few more queries on this :

As per your code snippet if we calculate,considering 500 * 500 size for rows and cols of image,

X_BLOCKS = (32+500-1)/32 = 531/32 =16.59

Y_BLOCKS = (16+500-1)/16 = 515/16 =32.18

So how this float number is considered for X_BLOCKS and Y_BLOCKS ?X_BLOCKS will be 16 or 17 ?

int tx = threadIdx.x + blockIdx.xblockDim.x;
int ty = threadIdx.x + blockIdx.y
blockDim.y;

should be

int tx = threadIdx.x + blockIdx.xblockDim.x;
int ty = threadIdx.y + blockIdx.y
blockDim.y;

IS this correct ?

how to find exact gpu usage percentage on jetson TK1?

Yes that is correct, i had a typo.