Please read the page 8 first para (above Figure 6) in the convolution separable document. they are using 16x48 threads in a thread block. My question is how is that possible. we can load maximum 512 threads in a thread block but they are using 16x48=768 threads. or i havent understood it correctly. Please tell me what is the size of thread block in this case.
In SDK the thread block size they used for column is
#define COLUMNS_BLOCKDIM_X 16
#define COLUMNS_BLOCKDIM_Y 8
And for row they used
#define ROWS_BLOCKDIM_X 16
#define ROWS_BLOCKDIM_Y 4
i am not able to understand 16x48 thread block given in pdf
Thanks in advance