About DCT Kernel , some questions about CUDA

__constant__ float DCTv8matrix[] = {

	0.3535533905932738f,  0.4903926402016152f,  0.4619397662556434f,  0.4157348061512726f,  0.3535533905932738f,  0.2777851165098011f,  0.1913417161825449f,  0.0975451610080642f, 

	0.3535533905932738f,  0.4157348061512726f,  0.1913417161825449f, -0.0975451610080641f, -0.3535533905932737f, -0.4903926402016152f, -0.4619397662556434f, -0.2777851165098011f, 

	0.3535533905932738f,  0.2777851165098011f, -0.1913417161825449f, -0.4903926402016152f, -0.3535533905932738f,  0.0975451610080642f,  0.4619397662556433f,  0.4157348061512727f, 

	0.3535533905932738f,  0.0975451610080642f, -0.4619397662556434f, -0.2777851165098011f,  0.3535533905932737f,  0.4157348061512727f, -0.1913417161825450f, -0.4903926402016153f, 

	0.3535533905932738f, -0.0975451610080641f, -0.4619397662556434f,  0.2777851165098009f,  0.3535533905932738f, -0.4157348061512726f, -0.1913417161825453f,  0.4903926402016152f, 

	0.3535533905932738f, -0.2777851165098010f, -0.1913417161825452f,  0.4903926402016153f, -0.3535533905932733f, -0.0975451610080649f,  0.4619397662556437f, -0.4157348061512720f, 

	0.3535533905932738f, -0.4157348061512727f,  0.1913417161825450f,  0.0975451610080640f, -0.3535533905932736f,  0.4903926402016152f, -0.4619397662556435f,  0.2777851165098022f, 

	0.3535533905932738f, -0.4903926402016152f,  0.4619397662556433f, -0.4157348061512721f,  0.3535533905932733f, -0.2777851165098008f,  0.1913417161825431f, -0.0975451610080625f

};

what is that matrix for ?

__global__ void CUDAkernel1DCT(float *Dst, int ImgWidth, int OffsetXBlocks, int OffsetYBlocks)

{

    // Block index

	const int bx = blockIdx.x + OffsetXBlocks;

	const int by = blockIdx.y + OffsetYBlocks;

as i have understand we have an id for each block , (x,y) in the grid.

now we want the block (8X8 pix. ) id (not cuda block id) , but as mentioned

in the DCT documentation we have cuda_block for every >> macroblock

– which contains of 4 or 16 vertical 8X8 pix. blocks

so how do we get the index of the right 8X8 pix. block ?

// Texture coordinates

	const float tex_x = (float)( (bx << BLOCK_SIZE_LOG2) + tx ) + 0.5f;

	const float tex_y = (float)( (by << BLOCK_SIZE_LOG2) + ty ) + 0.5f;

can you explain for me what’s the operator << means ?

in other words what do the command :

bx << BLOCK_SIZE_LOG2

do ?

and what is that “0.5f” ?

<< mean bit shift to the left, same as multiplication but faster

x<<1 = x*2

x<<2 = x*4

x<<log2(n) = x*n

0.5f is added to put coordinate into the center of texel.

Without confirming the numbers, my guess is that they’re the cosines to perform a 2D DCT. The 2D DCT is a matrix multiply.

However, the computation can reduced at the cost of higher register usage. Google: “AAN DCT”

Hope that helps.