Questions regarding the DCT8x8 sample

fontbona · November 23, 2013, 1:36am

Hello, I have a questions regarding the Sample provided by Nvidia called DCT8x8 which is applied to an image to execute the algorithm in parallel. more info: http://developer.download.nvidia.com/compute/DevZone/C/html/C/src/dct8x8/doc/dct8x8.pdf

The code executes forward DCT and it’s inverse on a BMP image.

My first question is, is there a way to calculate the only the forward transform to obtain the JPG?

Second, there are several parts of the code that I don’t understand I hope someone that is familiar with DTC and CUDA can help me with those.

First: in the file dtc8x8_gold.cpp the program uses the following matrices:

const float DCTv8matrix[BLOCK_SIZE2] =
{
    0.3535533905932738f,  0.4903926402016152f,  0.4619397662556434f,  0.4157348061512726f,  0.3535533905932738f,  0.2777851165098011f,  0.1913417161825449f,  0.0975451610080642f,
    0.3535533905932738f,  0.4157348061512726f,  0.1913417161825449f, -0.0975451610080641f, -0.3535533905932737f, -0.4903926402016152f, -0.4619397662556434f, -0.2777851165098011f,
    0.3535533905932738f,  0.2777851165098011f, -0.1913417161825449f, -0.4903926402016152f, -0.3535533905932738f,  0.0975451610080642f,  0.4619397662556433f,  0.4157348061512727f,
    0.3535533905932738f,  0.0975451610080642f, -0.4619397662556434f, -0.2777851165098011f,  0.3535533905932737f,  0.4157348061512727f, -0.1913417161825450f, -0.4903926402016153f,
    0.3535533905932738f, -0.0975451610080641f, -0.4619397662556434f,  0.2777851165098009f,  0.3535533905932738f, -0.4157348061512726f, -0.1913417161825453f,  0.4903926402016152f,
    0.3535533905932738f, -0.2777851165098010f, -0.1913417161825452f,  0.4903926402016153f, -0.3535533905932733f, -0.0975451610080649f,  0.4619397662556437f, -0.4157348061512720f,
    0.3535533905932738f, -0.4157348061512727f,  0.1913417161825450f,  0.0975451610080640f, -0.3535533905932736f,  0.4903926402016152f, -0.4619397662556435f,  0.2777851165098022f,
    0.3535533905932738f, -0.4903926402016152f,  0.4619397662556433f, -0.4157348061512721f,  0.3535533905932733f, -0.2777851165098008f,  0.1913417161825431f, -0.0975451610080625f
};

const float DCTv8matrixT[BLOCK_SIZE2] =
{
    0.3535533905932738f,  0.3535533905932738f,  0.3535533905932738f,  0.3535533905932738f,  0.3535533905932738f,  0.3535533905932738f,  0.3535533905932738f,  0.3535533905932738f,
    0.4903926402016152f,  0.4157348061512726f,  0.2777851165098011f,  0.0975451610080642f, -0.0975451610080641f, -0.2777851165098010f, -0.4157348061512727f, -0.4903926402016152f,
    0.4619397662556434f,  0.1913417161825449f, -0.1913417161825449f, -0.4619397662556434f, -0.4619397662556434f, -0.1913417161825452f,  0.1913417161825450f,  0.4619397662556433f,
    0.4157348061512726f, -0.0975451610080641f, -0.4903926402016152f, -0.2777851165098011f,  0.2777851165098009f,  0.4903926402016153f,  0.0975451610080640f, -0.4157348061512721f,
    0.3535533905932738f, -0.3535533905932737f, -0.3535533905932738f,  0.3535533905932737f,  0.3535533905932738f, -0.3535533905932733f, -0.3535533905932736f,  0.3535533905932733f,
    0.2777851165098011f, -0.4903926402016152f,  0.0975451610080642f,  0.4157348061512727f, -0.4157348061512726f, -0.0975451610080649f,  0.4903926402016152f, -0.2777851165098008f,
    0.1913417161825449f, -0.4619397662556434f,  0.4619397662556433f, -0.1913417161825450f, -0.1913417161825453f,  0.4619397662556437f, -0.4619397662556435f,  0.1913417161825431f,
    0.0975451610080642f, -0.2777851165098011f,  0.4157348061512727f, -0.4903926402016153f,  0.4903926402016152f, -0.4157348061512720f,  0.2777851165098022f, -0.0975451610080625f
};

float Q[BLOCK_SIZE2] =
{
    32.f,  33.f,  51.f,  81.f,  66.f,  39.f,  34.f,  17.f,
    33.f,  36.f,  48.f,  47.f,  28.f,  23.f,  12.f,  12.f,
    51.f,  48.f,  47.f,  28.f,  23.f,  12.f,  12.f,  12.f,
    81.f,  47.f,  28.f,  23.f,  12.f,  12.f,  12.f,  12.f,
    66.f,  28.f,  23.f,  12.f,  12.f,  12.f,  12.f,  12.f,
    39.f,  23.f,  12.f,  12.f,  12.f,  12.f,  12.f,  12.f,
    34.f,  12.f,  12.f,  12.f,  12.f,  12.f,  12.f,  12.f,
    17.f,  12.f,  12.f,  12.f,  12.f,  12.f,  12.f,  12.f
};

float C_a = 1.387039845322148f; //!< a = (2^0.5) * cos(    pi / 16);  Used in forward and inverse DCT.
float C_b = 1.306562964876377f; //!< b = (2^0.5) * cos(    pi /  8);  Used in forward and inverse DCT.
float C_c = 1.175875602419359f; //!< c = (2^0.5) * cos(3 * pi / 16);  Used in forward and inverse DCT.
float C_d = 0.785694958387102f; //!< d = (2^0.5) * cos(5 * pi / 16);  Used in forward and inverse DCT.
float C_e = 0.541196100146197f; //!< e = (2^0.5) * cos(3 * pi /  8);  Used in forward and inverse DCT.
float C_f = 0.275899379282943f; //!< f = (2^0.5) * cos(7 * pi / 16);  Used in forward and inverse DCT.

can someone please explain me why are those values being used and the reason for their usage?

also in the file dct8x8_kernel_quantization.cu there is another Q matrix, that my guess is that is indicating the threshold for quantization, and if so, why those values?

__constant__ short Q[] =
{
    32,  33,  51,  81,  66,  39,  34,  17,
    33,  36,  48,  47,  28,  23,  12,  12,
    51,  48,  47,  28,  23,  12,  12,  12,
    81,  47,  28,  23,  12,  12,  12,  12,
    66,  28,  23,  12,  12,  12,  12,  12,
    39,  23,  12,  12,  12,  12,  12,  12,
    34,  12,  12,  12,  12,  12,  12,  12,
    17,  12,  12,  12,  12,  12,  12,  12
};

my last question is,
I have the feeling that those values are specified for the “barbara.bmp” image which if true, will not let me use a different image than the defaul one, and that is what I’m looking for to do, besides understanding the code.

Thank you very much for your help!

Saul

Topic		Replies	Views
About DCT Kernel , some questions about CUDA CUDA Programming and Performance	2	6846	June 18, 2008
32x32 Block-based 2D-DCT on image coding/concurrency problem CUDA Programming and Performance	2	4701	May 23, 2008
NPP JPEG Compression problem nppiDCTQuantFwd8x8LS_JPEG_8u16s_C1R parameters CUDA Programming and Performance	1	8407	April 6, 2011
Parallel DCT algo. Need help. CUDA Programming and Performance	0	5185	November 27, 2007
NPP 3.2 JPEG Forward Quantization Problem Inconsistencies between demo code on IPP and NPP CUDA Programming and Performance	5	5472	July 14, 2011
Use NPP library for JPEG DCT GPU-Accelerated Libraries	9	20577	July 10, 2019
cufft questions CUDA Programming and Performance	5	1525	April 2, 2012
Question about the CUDA SDK DCT example CUDA Programming and Performance	0	3913	September 17, 2009
Batched 2D FFT implementation CUDA Programming and Performance	32	48569	January 13, 2011
DCT calculation (full-field) CUDA Programming and Performance	4	1970	April 9, 2014

Questions regarding the DCT8x8 sample

Related topics