DCT calculation (full-field)

Dear all,

How would I best implement the standard Discrete (or fast) Cosine Transform using CUDA? I am aware of the DCT8X8 example that is present in the SDK. However, I would like to perform a ‘full-field’ version of it (once on a 640X480 matrix), and not the one that works on local 8X8 blocks.

Has this been done before? What would be the best approach, direct implementation or some sort of adaptation of the CUFFT routines?

Thanks in advance,




There are many implementations and research papers.

Thanks for your response, but those are examples of local DCT’s. i.e. DCT’s on blocks of 8x8 or 16x16 pixels, all executed in parallel. These are often used in data compression, hence the high quantity of information you can find on them.

What I’m looking for is a one-go DCT, one that performs the transformation in one go on all elements. My application requires this as the images I’m working on have poor local frequency modulation, but significant frequency modulation across the entire image.

So what I’m looking for is an algorithm that does what CUFFT does, only doesn’t incorporate the complex waves in the basis functions, only the real (cos) ones…

Attempt to bump again. Anyone?