Know of a 2D DCT library?

I’m doing some performance and algorithm optimization.

In addition to having better conditions, on paper my algorithm would run faster with tuned implementation of the 2D DCT (Discrete Cosine Transform). This is mostly motivated by needing to do a frequency shift on real valued input data.

I’m wondering if anybody knew of a tuned, supported version of the 2D DCT that I could put into my code in the same way people use cuFFTW. I understand how to roll my own, but I worry that it won’t be optimized.

The npp library has various functions to facilitate DCTs:

[url]NVIDIA 2D Image And Signal Performance Primitives (NPP): NVIDIA 2D Image and Signal Processing Performance Primitives

and the JPEG sample app may help you get started:

[url]CUDA Samples :: CUDA Toolkit Documentation

I took a look at the npp and it seems that their DCTs are directed at JPEGs and specifically have fixed sizes, while I want to do a cosine a transform on an arbitrary (or large) size image in the same way I would use Matlab’s DCT command

For example, I see in NPP:

7.50.2.2 NppStatus nppiDCTInitAlloc (NppiDCTState**  ppState)
Input is expected in 8x8 macro blocks and output is expected to be in 64x1 macro blocks.

What I want is:

http://www.mathworks.com/help/images/ref/dct2.html
B = dct2(A) returns the two-dimensional discrete cosine transform of A. The matrix B is the same size as A and contains the discrete cosine transform coefficients B(k1,k2).

Hi misha695,

Have you resolved your issue (with a well-optimized DCT/IDCT from CuFFT)?