I am currently starting to get back into GPU computing and need to solve a problem where I have to do parallel calculations on image sequences.
For the calculation I need to access the color values for a range of images and do statistics on those.
Since all my kernels need to access a couple of picture, what’s the best way to access these? Would all of the pictures be copied to memory on the card? What are my options to access the color values? Are there already methods to access those with sub-pixel accuracy (using linear interpolation in hardware)?
Any advice on how to start working on a problem like this would be greatly appreciated. Also pointers to Tutorials etc. concerning similar problems would be great!
So depending on the format of your image input you’ll want to first access the raw RGB raster. Which is essentiallly a long sequential memory slice representing an RGB matrix.
Hence the pixels of the raster can be stored as for example:
Where A stands for the alpha channel. Now I’ve noticed that for example certain webcams will give you the channels in different orders : ex BRG , and often without an A channel.
So I suggest that you on the host side extrapolate your RGB channels from either camera, images, or camera footage using the appropriate API. Then load the entire raster of each channel to the GPU.
You would allocate the images as arrays, memcpy the data from the CPU to the arrays on the device, then bind them as textures, then sample them in the kernel with tex2D (which yes will be sub-pixel and (optionally) bilinear filtered). There are a ton of samples which do this sort of thing, search the samples directory for e.g. “cudaBindTexture”.
I’d also take a look at the NVidia Performance Primitives library (https://developer.nvidia.com/npp) which has some optimised functions for common operations like histogram.
You might also consider using OpenCV (http://opencv.org/) rather than raw CUDA, it’s easier to work with and has a very large number of built-in functions from primitives like add/subtract through to highly complex image processing operations. You may find what you want here, or can build something by composing simpler ops.