Best Nvidia GPU for Image Rotataion

Hi All,

I am not sure if this is the correct place to post this question, If not then please advise the correct location.

We are currently developing an application which requires fast (1 sec or less) image Rotation between -5 to +5 degrees. The images are 1-Bit Tiffs and can be up to 2GB in size (20inx30in @ 5080 dpi). We wish to use the CUDA platform for the software side but are still unable to spec the right GPU. Any suggestions will be greatly appreciated.

Thanks and all the best,

AK

This task will probably be memory bandwidth bound on at least on GPUs of compute capability 2.0 or higher. So look at the memory throughput needed (4GB/s) and compare that with the documented values in the specs of selected GPUs. Leave a bit of safety margin. The __ballot() intrinsic function of CC 2.0+ devices will probably come handy in generating the 1-bit image.

Do you require to transfer the images over PCIe and back and decompress/compress them within the 1sec? Then you will want to spec the host system accordingly as well.

For an overview of NVIDIA GPUs, have a look at this wiki page: http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units.

As tera said, if you want the PCIe transfer within 1 second, also spec the host system as well. PCIe v3 will come in handy in this case, which means you are limited to the latest GTX 600 series (and Titan of course).
A GPU with more than 4GB of memory (2GB for input, 2GB for output, and some extra for driver overhead, etc.) is hard to find (only Titan), so you have to tile your image in parts to do the processing. As an extra benefit you can overlap (in time) the computation on one part of the image with the transfer of another part.

With value-for-money in mind, something like the GTX 660 looks like a good candidate for your application.

Hi,

Thank you tera and Gert-Jan for your responses. I appreciate the time. I am fairly new to GPU computing and not well versed in the lingo yet.

tera:
The 1 sec spec does NOT include the compression/decompression time but does include the transfer over PCIe time.

Gert-Jan:
One of our requirements is also to have this GPU in a rack mount configuration. Given that this will be part of a larger product, price of the board is a non issue.

Breaking up the image into parts and processing it is definitely an option.

Based this what would you guys suggest? Also what platform will be best suited from the programming point of view?

Thanks,

AK

If price is not an issue, and the GPU will be mounted in a rack, I would have a look at Tesla cards (NVIDIA’s professional range of products): http://www.nvidia.com/object/tesla-servers.html.

Unfortunately there are no Tesla cards with PCIe v3, which means you will have to do with PCIe v2 and its maximum throughput of about 6GB/s. Luckily you can transfer in both ways simultaneously with Tesla cards (not possible with GeForce cards) if I remember correctly. Any Tesla card with 1 Fermi or 1 Kepler GPU will probably be (more than) sufficient for your application.

You can achieve simultaneous bidirectional copy with Geforce cards as well if at least one of the copies is performed by a kernel (using mapped memory).

That’s a good tip, tera. Thanks!

Hi Guys,

Thank you very much for all the information. Gert-Jan and tera, you have been very helpful, thank you!
Based on your recommendations and from Nvidia, we have decided to go with the Tesla K10 card. This will suffice for our application.

Thanks again!

AK

They will certainly suffice, given they are the secondmost expensive cards Nvidia has to offer. ;)

If price/performance ratio is important to you, there are definitely cheaper cards that would fully fulfil your requirements.

Hi Tera,

The card is part of a larger imaging system (for direct copper imaging - PCB manufacturing) and the card is but 1 small component of the whole. At this point the price of the card is not that critical and we want to go with the best :)

What software platform/libraries will you recommend for the software end of this image manipulation requirement for the K10?

Thanks,

AK