I write computer vision software and algorithms for image analysis. All my software is Linux based in CentOS 5.4/5.5 64-bit with a middle aged 2.6.18 kernel and some third party hardware. Everything is deployed on a Dell T5500 workstation with single or dual Xeon processors and a minimum of 12GB of RAM — a PCIe bus. I’m wanting to move to CUDA, but have a short list of requirement, which may grow later, and a few questions.
My current main goal is to reduce math operations from ~400 microseconds to as lightning fast as I can get them. (When you do the same operations 360,000 times, 400 microseconds starts to add up).
- Must all be real time, this is in a state machine and as close to RTOS as a standard distro will get.
- Must have C++ support, this is critical - gcc-4.1.
- Jave support would be nice, but not critical - java1.6.
- Need ability to do math operations in CUDA environment, something like multiplying matrices as big as 4,920x3,264 using standard data types (char, double, float, int).
- Would be nice to have built in image manipulation abilities such as binarize, dilation, erosion, rotation, etc.
- Advanced image processing such as edge detection and scaling would be really, really nice.
So, knowing the requirements, my questions are:
- Which nVidia/CUDA chip card should I be looking at for best results? With how much memory?
- Which kernel version would be the most optimal for this environment?
- Which CUDA development environment would be most efficient?
- Which compilers/versions are required to make the thing work, and which compilers/versions would be best?
- Is there any other third-party debugging/profiling software besides gdb, valgrind, electricfence and purify that I should be looking at?
I’d like to get a card on order next week and get started, so any insights would be greatly appreciated!