Matlab to do GPUs parallel computing and buy or build an optimal computer to do it.

Hi,

I am a researcher and I am looking into ways to speed up the Lab data analysys. We use Matlab as a programming language.
Our data analysis consist in taking a movie of ~600 images, we extract some regions of interest (ROIs), ~600xIMAGE, and fit the intensity profiles from those ROIs with a 2D gaussian. The program should be fairly easy to parallelize, since it runs mostly on independent for loops. I know that matlab supports CUDA-enabled NVIDIA GPUs(Unterstützung für MATLAB-Berechnungen auf CUDA-fähigen NVIDIA-GPUs - MATLAB & Simulink).

My questions:
If I use a parfor loop to parallelize my analysis, would my computer use automatically the GPUs?
If not, how can I do it?

Additional questions(maybe not relevant to the topic):
I would like to buy a computer for parallel computing using GPUs, possibly I would like to avoid to build it myself
in spite of an increased price. Unless you highly recommend it.
Some info about the movies. Each movie has about 600 images and each image 600 ROIs to fit with a gaussian, I imagine
that optimally I would need something that can first run the 600 images in parallel to get the ROIs and than the 600 fits.
It should be a computer with GPUs for a total of ~600cores.
Would you suggest any computer?
If you recommend to build it myself, could you elaborate on how to do it?
Budget is not a big issue.

Thank you a lot for your time.
NOTE: JUST KEEP IN MINE THAT MY EXPERTISE ARE VERY LIMITED ABOUT GPUs. I THINK I COULD MOUNT RELATIVELY EASY A HOME-BUILD COMPUTER WITH THE HELP Of GUIDS/YOUTUBE TUTORIALS BUT I WOULD NOT LIKE TO SPEND MORE THAN A FEW DAYS ON IT.

Parfor matlab loops will generally only focus on CPU acceleration. Refer to MATLAB GPU Computing Support for NVIDIA CUDA Enabled GPUs - MATLAB & Simulink for accelerating matlab code via moving variables/computations to CUDA-enabled GPUs.

If single-precision arithmetic is acceptable, you can calculate which recent NVIDIA GPU has a better performance in terms of GFLOPS/$ cost. Most likely this would be a newer generation Pascal or Volta. As long as you have a PCI-E slot that can house the GPU and a relatively modern desktop capable of PCI-E 2.0 or even 3.0 speeds, you should reap from GPU benefits.

Note that MATLAB coding operations like so: Benchmarking A\b on the GPU - MATLAB & Simulink Example is generally slower than building your own CUDA kernels, because of the overhead of having MATLAB do the JIT compilation for you, but I’ve managed to get a fair amount of speed increase (at least 10X) for a compute-bound algorithm just by this approach alone vs a naive single threaded MATLAB code.

I’d say go for a workstation class computer with single or dual Xeon CPU and ECC memory, provided by an experienced system integrator. Inside could be a Quadro class GPU (e.g. Quadro P5000)

If you also need high speed for double precision arithmetics, go for a higher end Tesla card, e.g. Tesla P100. For deep learning applications a Tesla V100 might be more suitable. Warning: Tesla cards are expensive, and they don’t have any video output (also most of these are passively cooled).

Image analysis not involving any deep learning is probably handled just fine with Quadro (or even Geforce GPU - but most system integrators prefer to sell the more expensive Quadros and Teslas).

Putting more than one GPU into the workstation does not automatically make your code (or Matlab) faster. Multi-GPU support has to be explicitly programmed into code.

Related stackoverflow thread here: image processing - Using more than one GPU in matlab - Stack Overflow

Note that GPUs have thousands of cores, but they won’t be working on thousands of images in parallel. What you generally do is to work on thousands of pixels of the same image in parallel, completing the task in a fraction of the time that a CPU would take. The images are still handled sequentially. Be aware that those images have to be transferred to the GPU first over PCI express, which can cause non negligible overhead.

something like this might suffice

your choice of AMD EPYC or Xeon Scalable SP CPUs…