Career in CUDA and the future of Parallel programming


I want to know about the career in CUDA technology, like what are the job opportunities, and especially "future growth scope etc. Do you think it will sustain for a longer time in the market and the CUDA expeerts will remain in demand (I read some where that NVIDIA says that CUDA will be the most jazzy thing to posess in 2010). hwat aout down the line 5 years from now?


There will be a demand for people who have an understanding of parallel programming. Knowledge of some specific language is not that important.

You mean if I learn CUDA today, it will definetlty help me even if CUDA is replaced by some other language or say some other CUDA-Like platform tomorrow. (CUDA is also parallel programming and parallel programming will be in demand in coming years. Am I right?)

Of course it will help you. I don’t think CUDA will be replaced by a completely different parallel model in the near future. Actually, I am quite optimistic because some commercial companies like Adobe have already started to tune their software using CUDA.

What if I modify the original question slightly: “Given that parallel programming is the direction we are headed, what is the best parallel paradigm/language to invest time learning now to be better equipped?”

(Did I get it cudacuda2009?)

I honestly dont know if I can answer that, I’d say OpenCL or Ct, but I can’t project those very far either.

Yes laxsu19. But tell me one thing. OpenCL is not that easy to learn and moreover, it is yet not that matured. So if I learn CUDA now, and gradually I will shift to OPenCL. Is this gradual migration form CUDA to OpenCL possible? I mean my knowledge of CUDA will enhance my understanding of OpenCL, so that as soon as the demand of OpenCL arrives in the industry , I will easy shift to OpenCL from CUDA with little effort?

Yes. The way OpenCL works is very similar to CUDA. The same concepts apply only are they’re called differently (ex. Work item vs Thread, Work group vs Block). There’s more micromanagement necessary to launch a kernel compared to CUDA’s Runtime API (about the same as with the Driver API) so learning CUDA’s high level API is a great introduction to massive parallel programming. If you go further and learn the Driver API, learning OpenCL for GPUs should be a breeze. I’d even go and say learn CUDA first even if what you want is to use OpenCL, because CUDA is so similar yet much less buggy at this time.

Only take note that this applies to OpenCL’s GPGPU. When you code for a Cell BE or a multicore CPU using OpenCL, it’s different. Even though there’s a common API, you’ll likely be using different parts of it for GPUs, CPUs and other architectures.


IHMO, NVIDIA - right. I think that in next 5 years will come another technologies. It is like a line ASM->C->C+±>C# etc.

As i see, CUDA it is like assembler for GPU. Next step will be in some othe direction.

In my opinion gradual migration from CUDA to OpenCL is possible - but probably not desirable unless there’s a major incentive (eg: strategic partner investment), CUDA is by far easier to learn - and develop with, in addition to being superior to OpenCL in various ways (except one major one, nVidia specific).

As it stands, all the companies I’m aware of (that is, professionally, that I have real life experience with) are using CUDA over OpenCL… having used both, I personally agree as well - CUDA is just better (in terms of performance relative to all existing OpenCL implementations (read: nVidia & AMD’s implementations)), far faster to develop with (and this is the #1 reason I’m stickign with it), far more mature, has far more documentation (first and third party) due to it’s maturity, and hundreds if not thousands of examples, articles, and applications using CUDA.

Another major win for CUDA is ironically a major downfall, the fact it’s tied to a single vendor means you only have to worry about a single set of debugging and profiling tools, for a single set of hardware. OpenCL, if you want to seriously contend with the performance CUDA has to offer, you have a dozen odd tools, for various platforms (CPU/GPU, nVidia/AMD|ATI/Intel/PowerVR/IBM/etc) - and it only gets worse, you generally have to make implementations of the same kernel to target different architectures to really get the performance you need (this is especially true between CPU/GPU, and embedded DSPs vs. GPUs), primarily due to how loose OpenCL is on hardware and implementation requirements & minimum resources.

That’s not to say OpenCL is bad though, if you only care about GPUs - you can generally get away with a single OpenCL kernel for both AMD and nVidia GPUs (Intel is another story, but again you can generally live with the same kernel) - and in this case, it’s possibly just as good as CUDA (minus the language pitfalls and array of profiling/debugging tools you’ll have to familiarize yourself with).

That said, I sadly see OpenCL toppling CUDA over in the long term (primarily because people like open standards, and people certianly don’t like being tied into a single IHV), once OpenCL matures.

One trend we are definitely seeing is a move toward hybrid computing, where CPUs are paired with some kind of specialized processing devices for data parallel tasks. General purpose CPUs are not terribly efficient at vector floating point calculation when you compare them to something less general like Cell or the current generation of GPUs. In FLOPS per watt and FLOPS per dollar, Cell and GPUs are an order of magnitude improvement. Pairing Opterons with PowerXCell 8i chips is what helped RoadRunner (the cluster here at LANL that I never get to use :) ) jump to the top of the supercomputer charts.

So I think learning how to deal with any one of these architectures (Cell, CUDA, OpenCL) is an excellent base for future HPC work. I think the era of just throwing more GHz and CPUs at the largest problems is coming to an end. It’s becoming more cost effective to spend transistors on less flexible architectures with more throughput.