I’m a C/C++ programmer who has recently become interested in Nvidia’s CUDA. It seemed to be an interesting and powerful technology but the reality is disappointing. Within a week I didn’t manage to parallelize a simple program that has less than 20 short lines. CUDA utilities (especially compiler) and documentation’s quality is quite poor. If I didn’t know it’s a final product I’d say it is in beta stage. Here are a few examples of what happened:
device global variables == catastrophe
In my first attempt to use device memory I’ve made a few variables global to not have to pass them as arguments to my kernel. I know global variables are ugly but I just wanted to write a short test. Compiler didn’t have anything against, the only hint that something is wrong was “Advisory: Cannot tell what pointer points to, assuming global memory space” which referred to lines where I assigned things to shared memory (local variable) . The program didn’t work of course. I spent many hours to figure out what is the problem. Google makes me think I am the only one who tried to do this. I suggest improving compiler messages and/or documentation.
Just a few days were enough to write a program that crashed my computer.
I’ve never written a program for CPU which could crash my computer (more than 4 years of coding, tens of thousands of LOC).
Interesting side-effect when synchronizing threads.
Try something like this: make a shared variable and write something to it in one thread and read in another one. Easy task? volatile modifier or __threadfence_block() is enough? No. The first option seem not to work at all, the second works only if the writing thread is still running during synchronization. I think synchronization should synchronize returned threads too (or there should be a warning that it won’t - I didn’t notice it in programming guide).
If I encounter something else, I’ll post it here. I think I’ll give CUDA a few more days.
I use CUDA 2.2 toolkit.