I’m working through the Intro to Parallel Programming (https://www.udacity.com/course/intro-to-parallel-programming--cs344) course on Udacity. I discovered when I compile with -O0, I get the correct output for the first homework. However, when I compile with -03, I have white spots in the output. Is something wrong with my driver and/or cuda installation?
I would rather suspect a missing synchronization being responsible for this.
Race conditions often do not show any wrong behavior when compiled without optimization, due to the slower execution. With no optimization it can happen that your kernel uses less resources, subsequently less blocks are executed simultaneously.
If the course asked you to add some synchronization somewhere, this might be the right place to search.
That sounds logical. The course doesn’t cover barriers until after the first problem set though.