I’m curious as to how most users compile their CUDA code because there seems to be so many ways to do it. For example, the makefiles that come packaged with the CUDA SDK 2.3 appears to do level 3 optimizations with the -O3 flag and -fno-strict-aliasing.
The nvopts.sh that comes packaged with Matlab Cuda 1.1 appears to use the -O3 flag as well as -funroll-loops.
What is the most stable way to compile cuda code?
Because I’m a little confused as to why these optimizations options are there because they appear to break when using newer gcc versions such as 4.4.1.