single precision arithmetic in FERMI architectures nvcc flags?

How can I use single precision arithmetic for floating point operations in gtx480 architecture?

When I compile I use the flag -arch sm_20. Without it my program cuda application will not run on gtx480 architecture.

I was wondering if there is any special function I am using in my code which requires
mandatory the double precision arithmetic.

Does anyone have any idea on how to do about that?
I would like to use single precision arithmetic in order to observe if my application would be faster that way.

Thank you in advance for the answers!

What happens exactly? Any system message? Do you monitor error codes that are being returned from all cuda function / your kernel calls?

Have you tried to debug the program step-by-step?

It happens that the program fails execution at the first cudaMalloc that encounters.

Try updating your nvidia driver to the current one, e.g. from…aspx?lang=en-us
Which CUDA do you use?

For test purposes you can compile with [font=“Courier New”]-arch=compute_12 -code=compute_12[/font], which will demote double to float and then force a dynamic recompilation if executed on a compute 2.x device.