How can I use single precision arithmetic for floating point operations in gtx480 architecture?
When I compile I use the flag -arch sm_20. Without it my program cuda application will not run on gtx480 architecture.
I was wondering if there is any special function I am using in my code which requires
mandatory the double precision arithmetic.
Does anyone have any idea on how to do about that?
I would like to use single precision arithmetic in order to observe if my application would be faster that way.
Thank you in advance for the answers!