Hi,
I have tested some program on cuda( cuda 2.3 using device emulation ). I notice that there is some minor difference in outputs of simple C program & respective Cuda program (Program contains the float calculation ) .
I want to know that I getting this difference because of device emulation mode or any other reason ?
Hi,
I have tested some program on cuda( cuda 2.3 using device emulation ). I notice that there is some minor difference in outputs of simple C program & respective Cuda program (Program contains the float calculation ) .
I want to know that I getting this difference because of device emulation mode or any other reason ?
First of all, there shouldn’t be any device emulation actually enabled on 3.2. It’s supposed to just ignore that flag. A lot of people will be happy to hear otherwise (as there has been a lot of noise about it).
Second, what do you mean minor difference? NEVER expect two different floating point codes to be the same. Different compilers, different optimization flags, and certainly different hardwares will give you different results. In general with floating point numbers (a + B) + c != a + (b + c).
These days (ansi c) compilers are not even allowed to optimize for that. It does mean though that code reordering or different choice of mad vs fmad vs mul + add will give different result.
Lastly, NVIDIA is not fully ieee compliant (last bit may give different results)
First of all, there shouldn’t be any device emulation actually enabled on 3.2. It’s supposed to just ignore that flag. A lot of people will be happy to hear otherwise (as there has been a lot of noise about it).
Second, what do you mean minor difference? NEVER expect two different floating point codes to be the same. Different compilers, different optimization flags, and certainly different hardwares will give you different results. In general with floating point numbers (a + B) + c != a + (b + c).
These days (ansi c) compilers are not even allowed to optimize for that. It does mean though that code reordering or different choice of mad vs fmad vs mul + add will give different result.
Lastly, NVIDIA is not fully ieee compliant (last bit may give different results)
First of all, there shouldn’t be any device emulation actually enabled on 3.2. It’s supposed to just ignore that flag. A lot of people will be happy to hear otherwise (as there has been a lot of noise about it).
Second, what do you mean minor difference? NEVER expect two different floating point codes to be the same. Different compilers, different optimization flags, and certainly different hardwares will give you different results. In general with floating point numbers (a + B) + c != a + (b + c).
These days (ansi c) compilers are not even allowed to optimize for that. It does mean though that code reordering or different choice of mad vs fmad vs mul + add will give different result.
Lastly, NVIDIA is not fully ieee compliant (last bit may give different results)