I have a very strange issue with the program I wrote. It’s supposed to be a parametric surface raytracer - but it’s not finished yet. The way I wrote it is downright deterministic. Each frame should be the same if the parameters are the same. But for some inputs, namely Chmutovs curve (defined with Chebyshev’s polynomials, see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.9031&rep=rep1&type=pdf), the results are nondeterministic. They don’t occur with other surfaces.
These are two screenshots taken from the program. Input vales were exactly the same:
Perhaps such undeterminism is expected - but personally I didn’t came across any documentation saying that.
To make things even weirder, this behaviour is correlated with code performance. This little table sums up my experiments:
#n | gencode opts | compiler options | FPS | effects
-----------------------------------------------------------------------
1. | sm_21 | -use_fast_math | 32 | very bad visuals
2. | sm_21 | ftz=1 prec-sqrt=1 | 41 | same as above
3. | sm_21 | ftz=0 prec-sqrt=1 | 187 | mostly fine, minor artifacts
4. | sm_21 | | 187 | same as above
5. | sm_21 | -use_fast_math
| ftz=0 prec-sqrt=1 | 403 | lot's of artifacts, but not that many as in #2
6. | (none given) | | 298 | at first the view is perfect, but moving around shows a small number of mostly-steady artifacts (only small variations on the screen, not a huge black patches)
7. | (none given) | ftz=1 prec-sqrt=1 | 287 | same as above
8. | (none given) | ftz=0 prec-sqrt=1 | 282 | same as above
9. | (none given) | -use_fast_math | 193 | ** perfect render ** - no artifacts anywhere
10.| (none given) | -use_fast_math
| ftz=0 prec-sqrt=1 | 374 | another ** perfect render ** combination - also notice high framerate
This whole thing is arcane. I thought that compiling for sm_21 would give the same results as leaving the compilation up to JIT but obviously I was mistaken. If seems that -use_fast_math without giving any gencode options yields the best and the most correct code, which is strange.
Below is my system setup. But I did test the code on other machines with different graphic cards and the artifacts were still there.
OS: Arch Linux, 64 bit, kernel 2.6.36-ARCH
CPU: i7 870 @ 2.93GHz
Graphics driver: NVIDIA UNIX x86_64 Kernel Module 260.19.36
Graphics card: nVidia Corporation GF104 [GeForce GTX 460] (rev a1)
Memory (I can’t remember the exact hardware now, sorry):
# dmidecode 2.11
SMBIOS 2.6 present.
Handle 0x0008, DMI type 5, 24 bytes
Memory Controller Information
Error Detecting Method: 64-bit ECC
Error Correcting Capabilities:
None
Supported Interleave: One-way Interleave
Current Interleave: One-way Interleave
Maximum Memory Module Size: 2048 MB
Maximum Total Memory Size: 8192 MB
Supported Speeds:
Other
Supported Memory Types:
DIMM
SDRAM
Memory Module Voltage: 3.3 V
Associated Memory Slots: 4
0x0009
0x000A
0x000B
0x000C
Enabled Error Correcting Capabilities:
None
Handle 0x0009, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: DIMM0
Bank Connections: 0 1
Current Speed: Unknown
Type: DIMM SDRAM
Installed Size: 2048 MB (Double-bank Connection)
Enabled Size: 2048 MB (Double-bank Connection)
Error Status: OK
Handle 0x000A, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: DIMM1
Bank Connections: 2 3
Current Speed: Unknown
Type: DIMM SDRAM
Installed Size: 2048 MB (Double-bank Connection)
Enabled Size: 2048 MB (Double-bank Connection)
Error Status: OK
Handle 0x000B, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: DIMM2
Bank Connections: 4 5
Current Speed: Unknown
Type: DIMM SDRAM
Installed Size: 2048 MB (Double-bank Connection)
Enabled Size: 2048 MB (Double-bank Connection)
Error Status: OK
Handle 0x000C, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: DIMM3
Bank Connections: 6 7
Current Speed: Unknown
Type: DIMM SDRAM
Installed Size: 2048 MB (Double-bank Connection)
Enabled Size: 2048 MB (Double-bank Connection)
Error Status: OK
There is no way I can provide you with isolated case, the whole program is way too complex. But I can provide you with full source code if you wish.