Hello,
I have a CUDA kernel that I just ported in OpenCL, and I get different results.
Precisely, the results are identical everywhere on a 100x50x50 3D matrix, except for two columns. Let me summarize :
This what the input looks like (not actually 1 and 0.1)
1 1 1 1 1 0.1 0.1 0.1 0.1 0.1
1 1 1 1 1 0.1 0.1 0.1 0.1 0.1
1 1 1 1 1 0.1 0.1 0.1 0.1 0.1
1 1 1 1 1 0.1 0.1 0.1 0.1 0.1
1 1 1 1 1 0.1 0.1 0.1 0.1 0.1
The output matrix is computed by finite differences (degree 1 in space)
1 1 1 1 X X.0.1 0.1 0.1 0.1
1 1 1 1 X X.0.1 0.1 0.1 0.1
1 1 1 1 X X.0.1 0.1 0.1 0.1
1 1 1 1 X X.0.1 0.1 0.1 0.1
1 1 1 1 X X 0.1 0.1 0.1 0.1
The X are the places where CUDA and OpenCL give different values, about 10^-6 relative error
I tried to change the compilation option (with and without --fast-math, --cl-fast-relaxed-math…), but I always get different results.
The computation involves :
(CUDA <-> OpenCL)
[*]fmaxf <-> max
[*]fminf <-> min
[*]sqrtf <-> sqrt
[], /, +
I wondered if this is a normal behavior. Perhaps the sqrt implementation is different ?
There are multiple sqrt possibilities in OpenCL, which one should I choose to ensure having the same results ?
I read that CUDA implements sqrt by 1/rsqrt, should I do explicitly 1.0f/rsqrt(x) in my OpenCL code ?)
Does it come from the optimizatiojn step ? (involving fmad or something like that)
Does it come from me ? (but this is a simple kernel without any synchronization needs, so translating it in OpenCL is just about a ‘sed’)
Any comment appreciated :)
Thank you !
PS : using CUDA 3.1, driver 256.40