Hello,
I am experiencing a not deterministic output from my cuda code.
I have written some cuda code, and I have tested it in debug mode (with -g -G flag). I have verified that the output is deterministic launching the same code many times in a loop: the output is always the same.
So the code works fine I assume, even if naturally is a bit slow.
In order to speed up the code I removed the -g -G flags (so the -O3 flag is set by default, right?). The code is noticeably faster, but no more deterministic!
Using the NVCC flags:
NVCCFLAGS = --compiler-options -fno-strict-aliasing --ptxas-options=-v -use_fast_math
the problem occurs very often (about 10% of the code executions in the loop).
Using instead the flags
NVCCFLAGS = --compiler-options -fno-strict-aliasing --ptxas-options=-v -use_fast_math -prec-div=true -ftz=false -prec-sqrt=true -fmad=false
the problem occurs fewer timers (about 0.5% or less of executions), but anyway still occurs!
Any idea?
Thanks for the help
EDIT: for completeness I attach in the following the part of my .pro file for the compilation of the cuda code (I am working in Qt):
CUDA_SOURCES += cuda_test.cu
CUDA_DIR = /usr/local/cuda-7.5/
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing --ptxas-options=-v -use_fast_math -prec-div=true -ftz=false -prec-sqrt=true -fmad=false
INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_DIR/samples/common/inc
QMAKE_LIBDIR += $$CUDA_DIR/lib64
LIBS += -L/usr/local/cuda-7.5/lib64/ \
-lcuda \
-lcudart
CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')
cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -M $$CUDA_INC $$NVCCFLAGS ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda