Hey gentle people again,
Currently I have a working project (a videodecoder). It’s a cpp-cuda integration project, using opengl<->cuda interop. I don’t read/write outside the buffers, and it’s calling 3 kernels for every frame.
It is showing the complete video-frame when I run in deviceemu, but when I go to release settings, it seems to me cudaThreadSynchronize(); gets ignored when compiling. Like the top 75% frame is correct and rest gets incorrect values.
After running a 30-40 frames I get a kernel launch failure (timeout). I think it queued too many kernels, and some need to complete when the rest of my program is already writing new information into one buffer or so? After each kernel invocation I added the cudaThreadSynchronize(); (in the main cuda file, the file that calls the kernels…)
This is why I think cudaThreadSynchronize gets compiled out, could be or not?
anyways here is the command line
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/RTC1,/MTd -I"$(CUDA_INC_PATH)" -I…/include/ -I…/src/ -I…/…/…/libs/baseDecoder/include/ -o $(ConfigurationName)\cuda.obj cuda.cu
Any thoughts?
Thx in advance and best regards,
Niels
p.s. :
system setup : Intel Core2 CPU 6300 @ 1,86ghz
1 gig ram
GTS8800