My kernel runs fine and produces the right results both in EmuDebug and EmuRelease mode. however, when run in Debug or Release mode I get the wrong results.
All CUDA CALLS return success
Any leads as to where to look first? I guess I must use too much shared memory/have bank conflicts?
can I change all _shared_mems to device mapped, with just a loss of performance to investigate the first hypothesis?
A weird bug prevents me to effectively investigate : It seems under vs 2005 when I modify the kernel code the changes are not recompiled into the new executable or the gpu keeps trace of the old code/data - is there a way to completely “flush” the graphic card for both memory and kernel code as it acts like it still uses old code: example, I modify the int value pointed to by *ptr on my device with *ptr = 66, I run the app and get the proper result = 66 on the host, then I remove the line , clean, rebuild (even manually delete the “debug” folder in the project folder altogether with the .obj code generated by nvcc) altogether (or put in *ptr = 888 instead ) and still get 66 as a result from the new executable.
this doesn’t happen in emu mode, the changes to the kernel are immediately taken into account and reflect in the new executable