app runs fine in Emu, but not on the card

My kernel runs fine and produces the right results both in EmuDebug and EmuRelease mode. however, when run in Debug or Release mode I get the wrong results.
All CUDA CALLS return success

Any leads as to where to look first? I guess I must use too much shared memory/have bank conflicts?

can I change all _shared_mems to device mapped, with just a loss of performance to investigate the first hypothesis?

A weird bug prevents me to effectively investigate : It seems under vs 2005 when I modify the kernel code the changes are not recompiled into the new executable or the gpu keeps trace of the old code/data - is there a way to completely “flush” the graphic card for both memory and kernel code as it acts like it still uses old code: example, I modify the int value pointed to by *ptr on my device with *ptr = 66, I run the app and get the proper result = 66 on the host, then I remove the line , clean, rebuild (even manually delete the “debug” folder in the project folder altogether with the .obj code generated by nvcc) altogether (or put in *ptr = 888 instead ) and still get 66 as a result from the new executable.
this doesn’t happen in emu mode, the changes to the kernel are immediately taken into account and reflect in the new executable

If you use too much shared memory, more than available for example, the kernel will not launch, so if you were able to launch it this was certainly not a problem. Bank conflicts have nothing to do with this either, they just make code execution slow, but won’t produce ‘wrong’ numbers.

It sounds like you have your executables mixed up. Are you sure you run the code you intend to run? What you describe can not possibly happen as far as I can see (having a variable 888 that magically turns into 66). Perhaps some old version of the executable lying around?