Hey, i have written the following code and it works fine on input of size 10,20,30,40, but when it comes to 50 and more it throws first-chance exception. Could anyone run this program on their PC and tell me whether they get the output or it crashes as well.
Here is the program: (Press enter 3 times when running the program because i have some breaks there):
#include <iostream>
#include <fstream>
using namespace std;
You are probably hitting the watchdog timer timeout. If not, run your kernel under cuda-memcheck to check for stray memory accesses. See my signature for links.
The thing is that it breaks when copying data from CPU to GPU, at least thats where the visual studio indicates. So i think, it does not even get to the kernel.
I have changed that and it still produces the error.
HOWEVER, i have done extensive research and i THINK i know what is wrong.
In my kernel first kernel, i have a for-loop and then in the CPU code, i have cudaThreadSynchronize() because i need all threads to finish this before going on to the next kernel, however, is that the right way to synchroznize it? Also in this kernel, i have a conditional statement that will be evaluated differently for EVERY thread. Can this have any impact?