kernel is not executing

Hello all,
I have a project having compiled by c/c++ and compiled by nvcc. calls which includes the kernel. Here are my files

The project can build without any error. I can call the wrapper from main file. By checking the output from, I can tell wrapper (BThreadCall) is working, there is no problem with data copying. But the kernel inside the wrapper is not working, the kernel is either skipped or the kernel is not executing the code inside when the program is running.
Can someone give some idea how to solve this problem? Thank you very much!

  1. Check the return values of all CUDA functions for error codes.
  2. Put a cudaGetLastError right behind the kernel call to check for launch errors.
    3. How big is N? [Edit:] Oh, I just saw that you redefine N in BThreadCall. The 4096 blocks you end up with should be fine at least.