kernel is not executing

Hello all,
I have a project having BThread.cu compiled by c/c++ and BWrapper.cu compiled by nvcc. BThread.cu calls BWrapper.cu which includes the kernel. Here are my files
BThread.cu:

BWrapper.cu

The project can build without any error. I can call the wrapper from main file. By checking the output from BThread.cu, I can tell wrapper (BThreadCall) is working, there is no problem with data copying. But the kernel inside the wrapper is not working, the kernel is either skipped or the kernel is not executing the code inside when the program is running.
Can someone give some idea how to solve this problem? Thank you very much!

  1. Check the return values of all CUDA functions for error codes.
  2. Put a cudaGetLastError right behind the kernel call to check for launch errors.
    3. How big is N? [Edit:] Oh, I just saw that you redefine N in BThreadCall. The 4096 blocks you end up with should be fine at least.