execution happens but still invalid device fuction


I am a newbie to CUDA. I am writing a small kernel but somehow i keep getting the invalid device function error message.

  • Operating System
    MS Vista 64 bit Home Premium

  • Synopsis description of the problem
    I execute my code (which compiles without issues) and i see the printfs that are executed after every execution of the kernel (my kernel is called multiple times in the host code).
    after all the iterations are complete (all printfs are on screen) the invalid device function is flagged.

I read the forum posts about

  1. mistakenly passing a host pointer.
  2. using the same device pointer somewhere else
  3. specifying the machine type incorrectly
  4. having a kernel race condition

i checked and tested my code for all the issues. For the fourth one i changed it to just matrix addition but it still does not work.

I request the experienced forum members to please help me out.

  • Detailed description of the problem


i have posted the .cu file as a .txt file also attached is the header file.

  • CUDA toolkit release version

  • SDK release version
    2.2.1 (64 bit windows vista)

  • Compiler for CPU host code
    Microsoft Visual C++ 2005 (64 bit windows vista)

  • System description including:
    NVIDIA GT220 (Compute Capability 1.3) running on Intel Quad Core x64 system with 4gb RAM

Kindly help me out


posted.txt (8.09 KB)
matrixMul.h (3.86 KB)