why can't __global__ function be loaded correctly on laptop?

Hi all, I’m quite new in CUDA programming. Recently I found my laptop, which equips a GT 330M and an integrated

display chip on mainboard, unable to execute a global kernel function properly in which I have implemented

a complex algorithm, so I used the variables such as cudaEventCreate start and end to record the execution time

of this kernel function, and I found that it took only 0.023ms to execute it, which means it can not even be

loaded into GT 330M properly to be executed, but oddly I can debug into the kernel function with Parralel Nsight 2.2.

Plz someone help me with that.

Please be noted that the global function mentioned above can be executed correctly on my desktop PC with

one GTX 560Ti, and I have install all the CUDA programming drive, SDK and toolkits on my laptop. It’s certain

that some simple CUDA codes such like matrices multiplication can be executed correctly on it.

Have you installed the CUDA laptop driver?

yes, I think did. And i could run the samples in CUDA SDK documents on my laptop as well. So I just can’t figure out where the problem is.

best regards.

Perhaps the code uses a kernel configuration that the laptop GPU can not handle, then the kernel will not start. Try to change the configuration (i.e. threads per block, amount of shared memory etc).

Actually i can use Parallel Nsight 2.2 to debug the kernel from the beginning to the end of the code, which, i think, means the configuration is ok. Everything went well through the debugging process, but turned out failed to execute it directly… sigh~