I met a strange issue recently. One day i analyze my app in visual profiler and found it ran much faster than cmd, even for empty kernel. the code is like
cu_plane_handleOut<<< gridSize, blockSize >>> (*this, particles);
and kernel is empty
global void cu_plane_handleOut (cuPlane plane, cuParticle particles)
the macro TIMING_START_GPU_NONAME and TIMING_END_GPU_NONAME output the time. In profiler, the time is about 4us, it should be correct. but in cmd line, it outputs 245us, I don’t think it is right, but I have no idea what’s wrong with it.
my operating system is win7 64 bit, installed on SSD, display card is GTX 780, latest driver. the app is 32bit, so I tried both 32bit and 64bit cmd, but same results. also tried run cmd as admin.
anybody can help me out? thanks in advance.