I have had a similar error with some of the other exaples for CUDA 2.1 SDK and Visual C++ Express Edition in emu mode. I managed to work around it, not sure if it works for the deviceQuery example but maybe you will be able to use a similar approach.
The trouble I had with many examples in emulation mode on my old laptop (i.e. without hardware) was that they aborted at the fist call to cutilCheckMsg( … ) with a report of an initialization error.
I tracked this back to the call
cudaSetDevice( cutGetMaxGflopsDeviceId() );
And the cutil function cutGetMaxGflopsDeviceId(), defined in cutil_inline.h
The problem here is that this function calls cudaGetDeviceCount( … ) which according to the 2.1 reference manual returns 0 if a 1.0 compatible hardware device exists and 1 if no such device (as if running in emulation without proper hardware). No check of the return value is made, and thus cutGetMaxGflopsDeviceId() tries to make device queries on hardware devices which does not exist.
However the implementation of cudaGetDeviceCount( … ) does not seem to follow this specification, and returns 0 even if no device exists, so a run time check does not seem possible at the moment. (Probably part of the “silly behavior” that tmurray talked about earlier in this thread. :) )
I got around this by checking the preprocessor define DEVICE_EMULATION that nvcc gets sent in emulation mode;
thus modifying cutGetMaxGflopsDeviceId() in cutil_inline.h as
// This function returns the best GPU (with maximum GFLOPS)
// Modified to handle emulation
inline int cutGetMaxGflopsDeviceId()
{
#if __DEVICE_EMULATION__
return 0;
#else
// ORIGINAL cutGetMaxGflopsDeviceId() code BEGIN
int device_count = 0;
cudaGetDeviceCount( &device_count );
cudaDeviceProp device_properties;
int max_gflops_device = 0;
int max_gflops = 0;
int current_device = 0;
cudaGetDeviceProperties( &device_properties, current_device );
max_gflops = device_properties.multiProcessorCount * device_properties.clockRate;
++current_device;
while( current_device < device_count )
{
cudaGetDeviceProperties( &device_properties, current_device );
int gflops = device_properties.multiProcessorCount * device_properties.clockRate;
if( gflops > max_gflops )
{
max_gflops = gflops;
max_gflops_device = current_device;
}
++current_device;
}
return max_gflops_device;
// END original code
#endif
}
Everything between the ORiGINAL BEGIN and END comments is the original code, i.e. the only real change is the #if checking for emulation mode and then returning 0 without performing any computations. In hardware mode the original code runs.
This does the trick for simple examples such as bitonic, scalarProd, et c., that has its main defined in a .cu file. However (yes, there is another however), it seems like the visual studio emulation build targets does not define DEVICE_EMULATION for the .cpp files. For the projects where cutGetMaxGflopsDeviceId is called in a .cpp file the VC++ compiler is used instead of nvcc. As the flag is not defined the same error occurs. The fix is to edit the build target properties for the emu builds and add DEVICE_EMULATION to the list of preprocessor definitions.
I have not tested this for all examples, but it seems to work for most.
Hope it helps someone.
.Lukas