CUDA SDK 2.1 breaks emulation features when no CUDA hardware is installed

CUDA Toolkit 2.1 and CUDA SDK 2.1 installed on a Windows Vista Business laptop without nVidia hardware.

Compiling deviceQuery SDK sample in EmuDebug mode.

I am getting the following message:

cudaSafeCall() Runtime API error in file <>, line 59: feature is not yet implemented.

line 59 of that piece of code says

cutilSafeCall(cudaGetDeviceProperties(&deviceProp, dev));

With SDK versions 1.1 and 2.0 I was able to run the SDK samples just fine even when no nVidia hardware
was present. SDK version 2.1 took away that capability.


download the 181.x driver~ change the inf file. add your device into it.

No the idea was to do development on a laptop with Intel grapics, in emulation mode.

That’s what emulation mode is for, right? Developing without having the actual hardware.


oh. hrmmmmm. I bet it’s related to polling the watchdog timer. sounds like a bug, thanks for the heads-up.

I am a C / C++ / Java developer, starting with CUDA. I have a problem that may or may not be related to this thread.

  • I have a dev PC with a NVidia GeForce 7950 GX2, running on a Windows XP2 professional PC, Intel Core™ 2 CPU 6600 @ 2.4GHx.
  • Initially I did not appreciate that the GX2 was not CUDA compatible.
  • I downloaded and installed the current driver, toolkit and SDK, in the order suggested (NVIDIA Driver for Microsoft Windows XP with CUDA Support (181.20), CUDA Toolkit version 2.1 for Windows XP, CUDA SDK 2.1 for Windows XP)
  • I realised the chip was not compatible, and downloaded the emulator, setting it to emulate a G80
  • I compiled deviceQuery with MSVC Express Edition 2008 - no errors - in EmuDebug and EmuRelease.
  • on executing, in either debug or release mode, I get:
    cudaSafeCall() Runtime API error in file <>, line 59: initialization error.
  • line 59, in turn, is the second cuda call in the code below:

main( int argc, char** argv)
int deviceCount;
if (deviceCount == 0)
printf(“There is no device supporting CUDA\n”);
int dev;
for (dev = 0; dev < deviceCount; ++dev) {
cudaDeviceProp deviceProp;
cutilSafeCall(cudaGetDeviceProperties(&deviceProp, dev));

The first call returns deviceCount as 1.

How can I set this right?

Many thanks


Any workaround for this problem? I’m new and trying to get started while waiting my 260 gets delivered by the end of the week.

I tried downloading the CUDA Toolkit version 2.0 for Windows XP but the link ( is broken.

Should I go back to version 1.1 of I’m better to wait to get actual hardware?


Debug and release both try to use the GPU, which it can’t do in your case. That sounds like it’s a bug in the sample (because right now cudaGetDeviceCount has very silly behavior).

I have had a similar error with some of the other exaples for CUDA 2.1 SDK and Visual C++ Express Edition in emu mode. I managed to work around it, not sure if it works for the deviceQuery example but maybe you will be able to use a similar approach.

The trouble I had with many examples in emulation mode on my old laptop (i.e. without hardware) was that they aborted at the fist call to cutilCheckMsg( … ) with a report of an initialization error.

I tracked this back to the call

cudaSetDevice( cutGetMaxGflopsDeviceId() );

And the cutil function cutGetMaxGflopsDeviceId(), defined in cutil_inline.h

The problem here is that this function calls cudaGetDeviceCount( … ) which according to the 2.1 reference manual returns 0 if a 1.0 compatible hardware device exists and 1 if no such device (as if running in emulation without proper hardware). No check of the return value is made, and thus cutGetMaxGflopsDeviceId() tries to make device queries on hardware devices which does not exist.

However the implementation of cudaGetDeviceCount( … ) does not seem to follow this specification, and returns 0 even if no device exists, so a run time check does not seem possible at the moment. (Probably part of the “silly behavior” that tmurray talked about earlier in this thread. :) )

I got around this by checking the preprocessor define DEVICE_EMULATION that nvcc gets sent in emulation mode;

thus modifying cutGetMaxGflopsDeviceId() in cutil_inline.h as

// This function returns the best GPU (with maximum GFLOPS)

// Modified to handle emulation

inline int cutGetMaxGflopsDeviceId()



return 0;


// ORIGINAL cutGetMaxGflopsDeviceId() code BEGIN

		int device_count = 0;

	cudaGetDeviceCount( &device_count );


	cudaDeviceProp device_properties;

	int max_gflops_device = 0;

	int max_gflops = 0;


	int current_device = 0;

	cudaGetDeviceProperties( &device_properties, current_device );

	max_gflops = device_properties.multiProcessorCount * device_properties.clockRate;


	while( current_device < device_count )


		cudaGetDeviceProperties( &device_properties, current_device );

		int gflops = device_properties.multiProcessorCount * device_properties.clockRate;

		if( gflops > max_gflops )


			max_gflops		= gflops;

			max_gflops_device = current_device;




	return max_gflops_device;

// END original code



Everything between the ORiGINAL BEGIN and END comments is the original code, i.e. the only real change is the #if checking for emulation mode and then returning 0 without performing any computations. In hardware mode the original code runs.

This does the trick for simple examples such as bitonic, scalarProd, et c., that has its main defined in a .cu file. However (yes, there is another however), it seems like the visual studio emulation build targets does not define DEVICE_EMULATION for the .cpp files. For the projects where cutGetMaxGflopsDeviceId is called in a .cpp file the VC++ compiler is used instead of nvcc. As the flag is not defined the same error occurs. The fix is to edit the build target properties for the emu builds and add DEVICE_EMULATION to the list of preprocessor definitions.

I have not tested this for all examples, but it seems to work for most.

Hope it helps someone.


Thanks m95lag for the trick. Does it apply only to VC++ users? I had the same problem, but even after applying the modify you suggested I still get

[codebox]eu@aer:~/NVIDIA_CUDA_SDK/bin/linux/emurelease$ ./particles

grid: 64 x 64 x 64 = 262144 cells

cutilCheckMsg() CUTIL CUDA error: integrate kernel execution failed in file <>, line 147 : initialization error.


for many examples (not for simpleGL). Driver 180.22, CUDA 2.1.

Any tip?

I need emulation 'cause I don’t have CUDA enabled hardware (GeForce Go 7300).