app on ATI card ends at cudaMalloc()


I’m doing my first steps with CUDA. I like it very well. It’s easy to use. You can hardly do anything wrong,


Have a look at this code:


bool InitCUDA(void){return true;}


bool InitCUDA(void)


int count = 0;

int i = 0;


if(count == 0) {

	fprintf(stderr, "There is no device.\n");

	return false;


for(i = 0; i < count; i++) {

	cudaDeviceProp prop;

	if(cudaGetDeviceProperties(&prop, i) == cudaSuccess) {

		if(prop.major >= 1) {





if(i == count) {

	fprintf(stderr, "There is no device supporting CUDA.\n");

	return false;



printf("CUDA initialized.\n");

return true;





/* Example */



global static void HelloCUDA(char* result, int num)


int i = 0;

char p_HelloCUDA[] = "Hello CUDA!";

for(i = 0; i < num; i++) {

	result[i] = p_HelloCUDA[i];





/* HelloCUDA */



int main(int argc, char* argv)


if(!InitCUDA()) {

	return 0;


char	*device_result	= 0;

char	host_result[12]	={0};

CUDA_SAFE_CALL( cudaMalloc((void**) &device_result, sizeof(char) * 11));


It is exactly the code (cut at the end) that came out of the app wizard ( (I like this wizard, thanks, kyzhao!).

In my computer there’s an ATI Radeon 9250 PCI (high-end card!!! ;) ).

Now when I step through the code with F11 (in VS2005), a device is said to be found,

although I don’t have an nVidia card. But at the cudaMalloc() call my app suddenly

terminates itself.

I did use the Debug-config set up by the wizard.

What’s going on inside here?

WOULD it work on my ATI card, too (I don’t think so)?

it definately will not work on an ati card. you have to use the emulation mode to emulate a cuda device on your cpu.

Seems to be a bug with the wizard’s code. Do you mind figuring out what it is?

cudaGetDeviceCount() is a tricky function.

When you have “n” CUDA enabled devices, it will return “n” standing for “n” devices.
When you have '0" CUDA enabled device(which is your case), the application will return “1” and will use the “EMULATOR Device” as Device 0.

So, if your application is NOT compiled with “-deviceemu” option – your cudaMalloc() would fail. Since you dont have any actual CUDA device, you should compile your code with device emulation for your “cudaMalloc()” to succeed.

Hope this helps

BEst Regards,

That doesn’t sound right.

It should only report the emulation device 0 when you compile for emulation.

The behavior I’m sure NVIDIA intended was for emulation to always work as a fallback, no matter how you compile or if there is CUDA hardware.

Since that important functionality hasn’t yet been implemented, NVIDIA must change cudaGetDeviceCount() to work sensibly.

(Also the wizard should do a workaround for the time being.)

The workaround is simple – just check compute capability of devices. Emulation device will have 0.x or 0.0 compute capability, AFAIR.

I would not think so because “cudaMalloc()” fails on the emulation device if your code is NOT compiled with “-deviceemu” anyway…

They are just exposing to hint at you that we support emulation and go recompile your code and come back…