Problem report cudaMalloc() returning "cudaErrorUnknown"

- Operating System

Windows XP 64-bit SP1

- Synopsis description of the problem

I have been running into cudaMalloc() problems on Windows XP 64-bit. This is not specific to any particular application. Even simple applications that just allocates little memory and frees face problems after invoking them successively for more than around 15 times.

- Detailed description of the problem

I have a Windows XP 64-bit setup on which I run 32-bit apps using 32-bit cudart.dll. I am not sure, at the momment, if this problem is specific to this setup.
Just write a simple application that would do a “cudaMalloc() and then do a cudaFree() and exit”. Lauch it more than 15 times… You will find that cudaMalloc() returns cudaErrorUnknown at some point. make sure your application is compiled as 32-bit and you use a 32-bit cudart.dll
Note that My development machine is 32-bit. So, I write programs on this 32-bit machine and ship it with cudart.dll and run the same on 64-bit machine. You may need to copy the Windows CRT dir (VCDIR/VC/redist/x86/Microsoft.VC80.CRT…) to the place where you run your app. App is compiled in release mode. I have not tried debug configuration yet.

- CUDA toolkit release version

CUDA 2.0 WinXP 64-bit, applocal cudart.dll is 32-bit CUDA 2.0

- SDK release version

The one that comes with CUDA 2.0 from the official site.

- Compiler for CPU host code

VS8 2005 SP1 (32-bit development machine)
VS9 2008 (64-bit testing machine) – jus in case if this matters – i dont use this for compilation. Just that it is installed on testing machine

[b]- System description including:

CPU type, CPU speed, installed system RAM, system type and model, video cards installed in the system, chipset type[/b]

CPU type: AMD64
CPU Speed: 2.41GHz (there r two of them)
System type and model: not sure (I am away frm the sys at the momment)
Video cards: 8800 GTX (the only card, primary)
chipset type: Model: AMD690G/V ; Manufacturer: Gigabyte:
Award BIOS Code: 08/06/2007 - RS690V - SB600 - 6A669G01C-00

I am attaching a bug report with the same details in case that would be handy.
cudaMallocBugReport.txt (2.06 KB)

Jusss promoting to top to find some1’s attention…

Appreciate a response…

Yet another user has also reported the same issue:

Look @ Malang’s comments @

Kindly do something about this…

Thanks for reporting this, we’re looking into it. Does it happen with the CUDA 2.1 beta too?

Thank you Simon! I am relieved now…

I have not tested CUDA 2.1 Beta yet.

Since some of our prospective customers are in CUDA 2.0 – we wont be testing CUDA 2.1 in near future.

Just my 2 cents here:

I am sure you guys muss be expert @ this. but just my cents – could help somebody…

I read through the following in some msdn blog. Hope you find this useful!

  1. While compiling an Application/DLL in Visual Studio – you can choose the platform to compile for.
    The default is ANY_CPU; But you could be specific saying “x86” or “x64” as well.

  2. THe problem is:
    a) if you compile your DLL as x86 – you are marking it as strictly 32-bit. i.e. a 64-bit application developed on XP 64-bit CANNOT use your DLL.
    B) If you compile you DLL as x64 - you are marking it as strictly 64-bit. i.e. a 32-bit application developed on XP-32 bit CANNOT use your DLL.
    c) If you compile your DLL as “ANY_CPU” – you are marking it as general i.e. both 64-bit as well as 32-bit applications can use your DLL.

So, if you do NOT have anything specific to 64-bit in your cudart (not sure what could be specific to 64-bit…) – one should consider compiling your DLLs as “ANY_CPU”

That way, you wont have the problem of shipping a separate 32-bit cudart.dll on 64-bit platforms…

Best Regards,

I tested it on Windows XP 64-bit SP2 – The problem persists

I tested it with the driver that tmurray released (the one that fixed Watchdog issues) - THe problem persists…

My intuition says – it is a very very simple loophole… Some “if” condition or sthg really basic… that comes into play between the 32-bit cudart and 64-bit driver… some 32-bit bit, 64-bit issue…

Okay, I have reproduced the problem and filed a bug. I’m not sure if this is even supposed to work, but it reproduces with the CUDA 2.1 beta too.

BTW, you should apply as a registered developer and then you’d be able to file your own bugs and get quicker turnaround.

Appreciate your time on this…

I dont understand why you say “I am not sure if this is even supposed to work…” – Don’t 32-bit cudaRT work with 64-bit drivers?? OR Am I mis-understanding your statement ?

I will look into the registered developer thing for sure. (I went through it 1 year back… I remember… then dropped that idea for some reason… but i think now it is a good time.)

Since this is because of mixing up of 32-bit cudaRT and 64-bit driver – this could be a very simple basic issue… (like sizeof(void*) is 64-bit in driver and 32-bit in cudart etc…)

I hope things work fine with the 64-bit cudart on the 64-bit driver :-)

I’m having the same problem but with the lower level driver API. It looks like the driver is leaking resources and not freeing anything even when my program closes. This is with the same setup (XP64 with the 32 bit runtime). Our program seems to work fine in Vista64, though (again, with the 32bit runtime).

I’m a bit confused by this though:

Are you saying that using the 32 bit runtime is not supported under 64 bit operating systems?


It looks like we already have a fix for this. I’ll let you know as soon as a driver with the fix is ready.

Vow! Thats great! It muss have been a very trivial bug to fix (i guess…)

@aziwoqpd (hmm… how do i pronounce ? :) It makes a good password than a username (just kiddin… light vein…) ),

32-bit cudart.dll is officially supported on 64-bit machines… Just that it has this cudaMalloc() problem!

As Simon has pointed out, the fix will be ready soon.

Best Regards,



I totally forgot to thank Simon. Sorry about that.


Thanks a TON for your prompt response and action! I was in a hurry to reply to aziwoqpd…

Sorry for digging out old Problems maybe I missed to find any Solution in the Forums…

Is there any fix for the Problem, yet?

I Run into the same Problem today, using the same Architecture (32 Bit on Xp64)…

thank you for any answer

Good that you are digging it out. Did you try CUDA 2.1 ?? This must have the latest driver.


Could you please confirm the status of this bug?

THank you

Best regards,


Thank you for answer,

I tried first 2.0 but did not come to a point where this bug could come up but got some other Probs using 2.0 cause of using VS Express 2008. After this i went to 2.1 which works fine until running into the described problem. And I have installed the latest driver 180.60 which is suggested by the download site of NVIDA. Further I found the Forum-Thread where some 181.xx driver links are given but was unsure about using them…

I’m really wondering why the Sample Projects like MatrixMul or somthing are not evoking this bug. Is it cause there is launched a Kernel beside Memory Allocation? Maybe I give it a try today (But its never this easy, is it?).

Maybe interesting: cuMemGetInfo showd me that with every running of the Program about 30 MB are lost. After I can use at my workstation only a NVS290 there are only about 6-7 starts I got… :-(

bye so far,

Oops… So 2.1 does NOT solve this then… :-(

We lived with this problem for long in a hope that CUDA 2.1 would come soon. We just went back 32-bit XP a week before the launch of 2.1

Anyway, Good that we moved to 32-bit.

@Simon (or @Tim),

Kindly throw some light on this issue please.

Best Regards,