Limitations on CUDA of 32bit app running on Windows XP 64bit Perfomance loss? how much of gpu memory

I know, I could actually check by myself, but I would be very happy if someone (Sarnath?) will have an answer for me.
I want to add CUDA support to a 32 bit application, that I cannot port to 64bit. The application is running on Windows XP 64 bit. My question is: should I expect limitations in the maximum gpu memory that will be addressable, or any other kind of limitation, with respect to 64 bit CUDA application? If I should expect a perfomance loss in what is related to CUDA driver internals, how much significant it will be?

I am not an expert in this. But I will write from my small experience.


32-bit CUDA apps run fine on Windows Xp 64-bit. By 64-bit, I dont mean Itanium and the likes. I refer to x86-64 things like Xeon that can run 32-bit apps fine. (Heard that Itanium runs 32-bit via emulation at 400Mhz speed or so…)

While running on XP 64-bit, you need to make sure that you ship the 32-bit cudart.dll along with your application. Your app wont work with 64-bit cudart.dll. This is applicable only if you use “run time” API. If you use only “driver API” – you dont need to worry.

As you might already know, 32-bit drivers WONT work on 64-bit platforms and hence you need to install 64-bit CUDA drivers.

As far as pointers, the GPU is still only 32-bit with maximum of 4GB RAM in TESLA products. Since you application is a 32-bit application, you should NOT find any pointer related problems.

Note that there was a problem of cudaMalloc() failures while running 32-bit CUDA applications on 64-bit. It is like – you run your application repeatedly and after some 14 or 15 times, you will find cudaMalloc() starts failing… And you need to reboot your app to fix the problem. I got frustrated with this problem (even with CUDA 2.1 if I remember correct) and installed WinXP 32-bit. Watch out for this problem. I reported this in the forum and Simon said he had taken note of it. Not sure what happened after that.

I was even able to demo this to a potential customer and he was quite happy with the performance. (‘terribly excited’ was the exact word he used). Note that our library does NOT use “driver API” at all. It uses only the runtime API.

Finally, you may need to re-distribute your CRT libarires along with your 32-bit EXE when you deploy it on other machines.

Apart from this, I vaguely remember some people complain about slow downs. I dont know anything about it. I have not encountered it.

Good Luck!


Hope this helps.

BEst Regards,


Sarnath. Thank you very much for your detailed answer! :thumbup:

Sure, You are welcome! And, Please do post an update with your findings if you get a chance to work on XP64-bit.


Best REgards,