32-bit CUDA WinXP app on WinXP 64-bit Deployment considerations!


I have a library written on top of CUDA 2.0 on a WinXP 32-bit platform.

I want to deploy this library on WinXP 64-bit platform having the same CUDA and VS version. (VS version being the same rules out any CRT related issues).

But I am wondering if this would work seamlessly OR Will I hit any CUDA related issues? I am not sure if CUDA libraries link statically or dynamically. I think it loads dynamically… So, In such a case, will a 64-bit CUDA library work fine with my DLL??

Can some1 help?


Best Regards,


If you have any answers, I’ll highly be interrested :)

I have a CUDA app compiled on a WinXP 32-bits platform. A user told me that this application runs on a winXP 64 bits machine but much more slower. ( with roughly the same GPU )

Are there any issues using a 32-bits Cuda program with the 64-bits version of the display drivers?


Diss what I found in the net: 64-bits could mean 2 things in Intel

  1. Intel Xeon 64-bit – which can run native 32-bit apps as quick as any fast CPU
  2. Intel Itanium – Runs 32-bit applications using a hardware-software combo emulator @ 400 Mhz speed.

I am not interested in Itanium at all…

I am interested to know what kind of dynamic linking problems can be expected @ run-time – especially CUDA related.

Since my app is linekd with 32-bit CUDA DLL, will it work fine on 64-bit machine with “CUDA on 64-bit” ??
Is it possible to isntall “CUDA for Windows XP” on a Windows XP 64-bit box? I understand that these 64-bit boxes NEED 64-bit drivers. So, Is it possible to install 64-bit driver for WinXp-64 and install toolkit and SDK from Normal WinXP??

Any help is greatly appreciated.

Best Regards,

Driver API: CUDA will just work seamlessly, no problem.
Runtime API: for now, redistribute cudart.dll and place in the same directory as the executable–we are making improvements to this.

Vow! Thanks for this! I am delighted to know that 32-CUDART would work seamlessly with the 64-bit driver.

At the momment, this is a boon.

Thank you,

Good luck on the improvements!

Best Regards,


Thanks for this answer tmurray :). And for performance issues? Does anybody have encounter such a thing ?

My app is a OpenGL/Cuda program widely using fonctions like cudaGLMapBufferObject. I’ve seen in the forum that this fonction could be slow. Can it be more or less efficient according to the GPU used ?

My development config : GeForce 8800 GTX --> 20 fps
User config : Quadro FX 4600 --> less than 3 fps :(

I’m using Cuda 2.0 by the way.


My programs are running normal. I use 32-bit cudart.dll over Windows XP 64-bit.


I get “Memory allocation issues” after a few invocations. My app never cudaFrees @ the end as I was told that cudaFree happens automatically after program exit. This used to work fine before…

but now,

this is getting to be a messier problem! The exact error I am getting is 30 == “cudaErrorUnknown”

I am going to use the latest driver from tmurray’s driver update post… Lets see if that fixes the issue…

btw, tmurray – cAn you tell me what is the reason behind this strange behaviour of cudaMalloc()??

I even tried with CUDA apps that allocate very very less (not even an MB). It fails after 3 to 4 invocations… Very strange…

Recently noted that this failure occurs exactly on the 15th time (regardless of the application and the memory request size)

Where can I download the old drivers? I would probably try moving to the 177 series…

tumrray or any nvidia person,
Can you comment on the issues i have stated in the post above?

Thank you

Well, I see this “cudaMalloc()” problem even with the driver that “tmurray” posted off late (the one with the watchdog fix)…

Hmm… I have no clue whats going on… Did any of you guys try the XP 64-bit driver listed in CUDA website OR the one that tmurray posted off late? Are things going well for you guys??

If so – this must be a 32-bit DLL on a 64-bit driver problem… Hmm… Sigh… Hope gets resolved sooon.

I have the same problem here. Even if I cudafree all memory mallocs, my Apps stops working after it was run several times.( one after each other, not simultansiosly).
CudaMalloc gives back NULL and also cudaGLMapBufferObjectstarts to fail. If this happens no other Cuda App (even apps from the SDK) are working anymore and i have to restart.

I am now going to test it with cuda sdk apps, too. Howver even if it is a programming mistack in my app it should not affect proccesses which are called later when the faulty process is already gone.

Win32 bit App with 32bit cuda on a Windows XP x64 Platform.

Thank you Malang!!

I hope atleast now NVIDIA people would wake up and look into this…

I am looking @ deploying a test version to a customer in 10 days time and most likely a production version in a month.

Appreciate, if NVIDIA people look into this annoying problem!

Best REgards,

If you don’t mind rewriting a bit of your app, you could take a slightly different approach…

I’m mostly a .NET programmer, so I interface all my CUDA dll’s into .NET via Interop Services. I’ve been doing something like the following:

  • Compile CUDA code to PTX
  • Embed PTX strings for various kernels into my .NET application.
  • Load whatever data is necessary onto the card using driver functions + .NET interop services
  • Call the kernel via the driver API + .NET interop services (there is function that you can just pass a string containing PTX instructions, which will run that kernel)
  • Retrieve results from card memory via driver API + .NET interop

This makes things a bit simpler, since you don’t need to recompile .NET apps for different platforms, and if you only compile your kernels to PTX code, that is portable as well (across any OS/architecture). You also get the benefit of .NET technology, so you can easily do things like retrieve/store data from a database or SOAP services, etc.

Thanks for your inputs. I have a long standing question on .NET… What is .NET? Why was it introduced? What problem does it solve? – Can you give a small gist? Appreciate your time on this.

I am not conversant with the driver API. I am more OK for the RunTime API. It saves time and its cool. I wont mind using cudaRT for production code. I dont see any change in speedups despite cudaMalloc, memcpy etc… But what you have said is a very very valid point for beginners.

Do you use GASS for your .NET - CUDA interoperability?

btw, I do the following to please my .NET master… -->

My DLL code is in C++ and my customer expects it in C#. So, I ship

C++ DLLs

A C# bridge DLL that bridges my DLL entry points with c# using Interop services

A C# application

Best Regards,


Is interop services also portable across windows/linux/mac os?

And what about using Managed C++? It does everything that C# can, and it doesn’t have any problem interfacing to dlls.

Maybe it’s even possible to get nvcc to compile managed code?

I just dipped my hands in C++… What is this managed C++?? Definitely first time I am hearing…

I dont think C# is portable to linux n all… SO, its a pain, I guess… Appreciate if some1 could enlighten.

C# is portable to linux via the Mono framework. New stuff like WPF doesn’t work, but a lot of things do. I’m just wondering if Interop Services also works.

Managed C++ is a lot like C#, but backward-compatible with C++. You can recompile any ordinary C program as .NET code, and you can add .NET features to a C program. The .NET code can then run on Mono.

It’d be very cool if you could tell nvcc to tell visual c++ to compile the C code to .NET bytecode instead of x86, and basically automatically create cross-platform CUDA programs.

Thanks! Do you know any good online books for C# and .NET?

Online books? Not really. But you can bittorrent real books online… :)

We have a bug filed on this issue and are looking into it. Thanks!