cuInit(0) returns CUDA_ERROR_NO_DEVICE in 64 bit

I have an odd problem here. When I run my program in 32 bits, cuInit(0) retuns 0, when I run the same program in 64 bit, it returns 100 (CUDA_ERROR_NO_DEVICE). Oddly enough, other programs, including the CUDA examples, build and run in 64 bit without any problems. Even when I call cuInit(0) as the first thing my program calls, it already fails.

Has anyone seen that before? Could it be that there are any compiler/linker options that I have wrong? I’m building in Visual Studio 2013 with the multithreaded DLL runtime and Unicode support, on Windows 10 with the latest stable Nvidia drivers and the CUDA 7.0 SDK. Hardware is a Retina MacBook Pro with 750M. For what it’s worth, the same code works on Mac OS X.

It turned out to be a broken installation, an outdated 64 bit version of nvcuda.dll was in the library search path and was loaded instead of the one in the windows\system32 directory.