I have an odd problem here. When I run my program in 32 bits, cuInit(0) retuns 0, when I run the same program in 64 bit, it returns 100 (CUDA_ERROR_NO_DEVICE). Oddly enough, other programs, including the CUDA examples, build and run in 64 bit without any problems. Even when I call cuInit(0) as the first thing my program calls, it already fails.
Has anyone seen that before? Could it be that there are any compiler/linker options that I have wrong? I’m building in Visual Studio 2013 with the multithreaded DLL runtime and Unicode support, on Windows 10 with the latest stable Nvidia drivers and the CUDA 7.0 SDK. Hardware is a Retina MacBook Pro with 750M. For what it’s worth, the same code works on Mac OS X.