I just ran into what seems like the same interop bug.
I’m using GLFW / CUDA 7.5 / 358.x in a Debug build.
The first CUDA Runtime device function after the initial cudaSetDevice() crashes in the CUDA runtime.
The non-Debug build appears to work fine.
Reverting from 358.x to 355.x solved the problem.
int
main(int argc, char* argv[])
{
//
// INIT GLFW
//
...
//
// INIT CUDA DEVICE
//
// enable pinned/write-combined allocations
cuda(SetDeviceFlags(cudaDeviceMapHost | cudaDeviceScheduleBlockingSync));
cuda(SetDevice(cuda_device_id));
// force initialization
cuda(Free(0)); // <--- Debug build crashes hard here deep in the CUDA RT on a NULL pointer.
// It fails on the first call after cudaSetDevice() whether it's this
// initialization idiom or something innocuous like cudaDeviceGetLimit().
I tried both Quadro and GeForce drivers in 358.x and 355.x.
355.x works for me.
System: Win7/x64 | Quadro K620 | GTX 750 Ti | GTX 980 | CUDA 7.5
Update: the problem might be in the PTX compilation phase.
For Debug builds I am witnessing the following behavior:
358.91: -gencode=arch=compute_52,code=compute_52 CRASH in CUDA runtime function
358.91: -gencode=arch=compute_52,code=sm_52 OK
354.42: -gencode=arch=compute_52,code=compute_52 OK
354.42: -gencode=arch=compute_52,code=sm_52 OK
Just tested 359.00 – my Debug GLFW interop app still crashes in the first CUDA function after cudaSetDevice():
359.00: -gencode=arch=compute_52,code=compute_52 CRASH in CUDA runtime function
359.00: -gencode=arch=compute_52,code=sm_52 OK
358.91: -gencode=arch=compute_52,code=compute_52 CRASH in CUDA runtime function
358.91: -gencode=arch=compute_52,code=sm_52 OK
354.42: -gencode=arch=compute_52,code=compute_52 OK
354.42: -gencode=arch=compute_52,code=sm_52 OK