I encounter a problem with CUDA+Visual Studio Express 2008 on Windows XP : when my C++ managed .NET program starts, it outputs a messagebox error claiming an assert failure on _CrtIsValidHeapPointer.

So, let’s explain :
I have a Visual Studio project that loads cudart.dll lazyly at run-time. It works well. The program can run on a machine with or without CUDA installed, and it tells if it is installed or not.
Then, I have written a minimum kernel.cu.
In the CUDA SDK, I have found a cuda.rules files that I could import in my VS project, so that it knows what to do with my kernel.cu file (calling nvcc and so on)
Thus, my project still compiles and links correctly with kernel.cu being handled by nvcc.

But when I launch my program (in debug mode), the assert failure of _CrtIsValidHeapPointer is raised. The kernel is not even called, there is no reference to it in my C++ code.

I could ignore it, but… I would prefer being sure that my base code is not broken.
Any idea of what is going on ?

I have tried to match the usual configuration found in CUDA sdk (multithreaded CRT DLL (/MDd), libcmt excluded or not, and so on…)


Pierre Chatelier

I feel rather isolated. My problem seems very uncommon…
So, here is a minimal sample project (Visual C++ 2008 Express) that reveals the error.
Everything compiles and links ok (except that /MTd won’t work, and /MDd outputs some warnings).
But the program won’t run, the _CrtIsValidHeapPointer is raised (in Debug Mode), and the program won’t go further anyway (even in Release).

I suspect a MSVCRT mess. (As usual since VS 2008).

Any clue to help ?


Pierre Chatelier
TestCUDA.zip (17.8 KB)

I have the same issue. It seems if the common language runtime environment is used it is not possible to use CUDA.

In my experience even if a totally empty! .cu file is compiled with NVCC we get this error. However, excluding all .cu files from compilation but still linking with cudart.lib (through for example #pragma comment(lib, “cudart.lib”) does work. cudaRegisterAll seems to fail and application dies. Very, very annoying, I’ll see if MSVC2005 does any better.

>  msvcr90d.dll!_msize_dbg(void * pUserData=0x4250b1c5, int nBlockUse=2)  Line 1511 + 0x30 bytes	C++

  msvcr90d.dll!_dllonexit_nolock(int (void)* func=0x420163c5, void (void)* * * pbegin=0x0012f000, void (void)* * * pend=0x0012eff8)  Line 295 + 0xd bytes	C

  msvcr90d.dll!__dllonexit(int (void)* func=0x420163c5, void (void)* * * pbegin=0x0012f000, void (void)* * * pend=0x0012eff8)  Line 273 + 0x11 bytes	C

  rabgraph.exe!_onexit(int (void)* func=0x0051d200)  Line 110 + 0x1b bytes	C

  rabgraph.exe!atexit(void (void)* func=0x0051d200)  Line 127 + 0x9 bytes	C


inky_kernel_cpp1_ii_626cb103()  Line 9 + 0x19 bytes	C++

  [Managed to Native Transition]	

  rabgraph.exe!_initterm(void** pfbegin = 0x00521248, void pfend = ) Line 130	C++

  rabgraph.exe!<CrtImplementationDetails>::LanguageSupport::InitializeNative() Line 555	C++

  rabgraph.exe!<CrtImplementationDetails>::LanguageSupport::_Initialize() Line 678	C++

  rabgraph.exe!<CrtImplementationDetails>::LanguageSupport::Initialize() Line 876	C++

  rabgraph.exe!?.cctor@@$$FYMXXZ() Line 922 + 0x9 bytes	C++

Hi Chacha, I think I might have found a “solution” to this issue. Only in quotation marks as it’s a pita, but as far as I have tested, it works.

What got me thinking is that (at least in my application) the CUDA runtime crashesh for cudaRegisterAll()… so what if we used the Driver API?

So far, so good :)

There are several issues though you need to be aware of:

  1. nvcc needs the -cubin parameter passed which strips ALL host code from your .cu file, only keeping the GPU

  2. because we are using the driver API there is no emulation mode possible :(

  3. we need to use the cumbersome syntax, cuInit(), context management, cuLaunch(), pass parameters, setup up launch, etc.

For me, I needed to remove all host-code from the .cu file, duplicate code that is used both in device and host mode (or at least inline in the header file), but the application at least runs :)

If I link against cudart.lib as well (next to cuda.lib), it seems I can also keep using cudaMalloc(), cudaMemcpy(), etc. and pass its pointers are parameters to the kernel function. eg

CUdeviceptr d_ptr;

cudaMalloc((void**)&d_ptr, 16);

cuParamSeti(kernel, 0, d_ptr);


cudaMemcpy(dst, (void*)d_ptr, cudaMemcpyDeviceToHost);

I am not too happy, but at least the kernel runs :)


Thanks for all your experiments ! At least it conforts me in the idea that I did not do anything wrong : this is just a limitation of interoperability on Windows (how surprising…)
Your solution is a too restrictive workaround for my needs, but this is great to have found it.
I hope that NVidia will find a fix for the next versions.


Pierre Chatelier

Well, it depends. All your kernel calls can stay exactly the same, even all your structs/mixed functions if they are in a header file that both MSVC and NVCC compile. The only additional work you have to do is the slightly different initialisation and execution. Not saying it is a good solution, but at least it lets me continue my research :)

Anyone else have any “solutions” that will still exclusively use cuda runtime?

So, nobody in the CUDA team to tell if this issue is being addressed ?

Gee… I found a solution that I do not even understand.
In the project options, Linker, change :
-(Advanced) the Entry point from “main” to nothing
-(System) the Sub System from “SUBSYSTEM:WINDOWS” to not set.

Well well well… I need a coffee.

We’re working on a solution, and hopefully you’ll see something with CUDA 2.2.

That would be great! Any ETA on CUDA 2.2?


I confirm that with CUDA 2.2, I do not have the problem any more.
Now, I have another one : loading cudart.dll dynamically


Pierre chatelier

CUDA 2.2 didnt solve this problem for my project. Always the same assert _CrtIsValidHeapPointer in debug mode.

Are there any news? Iam running out of time… :-/

additional post: http://forums.nvidia.com/index.php?showtopic=100619