simpleD3D Sample Disasster

I am trying to display with directX on the screen some results I calculated with CUDA.

I have these CUDA functions:


CopyResultToDisplay (int2 A_PatternSize, float4 * A_Color, float4 * A_Result, IDirect3DVertexBuffer9 * A_VB)


	if (A_VB==NULL)



    ColorVertex * MapVB;

    CUDA_SAFE_CALL(cudaD3D9MapVertexBuffer((void**)&MapVB, A_VB));

	GridDim g = CalculateGridDim (A_PatternSize);

    CopyIntoDirectX<<< g.grid, g.threads >>>(A_PatternSize, A_Color, A_Result, MapVB);




extern "C" void

InitDevice (IDirect3D9 * A_D3DObject, IDirect3DDevice9 * A_Device)



    A_D3DObject->GetAdapterIdentifier(D3DADAPTER_DEFAULT, 0, &AdapterId);

    int Device;

    if (cudaSuccess == cudaD3D9GetDevice(&Device, AdapterId.DeviceName))




extern "C" void

CleanupD3D9 ()




__global__ void

CopyIntoDirectX (int2 A_Size, float4 * A_Color, float4 * A_Result, ColorVertex * A_VB)


	long tx = threadIdx.x+blockIdx.x*20;

	long ty = threadIdx.y+blockIdx.y*20;

//	A_VB[tx+ty*A_Size.x].Color = 100;

	A_VB[0].Color = 100;


If I put “A_VB[0].Color” in a remark, then everything works as usual.

I get to see the vertex buffer displayed into the window, but without the values from the kernel of CUDA.

If I put A_VB[0].Color then it seems that the GPGPU function I calculated in CUDA(not CopyIntoDirectX), is not being called. I can tell this because the FPA becomes a lot higher.

However, when I exit the program I get a memory exception of some sort.

I don’t know why it doesnt work.

Also, when I try to run simpleD3D program in release it gets stuck(in Emu it works).

Release of simpleD3D worked for me before(I think), but now it doesnt.

I will check again, maybe it never works in release mode.

How can I tell what I am doing wrong? Are there no errors that tell me when I did something wrong?

Thank you.

It is quite a disasster really.
I was supposingly able to do the GPGPU calculation and display it with directX.
However, sometimes it works perfectly and I see the result displayed.
And sometimes there is a big slow down, I can see my vertex buffer display on the window, but it is not filled with the result of CUDA calculations.
Sometimes it also happen when I use the simpleD3D sample.
However, in the case of the simpleD3D I am not sure if it only happens after I run my own program or it happens by itself as well.
What could cause the CUDA sample itself to not work?
Even if I am doing something wrong, why does it affect the sample from NVIDIA?
And even if I am doing something wrong, is there no tool to detect CUDA memory leak or whatever cause this?

I have tried to run simpleD3D in debug configuration and I reach a breakpoint every time I run it like this.

The breakpoint is on the file free.c, somewhere in this code:

#endif  /* CRTDLL */

        else    //  __active_heap == __SYSTEM_HEAP

#endif  /* _WIN64 */


            retval = HeapFree(_crtheap, 0, pBlock);

            if (retval == 0)


                errno = _get_errno_from_oserr(GetLastError());




This breakpoint is reached after I close the program.

I reach a breakpoint in my own software as well, but its a different one.

Does it suppose to reach this breakpoint in the NVIDIA sample?

Or does it mean there is a bug in the sample?

Why does this happen?