SSAO in CUDA & Depth Buffer Access

I’m writting my custom version of AO in CUDA.

The kernel is working nicely, and i’m working in some optimizations.

The problem is that i dunno how to capture the Depth Buffer from the Direct3D application in which i’m integrating all this stuff.

It’s a Direct3D 8 program, and i dunno how to copy the Depth Buffer from the application to the system memory.

Is there a function in the nVidia driver to copy the depth buffer to the system memory ? I heard that there was one in
another board, but the user who posted it does not remember.

Thank you very much !

CUDA only supports inter-op with OpenGL. Direct3D 9, and Direct3D 10.

For D3D 9 you’re interested in cu(da)D3D9RegisterResource & cu(da)D3D9MapResource.


I found a ‘partial’ solution:

IDirect3DTexture8 *depthStencil_Texture;
m_pIDirect3DDevice8->CreateTexture(width, height, 1, D3DUSAGE_DEPTHSTENCIL, ((D3DFORMAT) MAKEFOURCC(‘I’,‘N’,‘T’,‘Z’)), D3DPOOL_DEFAULT, &depthStencil_Texture);

IDirect3DSurface8 *depthStencil_Surface;
depthStencil_Texture->GetSurfaceLevel(0, &depthStencil_Surface);

m_pIDirect3DDevice8->SetRenderTarget(renderTarget, depthStencil);

// Do the normal render of the game …

At the end of the rendering (when the game calls to the ::Present() method), i do this:

m_pIDirect3DDevice8->SetTexture(0, depthStencil_Texture);

& i set a D3DFMT_A8R8G8B8 new render target, and a pixel shader that does a ‘texld r0, t0’.

At the end i have a render target with the INTZ values, and i can post process it later with my CUDA kernel.

But, there is a huge problem → MultiSampling does NOT work ! :sad:

I need to turn off MultiSampling to bind the ‘depthStencil_Surface’ with my ‘renderTarget’ :sad:

So, atm, this works, but, it’s useless for my project, because i lose the FSAA.

Also, i still dunno how to convert the INTZ values, but, at least i can read them.

Any tip ?

The only tip I can really give you is to not use CUDA for post processing DirectX 8 apps…

a) You’re limiting yourself to nVidia GPUs (8000 and above only, as well)
B) there’s no direct interop between D3D8 and CUDA, so you’re going to have to manually copy from host<->device, or hope tha D3D8 textures stay compatible with D3D9 textures (as you’re doing now).
c) Cg and/or HLSL will make things a ton easier for you, taking care of memory access patterns, etc…

I have a CPU path too, but you are right. It’s only going to run decently on the nVidia cards.

About the interop, i’m using copies between the device & host. This is not optimal, but it works. You can check a working sample here:…st&p=516242

I can’t sacrifice the AA. It costed me a lot of work.

I’m thinking about writting a wrapper from DX8 to DX10. I dunno if this is possible, but, prolly this is my best option.