SSAO in CUDA & Depth Buffer Access

walterman · April 7, 2009, 12:13am

I’m writting my custom version of AO in CUDA.

The kernel is working nicely, and i’m working in some optimizations.

The problem is that i dunno how to capture the Depth Buffer from the Direct3D application in which i’m integrating all this stuff.

It’s a Direct3D 8 program, and i dunno how to copy the Depth Buffer from the application to the system memory.

Is there a function in the nVidia driver to copy the depth buffer to the system memory ? I heard that there was one in
another board, but the user who posted it does not remember.

Thank you very much !

Smokey · April 7, 2009, 12:23am

CUDA only supports inter-op with OpenGL. Direct3D 9, and Direct3D 10.

For D3D 9 you’re interested in cu(da)D3D9RegisterResource & cu(da)D3D9MapResource.

walterman · April 8, 2009, 11:44am

Ok,

I found a ‘partial’ solution:

IDirect3DTexture8 *depthStencil_Texture;
m_pIDirect3DDevice8->CreateTexture(width, height, 1, D3DUSAGE_DEPTHSTENCIL, ((D3DFORMAT) MAKEFOURCC(‘I’,‘N’,‘T’,‘Z’)), D3DPOOL_DEFAULT, &depthStencil_Texture);

IDirect3DSurface8 *depthStencil_Surface;
depthStencil_Texture->GetSurfaceLevel(0, &depthStencil_Surface);

m_pIDirect3DDevice8->SetRenderTarget(renderTarget, depthStencil);

// Do the normal render of the game …

At the end of the rendering (when the game calls to the ::Present() method), i do this:

m_pIDirect3DDevice8->SetTexture(0, depthStencil_Texture);

& i set a D3DFMT_A8R8G8B8 new render target, and a pixel shader that does a ‘texld r0, t0’.

At the end i have a render target with the INTZ values, and i can post process it later with my CUDA kernel.

But, there is a huge problem → MultiSampling does NOT work ! :sad:

I need to turn off MultiSampling to bind the ‘depthStencil_Surface’ with my ‘renderTarget’ :sad:

So, atm, this works, but, it’s useless for my project, because i lose the FSAA.

Also, i still dunno how to convert the INTZ values, but, at least i can read them.

Any tip ?

Smokey · April 8, 2009, 11:15pm

The only tip I can really give you is to not use CUDA for post processing DirectX 8 apps…

a) You’re limiting yourself to nVidia GPUs (8000 and above only, as well)
B) there’s no direct interop between D3D8 and CUDA, so you’re going to have to manually copy from host<->device, or hope tha D3D8 textures stay compatible with D3D9 textures (as you’re doing now).
c) Cg and/or HLSL will make things a ton easier for you, taking care of memory access patterns, etc…

walterman · April 10, 2009, 12:16am

I have a CPU path too, but you are right. It’s only going to run decently on the nVidia cards.

About the interop, i’m using copies between the device & host. This is not optimal, but it works. You can check a working sample here:

http://forums.nvidia.com/index.php?s=&…st&p=516242

I can’t sacrifice the AA. It costed me a lot of work.

I’m thinking about writting a wrapper from DX8 to DX10. I dunno if this is possible, but, prolly this is my best option.

Topic		Replies	Views
CUDA and Direct3D CUDA Programming and Performance	5	4109	January 21, 2008
CUDA interop with D3D12 possible? CUDA Programming and Performance	8	3030	June 28, 2024
[SOLVED] OptiX 5 interop DirectX 11 example? OptiX	9	2376	June 14, 2022
OpenGL interoperability Performance issue concern CUDA Programming and Performance	8	6673	December 3, 2008
CUDA for real-time video processing? CUDA Programming and Performance	1	4265	April 24, 2007
can't bind Surface to CudaArray obtained via Interop CUDA Programming and Performance	9	4203	February 2, 2012
Interop with Unity/D3D OptiX	8	1157	June 27, 2022
DX11 <> CUDA interop is slow compared to GL <> CUDA CUDA Programming and Performance	3	3030	January 5, 2020
CUDA and OpenGL data transfer CUDA Programming and Performance	9	21295	October 6, 2007
D3D interop RELOADED isn't supposed to be better than OpenGL...? CUDA Programming and Performance	2	3702	April 16, 2009

SSAO in CUDA & Depth Buffer Access

Related topics