Raycasting performance on GPU

DarkAr · September 25, 2008, 1:39pm

Ok, i’m finishing my raycaster, but i’m curious about performance other of You get on this field.
My results so far are (for ‘sponza’ model http://hdri.cgtechniques.com/~sponza/files/):

GFX card: NVIDIA GeForce GTX 280 @ 1.3 GHZ
KDTree construction with SAH (depth = 20) → 1.3 sec (with is quite ok compared to optimized CPU solution)

Average performance of traversing → 2.7 MRays / second

IMHO the performance of traversal is poor ;(
Currently i’m using push-down with shortstack traversal algorithm.
Stack is 8 items per thread, this gives 192 threads per block (shared memory size is soo small), each thread calculates one ray … and the whole thing suffers a lot from warp divergences External Image
Packetized traversal eats to many registers and at the end result is worse
(i’v experimented with 2x2 packets)

Simon_Green · September 25, 2008, 2:25pm

You might want to check out this guy’s implementation, he has some good tips for CUDA optimization:
[url=“bouliiii's blog: Cuda real time ray tracing - 100 millions ray/s?”]http://bouliiii.blogspot.com/2008/08/real-...h-cuda-100.html[/url]

E.D_Riedijk · September 25, 2008, 4:02pm

NVIDIA demoed a raytracer at NVISION (and other events) that uses BVH (possibly with KD-trees in the BVH btw). As far as I remember, they also do packet traversal.

Simon_Green · September 25, 2008, 4:11pm

No, I believe it’s a simple thread-per-ray.

The presentation is here:
[url=“http://developer.nvidia.com/object/nvision08-IRT.html”]http://developer.nvidia.com/object/nvision08-IRT.html[/url]

E.D_Riedijk · September 25, 2008, 8:51pm

I think you are right. I had in my mind they tried to group threads (read rays) that follow comparable paths together in blocks, but that seems not correct indeed.

DarkAr · September 26, 2008, 8:05am

after some tweaking & slashing i’v went from 2.7 MRays / second to ~12 MRays / second
(i’v reduced the number of if’s blocks to the minimum, some work is done redundantly now, but the divergence is much lower)
another thing i’v discowered is almost 100% texture cache miss when sampling KDTree branches, now i’m trying to rearange nodes table to be more cache friendly :)

for the sake of experiment i’v set camera fov to 1.0 (all rays should take almost that same path – no divergences) and in this case performance was ~70 MRays / second

all tests are done looking from the corner of sponza scene, so the whole atrium is visible to the camera.

st5486 · September 26, 2008, 10:11am

Hi,

I am also starting a ray tracing project and would like to know how you debug your CUDA code. I’ve tried setting up emuDebug, but in the C++ code, when setting up the D3D texture I get an error message saying “this feature is not yet implemented”. I also noticed in the CUDA SDK that the other D3D texture examples didn’t have any debug builds.

Please let me know how you handled this,

Thanks,

Seb

Ailleur · September 26, 2008, 11:42am

Not yet implemented usually means you have a bad toolkit/driver combo. Some of the stuff in the toolkit isnt in the driver.
Go to the cuda download page and download the most recent stuff there if you can.

DarkAr · September 26, 2008, 12:05pm

with the latest driver & toolkit (i’m using x64 windows vista version) there is no problem with binding memory to textures on emu.

the other side of the stick is that some CUDA vs. DX9 interops do not work
(cudaD3D9(Register/Map)Resource and you need to emulate this by manualy locking the texture to get pointer and bind is as texture or pass to the kernel)
but that’s only few additional lines of code so you could live with this :)

st5486 · September 26, 2008, 1:01pm

Ok, thanks guys. DarkAr, any chance you could give an example of how to lock texture manually? I’m pretty much a noob when it come to CUDA programming at the mo!

DarkAr · September 26, 2008, 1:40pm

here you go:

void c_KDTREE_GPU::RayCastGPU(c_D3DTEXTURE *rt)

{

#ifndef __DEVICE_EMULATION__

 IDirect3DResource9 *rttex = (IDirect3DResource9*)rt->GetTexture();

 if (cudaD3D9MapResources(1, &rttex) != cudaSuccess)

 Â freaked_error("Cannot map Cuda resource !");

void Â *TexData;

 size_t TexPitch;

 cudaD3D9ResourceGetMappedPointer(&TexData, rttex, 0, 0);

 cudaD3D9ResourceGetMappedPitch Â (&TexPitch, NULL, rttex, 0, 0);

#else

 void *TexData;

 int Â  TexPitch;

if (rt->Lock(0, D3DLOCK_DISCARD, &TexData, &TexPitch) == FALSE)

 Â return;

#endif

// .... call kernel

#ifndef __DEVICE_EMULATION__

 if (cudaD3D9UnmapResources(1, &rttex) != cudaSuccess)

 Â freaked_error("Cannot unmap Cuda resource !");

#else

 rt->Unlock(0);

#endif

return;

}

in other words in emulation mode you need to obtain pointers to dx9 object yourself ;)

st5486 · September 27, 2008, 12:03pm

here you go:

void c_KDTREE_GPU::RayCastGPU(c_D3DTEXTURE *rt)

{

#ifndef __DEVICE_EMULATION__

 IDirect3DResource9 *rttex = (IDirect3DResource9*)rt->GetTexture();

 if (cudaD3D9MapResources(1, &rttex) != cudaSuccess)

 Â freaked_error("Cannot map Cuda resource !");

void Â *TexData;

 size_t TexPitch;

 cudaD3D9ResourceGetMappedPointer(&TexData, rttex, 0, 0);

 cudaD3D9ResourceGetMappedPitch Â (&TexPitch, NULL, rttex, 0, 0);

#else

 void *TexData;

 int ï¿½ Â TexPitch;

if (rt->Lock(0, D3DLOCK_DISCARD, &TexData, &TexPitch) == FALSE)

 Â return;

#endif

// .... call kernel

#ifndef __DEVICE_EMULATION__

 if (cudaD3D9UnmapResources(1, &rttex) != cudaSuccess)

 Â freaked_error("Cannot unmap Cuda resource !");

#else

 rt->Unlock(0);

#endif

return;

}

in other words in emulation mode you need to obtain pointers to dx9 object yourself ;)

[snapback]444520[/snapback]

Hello, thanks for your reply, but I’m still having trouble, do you think you could tell me what rt is? I don’t know which of my members has Lock(). Here’s my code:

bool g_bDone = false;

IDirect3D9  * g_pD3D; // Used to create the D3DDevice

IDirect3DDevice9* g_pD3DDevice;

const unsigned int g_WindowWidth = 512;

const unsigned int g_WindowHeight = 512;

// Data structure for 2D texture shared between DX9 and CUDA

struct

{

	IDirect3DTexture9* pTexture;

	int width;

	int height;	

} g_texture_2d;

// The CUDA kernel launchers that get called

extern "C" 

{

	void runTest(void* surface, size_t width, size_t height, size_t pitch, float t);

}

//-----------------------------------------------------------------------------

// Forward declarations

//-----------------------------------------------------------------------------

HRESULT InitD3D( HWND hWnd );

HRESULT InitTextures();

void RunKernels();

void DrawScene();

void Cleanup();

void Render();

LRESULT WINAPI MsgProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);

int main(int argc, char* argv[])

{

	// create window

	// Register the window class

	WNDCLASSEX wc = { sizeof(WNDCLASSEX), CS_CLASSDC, MsgProc, 0L, 0L,

  GetModuleHandle(NULL), NULL, NULL, NULL, NULL,

  "CUDA Raytracing Test", NULL };

	RegisterClassEx( &wc );

	// Create the application's window

	HWND hWnd = CreateWindow( wc.lpszClassName, "CUDA Raytracing Test",

  WS_OVERLAPPEDWINDOW, 0, 0, g_WindowWidth, g_WindowHeight,

  NULL, NULL, wc.hInstance, NULL );

	ShowWindow(hWnd, SW_SHOWDEFAULT);

	UpdateWindow(hWnd);

	// Initialize Direct3D

	if( SUCCEEDED( InitD3D(hWnd) ) &&	SUCCEEDED( InitTextures() ) )

	{

  // register the Direct3D resources that we'll use

  // we'll read to and write from g_texture_2d, so don't set any special map flags for it

  #ifndef __DEVICE_EMULATION__

  	cudaD3D9RegisterResource(g_texture_2d.pTexture, cudaD3D9RegisterFlagsNone);

  	CUT_CHECK_ERROR("cudaD3D9RegisterResource (g_texture_2d) failed");

 	// Initialize this texture to be black

  	{

    cudaD3D9MapResources (1, (IDirect3DResource9 **)&g_texture_2d.pTexture);

    void* data;

    size_t size;

    cudaD3D9ResourceGetMappedPointer(&data, g_texture_2d.pTexture, 0, 0);

    cudaD3D9ResourceGetMappedSize(&size, g_texture_2d.pTexture, 0, 0);

    cudaMemset(data, 0, size);

    cudaD3D9UnmapResources (1, (IDirect3DResource9 **)&g_texture_2d.pTexture);

  	}

  #else

  	void* data;

  	int pitch;

  	if(Lock(0, D3DLOCK_DISCARD, &data, &pitch) == FALSE)

    return 0;

  #endif

	}

And then later on I do this: (I don’t know if this needs to be locked as well)

////////////////////////////////////////////////////////////////////////////////

//! Run the Cuda part of the computation

////////////////////////////////////////////////////////////////////////////////

void RunKernels()

{

	static float t = 0.0f;

	// populate the 2d texture

	{

  void* pData;

  size_t pitch;

  cudaD3D9ResourceGetMappedPointer(&pData, g_texture_2d.pTexture, 0, 0);

  cudaD3D9ResourceGetMappedPitch(&pitch, NULL, g_texture_2d.pTexture, 0, 0);

  runTest(pData, g_texture_2d.width, g_texture_2d.height, pitch, t);

  

	}

	t += 0.01;

}

st5486 · September 27, 2008, 12:15pm

Oops, just saw the declaration of rt. But would still be grateful if you could point out what I need to do to my code lol cos it’s a bit different to yours. Do i need to change it a bit so I can still lock the texture? It’s still not working even though I have upgraded the SDK/Toolkit and drivers

st5486 · September 28, 2008, 3:19pm

Can anyone help me?

Topic		Replies	Views
ray tracer choosing tools CUDA Programming and Performance	24	34189	May 20, 2008
Ray Tracing CUDA Programming and Performance	15	17036	April 14, 2007
Real-Time Ray Tracing with CUDA I succeeded! CUDA Programming and Performance	20	21231	June 27, 2011
CUDA 3D Rendering Mystery CUDA Programming and Performance	25	16336	June 16, 2010
Porting my renderer from C++ to CUDA: my journey CUDA Programming and Performance	18	18303	September 30, 2011
Error with Texture CUDA Programming and Performance	0	710	August 26, 2018
Is there any performance difference implementing a ray-tracer in cuda vs. rendering pipelines? CUDA Programming and Performance	7	2996	March 2, 2019
CUDA Ray Tracer project I have been working on.. CUDA Programming and Performance	0	997	March 18, 2009
How CUDA works within Optix OptiX cuda	20	805	March 25, 2025
Some newbie questions for raytracing with CUDA CUDA Programming and Performance	6	6823	April 25, 2008

Raycasting performance on GPU

Related topics