Game Physics experiment with CUDA

timorgizer · May 11, 2009, 8:24pm

Hello all

As a school project I’ve made an experiment - an attempt to speed up game physics calculations using CUDA.
I’d like to share the results and get feedback about this stuff.

In this experiment I used a code of Cyclone Physics Engine - A physics engine made for learning purposes and is actually
an accompanyng code for Ian Millington’s book Game Physics Engine Development.

Info can be found here:

[url=“http://www.procyclone.com/”]Loading...

I’ve converted Rigid Body Integration function and collision detection functions to CUDA kernels and compared the
running times of the functions on GPU and CPU.

In first test I compared the rigid body integration performance. The workflow involved copying forces and torques
to device in every frame (two arrays, 4 floats in each array per body), running the kernel and copying transform matrices
for rendering the bodies with GLUT to device (16 floats per body, single array of matrices).

The execution times were as follows:

External Media

In the next test I compared overall performance of a workflow which included rigid body integration as before,
sphere-halfspace collision detection and contact resolution (Runs on CPU). The memory copies per frame were forces and
torques as before, generated contact data (An array of structures, 36 bytes per contact), copying body state data
(16 floats per body) to host and then copying body state adjustments to device (16 floats per body).

The performance comparision gave the following results:

External Media

I am still a CUDA newbie and I know there’s still a lot of optimizations and further work to be done,so
all this is just the beginning.

Anyway, I’d like to hear your suggestions, comments and feedback.

Timur

cbuchner1 · May 11, 2009, 9:17pm

Cyclone, that rang a bell. As you said, it’s the physics engine Ian Millington’s book “Game Physics Engine development”. I was reading this book just recently.

Are you willing to share some of your CUDA code (now or later on maybe) ?

Christian

timorgizer · May 11, 2009, 10:13pm

Not a problem, I just would like to clean the mess in the code before that (and before I submit the work),

it will take me a week or so. Then I can send it to everyone who’ll be interested.

My mail is:

timorgizer@gmail.com

_Big_Mac · May 12, 2009, 10:40am

Great work!

I know PhysX has yet to get rigid bodies and collision detection on GPU, presently only supporting fluids, cloths and soft-bodies.

Perhaps it would be a good idea to limit CPU<->GPU transfers. It’s gonna be tricky, especially since some parts are still calculated on the CPU and bottom line is, you have to copy back relevant stuff to do game logic, rendering etc. Yet, there might be some room for optimization. Perhaps you’ve already thought of or even implemented this:

most of the bodies don’t move all the time and can be effectively put to sleep. You’d need a mechanism to signal which bodies have been updated on the GPU and require copyback and which are asleep.
probably there’s no need to copy all the forces and torques every frame - you can keep them within device memory and only update those affected by the player (ex. if he presses “forward”, send a small update that his body just got a forward force vector). You also don’t need to copy forces from GPU to CPU unless specifically needed by game logic and even then you might just copy the necessary ones. Forces are invisible, so are velocities, you’re more concerned about positions and orientations which you will need for rendering.

I’d also be interested in seeing the code and/or some documentation

timorgizer · May 12, 2009, 11:53am

Indeed, such a sleep system was implemented in the original Cyclone Physics engine.

At this stage however, I was interested to measure performance when CPU and GPU DO have to do the work,

so I left it out at this stage.

The main problem here is that it’s some trick to think about a mechanism that selects the forces and torques that were changed ,copy them at once and applied on the appropriate body on device. Since copying a lot of data with a single cudaMemcpy is much faster than copying the same data with several calls to cudaMemcpy (This actually is really slow),

this is something to think about.

_Big_Mac · May 12, 2009, 2:38pm

Perhaps zero-copy (aka mapped memory) that was introduced in CUDA 2.2 would be an answer here, at least for those that have G200 cards or newer. I believe you could then simply write

__global__ void dataCopyBack(int changed[], float4 d_V[], float4 d_P[], float4 hm_V[], float4 hm_P[])

{

	int idx = threadId.x + ...

	if(changed[idx] > 0)

	{

		hm_V[idx] = d_V[idx];

		hm_P[idx] = d_P[idx];

	}

}

Assuming hm_V and hm_P are pointers to mapped host memory. You could write to such host memory directly from a kernel. The reads are coalesced. PCI logic should take care of coalescing the scattered writes into bursts. No explicit memcpys. This could work the other way around too, perhaps you’d memcpy to device an array of what has changed, and then let threads selectively read in only the changed float4s.

I only have an 8800 and cannot check this.

changed could be a bool array, I went with ints for coalescing.

MisterAnderson42 · May 12, 2009, 2:59pm

Brilliant! Why didn’t I think of that? Here, I’ve been wracking my brain for effective ways to use zero-copy in my app and never came up with this one.

Let’s just hope that the PCIe burst logic is smart enough to effectively deal with “coalesced” reads/writes with gaps like this. If it does, this could mean much better multi-GPU scaling for me :) Unfortunately, I’m on vacation and won’t be able to try it out immediately :(

jph4599 · May 12, 2009, 5:04pm

On vacation and still checking CUDA forums…now that’s dedication :lol:

dneckels · May 14, 2009, 6:20pm

Nice project, thanks for posting the results!

I was just wondering about this myself: is it worth offloading some of the game physics to the graphics card? After all, if you are doing complex rendering (shadow maps, HDR, etc…), isn’t it natural to be overlapping this work with the collisions, contacts, and rigid body calcs? In which case it would make more sense to keep those on the CPU?

Just a thought. I guess, though, that the Unreal engine uses PhysX (physics on the card), so maybe my thinking is wrong??

timorgizer · May 15, 2009, 3:11pm

I’ve been recently playing Cryostasis - The Sleep of Reason game, the PhysX power is really visible there.

The game itself runs really slow, at least on my PC, but that’s probably because of the cold effects there.

A friend of mine has a 9800, and he says the game runs fine on his PC. Anyway, looks like it does make sense

to use GPU for physics, at least in case of Cryostasis it looks impressive.

And, about the suggestion above to use zero memory copy, unfortunately I can’t try that out either: I have a 8800 too…

cbuchner1 · June 17, 2009, 1:46pm

As far as I know all available Physics engines that run on the GPU only use the GPU for non-critical visual physics effects, such as cloth effects, glass shattering, liquid simulations and such - but not for actual in game physics such as obstacles or stackable crates that will interact directly with the player’s character and may influence the player’s progress in the game. (Big Mac also stated this above in this thread). I believe this was done to avoid any dependency on a fully programmable GPU series which is not available in many gamer’s PCs (many gamers still use older hardware that would not support CUDA for example).

Also it allows for exact same behavior in multiplayer games where every player’s physics runs in much the same way on the CPU. The GPUs might differ a tiny bit in precision, giving a little competitive edge to some players.

cbuchner1 · June 29, 2009, 12:23am

I agree. Has there been any further progress on this project since?

I am very interested in getting some GPU accellerated physics, because I haven’t been able to simulate more than about 400 coins in realtime on a Core 2 Duo using the open sourced bullet physics library ( http://www.youtube.com/watch?v=0bLAq7btZbU&fmt=22 ). And this simulation still has too many flaws to be usable.

So I am just about to start implementing a little GPU based physics library myself, following Ian Millington’s book as a guide line - my approach will be entirely based on the Thrust template library. All of the physics state variables will be stored in device_vectors. The C++ physics objects (particles, rigid bodies, etc) will only contain an index to the respective position in the vector - the vectors themselves will be shared by all objects and grow as needed as new objects are created.

The device_vector storage should allow for coalesced memory access during most physics computations, which is nice. The physics calculations will be implemented as operators, as demonstrated in the “complicated transformations” Thrust code sample. I intend to not implement any dedicated CUDA kernels (no low level programming) . Hopefully Thrust will be versatile enough to do all I need - and if not, I will carefully try to extend Thrust. Thrust might be extended later to also do the computation on the CPU (maybe even supporting an ISSE2 accelerated code path), so there would be no strict CUDA dependence.

Christian

timorgizer · July 19, 2009, 9:49am

Just submitted it, and I think about some ideas to expand it - So far it implements on CUDA only the rigid body integration

and fine collision detection from Cyclone.

I can send you my code if you’ll post your email address, or send me an email.

Starting from scratch seems to be a lot of work to do - At least, you’ll have to implement the fine collision detection

between two cylinders (coins) yourself, and keep in mind that resizing vectors (memory allocations) make a serious impact on performance. Also, I’d go for an implementation of Coarse Collision Detection phase using BSP trees - That will allow you

to group contacts and implement the contact resolution routines on GPU efficiently - May be a considerable performance gain!

That’s one of the ideas I’m considering for further work.

Good luck!

Topic		Replies	Views
What are you guys doing with cuda? just wanna find a way to go CUDA Programming and Performance	81	56090	February 7, 2013
Is CUDA suitable for Monte Carlo transport? CUDA Programming and Performance	30	20169	January 21, 2011
CUDA for Non-programmers? I am an undergrad physics major... CUDA Programming and Performance	6	2139	June 19, 2011
CUDA very slow performance CUDA Programming and Performance	21	16709	March 6, 2020
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134569	May 26, 2010
Is this a task in which CUDA could speed up things? CUDA Programming and Performance	11	11764	September 29, 2009
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204317	April 13, 2009
Help on fixing some poor performances (rookie) CUDA Programming and Performance	10	7163	November 28, 2007
CUDA & Smoothed Particle Hydrodynamics Best approach? CUDA Programming and Performance	19	9962	June 24, 2009
Neural network on GPU, physics on CPU? CUDA Programming and Performance	13	5654	October 6, 2013

Game Physics experiment with CUDA

Related topics