How to utilise GPU purely for physics simulation

Hi everyone,

I have briefly looked into parts of the PhysX SDK, but apart from that am a complete noob of PhysX. Having said that, I am not incompetent (I am studying Computer Science and work as a software developer), so please don’t take my ignore for PhysX as an ignorance for programming.

To get straight to the point and cut a long story short, I am currently using Blender (via it’s Python API) as a physics engine from a very high level. I.e. I load in OBJ files, set some of them as collision objects, set some as cloths, run the simulation and then export the resulting OBJ file (all using Python commands).

This is great, but the problem is that it’s too slow for my requirements and I wondered if I could use the power of GPUs to accelerate the computation. Did some research and found that the physics engine that Blender uses (Bullet physics) is currently developing code to utilise OpenCL GPU technology.

Seeing as this is a work in progress I did more research and was led to believe that NVIDIA is already utilising GPUs for physics simulation via PhysX, which brings me here.

The level of documentation I have been looking for has been hard to come by, so I defaulted to looking at the PhysX source code and experimenting for myself. It seems like there is ability to use the GPU for physics simulation, but when I tried it I kept getting error messages along the lines of:
“Failed to create cloth GPU. Failling back to CPU implementation”, even though my GPU is an NVIDIA GTX 750 Ti (which supposedly supports PhysX). I should mention that I am running Linux x64.

So my question is: can I use PhysX to run physics engine code on a Linux x64 OS and a NVIDIA GTX 750 Ti GPU?
Why am I getting those error messages? I wish they were more detailed so I can figure out what’s causing them.

P.S. I have the latest NVIDIA drivers installed and also the CUDA toolkit 6 installed.

Saying its too slow doesn’t give any indication as to a potential cause.

-How many rigid bodies are you simulating ?
-How many particles/fluids/cloth are you simulation.
-Have you tried threading your simulation loop?

I have done simulation with 100s of rigid bodies a few cloth + particle system + fluids and the performance was rock solid. Again, you did not mention your performance metric or requirement. With that said PhysX is capable of running parts of the simulation on the GPU, however, this doesn’t not guarantee that your application is going to perform any better, if certain guidelines are not being followed.
As for the example failing, was this the SDK example or your modified example ?
I was able to run my cloth simulation on 64-bit Ubuntu 14.04 without CPU fallback so GPU simulation should be possible. In fact my GPU is a old mobile 8600GT M.

I apologise for the lack of information, so allow me to answer your questions.

There is one rigid body and one cloth in each simulation (but there may be more than one cloth in the future).

What exactly do you mean by “…threading your simulation loop”? I’m using Blender and never noticed any notion of threading at the user/Python API level. Are you saying that maybe chunks of frames can be simulated in parallel? I assumed Blender would be doing something like this under the hood, but maybe it’s worth a shot from the user level.

It takes around 1 - 5 minutes to complete the simulation on my machine.
I want to try and get the simulations down to less than 10 seconds (at the absolute most) if possible.
The simulations essentially consist of a cloth being draped over a humanoid rigid body.

Could you please elaborate on this: “…if certain guidelines are not being followed”?

I was able to run the SampleCharacterCloth from the SDK (but not sure if it was running on the CPU or GPU). The failure and error message I refer to is the SnippetCloth example in the SDK. I added a line like so:

gCloth.setClothFlag(PxClothFlag::eGPU, true);

…and that gave me the error message about defaulting to the CPU.
Maybe the error message is appearing because I can’t just set that flag like that? I tried moving that line around in the code (e.g. after the cloth had been operated on a bit) but kept getting the error message.

Thank you for your interest in helping me thus far.

I’m not familiar with Blender plugins, and have little experience with Python. I use the PhysX SDK natively ( C++ ). However, the simulation needs to run a fairly frequent rate to minimize jitter and other simulation update behavior. How ofter are you updating the simulation?

The native SDK allow threading of the simulation update, the SDK has a sample that illustrate the topic.

1-5 minutes for a single cloth and 1 rigid seems like an aweful long time for just 2 object. Is your humanoid rigid body a composite ( consist of multiple shaped )?

The SDK outlines a few issues that may arise if certain criteria or constraint are not met…
Ex. The simulation is very sensitive to time-steps update. Failure to update the time step correctly may lead to rigid body penetrating each other, joints and other constraints not behaving correct. For the most part the SDK documents these behavior, so if one was implementing a particular feature, I highly recommend scouring the documentation for the use case.

The setClothFlag works as expected without error in a native context( well in my use case ). Did you create a CudaContextManager for your simulation ?

I took a look at the SnippetMultithreading example in the SDK (as I assumed you were referring to that one) and I think I understand what you mean by using multithreading. You meant: use multithreading to perform heavy computations between each simulate() call, to decrease the gap between simulate() calls, right?

Well the thing is, as my intentions are purely physics based and don’t require any rendering, there is no need for me to perform any computation between frame simulations at all. I.e. my simulation loops is literally something like this:

while (bpy.context.scene.frame_current < bpy.context.scene.frame_end):
    bpy.context.scene.update()
    bpy.context.scene.frame_set(bpy.context.scene.frame_current + 1)

… where that update() call would be equivalent to PhysX’s simulate() call.

So there shouldn’t be any delay between calls.

Yeah 1-5 minutes is really undesirable for my requirements. My humanoid and cloth models all consist of quads, so no they are not multi-shaped models, if that answers your question. I guess I should mention that the humanoid models have around 13,000 faces each and the cloth models have around 5,500 faces each.
Does 1-5 minutes still sound ridiculously slow for those kind of meshes?

I was reading through your forum responses to other questions, found one mentioning CudaContextManager and gathered that it was required in addition to the setClothFlag() call to enable GPU computation. So no, I did not add in a CudaContextManager to that cloth snippet. I will try that and hopefully it should work.

If 1-5 minutes sounds really slow to you (even with the mesh resolutions I mentioned), it sounds like PhysX could really improve my simulation times (even without GPU computation). The thing is though, I am not familiar with the SDK at all and don’t want to waste all my time porting a solution over and then find out it doesn’t give me much improvement at all.
Do you have any tips for good places to start for newbies to PhysX (I am not new to C++, just PhysX)?
I guess my biggest question in this regard is documentation on how to build my own PhysX based application. I had a look at the Makefiles for the Samples/Snippets and they look so complex and I can find no documentation for starting from scratch and building on Linux. I found some documentation for building on Windows with Visual Studio, but that doesn’t help me.

Forgive me if I am wrong, but it seems like I can’t create a CudaContextManager on Linux.
I see this everywhere:

#if defined(PX_WINDOWS) && !defined(PX_WINMODERN)

…and I can’t seem to use the function “PxCreateCudaContextManager” because it seems to be windows only.

Oh…sorry, slight moment of insanity, you are indeed correct, the CudaContextManager is only available on Windows. My codebase have pre-processors surrounding those tidbits so I never really gave that any thought. Again…my apologies. As far as documentation goes, there isn’t a lot out there. The SDK documentation take some getting use to, but its the only thing I tend to use along with the User’s Guide. This forum is a big help, but it may take a while to get a response back to a question, in either case it doesn’t hurt asking anyways.