PhysX SPH Fluid Sim and WDDM Timeout

We’re currently using PhysX 3.4’s SPH particle solution to sim fluids, using CUDA to simulate the scene on the GPU. Sometimes, due to very complex scenes, the GPU kernel takes more than 2 seconds, triggering the WDDM timeout*

We can work around this ourselves easily enough by changing the registry entry, but we plan on distributing this application to other end-users. Changing their registry keys is something we’d like to avoid.

Is there any way to split up the workload/configure the PhysX library to avoid triggering this timeout?


For Gpu particles, we have split pipeline into many small CUDA kernels so that most users are not likely to hit the 2 seconds timeout limit. Is it possible for you to provide a repro so that I can investigate the issue. I would also like to know the system configuration you are running on. You can get the system information by launching the Nvidia Control panel->Help->System Information. There is a save button for you to capture the details into a txt file.

Providing a exact repro isn’t possible, but I can tell you that a single fluid had around 48,000 particles, with the following properties:

Grid Size: 3.0
Max Motion Distance: 0.3
The total area the particles covered was about 10m x 10m

The kicker was that we had about 6 static triangle mesh rigid bodies in the scene with ~49,000 vertices each. The scene became very expensive to compute as more particles were added, triggering the timeout at around the 48,000 mark. Removing these meshes made the scene much faster to compute.

If it is indeed just becoming too complex, is there a way to configure the pipeline so it breaks up the jobs even smaller, such that it’s guaranteed to never hit the timeout on higher-end systems? (Kepler or higher)

I’m guessing that this system is tooled for real-time applications, but we’re using it for offline rendering/simulation, so as unconstrained as we can make the system, the better.

System Information
NVIDIA System Information report created on: 05/17/2017 11:21:38

Operating System: Windows 8.1, 64-bit
DirectX version: 11.0
GPU processor: GeForce GTX 670MX
Driver version: 382.05
Direct3D API version: 11.2
Direct3D feature level: 11_0
CUDA Cores: 960
Core clock: 601 MHz
Memory data rate: 2800 MHz
Memory interface: 192-bit
Memory bandwidth: 67.20 GB/s
Total available graphics memory: 9191 MB
Dedicated video memory: 3072 MB GDDR5
System video memory: 0 MB
Shared system memory: 6119 MB
Video BIOS version:
IRQ: Not used
Bus: PCI Express x16 Gen2
Device Id: 10DE 11A1 10AD1043
Part Number: 2051 0003


NvGFTrayPluginr.dll NVIDIA GeForce Experience
NvGFTrayPlugin.dll NVIDIA GeForce Experience
nvui.dll NVIDIA User Experience Driver Component
nvxdplcy.dll NVIDIA User Experience Driver Component
nvxdbat.dll NVIDIA User Experience Driver Component
nvxdapix.dll NVIDIA User Experience Driver Component
NVCPL.DLL NVIDIA User Experience Driver Component
nvCplUIR.dll 8.1.950.0 NVIDIA Control Panel
nvCplUI.exe 8.1.950.0 NVIDIA Control Panel
nvWSSR.dll NVIDIA Workstation Server
nvWSS.dll NVIDIA Workstation Server
nvViTvSR.dll NVIDIA Video Server
nvViTvS.dll NVIDIA Video Server
NVSTVIEW.EXE NVIDIA 3D Vision Photo Viewer
NVSTTEST.EXE NVIDIA 3D Vision Test Application
nvDispSR.dll NVIDIA Display Server
NVMCTRAY.DLL NVIDIA Media Center Library
nvDispS.dll NVIDIA Display Server
PhysX 09.17.0329 NVIDIA PhysX
nvGameSR.dll NVIDIA 3D Settings Server
nvGameS.dll NVIDIA 3D Settings Server


3DTV Play

Based on your description, it is very likely the tdr is happening in the triangle collision kernel due to the huge number of vertices. The collision kernel process everything in a single pass. One possible solution is to make the collision kernel do multiple passes. Unfortunately, we currently do not have any plans to add this feature in the near future. If you could live with some compromises, I would suggest implementing a level of detail scheme for the triangle meshes. You can also try converting the meshes into convexes or height fields to reduce the computation loads.


Could you give us some details about the rough number of triangles per cubic meter?
I think you might have tried already something like that, but here is a thought:
It looks like your (mainly 2D area) would consist of roughly 10-20 grid cells (edge length 3.0 m)
We use the grid cells as a broad phase to the triangle collision. Within a grid cell, it’s pretty much brute force, every particle against every triangle. Smaller grid cells should give you considerable speedups, but there is a catch: there is a limit of ~1k active grid cells (containing particles), beyond that the collisions will start to fail. However, you seem to be on the lower end of active cells, so there might be some room for improvement.

Please note that PhysX particles are deprecated in 3.4 and will be removed in 3.5. A very recommendable alternative is our FLEX library featuring position based dynamics, with a much more robust parameter space.