Simulation not nearly using all available CPU ressources [3.4.0] C++

So I recently started working with PhysX in order to test its usability for the simulation of an automated high rack warehouse.
The main criteria in this case is the performance of the simulation when simulating huge facilities.
While I am aware that I probably won’t be able to simulate a whole facility at once (I already have a concept to split it on different servers) I still want to see how far I can go with one simulation.

What I’ve done so far is creating “roll conveyors” (you can easily google what they look like) by adding a few capsules (kinematic) to the scene in a row and rotating them along their longitudinal axis. I can then drop boxes onto it which will then get carried away.

So far so good.
Now with a simulation step time of 33ms I can already create around 28000 rolls (a few boxes on it are still fine, haven’t tested further yet) without the calculation time exceeding the 33ms.
But that’s about it… 28000 rolls are the limit. With more rolls the calculation time starts increasing beyond the 33ms mark. Which would be fine, 28k is already a lot.
But the thing is, with 28k rolls the programm is only using around 30% CPU capacity and it won’t go much higher than that even when creating even more rolls (despite the calculation time still increasing).
Now it seems odd to me that Physx is not using all of the CPU capacity if it clearly needs it!

The weirdest part is, that I can easily run 2 instances of the simulation with 28k rolls per simulation at the same time (which does indeed result in doubled CPU usage). So how is it not possible to run ONE simulation with 56k rolls when I can run 2 with 28k?

I am currently working on a Windows system with 4 CPU cores.
I am using one CpuDispatcher (default) with 4 worker threads. I already tried less, and even more, threads but the result hardly changed. I also experimented around with affinity mask but also…nothing.
No matter what I did, the programm just won’t use all of the CPU’s capacity.

Another thing I should mention is, that I am running the programm as a console application. There’s no renderer (I can turn it on for debugging) and no game engine or anything and there won’t be one in the future so no worries about that.

Does anyone have an explanation for this “bottleneck”?
It’d be perfect if there’d be a solution for it but I’d already be happy if I’d at least know for sure that what I want to do, is not possible.

The only way to tell is to profile the application and find where the hotspots are.

Can you connect PVD to your application, using a profile build of PhysX? There are examples how to connect to PVD in the PhysX snippets. Please capture and provide a profile-only capture (pass the flag ePROFILE rather than eALL to the connection code) and this should let us see where the bottlenecks are and perhaps we can make some suggestions to improve things. eALL sends geometric information to the tool, which lets us see what you are simulating, but it also adds a large amount of overhead when simulating a lot of bodies/shapes and can completely skew the performance figures.

Unfortunately, it’s not always possible to multi-thread every component of an engine so it’s by no means a guarantee that PhysX will use all cores, all the time. PhysX does scale to lots of threads when there is a lot of simulation work to be done (collision detection, constraint solving, for example), but it sounds from your description that this might not be the case because you have 10s of 1000s of user-controlled kinematic actors rotating.

It would be great if you could share the PVD capture so we can see what’s happening. If you could provide a capture showing a few 100 frames of simulation, that should tell us a lot about where the time is going.

Thanks a lot for the quick answer!
I tried using the profiler and will upload the files, I had one problem, though!
I cannot start the current version of PVD as it always instantly closes on my maschine, so I was using an
older version (3.2016.04.20614672) which seems to be a little buggy…
At least I can’t really inspect the profile plot as it won’t show me anything past the 5ms mark, despite one of the bars I can see with a duration around 40ms (when clicking on it).
There doesn’t seem to be a scroll bar as well

You can download the file anyways under the link below, I hope it works

.pxd2 file:
https://drive.google.com/file/d/11ec8O-mgJMS1Ixaa0ntf8PQ9ciLOWgu2/view?usp=sharing