PhysX on nVidia GPU? Does it work yet, and what is it worth?

As far as I have understood nvidia’s announcements, PhysX will be integrated into CUDA and then allegedly run faster on an ordinary GPU like the 8600GT than it did on a dedicated Ageia chip. How much of that can be believed? Sounds like a marketing gag to say: “Ageia’s specialized PhysX hardware was unnecessary. We can run it on our standard GPU while still doing all the other necessary GPU computations and still be faster than the specialized Ageia chip and a GPU that can focus on the graphics.”

Can that be true? News on this appear hard to obtain. What is the actual performance of the supported CUDA nvidia GPUs like the 8600 or 8800 compared to a real Ageia chip? Or is the support for the nvidia GPUs still in the works, meaning that we are currently living in a black hole, with Ageia chip production stopped but nvidia PhysX support not yet working?

i dont understand why physx weak games like

Airborne
GRAW
Switchball
…use the Ageia Physx chip and Physics heavy games like Half-Life2,Crysis,Fear,run better without even having to utilize any special ppu chip, its crazy, :o

Well you could run the SDK demos on either of those cards and see if you are impressed or not. That will give you an idea if the GPU is faster or slower as the PPU.

What “NVIDIA announcements” are you referring to? The only information so far available regarding CUDA and PhysX are unconfirmed rumors.

Regarding your question, though:

[quote]

“We can run it on our standard GPU while still doing all the other necessary GPU computations and still be faster than the specialized Ageia chip and a GPU that can focus on the graphics.”

Can that be true?[/quot]

Yes, this certainly can be true. Physics calculations don’t take very much processing power. I would guess a big bottleneck with the PPU->GPU solution is that all of the data calculated on the PPU has to be copied over to the GPU, and PCI-express is very slow compared to the memory on the GPU. With everything running on the GPU, these copies are not needed. Note that this is just speculation on my part.

And the PhysX card is PCI-based, the device<=>host bandwidth is even worse. Back to the PPU itself, PhysX PPU takes different architecture from NVIDIA (rumor said it’s cell-based). Consider the onboard memory bandwidth and the fact that it has smaller die size (fewer transistor), we can suspect the processing power should be much lower then G80/G92 based GPU, even though they are under different architectre (cell vs. stream processor)

Can’t wait to see PhysX running on CUDA!! Hope NVIDIA would release it soon. The release of CUDA-based PhysX engine will take game development to the next level and make existing G80/G92 cards more appealing to gamers, which will push the sale of GPU! (3-way SLI?)

Not completely, Dave Hoff has confirmed it in these forums: http://forums.nvidia.com/index.php?showtopic=66096&hl=physx

I should have been more specific: I was referring to the OP’s claim of performance numbers as rumors. As you pointed out, the fact that PhysX is being ported to CUDA is confirmed.

Oh, okay :) That is indeed never confirmed, the only thing I know about is they gave a nice demo on 9800GTX that kicked the hell out of an intel-demo, but I think that was pure CUDA code.

Anyway, I think it is a very smart move by NVIDIA in general.

Has PhysX support even been implemented into the released versions of CUDA yet?

I downloaded and installed the XP x64 v174.55 beta driver (w/ CUDA 2.0 support) and Warmonger. My 8800GT 512MB wasn’t detected as an Aegia PPU and Warmonger still ran like arse.

My specs:- Core 2 Duo E6400@3.0GHz, 4GB DDR2-667@752, 8800GT 512MB, XP x64.

When the Geforce line gets Ageia compatible we will know it cause NVIDIA will most likely SCREAM IT OUT LOUD so it can wipe the floor with ATI.

You are wrong. Check here.

If that was true, then there would never have been any need for PhysX support at all. Programmers would then simply do it using the main CPU. The Ageia PhysX processor was a specialized hardware for PhysX computations. If this was an easy task to compute, Ageia’s hardware would never have made any significant difference.

Likewise, a GPU is a specialized hardware for video-related computations. Theoretically you could do those using the main processor, too, only that the performance would be horrible, as the main CPU is not tailored for this special type of computations.

I can imagine nvidia implementing the Ageia hardware features as additional features of their future GPUs. However, I have a hard time believing that their current GPUs, which contain no PhysX-related designs whatsoever, should be able to excel here.

If I am not much mistaken, PhysX is more than only accurately displaying small pieces of dust flying around. There is also the aspect e.g. of lots of NPC characters wandering around with a way better pathing than regular games can compute for them, creating a world that feels far more real. This is not really a graphical aspect; it directly interacts with gameplay. Constantly updating the locations of all the NPCs in the area does not take much PCI bandwidth. Computing these locations, and choosing unobstructed paths for these NPCs is the challenge that is covered by the PhysX engine.

whom so ever states that physics dont consume much processing power, obviously=

1.has no idea what he is talking about
2.doesnt acknowledge the exhistance of physics in games
3.never played a physics heavy game
4.neverhad fun stacking wooden crates to run them over at high speeds
5.never blown up a building on Crysis
6.never shot downa helicopter and had is smash into a building

…need i say more…

yea physics dont take much cpu power,if yu run through a level and shoot all the guys and keep going without stopping to enjoy all the creative desruction you have on your hands :P

A GPU was a specialized hardware for video-related computations. Nowadays, you can do all kinds of (often physics-related) calculations on them. As far as I see, there is no need for any special hardware-features, as the type of calculations are already very much possible on current GPU’s

Also regarding the simple physics calculations that you seem to object to:

this is a forum about using CUDA, there are a lot of people here doing far, far more difficult calculations on CUDA than the simple calculations needed for physics. If you cannot understand the fact that physics calculations are simple, you are better of reading the gaming forums, not the HPC ones regarding CUDA.

GPUs can also do that, but they still excel at what they are tailored to with their many pipelines, hardware shaders and other features: graphics computations.

You appear to be talking about the complexity of programming such a thing, which totally misses my point. Physics calculations may (partially) be easy to describe, but their sheer number which is necessary for modern games makes them a challenge for the computer.

As a rule of thumb, modelling a problem in hardware is waaay faster than using a generic CPU like the main processor. Theoretically you could design hardware for every problem. You could even invent a “windows CPU” which has all the programming logics of Windows hardwired. But besides losing your flexibility when it comes to modifying the Windows code (e.g. when it comes to fixing a bug), this would also be pointless, as generic CPUs are fast enough nowadays.

In former times, sound creation and ISDN communications were done using dedicated “active” chips that relieved the main CPU from these tasks. Today we have passive ISDN chipsets and AC97 (or other) sound chips that leave this work to the main CPU, as main CPUs have improved so much speed-wise that these tasks no longer pose a challenge to them.

Why do you think that under these circumstances, Ageia develops a dedicated PhysX chipset? If it was all about smart programming they could simply have written a driver which is run on the main CPU.

No, Physics calculations are most obviously still so complex that it is significantly faster to process them on dedicated hardware and move the results over a slow PCI bus, than to have your Core Quad compute them. Under these circumstances I do think that it is understandable that I am puzzled when nvidia claims that their generic GPUs could do it even faster without any hardware adjustments whatsoever.

And since behind the link that I gave above nvidia announced that PhysX will be coded in CUDA, and this is a CUDA programming forum, I do not think that I am off-topic when I ask about this here.

Now for some reality check, here is a quote from the link you posted:
The multithreaded PhysX engine (originally from AGEIA) was designed specifically for hardware acceleration in massively parallel environments. While AGEIA’s PhysX processor had tens of cores, NVIDIA’s GPUs, have as many as 128 cores today, so they are well-suited to take advantage of PhysX software. More importantly, the GPU architecture is a more natural fit than a CPU because of the highly parallel and interactive nature of game physics. PhysX will provide gamers even more value utilizing either today’s or tomorrow’s GPUs.

So basically a GPU is like a few Physx chips together. The mathematical operations implemented on these kinds of chips are the same. a sqrt is a sqrt, sin is sin, cos is cos.

NVIDIA have developed a dedicated hardware-multithreading chipset. They have years of experience in designing and fabricating silicon. I think it’s a fair assumption that they have a position to create better “value” chips than Ageia, hence the possibility of the whole thing working in the end.

No, the biggest GPUs have 16 multi-processors. They have 8 ALU per multi-processor (which are able to execute a warp of 32threads in total) and that’s where the number 128 you read around commes from (16x8). But calling a GeForce a 128 cores processor, is aking to calling a Intel Quad Core 2 “16 core cpu” because its SIMD unit found on each of its core (the SSE) can each process up to 4 floats in parallel.

This has some implication on diverging codepaths (all units in the same processors execute the same instruction. If code path diverge, you can’t run the 2 different paths in parallel inside the same SIMD processor).

And although the execution may diverege between the 16 processors, the architecture is still SPMD (single program, multiple data) meaning that all processors will run the same kernel any way.

That why, I think, nVidia engineers don’t seem as fond as Intel’s about Ray-Tracing.

The processors like Cell and PhysX follow a different architecture. Each of the separate core can run a different program. Each of the core has a small local very fast memory (similar to the GPU) and each core can communicate and exchange quickly data with the others using DMA. This enables having each core executing a different step of a pipeline and data being quickly streamed across the whole stack. (This is something that GeForce can’t do at all. They can only load/store data to the device memory - they can’t exchange data with each other. And they can’t stream a constant flux of data, only apply a kernel to a block of memory defined before hand [although the same effect can be reached by using lots of small buffers]. Also, because of their SPMD approach, they can’t have each processor run a separate step of a pipeline).

Each architecture has it advantage and drawbacks, and conclusion about which architecture is best suited to which situation is better left open for the professional engineers to discuss.

But I suspect that most of the work done by a physics engine doesn’t necessarily benefit at all from a Cell-/PhysX-like architecture, as finding collision seems just a big number crunching problems with not that much discrete steps. It’s mostly lots of geometric computation, for which GPU are already pretty much nicely optimised, and you get the added benefit that all the data is already on GPU for subsequent rendering.

The Cell-/PhysX-like architecture seem better suited for complex task involving, for example, decompressing video / applying filters / recompressing in real-time.

But on the other hand, I’m not expert, and perhaps the CUDA implementation is faster only because of higher clock frequency and proximity between the physics computation and renderer.

Considering the huge number of applications that people have found for CUDA already, and work perfectly on current cards, do you really have a hard time believing it is suited to do some physics number crunching?

why is there even an argument here? of course nvidia’s cards can do physics, they said it can and as many other posters here have said there are a vast range of far more complex things being executed on nvidia cards currently using CUDA.

as far as cpu based physics engines, stop saying it is too much effort for a cpu to do physics! have you never played half life 2? have you never played gary’s mod in that engine? you can stack up LOADS of things, attach stuff to them and blow it all up! no framerate issues at all with a half decent processor, and thats an old game! oh and dont even get me started on crysis, it only uses one of my cores on my quad (i watch utilisation while playing) and i can blow up houses made of those bits of metal, the indevidual bits fly everywhere, all the stuff in the house like food, pots n pans etc etc flies everywhere, my cpu doesnt break a sweat and i havent even talked about flying hellicopters into piles of houses in the sandbox… no you have not convinced me that a cpu cannot do physics. oh and to further rub it in, go watch the tech demo for alan wake where the devs dedicated an entire core of a quad core for physics, watch when the tornado they spawn throws the car in the air… watch and weep for you are wrong External Image

Hellz ya!! go watch that stuff and feel the wrath of qazax External Media