The proverbial "your whole scene has to fit in your RAM limit" argument

I have been researching gpu technology for an upcoming purchase for professional use (3ds Max Iray), and the argument of memory on any card is always the same: “If you want to render a scene (with iray for ex) you need to have your scene “fit” inside the X amount of Ram your card has, therefore more ram is better!”. Makes sense to me…until you start thinking about it and try to understand how much Ram really goes into managing a complex scene vs a simpler scene, AND budget that accordingly. I haven’t found a reliable/scientific source on how to evaluate accurately a scene’s size nor explains the break down of Ram use and what gains it translates into. Anybody has a benchmark/general rule of thumb of what 1GB vs. 2 or 3GB of RAM does in a iray-type rendering environment? Thanks a bunch.

I have been researching gpu technology for an upcoming purchase for professional use (3ds Max Iray), and the argument of memory on any card is always the same: “If you want to render a scene (with iray for ex) you need to have your scene “fit” inside the X amount of Ram your card has, therefore more ram is better!”. Makes sense to me…until you start thinking about it and try to understand how much Ram really goes into managing a complex scene vs a simpler scene, AND budget that accordingly. I haven’t found a reliable/scientific source on how to evaluate accurately a scene’s size nor explains the break down of Ram use and what gains it translates into. Anybody has a benchmark/general rule of thumb of what 1GB vs. 2 or 3GB of RAM does in a iray-type rendering environment? Thanks a bunch.

My guess is really simple:

iRay works with “light particles” which bounce around in the scene.

Each light particle needs to be stored in memory.

Calculations happen on the light particles.

However retrieving/reading and storing/writing memory for these light particles requires a lot of time. (Memory Access Time).

So the more memory your graphics card has the more light particles can be stored in it. This is also good for performance, because the memory requests can be pipelined.

So iRay can “machine gun” the memory… it can send a lot of requests towards the memory… and this machine gunning goes on and on and on, until the responses from memory come in.

This allows the memory latency to be hidden.

Ofcourse memory/ram is also need for 3d coordinates, perhaps triangles, probably textures.

Such things require a lot of memory, especially because textures are in 2D.

So the more memory the graphics card has the complexer the scene can be, which means more textures possible.

On graphics cards itself the gpu can communicate with it’s own memory, this communication has high bandwidth and high latency.

The high bandwidth is nice, so as long as all of it can be loaded into the graphics card memory this is nice.

However the PC also has it’s own memory, but this needs to travel to the graphics card via the pci expres bus which is limited to 4 GB/sec.

While a graphics card can be as much as 300 GB/sec.

So once everything is loaded into the graphics card, the graphics card can fly and do calculations faster on it’s own memory then it can via PC memory.

At least that’s my theory ;)

My guess is really simple:

iRay works with “light particles” which bounce around in the scene.

Each light particle needs to be stored in memory.

Calculations happen on the light particles.

However retrieving/reading and storing/writing memory for these light particles requires a lot of time. (Memory Access Time).

So the more memory your graphics card has the more light particles can be stored in it. This is also good for performance, because the memory requests can be pipelined.

So iRay can “machine gun” the memory… it can send a lot of requests towards the memory… and this machine gunning goes on and on and on, until the responses from memory come in.

This allows the memory latency to be hidden.

Ofcourse memory/ram is also need for 3d coordinates, perhaps triangles, probably textures.

Such things require a lot of memory, especially because textures are in 2D.

So the more memory the graphics card has the complexer the scene can be, which means more textures possible.

On graphics cards itself the gpu can communicate with it’s own memory, this communication has high bandwidth and high latency.

The high bandwidth is nice, so as long as all of it can be loaded into the graphics card memory this is nice.

However the PC also has it’s own memory, but this needs to travel to the graphics card via the pci expres bus which is limited to 4 GB/sec.

While a graphics card can be as much as 300 GB/sec.

So once everything is loaded into the graphics card, the graphics card can fly and do calculations faster on it’s own memory then it can via PC memory.

At least that’s my theory ;)

Your explanation makes perfect sense and I follow its “conceptual” logic, however… how do you figure out how big your scene EXACTLY is? (I’ll settle for “roughly” if someone has a general rule of thumb !:) In other words, beyond the simplistic and vague “more memory is better”, in relation to a particular type of work I do (architectural viz in my case, interiors & exteriors), how do I choose the card and associated memory accordingly? Please dispense the “I think” and “it must be”, is there actually some literature/methods and/or recent tests out there that could give a base line for objective evaluation? The closest I found to a methodical approach is this pre-iray, already old post: http://www.cgarchitect.com/news/Reviews/Review076_1.asp. It’s OK if no one knows, we’ll just all keep spending money without really knowing why!! Just teasing…

Your explanation makes perfect sense and I follow its “conceptual” logic, however… how do you figure out how big your scene EXACTLY is? (I’ll settle for “roughly” if someone has a general rule of thumb !:) In other words, beyond the simplistic and vague “more memory is better”, in relation to a particular type of work I do (architectural viz in my case, interiors & exteriors), how do I choose the card and associated memory accordingly? Please dispense the “I think” and “it must be”, is there actually some literature/methods and/or recent tests out there that could give a base line for objective evaluation? The closest I found to a methodical approach is this pre-iray, already old post: http://www.cgarchitect.com/news/Reviews/Review076_1.asp. It’s OK if no one knows, we’ll just all keep spending money without really knowing why!! Just teasing…

Good question. I have no experience with iRay (yet) unfortunately. So I am not sure if iRay or 3DS Max displays how much graphics cards memory it is using.

So perhaps you could take a look inside 3ds max or ask others. If nobody knows or if this feature/information is not available then I would recommend/request that you try to contact either the developers of 3DS Max or iRay and ask them if they can provide some kind of information when the product is working.

So that the product can return how much graphics card memory it is using so that this can give you “rendering” guys an idea how much memory is actually being used and if it would be beneficial to upgrade your systems/graphics cards.

Failing that… so suppose they don’t do this… perhaps there are external tools which can “spy” on the graphics card… for example these could observe the ammount of memory that’s being used and that’s still available.

It’s probably pretty easy to create such a tool… perhaps device query for cuda does this… if not then I could probably create such a tool… I am not yet sure… but it’s probably one little api call which returns “free memory” and “available memory”. Assuming that this is actual graphics cards memory being available, and not something virtual/swapped then this would good.

So what you would do is you would render your scene and then while it renders you would run my tool and then perhaps you would get some insight.

One potential problem is that the render might go so fast or so slow that a single measurement is not good.

So then I would alter my tool so it “spies” on the memory continuously… just like task manager does…

So then you could see how the memory fluctuates and get some idea of what is going on…

Purpose this is a good idea for NVIDIA as well.

So you could also try contacting NVIDIA developers, perhaps even Microsoft developers so that they implement this feature into Task Manager or Resource Monitor.

Resource Monitor currently on Windows 7 Ultimate does not seem to have a pane/window for graphics card memory, so this could be a very good suggestion to implement it, which would make rendering guys like you really happy too ! ;) =D

One tool which you maybe could already use is called: NVIDIA’s CUDA Toolkit 4.0 Visual Profiler.

It can be used to profile applications too. So perhaps you could use it to profile 3DS Max/iRay.

I haven’t used it yet since this is all pretty much new stuff ! ;) =D

So many new stuff, so little time ! ;) =D

But I am getting there eventually ! ;) =D

I am trying out the visual profiler on a very simple delphi application which happens to have debug information as well, and which executes a really simple “hello world” kernel.

So far it seems to be doing 18 steps, it goes very slowly… I am not sure what it’s doing, perhaps some kind of simulation… but so far the cpu usage is very low… so I wonder what’s going on and why cpu usage is so low… I did use some windows critical sections in my code maybe that has something to do with it.

Or maybe it’s some kind of bug in the visual profiler…

Anyway when it’s done and if it shows something interesting I might upload some screenshots ;)

So I am not sure if this tool is actually an option for you, since my app is pretty simple but 3ds max is huge…

Ok it seems it might be a problem with my application which is a console application and it’s waiting for input from the user… the manual said there was an option for it… but it’s not there… I just tried another cuda simple application and this time it was lightning fast… so you could give it a try ! ;)

It doesn’t show much information about used memory or peak memory, but it does show “transfer size” in MemCopy Table, that could be an indication of how much memory is being used.
So this tool is not the greatest yet… it’s apperently quite new… maybe in the future it might show details like that…

NVIDIA has other graphics card spieing tools though… can’t remember their names at the moment, they were ment for opengl/directx spieing or so… so game developers can see what happens and how it executes… multiple different tools available, perhaps some of these will show memory usage of graphics card and perhaps this might also be the same as “cuda memory being used” ! so that could be interesting.

Good question. I have no experience with iRay (yet) unfortunately. So I am not sure if iRay or 3DS Max displays how much graphics cards memory it is using.

So perhaps you could take a look inside 3ds max or ask others. If nobody knows or if this feature/information is not available then I would recommend/request that you try to contact either the developers of 3DS Max or iRay and ask them if they can provide some kind of information when the product is working.

So that the product can return how much graphics card memory it is using so that this can give you “rendering” guys an idea how much memory is actually being used and if it would be beneficial to upgrade your systems/graphics cards.

Failing that… so suppose they don’t do this… perhaps there are external tools which can “spy” on the graphics card… for example these could observe the ammount of memory that’s being used and that’s still available.

It’s probably pretty easy to create such a tool… perhaps device query for cuda does this… if not then I could probably create such a tool… I am not yet sure… but it’s probably one little api call which returns “free memory” and “available memory”. Assuming that this is actual graphics cards memory being available, and not something virtual/swapped then this would good.

So what you would do is you would render your scene and then while it renders you would run my tool and then perhaps you would get some insight.

One potential problem is that the render might go so fast or so slow that a single measurement is not good.

So then I would alter my tool so it “spies” on the memory continuously… just like task manager does…

So then you could see how the memory fluctuates and get some idea of what is going on…

Purpose this is a good idea for NVIDIA as well.

So you could also try contacting NVIDIA developers, perhaps even Microsoft developers so that they implement this feature into Task Manager or Resource Monitor.

Resource Monitor currently on Windows 7 Ultimate does not seem to have a pane/window for graphics card memory, so this could be a very good suggestion to implement it, which would make rendering guys like you really happy too ! ;) =D

One tool which you maybe could already use is called: NVIDIA’s CUDA Toolkit 4.0 Visual Profiler.

It can be used to profile applications too. So perhaps you could use it to profile 3DS Max/iRay.

I haven’t used it yet since this is all pretty much new stuff ! ;) =D

So many new stuff, so little time ! ;) =D

But I am getting there eventually ! ;) =D

I am trying out the visual profiler on a very simple delphi application which happens to have debug information as well, and which executes a really simple “hello world” kernel.

So far it seems to be doing 18 steps, it goes very slowly… I am not sure what it’s doing, perhaps some kind of simulation… but so far the cpu usage is very low… so I wonder what’s going on and why cpu usage is so low… I did use some windows critical sections in my code maybe that has something to do with it.

Or maybe it’s some kind of bug in the visual profiler…

Anyway when it’s done and if it shows something interesting I might upload some screenshots ;)

So I am not sure if this tool is actually an option for you, since my app is pretty simple but 3ds max is huge…

Ok it seems it might be a problem with my application which is a console application and it’s waiting for input from the user… the manual said there was an option for it… but it’s not there… I just tried another cuda simple application and this time it was lightning fast… so you could give it a try ! ;)

It doesn’t show much information about used memory or peak memory, but it does show “transfer size” in MemCopy Table, that could be an indication of how much memory is being used.
So this tool is not the greatest yet… it’s apperently quite new… maybe in the future it might show details like that…

NVIDIA has other graphics card spieing tools though… can’t remember their names at the moment, they were ment for opengl/directx spieing or so… so game developers can see what happens and how it executes… multiple different tools available, perhaps some of these will show memory usage of graphics card and perhaps this might also be the same as “cuda memory being used” ! so that could be interesting.

Hi,

I will try to answer your question.

In order to use iRay efficiently you need to be able to store the whole scene in the GPU ram.

A scene is the geometry representation of the “world”.

Now i have no experience in iRay but i can tell you from the ray-tracing point of view. The problems you are dealing with is that you need to construct an accelerated hierarchy of the geometry (scene/world) in order to be able to do fast queries (collision detection). This will lead to that you need slightly more memory then the raw geometry. This all depends on what accelerating structure you pick and how optimized you want it to be. You need to sacrifice the memory for performance.

Now the question is: “What is a complex scene?”

One way to think about a complex scene is to take a look at how much memory your GPU offers. If you have a GPU with 1GB of RAM then a complex scene for you is slightly less then 1GB of geometry. On the other hand, this scene is not complex for a GPU that has 3GB of RAM.

Another way is to think about who is modeling the scene? It the scene is used for 3D games, then the designer needs to make sure that the scene fits on the GPU RAM for the people that plays the game.

Now when talking about scene and geometry there are many tricks one can use. Imagine a model of a sphere. The sphere is build up of triangles. One can construct this sphere with low amount of triangles but his would lead to a non-smooth surface on the sphere when looking close on it. Then one could increase the number of triangles for the sphere and thus increase the scene size (This grows to infinity if you want to have a perfectly smooth surface on the sphere).

Suppose that you wan to model a scene containing 20 spheres. On way is to replicate the spheres and store them one by one (20sizeof(sphere)) or be more clever and reference to the sphere model with a translation at 20 points (sizeof(sphere) + 20sizeof(float3)).

Bottom line:

The designer is the one controlling the size of the scene.

There is a small memory overhead for the accelerated data-structure in use.

You also need memory to store the pixel-buffers (depends on the resolution).

Hi,

I will try to answer your question.

In order to use iRay efficiently you need to be able to store the whole scene in the GPU ram.

A scene is the geometry representation of the “world”.

Now i have no experience in iRay but i can tell you from the ray-tracing point of view. The problems you are dealing with is that you need to construct an accelerated hierarchy of the geometry (scene/world) in order to be able to do fast queries (collision detection). This will lead to that you need slightly more memory then the raw geometry. This all depends on what accelerating structure you pick and how optimized you want it to be. You need to sacrifice the memory for performance.

Now the question is: “What is a complex scene?”

One way to think about a complex scene is to take a look at how much memory your GPU offers. If you have a GPU with 1GB of RAM then a complex scene for you is slightly less then 1GB of geometry. On the other hand, this scene is not complex for a GPU that has 3GB of RAM.

Another way is to think about who is modeling the scene? It the scene is used for 3D games, then the designer needs to make sure that the scene fits on the GPU RAM for the people that plays the game.

Now when talking about scene and geometry there are many tricks one can use. Imagine a model of a sphere. The sphere is build up of triangles. One can construct this sphere with low amount of triangles but his would lead to a non-smooth surface on the sphere when looking close on it. Then one could increase the number of triangles for the sphere and thus increase the scene size (This grows to infinity if you want to have a perfectly smooth surface on the sphere).

Suppose that you wan to model a scene containing 20 spheres. On way is to replicate the spheres and store them one by one (20sizeof(sphere)) or be more clever and reference to the sphere model with a translation at 20 points (sizeof(sphere) + 20sizeof(float3)).

Bottom line:

The designer is the one controlling the size of the scene.

There is a small memory overhead for the accelerated data-structure in use.

You also need memory to store the pixel-buffers (depends on the resolution).

Thanks a lot for trying guys! I learned a lot of interesting stuff that I can start applying…
@ SKYBUCK, in Windows 7 there are actual free “widgets” that can help you monitor the amount of memory used by your graphics card (called GPU meter, not sure it’s accurate though) and the higher tech version of what you propose would be awesome to have, but that’s an AFTER the fact type of benchmarking and when you’re trying to figure out which card to buy there’s no spec or reliable info that equates in detail, for example, Ram to X number of polygons and textures and etc… @ BRANO, I understand what you’re saying but it sounds like testing (again, after the fact, even the “masters” do the same) and hoping that your card works is the reality we “rendering” guys are faced with. I am being a little ironic about this because I kind of knew I would need a “monster” card but it’s an interesting issue that the “race for memory” obscures in my opinion the real question of how things work…Anyway, thanks much for the help and anybody who wants to contribute some more is welcomed! Cheers!

Thanks a lot for trying guys! I learned a lot of interesting stuff that I can start applying…
@ SKYBUCK, in Windows 7 there are actual free “widgets” that can help you monitor the amount of memory used by your graphics card (called GPU meter, not sure it’s accurate though) and the higher tech version of what you propose would be awesome to have, but that’s an AFTER the fact type of benchmarking and when you’re trying to figure out which card to buy there’s no spec or reliable info that equates in detail, for example, Ram to X number of polygons and textures and etc… @ BRANO, I understand what you’re saying but it sounds like testing (again, after the fact, even the “masters” do the same) and hoping that your card works is the reality we “rendering” guys are faced with. I am being a little ironic about this because I kind of knew I would need a “monster” card but it’s an interesting issue that the “race for memory” obscures in my opinion the real question of how things work…Anyway, thanks much for the help and anybody who wants to contribute some more is welcomed! Cheers!

When it comes to selecting the best card it’s ofcourse also about how much heat your pc can handle.

But what you should be looking for is:

Maximum ammount of cuda cores.

Most sellers will probably mention the total number of cuda cores.

Sometimes they will also mention the number of multi processors.

But don’t multiply those two numbers since that was already done.

The hardware itself only contains the number of multi processors and it needs a lookup table in software to understand how much cuda cores it actually has.

The number of multi processors is also a good indication of fast the card is.

So look for:

Highest multi processors.
Highest cuda cores.

And ultimately also:
Highest bandwidth
Highest memory.

Clock frequencies can also come into play.

For example multiplieing cuda cores with processor frequency could give some indication.
For example multiplieing multi processors with processor frequency could give some indication.

For example diving memory with memory frequency would give how many memory transactions it can do roughly speaking. (divide that number by an extra 4 or 8 for more accuracy).

Ultimately look at TDP or WATT as well… which is an indication how much heat it produces… the higher the worse. (also more watts is more electricity bill).

How this translates to polygons and stuff like that is hard to say and is probably not so relevant unless you want to compare cuda with cpu’s, but these figures give some impression of that. Though memory access times are different. Probably something like 100 ns or 10 ns for cpu<->memory and 600 ns for gpu<->memory, but this latency can be hiding, not sure if these numbers are correct.
But it’s probably save to assume that gpu memory is 6 times slower compared to main memory when it comes to access times, again I am not sure about that.

Perhaps this is all too technical… one last resort would be to look at benchmarks and see how many things it can render per second.

Benchmarks like 2006 and 2011 (can’t remember the products name but it has it in it’s name).

Also nvidia and review sites do give out specs about “pixel fill count” “texel fill count” and “vertex count” which might also be some indication for polygons/sec especially vertex count… but these are probably old numbers based on old technology and does not apply to cuda anymore. Unless the software uses cuda in combination with textures, this is possible, so might still be relevant.

Perhaps you also want to know how more memory translations into more points or verteces or polygon points or polygons. It’s probably mostly about points. This is a bit hard to say because it dependants on the data structure being used.

But it’s at least an x,y,z for a vertex, which probably also contains some kind of texture mapping information which could also be two or 3 elements, plus some lighting information perhaps, plus maybe normals, plus it needs information to be stored in a polygon.

So let’s take it a bit roomy, so 20 elements each 32 bit or so which is 4 bytes, so that’s near 80 bytes, so let’s take it a bit more relaxed 100 bytes per vertex (20 bytes for overheads or so).

So if you have 1 GB of memory this means roughly: 10 million verteces can be stored in the graphics card, but this wouldn’t leave much room for anything else.

So the more memory the better it seems ;)

10 million verteces might seem like a lot but it’s not really… if this were a 3d grid of points it would be roughly 215x215x215 which is a pretty small volume/grid ;)

Also since modelling is actually a 3D problem, the memory needs to be 8x as big as the previous memory to get a doubling of all dimensions.

So only an upgrade from 1 GB to 8 GB would really give a significant bigger scene ! ;) :)

The next upgrade would be 8 GB to 64 GB and the next upgrade 64GB to 512 GB.

I expect graphics cards to hit 1 TB pretty soon within a couple of years probably, just watch and see ! ;) :)

When it comes to selecting the best card it’s ofcourse also about how much heat your pc can handle.

But what you should be looking for is:

Maximum ammount of cuda cores.

Most sellers will probably mention the total number of cuda cores.

Sometimes they will also mention the number of multi processors.

But don’t multiply those two numbers since that was already done.

The hardware itself only contains the number of multi processors and it needs a lookup table in software to understand how much cuda cores it actually has.

The number of multi processors is also a good indication of fast the card is.

So look for:

Highest multi processors.
Highest cuda cores.

And ultimately also:
Highest bandwidth
Highest memory.

Clock frequencies can also come into play.

For example multiplieing cuda cores with processor frequency could give some indication.
For example multiplieing multi processors with processor frequency could give some indication.

For example diving memory with memory frequency would give how many memory transactions it can do roughly speaking. (divide that number by an extra 4 or 8 for more accuracy).

Ultimately look at TDP or WATT as well… which is an indication how much heat it produces… the higher the worse. (also more watts is more electricity bill).

How this translates to polygons and stuff like that is hard to say and is probably not so relevant unless you want to compare cuda with cpu’s, but these figures give some impression of that. Though memory access times are different. Probably something like 100 ns or 10 ns for cpu<->memory and 600 ns for gpu<->memory, but this latency can be hiding, not sure if these numbers are correct.
But it’s probably save to assume that gpu memory is 6 times slower compared to main memory when it comes to access times, again I am not sure about that.

Perhaps this is all too technical… one last resort would be to look at benchmarks and see how many things it can render per second.

Benchmarks like 2006 and 2011 (can’t remember the products name but it has it in it’s name).

Also nvidia and review sites do give out specs about “pixel fill count” “texel fill count” and “vertex count” which might also be some indication for polygons/sec especially vertex count… but these are probably old numbers based on old technology and does not apply to cuda anymore. Unless the software uses cuda in combination with textures, this is possible, so might still be relevant.

Perhaps you also want to know how more memory translations into more points or verteces or polygon points or polygons. It’s probably mostly about points. This is a bit hard to say because it dependants on the data structure being used.

But it’s at least an x,y,z for a vertex, which probably also contains some kind of texture mapping information which could also be two or 3 elements, plus some lighting information perhaps, plus maybe normals, plus it needs information to be stored in a polygon.

So let’s take it a bit roomy, so 20 elements each 32 bit or so which is 4 bytes, so that’s near 80 bytes, so let’s take it a bit more relaxed 100 bytes per vertex (20 bytes for overheads or so).

So if you have 1 GB of memory this means roughly: 10 million verteces can be stored in the graphics card, but this wouldn’t leave much room for anything else.

So the more memory the better it seems ;)

10 million verteces might seem like a lot but it’s not really… if this were a 3d grid of points it would be roughly 215x215x215 which is a pretty small volume/grid ;)

Also since modelling is actually a 3D problem, the memory needs to be 8x as big as the previous memory to get a doubling of all dimensions.

So only an upgrade from 1 GB to 8 GB would really give a significant bigger scene ! ;) :)

The next upgrade would be 8 GB to 64 GB and the next upgrade 64GB to 512 GB.

I expect graphics cards to hit 1 TB pretty soon within a couple of years probably, just watch and see ! ;) :)