Can your GPU run it?

It’s a 1000 × 1000 × 1000 cube rotation inside a GPU kernel.
GPUKernel<<<1000000, 1000>>>

My GTX 1060 3gb can’t definitely run this, the cube doesn’t even show up.

If you can run this, please share me your PC specs, I want that huge resolution because I need it for my project of 3D engine.

CUDA 11.1 Runtime1.zip (7.7 KB)

You could try to use cudaMallocHost or cudaMallocManaged instead of cudaMalloc to allow parts of the data to be stored in cpu ram, but it can be accessed by the gpu. I do not know, however, if this works on Windows.

1 Like

How can users tell whether it is running successfully? When I compile and run the posted code, it writes the GPUs present in my system to the console, and opens a window called “Circle Window”. It also appears to chew through all 32 GB of my system memory, at which point I terminated the application.

You might want to add a command line argument for the cube size to the application, then it will be easy to test with successively larger cube sizes until it fails. BTW, I don’t see status checking on the CUDA API calls, you might want to add that as well and terminate the application if one of those fails.

1 Like

Here’s the report of the test:
cudaMallocHost() allocate on RAM and shared VRAM, thus the cube cannot be displayed since my shared VRAM is limited to 8GB.

cudaMallocManaged() allocate on RAM and dedicated VRAM, same issue as cudaMallocHost() and cudaMalloc().

Noted that both of those malloc kill the CUDA performance by 70%

Thanks anyway.

I have a full CPU version of my engine and the cube of 1000^3 is displaying, despite my 16 GB of RAM.

If you see a green cube on the screen, it’s working.
But at that resolution (1000^3), you will more see a green screen.

I fail at 600^3 resolution with with cudaMallocHost() and 500^3 with cudaMalloc().

No thank you, I dislike doing exception handling, if I got an error, I fix it and move to the next error until I have none.

I guess you are just testing the largest size you can cudaMalloc. but you should be able to estimate how much memory you need without having to crowdsource it. sizeof your particle is 32 bytes. (x*x*x*32 + some buffer) < GPU memory size For x=1000 you’re going to need a GPU with more than 32GB. There aren’t too many of those. The Quadro RTX8000 comes to mind. Perhaps you should see if you can reduce the size of your particle.

Because your particle struct is using a __m128 datatype which must be aligned on a 16-byte boundary, your struct is forced to use 32 bytes even though it only “needs” 16 + 4 bytes. You could cut your memory consumption almost in half just by rearranging from AoS to SoA. (and do you really need 16 bytes for coordinates? - it doesn’t look like it - if you drop the W element, you could cut your memory consumption in half)

I can get to at least a SpaceDim of 600 on my RTX2070, and get about 40FPS

1 Like