Hello,
my question is related to CUDA programming and the implementation of Unified Memory.
I´m using a Single Nvidia Quadro K4000 GPU on a 64-Bit Linux System (OpenSuse13.1). I installed the required graphics driver and the CUDA Computing 6.5 Toolkit. Everything works well !
In a project I am working with a particle simulation in DualSPHysics (Opensource). The GPU code I am trying to modify is written in C++ and Cuda 4.0 (there is “cudamalloc” instead of “cudamallocmanaged”). Currently the simulations are on the limit of our 3GB GPU RAM. Now we want to use the full power of our GPU and so we tried to include the Unified Memory capability in the simulation tool to access the extra 64GB system RAM on our workstation.
I have already compiled and run the “Unified Memory Streams” sample successful. There is no error or complication.
To modify the code of the simulation I replaced every single “cudamalloc” with a “cudamallocmanaged”. I know that there are some more things to consider (e.g. remove memcpy) but as far as I can see that´s not the main problem.
The code compiles successful. There is no complication. When I simulate a case which requires less than the maximum accessible GPU RAM (3GB) everything works well. When I simulate a case which requires more than the maximum accessible GPU RAM (i.e. the simulation needs to access the Unified Memory) there is no success and the required RAM can not be allocated. I assume that there is a problem by accessing the Unified Memory via cudamallocmanaged…!?
Do you have an idea how I can solve this problem ?
Is there a tool or something similar on how I can test my Unified Memory ?
Could this be a hardware restricted problem ?
How much effort would it take to modify the tool successful ?
Please excuse my bad description - programming is is not my business.
Regards from Germany