I’m working on a real time system which means I need control of the memory that gets allocated, thread priorities, and thread affinities. I’ve run into a few issues that maybe you guys have seen.
I’m running on RedHat 5.8 Real Time Distribution
libcuda.so spawns one thread per GPU I use, anybody have any idea how to set affinities/priorities on these worker threads is there something in the API I am missing?
When I call cudaSetDevice( 0 ) for example my OS reports my process has 100GB+ of virtual memory reserved! That seems like a lot and since I’m using multiple devices it tends to go up to 200 or 300 GB of memory. I’ve never seen it actually page that into the physical memory space but it is frightening that some worst case scenario could invoke that without me knowing about it. Has anyone seen this or know a reason why this would happen?
#include <iostream>
#include "cuda_runtime_api.h"
int main( int argc, char* argv[] )
{
cudaSetDevice( 0 );
std::cout << "Break here and check the OS reported memory usage" << std::endl;
}
Before cudaSetDevice I have in the neighborhood of 300 Megs allocated, after the cudaSetDevice I have 107 GB reported.
This is outside my area of expertise, but as far as I am aware in order to create a unified virtual address space across all CPUs and all GPUs in the system the driver needs to reserve enough virtual memory to map all of the host’s system memory plus the memory of all attached GPUs.
That never even crossed my mind! That’s a great thought. I wonder, is there a way to turn that “off”? I know it hasn’t been around forever in CUDA so maybe there’s a knob somewhere that I can find. I guess its time to do some documentation spelunking.
I don’t know of a way to turn this off, nor do I think this would make much sense. UVA was the first step in creating a seamless heterogeneous computing platform, and has been around for at least 3.5 years:
You’re right of course, I rely on the UVA to allow P2P transfers I believe so turning it off would make no sense for me anyway.
As far as why it matters, I the GPU developer, understand that it probably doesn’t matter, my customer on the other hand only sees a massive amount of memory being allocated and fears the worst case scenario that somehow the OS tries to bring that into physical memory which then causes thrashing with the swap and loss of timeline wrt computing. I think the best course of action is customer education.
I agree that customer education is probably the best course of action here. BTW, sorry for copy/pasting the wrong link into my previous response, creating a circular reference. I have fixed the link which now points to a thread about the large virtual allocation that includes a response from a relevant NVIDIA engineer shortly after UVA was first deployed.