Could not clear or free all the cpu memory when using cudaMalloc founction?

hello there!
I have meet a problem about cudaMalloc founction.

When I declare the cudaMalloc founction,like cudaMalloc((void **)&da, N*sizeof(int)), and if my c++ programm use cpu memory about 27% or more ,it could not clear or free all the cpu memory. No matter I use cudaFree or not the "top " shell always showing 27% used or higher .I also had tried other performance like vector or read sql data , it showed the same result.

Thanks a lot!

machine 1:
os: ubuntu 16.04
local memory:
32GB
gpu:
Tesla 16280MiB Driver Version: 390.46
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

machine 2:
os: ubuntu 16.04
local memory:
32GB
gpu:
TITAN X (Pascal) 12196MiB Driver Version: 396.26

			nvcc -V
			nvcc: NVIDIA (R) Cuda compiler driver
			Copyright (c) 2005-2017 NVIDIA Corporation
			Built on Fri_Sep__1_21:08:03_CDT_2017
			Cuda compilation tools, release 9.0, V9.0.176

#define N 1000
using namespace std;
global
void add(int *a, int b) {
int i = blockIdx.x;
if (i<N) {
b[i] = 2
a[i];
}
}

int ha[N], hb[N];
int stop_int;
int *da, *db;
cudaMalloc((void **)&da, N*sizeof(int));

// cudaMalloc((void **)&db, Nsizeof(int));
// for (int i = 0; i<N; ++i) {
// ha[i] = i;
// }
// cudaMemcpy(da, ha, N
sizeof(int), cudaMemcpyHostToDevice);
// add<<<N, 1>>>(da, db);
// cudaMemcpy(hb, db, N*sizeof(int), cudaMemcpyDeviceToHost);
// printf(“-----------------prepare free----------------------\n”);
// cudaFree(da);
// cudaFree(db);

  int row=414730000;
int col=2;
int **arr=new int *[row];    
for(int i=0;i<row;i++){
    	arr[i]= new int[col];
}
 for(int i=0;i<row;i++){
    	for(int j=0;j<col;j++) {
		arr[i][j]=i+j;
	}
}
printf("------------ prepare free  array ---------------");
std::cin>>stop_int;
//free array
delete[] arr;

printf(“------------free array done---------------”);
std::cin>>stop_int;

Things that help:

  • Do you get any sort of error message?
  • Are you doing cuda error checking? Google for a macro called CUDA_SAFE_CALL and wrap cudaMalloc with it.
  • Are you running this cudaMalloc over and over without a system restart? If so, then it could be it can’t allocate a contiguous space anymore (leaking memory).
  • Try to use thrust::host_vector and thrust::device_vector, it will do safe memory management for you.
  • Last but not least, the lines where you allocate a “2D” host array. You have to deallocate it in a loop too. It is leaking host memory with this single delete arr. If you intend to pass this to a cuda kernel at some point, I wish you good luck. Just in case you haven’t tried before, search for “flattened 1d array”.