Memory alloc limited to less than half available RAM

I am trying to use a tool that requires ~2GB of GPU Memory in a single allocated array. With a fresh install of the latest Jetpack and disabling xserver I am unable to allocate more than 1986MB with cudaMalloc and I am not sure why.

I wrote a simple test program and repeatedly changed the value of size until I am unable to allocate more.

#include <iostream>

#define CUDA_SAFE_CALL(call)						\
do {		                                                        \
	cudaError_t err = call;						\
	if (cudaSuccess != err) {				        \
		const char * errorString = cudaGetErrorString(err);	\
		fprintf(stderr,					        \
			"CUDA error in func '%s' at line %i : %d:%s.\n",\
			__FUNCTION__, __LINE__, err, errorString);	\
		throw std::runtime_error(errorString);			\
	}								\
} while (0)

int main(void) {
  void *x;
  size_t size = 1987*1048576l;
  CUDA_SAFE_CALL(cudaMalloc(&x, size));
  return 0;
}

To make sure that I am using as little RAM elsewhere I clear the buffers and cache and enabled an 8GB swap file just incase giving me the following stats:

ubuntu@tegra-ubuntu:~/$ free -m && sync && sudo /bin/sh -c 'echo 3 > /proc/sys/vm/drop_caches' && free -m
              total        used        free      shared  buff/cache   available
Mem:           3994         699        3083          21         212        3718
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:           3994         201        3676          21         117        3718
Swap:          8191           0        8191

When I compile and run it, it fails:

ubuntu@tegra-ubuntu:~/$ nvcc -std=c++11 test.cu && ./a.out
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
CUDA error in func 'main' at line 21 : 2:out of memory.
terminate called after throwing an instance of 'std::runtime_error'
  what():  out of memory
Aborted

Does anybody have any idea why I am unable to allocate more than 1987MB for an array when there is 3718MB of RAM available?

Hi,

We will check this issue and update information to you soon.
Thanks.

Hi,

This is a known limitation on L4T.
Currently, we limit maximal memory of one chunk to be the half size of the physical memory.
That is, in tx1, you can’t allocate GPU memory bigger than ~2G.

This limitation is removed in our next release.
Please wait for our announcement and update.

Thanks.

Hi,

This fix is available now.
Please check https://developer.nvidia.com/embedded/jetpack