cuda-memcheck and available memory

I’m trying to debug/optimize my code but am running into a problem on the Jetson X1.

When running “cuda-memcheck thingtodebug”, the available memory is drastically reduced.

device_props.totalGlobalMem returns 3982 MB normally, but only 995 MB with cuda-memcheck.

I’m fairly new to CUDA, so this might be pretty obvious?

I’ve done the following:

  • Recompiled the kernel with swap enabled, and moved everything to an SSD to give it swap space
  • Wrapped my code of interest with cudaProfilerStart() / cudaProfilerStop()
  • Disabled X11 and killed unneeded processes (this is why I have 3982 MB available normally).
$ free
              total        used        free      shared  buff/cache   available
Mem:        4078456      175060     3433532        1920      469864     3844104
Swap:      16777212       38940    16738272



Could you share the log of cuda-memcheck?
Swap is not a GPU accessible memory. The maximal GPU memory of TX1 should be ~4Gb.

You can also monitor the Jetson status with our script:

sudo ~/tegrastats
RAM 1366/7844MB (lfb 1023x4MB) CPU [0%@345,0%@346,0%@345,0%@345,1%@345,0%@338] EMC_FREQ 4%@40 GR3D_FREQ 0%@114 APE 150 MTS fg 0% bg 0% BCPU@36C MCPU@36C GPU@42C PLL@36C AO@35.5C Tboard@32C Tdiode@34.25C PMIC@100C thermal@35.9C VDD_IN 1146/1241 VDD_CPU 229/229 VDD_GPU 152/152 VDD_SOC 305/343 VDD_WIFI 0/0 VDD_DDR 96/144


System idle:

RAM 257/3983MB (lfb 626x4MB) SWAP 37/16384MB (cached 6MB) cpu [100%,0%,0%,0%]@1734 EMC 0%@1600 APE 25 GR3D 0%@76

Application running (without cuda-memcheck, it sees 3982 MB):

RAM 1335/3983MB (lfb 479x4MB) SWAP 37/16384MB (cached 6MB) cpu [0%,30%,3%,0%]@102 EMC 27%@1600 APE 25 GR3D 99%@998

When attempting to run “cuda-memcheck --log-file log.txt build/myapplication”, the log file says:

========= ERROR SUMMARY: 0 errors

The reason for the empty log is that my program quits using exit(1) since it doesn’t see enough GPU memory, but only 995 MB.


We want to know more about your problem.

Could you try to run your application directly without cuda-memcheck?
This experiment will narrow down the issue comes from cuda-memcheck or the application?



I did, see #1 and #3. Without cuda-memcheck, the application runs.


Thanks for your feedback.

We want to test this issue internally.
Could we reproduce this issue with our official CUDA sample?

If not, could you share your application with us?