cuDeviceTotalMem returns maximum of 4GB

I try to retrieve the device memory of a Titan X card using cuDeviceTotalMem. The cuDeviceTotalMem functions returns 4GB total memory. I’m on a 64bit GNU/Linux system with CUDA 7.5 installed. How to retrieve the correct amount of memory (12GB)?

Do you have more than 1 GPU in the system?

Are you passing a size_t variable for the memory size to cuDeviceTotalMem ?

When you print out the total, are you printing it out as a 64-bit unsigned quantity?

Yes, I pass a size_t variable and print it as a 64-bit unsigned quantity.

What happens if you run the deviceQuery sample code? How much memory does it report?

What linux distro and version are you using?

Can you provide a short, complete code sample that just does this operation and demonstrates the issue?

What is the output of nvidia-smi for your system?

Both the deviceQuery sample code and nvidia-smi report 12GB.

I’m using Ubuntu 15.10.

My code reports 4294967295 bytes using cuDeviceTotalMem. Note that the deviceQuery sample code does not use cuDeviceTotalMem.

Ubuntu 15.10 is not an officially supported OS for CUDA 7.5 toolkit.

deviceQueryDrv sample code does use cuDeviceTotalMem

try running that.

If it reports the correct amount, you most likely have a bug in your code.

Here is typical output from deviceQueryDrv on a 12GB card (K40c):

$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQueryDrv
 /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
Detected 3 CUDA Capable device(s)

Device 0: "Tesla K40c"
  CUDA Driver Version:                           7.0
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 11520 MBytes (12079136768 bytes)
  (15) Multiprocessors, (192) CUDA Cores/MP:     2880 CUDA Cores
  GPU Max Clock rate:                            876 MHz (0.88 GHz)