Data being sent to both GPUs despite only selecting one

Specs (I have the full spec sheet we used to purchase the machine if needed):

  • OS: Red Hat Enterprise Linux version 8.9 (Ootpa)
  • GPU: 2x NVIDIA RTX A6000
  • NVIDIA SMI info: NVIDIA-SMI 545.23.08, Driver Version: 545.23.08, CUDA Version: 12.3

Hi, we recently ordered and received a new machine that has 2 NVIDIA RTX A6000 cards which gives us 96GB of VRAM to work with. However, we noticed there is a weird issue with how the memory is being allocated onto the GPUs.

I was working on some experiments in PyTorch and bumped my batch size up high enough to use about 32/48GB on GPU 0. Upon doing so, I saw that for some reason, both GPUs were allocating the same amount of memory. To make sure it was not a visual glitch with the nvidia-smi command, I ran the same model at the same time but on GPU 1 and ran out of memory. I should have been able to run the model on both GPUs at the same time considering we have in total 96GB of VRAM but it looks like the data is for some reason being copied to both GPUs.

I was not sure if this was a PyTorch issue or not so I put together a small script to test:

import numpy as np
from numba import cuda

data = np.ones(1000000000)
d_data = cuda.to_device(data)

while True:
    a = 1

All this code does is send data to the specified GPU in the cuda.select_device() line (in our case either device 0 or 1) and hang until the user quits.
After doing so, I was able to capture screenshots of each GPUs memory allocation using nvtop:

The yellow line in each image represents the memory allocation for that GPU. The first image shows when cuda.select_device(0) was used and the second image used cuda.select_device(1). You can see that in each case, the data is being set to both GPUs even though in the code we only wanted to select one. I have also tried using the CUDA_VISIBLE_DEVICES environment variable. Setting it to 0 causes a segmentation fault with no further output. Setting it to 1 works but is still putting the data onto both GPUs. Setting it to anything else does not work and says that no cuda devices are available.

We have never seen anything like this before. Additionally, none of us have worked with Red Hat Linux either. We were wondering if this is an OS issue or possibly a hardware issue. In either case, this is bug is only letting us use 48/96GB of VRAM in the machine because the data is copied to both GPUs. If anyone has any insight on how to go about diagnosing and fixing this issue it would be much appreciated, thank you!

Hello @marcbaltes98 and welcome to the NVIDIA developer forums.

I am sorry to hear that you have this kind of issue and apologies for the late reply.

I recommend we move this post to the CUDA forums, but first of all I would like to ask you to try out a CUDA only sample app that uses the actual CUDA interface of cudaSetDevice(). Just to rule out any issues with Numba on Red Hat multi GPU installations. I am not familiar enough with CUDA to recomend anything specific, but you can check out the CUDA samples on Github.

Then the best forum for this would be CUDA Programming and Performance - NVIDIA Developer Forums