This issue has been posted about already at cuda - 'cudaMalloc' unintentionally allocating memory on multiple GPUs instead of just 1 - Stack Overflow and linux - Unwanted Pytorch data duplication across multiple GPUs - Stack Overflow but I’ll write a summary here.
At my work we have 3 machines with very similar setups, let’s call them a
, b
, and c
. On machine b
we don’t seem to be able to get CUDA code to run on a single GPU.
The person who first noticed it helpfully wrote the following code to isolate the issue
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <GPU_ID> <Memory_Size_GB>\n", argv[0]);
return 1;
}
int gpu_id = atoi(argv[1]);
size_t memory_size_gb = atoll(argv[2]);
// Set the GPU
cudaError_t cudaStatus = cudaSetDevice(gpu_id);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaSetDevice failed! Do you have a CUDA-capable GPU installed?");
return 1;
}
// Convert GB to bytes for memory allocation
size_t size = memory_size_gb * 1024 * 1024 * 1024;
// Allocate memory on the GPU
void *gpu_memory;
cudaStatus = cudaMalloc(&gpu_memory, size);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed to allocate %zu bytes!\n", size);
return 1;
}
printf("Allocated %zu GB of memory on GPU %d\n", memory_size_gb, gpu_id);
printf("Press any key to free memory and exit...\n");
getchar(); // Wait for key press
// Free the memory when done
cudaStatus = cudaFree(gpu_memory);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaFree failed!");
return 1;
}
return 0;
}
If this is compiled as gtest
and run as ./gtest 0 5
or ./gtest 1 5
on machines a
and c
then 5 gigabytes of memory is allocated specifically on the 0th or 1st GPUs respectively, as expected. However if run on machine b
then both cause 5 gigabytes to be allocated on both graphics cards.
If used in confuction with CUDA_VISIBLE_DEVICES
then running CUDA_VISIBLE_DEVICES=0 ./gtest 0 5
and CUDA_VISIBLE_DEVICES=1 ./gtest 0 5
runs as you’d expect on machines a
and c
, but on b
specifying 0
gives a segmentation fault and 1
allocates 5 gigabytes on both graphics cards.
Not sure if relevant but the software-version/cards on machine a
are:
NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
on machine b
NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GP102GL [Quadro P6000] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GP102GL [Quadro P6000] (rev a1)
and on machine c
NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6
$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1)
On a post someone suggested checking SLI
. I noticed on machine b
there was an error about Auto
not being a supported mode in Xorg.0.log
. I changed it to mosaic
with no effect. But also later noticed that machine c
has the same error and is working fine.