I try to run a simple program with 3 dimensional grid but for some reason when I launch it with cuda-memcheck it just gets stuck, and after the timeout it’s terminated. The problem has nothing to do with a short timeout cause I changed it just for this manner to 60 seconds.
The code I run has a grid of 45x1575x1575 and it runs an empty global function. additional info: My compute capability is 2.1 and I run with the flag -maxrregcount=24 to limit the number of registers the device functions can use (saw in some other program of mine that it gives the best results with the occupancy calculator)
Here’s my code:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
__global__ void stam(int a){
}
int main()
{
// Choose which GPU to run on, change this on a multi-GPU system.
cudaError_t cudaStatus = cudaSetDevice(0);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaSetDevice failed! Do you have a CUDA-capable GPU installed?");
return;
}
dim3 gridSize(45,1575,1575);
stam<<<gridSize,224>>>(4);
cudaStatus = cudaDeviceSynchronize(); // This function gets stuck
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaSetDevice failed!!");
return;
}
cudaStatus = cudaDeviceReset();
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaDeviceReset failed!");
return 1;
}
return 0;
}
Isn’t the max grid size 65535x65535x65535? What is the problem in here?
It only crashes when I compile it with the -G flag. otherwise it’s just slow, but it doesn’t exceed the 60 seconds… Also, I have everything up to date. Any ideas?