Hi,
Today I encountered a very strange problem and possibly a bug of nvcc/ptxas, when I added 1 line of code calling fmaxf() inside a kernel function, ptxas memory usage increases to 1.5GB, and hangs forever (>2 hours) while cpu is busy. -ptxas-options=-v does not show any info in the hanging case.
code sketch is like this:
__global__ void __launch_bounds__(64) standardize(Variance* __restrict__ cvariance, ....)
{
float cvar[40*64] = {0};
float gvar[40] = {0};
for(int layer=0;layer<40;layer++)
{
for(int c=0;c<64;c++)
{
for(int x=0;x<360;x++)
{
cvar[layer*64+c] = fmaxf(cvar[layer*64+c], cvariance[layer].entry(x,c,1));
}
gvar[layer] = fmaxf(gvar[layer], cvar[layer*64+c]); //// problematic line
}
}
....
}
Note that gvar is not reference anywhere else.
However, if I comment out the problematic line or change fmaxf() to (> ? :) operator, compilation finishes in minutes, and I never notice ptxas showing up in process list (I use nvcc command)
My setup is Win7 64 bit, CUDA Toolkit 10.1.105_418.96, Visual Studio 2017 Community Edition v15.8.0
Compilation command is “nvcc --gpu-architecture=sm_61 --default-stream=per-thread --optimize=3 --x=cu *.cpp --use_fast_math --library=cuda,cudart_static”
Any comments? Thank you very much.