GeForce RTX 2080 ERR! show in nvidia-smi

We have two RTX 2080 8GB graphic cards.

Card on the second slot is giving us the attached error.
Fri Mar 15 11:43:42 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.57 Driver Version: 410.57 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:17:00.0 On | N/A |
| 0% 48C P5 7W / 225W | 509MiB / 7949MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce RTX 2080 Off | 00000000:B6:00.0 Off | N/A |
| ERR! 48C P8 ERR! / 225W | 23MiB / 7952MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1006 G /usr/lib/xorg/Xorg 238MiB |
| 0 5146 G compiz 159MiB |
| 0 5292 G /home/gpu/anaconda3/envs/py35/bin/python3 135MiB |
| 0 31017 G /home/gpu/anaconda3/envs/py35/bin/python3 12MiB |
±----------------------------------------------------------------------------+

We are using these cards for Deep learning models via python

Usually the problem occurs when system is Idle and not running a code, or happens when initializing/exiting some code
Current steps to debug :

  1. We have swapped the slots of the cards but still its giving the error on the second slot one (Second entry in nvidia-smi), It shows that both cards are working fine, the problem is something else.
  2. We have also check the other equipments like power, fan but other equipments are working properly.

The temperature of the cards when idle are usually around 45 degree Celsius, and when running code is around 75-82 degree Celsius. But this error only occurs when initializing or exiting a particular code (No issues with the code as we have tried different versions)

Can you help uus to understand when does Nvidia-SMi returns this type of error msg.
Current work around to this problem is restart of the system and it works for some time , but fails after that again.
nvidia-bug-report.log.gz (1.46 MB)

Please use gpu-burn to check the card for a hw fault.

HI Can you provide me a documentation or step to follow to perform gpu-burn.

There’s not much to it.


Follow the link in the README for an example.

PATH=.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games /usr/local/cuda-10.0/bin/nvcc -arch=compute_20 -ptx compare.cu -o compare.ptx
nvcc fatal : Value ‘compute_20’ is not defined for option ‘gpu-architecture’
Makefile:10: recipe for target ‘drv’ failed
make: *** [drv] Error 1

Getting this error after running after gpu-burn test

kindly help on This

I don’t know where you got that old makefile from
-arch=compute_20
is not supported in cuda 10, the current makefile of gpu-burn uses
-arch=compute_30
Clone the current repo or just change the Makefile accordingly.

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games nvcc -I/usr/local/cuda-10.0/include -arch=compute_30 -ptx compare.cu -o compare.ptx
/bin/sh: 1: nvcc: not found
Makefile:10: recipe for target ‘drv’ failed
make: *** [drv] Error 127

We have encounter this error

You did not set up the correct $PATH for cuda:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
Put those exports into your users ~./bashrc to have them added automatically on log in.
Previously, you used the full path to start nvcc /usr/local/cuda-10.0/bin/nvcc

Ok Thansks for that we have set the env variable now.

So what value we must set for gpu-burn test

for example : i am running # make && ./gpu_burn 3600
i have 2 gpu rtx 2080

kindly suggest

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-10.0/bin:.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-10.0/bin nvcc -I/usr/local/cuda-10.0/include -arch=compute_30 -ptx compare.cu -o compare.ptx
/usr/local/cuda-10.0/nvvm/bin/cicc: IO error: Error opening output file ‘compare.ptx’: Permission denied
g++ -O3 -Wno-unused-result -I/usr/local/cuda-10.0/include -c gpu_burn-drv.cpp
gpu_burn-drv.cpp: In member function ‘void GPU_Test::initCompareKernel()’:
gpu_burn-drv.cpp:222:14: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:222:14: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:14: warning: ‘CUresult cuParamSetv(CUfunction, int, void*, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:14: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:14: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:14: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:14: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:14: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:227:14: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp:227:14: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp: In member function ‘void GPU_Test::compare()’:
gpu_burn-drv.cpp:233:14: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp:233:14: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp: In instantiation of ‘void GPU_Test::compare() [with T = double]’:
gpu_burn-drv.cpp:297:4: required from ‘void startBurn(int, int, T
, T*, bool) [with T = double]’
gpu_burn-drv.cpp:544:15: required from ‘void launch(int, bool) [with T = double]’
gpu_burn-drv.cpp:605:39: required from here
gpu_burn-drv.cpp:233:26: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp:233:26: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp:233:26: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp: In instantiation of ‘void GPU_Test::compare() [with T = float]’:
gpu_burn-drv.cpp:297:4: required from ‘void startBurn(int, int, T*, T*, bool) [with T = float]’
gpu_burn-drv.cpp:544:15: required from ‘void launch(int, bool) [with T = float]’
gpu_burn-drv.cpp:607:38: required from here
gpu_burn-drv.cpp:233:26: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp:233:26: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp:233:26: warning: ‘CUresult cuLaunchGrid(CUfunction, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuLaunchGrid(d_function, SIZE/g_blockSize, SIZE/g_blockSize), “Launch grid”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11175:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuLaunchGrid(CUfunction f, int grid_width, int grid_height);
^
gpu_burn-drv.cpp: In instantiation of ‘void GPU_Test::initCompareKernel() [with T = double]’:
gpu_burn-drv.cpp:188:20: required from ‘void GPU_Test::initBuffers(T*, T*) [with T = double]’
gpu_burn-drv.cpp:285:3: required from ‘void startBurn(int, int, T*, T*, bool) [with T = double]’
gpu_burn-drv.cpp:544:15: required from ‘void launch(int, bool) [with T = double]’
gpu_burn-drv.cpp:605:39: required from here
gpu_burn-drv.cpp:222:28: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:222:28: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:222:28: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void*, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:227:33: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp:227:33: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp:227:33: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp: In instantiation of ‘void GPU_Test::initCompareKernel() [with T = float]’:
gpu_burn-drv.cpp:188:20: required from ‘void GPU_Test::initBuffers(T
, T*) [with T = float]’
gpu_burn-drv.cpp:285:3: required from ‘void startBurn(int, int, T*, T*, bool) [with T = float]’
gpu_burn-drv.cpp:544:15: required from ‘void launch(int, bool) [with T = float]’
gpu_burn-drv.cpp:607:38: required from here
gpu_burn-drv.cpp:222:28: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:222:28: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:222:28: warning: ‘CUresult cuParamSetSize(CUfunction, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetSize(d_function, __alignof(T*) + __alignof(int*) + __alignof(size_t)), “set param size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10998:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetSize(CUfunction hfunc, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void*, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:223:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, 0, &d_Cdata, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:224:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*), &d_faultyElemData, sizeof(T*)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:225:25: warning: ‘CUresult cuParamSetv(CUfunction, int, void
, unsigned int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuParamSetv(d_function, __alignof(T*) + __alignof(int*), &d_iters, sizeof(size_t)), “set param”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:11099:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuParamSetv(CUfunction hfunc, int offset, void *ptr, unsigned int numbytes);
^
gpu_burn-drv.cpp:227:33: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp:227:33: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
gpu_burn-drv.cpp:227:33: warning: ‘CUresult cuFuncSetBlockShape(CUfunction, int, int, int)’ is deprecated [-Wdeprecated-declarations]
checkError(cuFuncSetBlockShape(d_function, g_blockSize, g_blockSize, 1), “set block size”);
^
In file included from gpu_burn-drv.cpp:48:0:
/usr/local/cuda-10.0/include/cuda.h:10932:36: note: declared here
__CUDA_DEPRECATED CUresult CUDAAPI cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z);
^
Assembler messages:
Fatal error: can’t create gpu_burn-drv.o: Permission denied
Makefile:10: recipe for target ‘drv’ failed
make: *** [drv] Error 1

Got This error after running gpu-burn

kindly suggest now what to do??

You’re running this in a directory where your user doesn’t has access to.

g++ -o gpu_burn gpu_burn-drv.o -O3 -lcuda -L/usr/local/cuda-10.0/lib64 -L/usr/local/cuda-10.0/lib -Wl,-rpath=/usr/local/cuda-10.0/lib64 -Wl,-rpath=/usr/local/cuda-10.0/lib -lcublas -lcudart -o gpu_burn
GPU 0: GeForce RTX 2080 (UUID: GPU-03cff016-0cf3-20f7-30de-9f13b9d7dc1b)
GPU 1: GeForce RTX 2080 (UUID: GPU-8fad4f80-1bb4-d803-e64a-b8ffcb393306)
Initialized device 1 with 7952 MB of memory (7747 MB available, using 6972 MB of it), using FLOATS
Initialized device 0 with 7949 MB of memory (5397 MB available, using 4857 MB of it), using FLOATS
10.5% proc’d: 10234 (9444 Gflop/s) - 10392 (9393 Gflop/s) errors: 0 - 0 temps: 68 C - 68 C
Summary at: Fri Mar 15 18:51:03 IST 2019

21.0% proc’d: 21672 (9453 Gflop/s) - 21650 (9284 Gflop/s) errors: 0 - 0 temps: 74 C - 73 C
Summary at: Fri Mar 15 18:51:24 IST 2019

31.5% proc’d: 33411 (9362 Gflop/s) - 32908 (9257 Gflop/s) errors: 0 - 0 temps: 79 C - 76 C
Summary at: Fri Mar 15 18:51:45 IST 2019

42.0% proc’d: 44849 (9291 Gflop/s) - 44166 (9181 Gflop/s) errors: 0 - 0 temps: 82 C - 79 C
Summary at: Fri Mar 15 18:52:06 IST 2019

52.5% proc’d: 55685 (9220 Gflop/s) - 55424 (9149 Gflop/s) errors: 0 - 0 temps: 85 C - 80 C
Summary at: Fri Mar 15 18:52:27 IST 2019

62.5% proc’d: 66521 (9052 Gflop/s) - 65816 (9122 Gflop/s) errors: 0 - 0 temps: 85 C - 82 C
Summary at: Fri Mar 15 18:52:47 IST 2019

73.0% proc’d: 77357 (8675 Gflop/s) - 77074 (9067 Gflop/s) errors: 0 - 0 temps: 87 C - 84 C
Summary at: Fri Mar 15 18:53:08 IST 2019

83.5% proc’d: 87892 (8725 Gflop/s) - 88332 (9045 Gflop/s) errors: 0 - 0 temps: 86 C - 85 C
Summary at: Fri Mar 15 18:53:29 IST 2019

94.0% proc’d: 98427 (8602 Gflop/s) - 99157 (8925 Gflop/s) errors: 0 - 0 temps: 86 C - 85 C
Summary at: Fri Mar 15 18:53:50 IST 2019

100.0% proc’d: 105049 (8610 Gflop/s) - 105652 (8907 Gflop/s) errors: 0 - 0 temps: 86 C - 85 C
Killing processes… done

Tested 2 GPUs:
GPU 0: OK
GPU 1: OK

kindly check the result for GPU

Looks good. You might want to run it for about 10 minutes and watch if the temperature of GPU1 is rising more that GPU0, since always the second slot is affected. Furthermore, you should start nvidia-persistenced on boot so the driver will keep the gpus initialized. Unfortunately, you used the .run installer instead of the Ubuntu driver packages from ppa, so you will have to set this up manually.
Did you ever try to run with just one gpu in the second slot to rule out a mainboard failure?

Hi,

We Have already testing by Single GPU and its Working Fine. we found no Problem in this. now we are running that test for 10 Minutes and Share the Result with you

HI Still waiting for response from your end.

g++ -o gpu_burn gpu_burn-drv.o -O3 -lcuda -L/usr/local/cuda-10.0/lib64 -L/usr/local/cuda-10.0/lib -Wl,-rpath=/usr/local/cuda-10.0/lib64 -Wl,-rpath=/usr/local/cuda-10.0/lib -lcublas -lcudart -o gpu_burn
GPU 0: GeForce RTX 2080 (UUID: GPU-03cff016-0cf3-20f7-30de-9f13b9d7dc1b)
GPU 1: GeForce RTX 2080 (UUID: GPU-8fad4f80-1bb4-d803-e64a-b8ffcb393306)
Initialized device 0 with 7949 MB of memory (7350 MB available, using 6615 MB of it), using FLOATS
Initialized device 1 with 7952 MB of memory (7747 MB available, using 6972 MB of it), using FLOATS
10.1% proc’d: 58362 (8125 Gflop/s) - 61486 (8646 Gflop/s) errors: 0 - 0 temps: 86 C - 86 C
Summary at: Sat Mar 16 11:57:05 IST 2019

20.1% proc’d: 114669 (8410 Gflop/s) - 122106 (8589 Gflop/s) errors: 0 - 0 temps: 87 C - 85 C
Summary at: Sat Mar 16 11:59:05 IST 2019

30.2% proc’d: 171798 (8028 Gflop/s) - 182726 (8656 Gflop/s) errors: 0 - 0 temps: 86 C - 86 C
Summary at: Sat Mar 16 12:01:06 IST 2019

40.2% proc’d: 228516 (8286 Gflop/s) - 243779 (8693 Gflop/s) errors: 0 - 0 temps: 87 C - 86 C
Summary at: Sat Mar 16 12:03:07 IST 2019

50.3% proc’d: 285234 (8354 Gflop/s) - 304399 (8625 Gflop/s) errors: 0 - 0 temps: 87 C - 87 C
Summary at: Sat Mar 16 12:05:08 IST 2019

60.3% proc’d: 341541 (8069 Gflop/s) - 364586 (8648 Gflop/s) errors: 0 - 0 temps: 87 C - 84 C
Summary at: Sat Mar 16 12:07:08 IST 2019

70.4% proc’d: 398670 (8026 Gflop/s) - 425206 (8703 Gflop/s) errors: 0 - 0 temps: 87 C - 87 C
Summary at: Sat Mar 16 12:09:09 IST 2019

80.5% proc’d: 455388 (8074 Gflop/s) - 485826 (8665 Gflop/s) errors: 0 - 0 temps: 87 C - 86 C
Summary at: Sat Mar 16 12:11:10 IST 2019

90.6% proc’d: 512517 (8076 Gflop/s) - 546879 (8684 Gflop/s) errors: 0 - 0 temps: 87 C - 87 C
Summary at: Sat Mar 16 12:13:11 IST 2019

100.0% proc’d: 565947 (8184 Gflop/s) - 604035 (8618 Gflop/s) errors: 0 - 0 temps: 87 C - 85 C
Killing processes… done

Tested 2 GPUs:
GPU 0: OK
GPU 1: OK

We have already run the test for more than 10 minute & we have received these temperature difference .we have also tested the PCI slot as per your suggestion.So kindly suggest the next step

Please enable nvidia-persistenced to start on boot and check if that resolves the issue.

Hi we have enable the persistence mode please find the detail’s of both GPU
Persistence Mode : Enabled
Persistence Mode : Enabled

What would be the next step.

Have your normal workload run and see if the issue reappears.

1:We have enable Persistence mode but still display manager or Xorg is utilizing GPU memory.

Fri Mar 29 13:51:32 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:17:00.0 On | N/A |
| 0% 38C P8 1W / 225W | 3684MiB / 7949MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce RTX 2080 On | 00000000:B6:00.0 Off | N/A |
| 0% 38C P8 10W / 225W | 1MiB / 7952MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1012 G /usr/lib/xorg/Xorg 3365MiB |
| 0 2158 G compiz 241MiB |
| 0 92750 G …-token=4C60B7791AF8E491E7C05F3B4C0ACB0D 69MiB |
| 0 113775 G unity-control-center 6MiB |
±----------------------------------------------------------------------------+

kindly check the output result.