we have eight P4 card on our server
[root@localhost ~]# nvidia-smi
Tue Apr 23 14:15:28 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:18:00.0 Off | 0 |
| N/A 72C P0 28W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:19:00.0 Off | 0 |
| N/A 71C P0 26W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P4 Off | 00000000:5F:00.0 Off | 0 |
| N/A 68C P0 25W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P4 Off | 00000000:86:00.0 Off | 2 |
| N/A 39C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla P4 Off | 00000000:87:00.0 Off | 0 |
| N/A 72C P0 27W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla P4 Off | 00000000:AF:00.0 Off | 0 |
| N/A 60C P0 25W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla P4 Off | 00000000:B0:00.0 Off | 0 |
| N/A 62C P0 25W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla P4 Off | 00000000:D8:00.0 Off | 0 |
| N/A 70C P0 25W / 75W | 5413MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 136366 C java 5403MiB |
| 1 152242 C java 5403MiB |
| 2 158402 C java 5403MiB |
| 4 140028 C java 5403MiB |
| 5 147174 C java 5403MiB |
| 6 141910 C java 5403MiB |
| 7 144020 C java 5403MiB |
+-----------------------------------------------------------------------------+
but something seems wrong with No.3 card.
I write a small demo,code is here
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <sstream>
using namespace std;
int dataSize = 16;
int main(int argc, char*argv[])
{
if(argc < 3)
{
printf("please enter device dataSize\n");
return 0;
}
int device = 0;
std::stringstream convert;
convert << argv[1];
convert >> device;
convert.clear();
convert << argv[2];
convert >> dataSize;
if(cudaSuccess != cudaSetDevice(device))
{
printf("cuda set device error\n");
return -1;
}
int * pGpuDistance;
if(cudaSuccess != cudaMalloc((void **)&pGpuDistance, sizeof(int)*dataSize))
{
printf("cuda set device error\n");
return -1;
}
if(cudaSuccess != cudaFree(pGpuDistance))
{
printf("cuda free error\n");
return -1;
}
if(cudaSuccess != cudaDeviceReset())
{
printf("cuda free error\n");
return -1;
}
return 0;
}
when i run this demo, No.3 card seems not OK.
[root@localhost ~]# ./a.out 3 100
段错误
[root@localhost ~]# ./a.out 2 100
您在 /var/spool/mail/root 中有新邮件
[root@localhost ~]# ./a.out 4 100
linux version is
[root@localhost ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
Can anyone tells me what’s wrong?
Thanks!!