we traced the problem to a possible graphics driver issue
CUDA driver is not working well with the kernel…
will be all about the calls the driver makes for interrupts…
the large waits are on the call;
open(“/dev/nvidia0”,
O_RDWR)
open(“/dev/nvidia1”,
O_RDWR)
Kernel version which we use are:
4.4.0-154-generic
4.4.0-141-generic
- historically we only used 3 major versions of NVIDIA graphics drivers (384, 410, 418) - none of which are specifically support with TITAN V’s
- when selecting a driver you need to select based on GPU (Titan V), OS version (Ubuntu 16.04), toolkit (Cuda 9). According to NVIDIA the only version that specifically released for the combination mentioned before is 387.34 .
We tried installing that version, but was unsuccessful (due to some compilation issue with the kernel) - the next likely candidate driver version Mike tried was 430, which supports cuda 10 - again very slow