Amazon P3 doesnt work on linux 17.04

How to reproduce:

  1. Select AWS Ubuntu 17.04 instance ami-32e7464a on region us-west-2 on p3.2xlarge

  2. Follow linux install guide.

  3. Build examples

  4. run device query = PASS

  5. run device query again = CRASHES SERVER.

The first time any example kernel is launched CUDA works ok. Then subsequent launch crashes the server.

I have tried

  • many images
  • many regions,
  • ubuntu 16.04 and 17.04
    = install methods (deb/runtime etc).

And always the same result.

Amazon or Nvidia please fix! Been waiting weeks for this to work.

ubuntu@ip-172-31-xx-xx:~/samples/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla V100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    7.0
  Total amount of global memory:                 16152 MBytes (16936861696 bytes)
  (80) Multiprocessors, ( 64) CUDA Cores/MP:     5120 CUDA Cores
  GPU Max Clock rate:                            1530 MHz (1.53 GHz)
  Memory Clock rate:                             877 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 6291456 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 30
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS
ubuntu@ip-172-31-xx-xx:~/samples/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

(system hangs)

driver version:

ubuntu@ip-172-31-xx-xx:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  387.26  Thu Nov  2 21:20:16 PDT 2017
GCC version:  gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2)

nvcc

ubuntu@ip-172-31-xx-xx:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

The issue now appears to be understood and affects all currently available r387 drivers.

It should be fixed in a future r387 driver that is 387.41 or later.

As an interim workaround, use CUDA 9.0/r384 drivers on AWS P3 instances.

There is a 390.12 beta driver which has been posted since January 4th that is believed to fix this issue.

Future 390.xx drivers should also be posted. It appears now that there may not be any further 387.xx drivers posted, but any 390.xx driver should fix this issue.