RuntimeError: parallel_for failed: no kernel image is available for execution on the device

I’m intrigued by the NLP research paper Leveraging Graph to Improve Abstractive Multi-Document Summarization, but I’m having trouble testing the code found on Github. Here’s the error I get: RuntimeError: parallel_for failed: no kernel image is available for execution on the device. According to the web, people usually encounter this error because their GPU is too old, with a low CUDA Capability value. However, I have a brand new GeForce 3090 with a CUDA Capability value of 8.6. Perhaps this log line yields a clue? I’m kinda lost.

W1023 10:20:11.511365 3292334 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 86, Driver API Version: 11.1, Runtime API Version: 10.0

$ nvidia-smi
Fri Oct 23 14:31:32 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   32C    P0    62W / 350W |      0MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

my guess would be that this is a function of the version of paddlepaddle you are using. It apparently has some compiled objects or is doing runtime compilation without specifying either:

  1. embedded PTX
  2. the necessary compile switches to target SASS to a cc8.6 architecture (which would not be possible anyway with CUDA 10.0)

If you can find a version of paddlepaddle that uses CUDA 11.0 or 11.1, that may fix it.