RuntimeError: parallel_for failed: no kernel image is available for execution on the device

I’m intrigued by the NLP research paper Leveraging Graph to Improve Abstractive Multi-Document Summarization, but I’m having trouble testing the code found on Github. Here’s the error I get: RuntimeError: parallel_for failed: no kernel image is available for execution on the device. According to the web, people usually encounter this error because their GPU is too old, with a low CUDA Capability value. However, I have a brand new GeForce 3090 with a CUDA Capability value of 8.6. Perhaps this log line yields a clue? I’m kinda lost.

W1023 10:20:11.511365 3292334 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 86, Driver API Version: 11.1, Runtime API Version: 10.0

$ nvidia-smi
Fri Oct 23 14:31:32 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   32C    P0    62W / 350W |      0MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

my guess would be that this is a function of the version of paddlepaddle you are using. It apparently has some compiled objects or is doing runtime compilation without specifying either:

  1. embedded PTX
  2. the necessary compile switches to target SASS to a cc8.6 architecture (which would not be possible anyway with CUDA 10.0)

If you can find a version of paddlepaddle that uses CUDA 11.0 or 11.1, that may fix it.

I’m facing the same problem(3090), is there any way to fix this problem? I believe I install the newest paddle, it support cuda 11.1 now

You would need to find a version of paddlepaddle that is compiled for cc 8.6

You might wish to file an issue against paddlepaddle. This isn’t something that can be fixed by anyone except the paddlepaddle developers, or someone who is compiling/building paddlepaddle from source. This may be of interest, I don’t know that language: paddle 2.0 运行失败

I won’t be able to respond to further questions requesting help/support for paddlepaddle here.