Nvcc compilation stuck at ptxas

Hello, I am using Libtorch2.4.0 & CUDA 11.8 to develop a deep learning project. This project include custom backward& forward CUDA functions which are written in *.cu. This project *can compile and run successfully previously but when I add more codes in .cu, the compliation stucks and never return(not err code, just stuck here). Then I use command “top”, I found there is a command “ptxas -arch sm_89 -m64 -v /tmp/tmpxft_00006356_00000000-6_main.ptx -o /tmp/tmpxft_00006356_00000000-8_main.cubin”. To make sure ptxas is the culprit, I run this command mannully(i.e. type it &run it in terminal) and it never return.

My guess is that, there maybe some limitation(register number, device code size or constant memory limit) violated? Unfortunely, there is not output from ptxas since it never return, so I have no clue how to optimize my code.

For you information, my cuda code is around 800 lines, I use cuco::static_map to accelerate my code.

I will be really appriciated if you can help, thanks in advance!

It’s possible that a particular code causes ptxas to spend a very long time during compilation. There is no way to diagnose such cases without an actual, complete example.

Since you mention CUDA 11.8, you may wish to try that ptxas command that takes a long time (“stuck”) on the latest version of CUDA, currently 12.6

Luckily, I successfully compile my code under CUDA 12.4. Still don’t know why, but thanks!