I’m trying to use nsight compute to profile a large language model program. And I use ncu --kernel-name my_kernel --launch-skip 86 --launch-count 1 --target-processes all python run.py to obtain the kernel result.
But I always got this error: ==PROF== Target process 112708 terminated before first instrumented API call.
which results in ==WARNING== No kernels were profiled.
What can I do to solve this problem?
Any help would be so appreciated!
Sorry for the issue you met !
Would you please try a simple CUDA sample to see if NCU works firstly ? This will help us to isolate if this is an ENV issue or specific sample issue.