Nsys profile error: invalidArgumentException, unknown API driver activity

1229889599 · July 27, 2021, 10:33am

Hello everyone, I have a problem on profiling in Nsight System. The CLI is :
CUDA_VISIBLE_DEVICES=2 nsys profile --trace=cuda,cudnn,cublas -o p1 ./main, but I got a error retured.

I try to change the trace arguments, however, it didn’t work. Besides, without nsys, the ./main passed.

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: “Local (CLI)”
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/fast/src/Alt/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(47): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::InvalidArgumentException\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_error_text*] = Unknown API driver activity\n[boost::errinfo_errno_*] = 626, "Unknown error 626"\n”
}
}
}
}

Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: “Local (CLI)”
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/fast/src/Alt/QuadD/Host/Analysis/Modules/StringStorage.cpp(150): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExteriorId(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::LogicException\nstd::exception::what: LogicException\n[QuadDCommon::tag_error_text*] = Cannot find bucket for a bucket index\n”
}
}
}
}

=======================
I found the solution. Just using the nsys in system path(installed with cuda driver), not standalone installed.

stalati · March 3, 2023, 4:35pm

Hi

How did you resolve this error? I didn’t understand what you meant by “Just using the nsys in system path(installed with cuda driver), not standalone installed.”
Thank you

hwilper · March 6, 2023, 7:54pm

I believe, based on what they were saying, that they had multiple versions of Nsight Systems installed and that they were using the installed version that did not match their driver version.

Are you seeing the same issue?

2455790890 · June 10, 2023, 2:17pm

Hi, How diss you resolve this error? I have meet this error when I use nsys profile. And I try to use different cuda version, but it is useless.

2455790890 · June 10, 2023, 2:18pm

How can I make sure that the nsight systems version matches the driver version?

hwilper · June 20, 2023, 2:34pm

Technically it is the CUDA toolkit version that needs to match the driver version. Most people achieve this by getting their drivers from the CUDA toolkit. Nsys will work on any set of driver/CUDA from CUDA 8.0 on (although we only test back to 10,0).

See Installation Guide :: Nsight Systems Documentation for specifics on CUDA driver/CTK versions.

danielyu.33 · June 27, 2023, 7:49pm

Hi, I am having a similar issue. The error is

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(62): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_error_text*] = Unknown API runtime activity\n[boost::errinfo_errno_*] = 406, \"Unknown error 406\"\n"
      }
    }
  }
}
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(62): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_error_text*] = Unknown API runtime activity\n[boost::errinfo_errno_*] = 406, \"Unknown error 406\"\n"
      }
    }
  }
}

I am pretty sure my driver and toolkit matches; both cudaRuntimeGetVersion and cudaDriverGetVersion gives 11020. I cannot use the nsys that comes with the driver since it was simply not installed with the driver (by my system administrator); running

/usr/local/cuda/bin/nsys

simply gives

Error: Nsight Systems 2020.4.3 hasn't been installed with CUDA Toolkit 11.2

I have tried installing nsys 2020.4.1 and the latest 2023.2, both gives similar error.

I am on CentOS 7 and running onnxruntime built from C++ source on an A10 GPU. The program runs normally without nsys profile.

Can someone tell me where did I go wrong…

hwilper · June 27, 2023, 7:54pm

@liuyis can you take a look

liuyis · June 28, 2023, 3:19am

Hi @danielyu.33, can you please share what’s the error you hit when you use Nsys 2023.2 release?

The error you shared is because Nsys 2020.4.1 doesn’t support CUDA 11.2, but that should not happen with Nsys 2023.2.

danielyu.33 · June 28, 2023, 5:43am

Hi @liuyis , 2023.2 gives

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::wrapexcept<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown runtime API function index: 406\n"
      }
    }
  }
}
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/StringStorage.cpp(155): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExteriorId(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::wrapexcept<QuadDCommon::LogicException>\nstd::exception::what: LogicException\n[QuadDCommon::tag_message*] = Cannot find string for an exterior index\n"
      }
    }
  }
}

And there are several more Cannot find string for an exterior index errors that are identical to the one posted above. The error list is always one Unknown runtime API function index: 406 followed by several Cannot find string for an exterior index.

liuyis · June 28, 2023, 6:44am

Thank you. Could you share the results of nvidia-smi and nsys --version?

danielyu.33 · June 28, 2023, 6:47am

nvidia-smi gives

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A10                 On   | 00000000:41:00.0 Off |                    0 |
|  0%   42C    P0    42W / 150W |  18499MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  A10                 On   | 00000000:42:00.0 Off |                    0 |
|  0%   36C    P8    21W / 150W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  A10                 On   | 00000000:61:00.0 Off |                    0 |
|  0%   39C    P8    21W / 150W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  A10                 On   | 00000000:62:00.0 Off |                    0 |
|  0%   36C    P8    15W / 150W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

(FYI I was running on GPU #1)
and nsys --version

NVIDIA Nsight Systems version 2023.2.1.122-32598524v0

liuyis · June 28, 2023, 7:02am

Thank you. It’s a bit strange here, because looking up the mapping between the function index and CUDA runtime API, 406 corresponds to cudaGetDriverEntryPoint, which should only exist since CUDA 11.3. Not sure why your application could trigger this API while your system is using CUDA 11.2. (And that’s why Nsys reports the error because this function index is unexpected under your driver version)

Could you also check nvcc --version? Is there any chance that the app was built with CTK 11.3 or higher despite the driver is CUDA 11.2?

Also, is it possible for you to update the CUDA driver to 11.3 or higher?

danielyu.33 · June 28, 2023, 7:49am

nvcc --version gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

I tried to find cudaGetDriverEntryPoint in my code but there is no matching result. I’ll try later to find out it there are any third party libraries that might call this API.
Speaking of the “mapping between the function index and CUDA runtime API”, has the mapping been documented somewhere?

Updating the CUDA driver to 11.3 is possible on my development machine, but not possible on the kubernetes cluster where my application would be deployed onto, and I don’t want my development environment to be different from the production servers.

liuyis · June 28, 2023, 7:57am

You can search for the cupti_runtime_cbid.h header in your CTK installation folder. For example, on my system, it’s at /usr/local/cuda-12.1/targets/x86_64-linux/include/cupti_runtime_cbid.h.

danielyu.33 · June 30, 2023, 12:32pm

Turns out the cuDNN lib was not built with CUDA 11.2. Downgrading cuDNN solved the problem. Thanks very much for your quick reply and detailed explaination :)

liuyis · June 30, 2023, 1:28pm

You are welcome!

zhangyuzhou15 · July 28, 2023, 9:02am

Hello, I encountered a similar error:

Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::wrapexcept<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown driver API function index: 711\n"
      }
    }
  }
}
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/StringStorage.cpp(155): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExteriorId(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::wrapexcept<QuadDCommon::LogicException>\nstd::exception::what: LogicException\n[QuadDCommon::tag_message*] = Cannot find string for an exterior index\n"
      }
    }
  }
}
...
...

output of nvidia-smi:
NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2

output of nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

output of nsys --version:
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0

There is indeed a mismatch on cuda version of the outputs of nvidia-smi and nvcc --version. However, if I use only one process (I run a pytorch model and use --nproc_per_node=N to set the number of process and GPUs to use), the aforementioned error does not occur. If I set --nproc_per_node to more than one, the error would always occur.

Could you give some hint of the cause of the error? Thanks a lot!

Topic		Replies	Views
Nsys run time error "Unknown API driver activity" profiling Numba with nsys on CUDA 11.3 Profiling Linux Targets	1	519	August 18, 2023
How to profile an application with Cuda 12.1 driver? Profiling Linux Targets	19	2642	July 18, 2023
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	9367	January 11, 2023
Nsys does not show CUDA kernels Profiling Linux Targets	6	1342	December 12, 2022
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2695	September 14, 2023
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	3729	March 4, 2021
[QuadDCommon::tag_message*] = No GPU associated to the given UUID Profiling Linux Targets	24	1043	November 5, 2024
Unable to capture "Can't find UUID for CUDA device" Profiling Linux Targets	10	2511	November 9, 2023
Nsys Does not Show the kernels output Profiling Embedded Targets	21	3297	October 20, 2022
Nsys command line on agx pegasus Profiling DRIVE Targets drive-devtools	13	1912	November 16, 2021

Nsys profile error: invalidArgumentException, unknown API driver activity

Related topics