Nsys profile error: invalidArgumentException, unknown API driver activity

Hello everyone, I have a problem on profiling in Nsight System. The CLI is :
CUDA_VISIBLE_DEVICES=2 nsys profile --trace=cuda,cudnn,cublas -o p1 ./main, but I got a error retured.

I try to change the trace arguments, however, it didn’t work. Besides, without nsys, the ./main passed.

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: “Local (CLI)”
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/fast/src/Alt/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(47): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::InvalidArgumentException\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_error_text*] = Unknown API driver activity\n[boost::errinfo_errno_*] = 626, "Unknown error 626"\n”
}
}
}
}

Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: “Local (CLI)”
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/fast/src/Alt/QuadD/Host/Analysis/Modules/StringStorage.cpp(150): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExteriorId(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::LogicException\nstd::exception::what: LogicException\n[QuadDCommon::tag_error_text*] = Cannot find bucket for a bucket index\n”
}
}
}
}

=======================
I found the solution. Just using the nsys in system path(installed with cuda driver), not standalone installed.

Hi

How did you resolve this error? I didn’t understand what you meant by “Just using the nsys in system path(installed with cuda driver), not standalone installed.”
Thank you

I believe, based on what they were saying, that they had multiple versions of Nsight Systems installed and that they were using the installed version that did not match their driver version.

Are you seeing the same issue?

Hi, How diss you resolve this error? I have meet this error when I use nsys profile. And I try to use different cuda version, but it is useless.

How can I make sure that the nsight systems version matches the driver version?

Technically it is the CUDA toolkit version that needs to match the driver version. Most people achieve this by getting their drivers from the CUDA toolkit. Nsys will work on any set of driver/CUDA from CUDA 8.0 on (although we only test back to 10,0).

See Installation Guide :: Nsight Systems Documentation for specifics on CUDA driver/CTK versions.

Hi, I am having a similar issue. The error is

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(62): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_error_text*] = Unknown API runtime activity\n[boost::errinfo_errno_*] = 406, \"Unknown error 406\"\n"
      }
    }
  }
}
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(62): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_error_text*] = Unknown API runtime activity\n[boost::errinfo_errno_*] = 406, \"Unknown error 406\"\n"
      }
    }
  }
}

I am pretty sure my driver and toolkit matches; both cudaRuntimeGetVersion and cudaDriverGetVersion gives 11020. I cannot use the nsys that comes with the driver since it was simply not installed with the driver (by my system administrator); running

/usr/local/cuda/bin/nsys

simply gives

Error: Nsight Systems 2020.4.3 hasn't been installed with CUDA Toolkit 11.2

I have tried installing nsys 2020.4.1 and the latest 2023.2, both gives similar error.

I am on CentOS 7 and running onnxruntime built from C++ source on an A10 GPU. The program runs normally without nsys profile.

Can someone tell me where did I go wrong…

@liuyis can you take a look

Hi @yuc8939, can you please share what’s the error you hit when you use Nsys 2023.2 release?

The error you shared is because Nsys 2020.4.1 doesn’t support CUDA 11.2, but that should not happen with Nsys 2023.2.

Hi @liuyis , 2023.2 gives

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::wrapexcept<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown runtime API function index: 406\n"
      }
    }
  }
}
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/StringStorage.cpp(155): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExteriorId(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::wrapexcept<QuadDCommon::LogicException>\nstd::exception::what: LogicException\n[QuadDCommon::tag_message*] = Cannot find string for an exterior index\n"
      }
    }
  }
}

And there are several more Cannot find string for an exterior index errors that are identical to the one posted above. The error list is always one Unknown runtime API function index: 406 followed by several Cannot find string for an exterior index.

Thank you. Could you share the results of nvidia-smi and nsys --version?

nvidia-smi gives

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A10                 On   | 00000000:41:00.0 Off |                    0 |
|  0%   42C    P0    42W / 150W |  18499MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  A10                 On   | 00000000:42:00.0 Off |                    0 |
|  0%   36C    P8    21W / 150W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  A10                 On   | 00000000:61:00.0 Off |                    0 |
|  0%   39C    P8    21W / 150W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  A10                 On   | 00000000:62:00.0 Off |                    0 |
|  0%   36C    P8    15W / 150W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

(FYI I was running on GPU #1)
and nsys --version

NVIDIA Nsight Systems version 2023.2.1.122-32598524v0

Thank you. It’s a bit strange here, because looking up the mapping between the function index and CUDA runtime API, 406 corresponds to cudaGetDriverEntryPoint, which should only exist since CUDA 11.3. Not sure why your application could trigger this API while your system is using CUDA 11.2. (And that’s why Nsys reports the error because this function index is unexpected under your driver version)

Could you also check nvcc --version? Is there any chance that the app was built with CTK 11.3 or higher despite the driver is CUDA 11.2?

Also, is it possible for you to update the CUDA driver to 11.3 or higher?

nvcc --version gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

I tried to find cudaGetDriverEntryPoint in my code but there is no matching result. I’ll try later to find out it there are any third party libraries that might call this API.
Speaking of the “mapping between the function index and CUDA runtime API”, has the mapping been documented somewhere?

Updating the CUDA driver to 11.3 is possible on my development machine, but not possible on the kubernetes cluster where my application would be deployed onto, and I don’t want my development environment to be different from the production servers.

You can search for the cupti_runtime_cbid.h header in your CTK installation folder. For example, on my system, it’s at /usr/local/cuda-12.1/targets/x86_64-linux/include/cupti_runtime_cbid.h.

2 Likes

Turns out the cuDNN lib was not built with CUDA 11.2. Downgrading cuDNN solved the problem. Thanks very much for your quick reply and detailed explaination :)

1 Like

You are welcome!

Hello, I encountered a similar error:

Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::wrapexcept<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown driver API function index: 711\n"
      }
    }
  }
}
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/StringStorage.cpp(155): Throw in function QuadDCommon::StringId QuadDAnalysis::StringStorage::GetKeyForExteriorId(QuadDAnalysis::GlobalProcess, QuadDAnalysis::StringStorage::ExteriorId) const\nDynamic exception type: boost::wrapexcept<QuadDCommon::LogicException>\nstd::exception::what: LogicException\n[QuadDCommon::tag_message*] = Cannot find string for an exterior index\n"
      }
    }
  }
}
...
...

output of nvidia-smi:
NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2

output of nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

output of nsys --version:
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0

There is indeed a mismatch on cuda version of the outputs of nvidia-smi and nvcc --version. However, if I use only one process (I run a pytorch model and use --nproc_per_node=N to set the number of process and GPUs to use), the aforementioned error does not occur. If I set --nproc_per_node to more than one, the error would always occur.

Could you give some hint of the cause of the error? Thanks a lot!