Cuda Runtime API error for cuda Graph and OpenCV

Hello there,

I created a Cuda Graph, using the runtime api and recording the operations; for now it’s a just a simple loop of resize operations from opencv;

I then try to inspect the graph using a simple operation as calling:

auto getKernelParams(cudaGraphNode_t &n) -> cudaKernelNodeParams
    cudaKernelNodeParams par;
    cudaGraphKernelNodeGetParams(n, &par);

but the call to cudaGraphKernelNodeGetParams return error “cudaErrorInvalidDeviceFunction”;

What are the reasons of this kind of error?
Cuda is installed using the sdk manager. Ubuntu 18, Jetpack 4.6 and Cuda 10.2;

1 Like


You can find the detailed error information in our document below:
cudaErrorInvalidDeviceFunction = 98
The requested device function does not exist or is not compiled for the proper device architecture.

The error indicates your CUDA app has not complied with NX architecture.
Please make sure you have added this to the nvcc configuration.

For example:

$ nvcc -gencode arch=compute_72,code=sm_72


Thanks @AastaLLL for the answer;

The problem is that I don’t compile anything with nvcc; moreover OpenCV is installed using your JEP script (correct architteture). The error is just coming from the call to the API.

I’m using CMakeLists to find Cuda and the linkage is correct, also according to the documentation, the set-get function for kernel parameters are the only 2 function that actually can return that specific error.

So I’m curious about other reason for the error apart from the wrong device architecture.


You can set GPU architecture in CMakeLists as well.
Here is an example for your reference:

Please noted that the GPU architecture of Xavier NX is sm_72.

Hi @AastaLLL,
I implemented your suggestion but it didn’t change anything; I made a small sample to reproduce the “error”, it might be helpful to understand better what is going on; I tried to stay close to the structure of the main project but there are some differences in the code since we are using cpp20 and GCC11.x;

The sample compiles with GCC7.5; and the build commands are in the bash script. (6.3 KB)


Thanks for sharing the source with us.

We are going to check it internally.
Will get back to you later.


Thanks for your patience.

We try to reproduce this issue on a JetPack 4.6.1 environment.
But the compiling fails due to a missing OpenCV header as below:

In file included from /home/nvidia/topic_215408/sample/SampleLib/processor.hpp:4:0,
                 from /home/nvidia/topic_215408/sample/exec/sample.cpp:1:
/home/nvidia/topic_215408/sample/SampleLib/image_processing.hpp:4:10: fatal error: opencv2/cudawarping.hpp: No such file or directory
 #include <opencv2/cudawarping.hpp>
compilation terminated.

Is this reproducible with the default OpenCV package?
If not, could you share which OpenCV version you use?



The OpenCV package is built from sorce using your(I guess) script JEP. Version is 4.5.4 and it is built with cuda support.

IIRC there might be some differences like not building for python and/or using ccache; I’ll attach the generated makefile for a comparison if needed.
makefile_text.txt (332.0 KB)


Here are some updates for you:

We can reproduce this issue internally.
It looks like there are issues related to the CUDA Graph but we need more time to investigate.
Will give you an update once we got further information.


Thanks @AastaLLL

I’ll wait for the update.


Thanks for your patience.
This issue is related to CUDA runtime API.

If you create the CUDA graph from a dynamic library and tries to introspect it outside the library.
The query functions may fail because the nodes reference CUDA C++ symbols that belong to a different runtime and are not in the local runtime’s map.

The workaround currently is to use driver APIs instead.
We can run your sample without issue after switching to the driver APIs.

Here is our change for your reference: driverAPIs.patch (3.4 KB)


Thanks to you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.