CUDA Error Prev


I am currently attempting to get a cuda enabled docker working on my jetson-nano (for balenaOS). i am using 2 different detection networks. one with darknet and one with opencv+cudnn

the detections themself work fine, but when i first run a detection with opencv, then secondly with darknet, i get the following error:

 CUDA Error Prev: operation not supported

anybody know what could cause this?

-L4T 32.4.2
-opencv 4.5.0


Usually, “operation not supported” is from an implementation issue.
For example, inference a CPU buffer with GPU.

Could you do the following experiments first?

1. Please check if GPU can work correctly within the docker.
You can test this with deviceQuery app in the CUDA sample folder.

2. Could you check if your detection can run well on GPU outside of the docker.


thanks, something does seem to be faulty with my cuda installation when building the sample.

root@7c839ba:/usr/local/cuda/samples/1_Utilities/deviceQuery# make
/usr/local/cuda-10.2/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o 
/usr/bin/ld: cannot find -lcudadevrt
/usr/bin/ld: cannot find -lcudart_static
collect2: error: ld returned 1 exit status
Makefile:303: recipe for target 'deviceQuery' failed
make: *** [deviceQuery] Error 1

note: this might be the only relevant to the devicequery

my fault, i deleted some files to reduce disk space.
./devicequery returns:

root@7c839ba:/usr/local/cuda/samples/1_Utilities/deviceQuery# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3961 MBytes (4153769984 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

i also ran the detection using --runtime nvidia, and the detection frequency is the same

well, this seems to fix my problem as well
i used "rm -rf /usr/local/cuda/targets/aarch64-linux/lib/*.a " to clear a some space and never had the issue using --runtime nvidia.
thanks for the help

Good to know this.
Thanks for the feedback.