CUDA error 100 & Decoder not initialized. when run decPerf sample

Since we setup another deepstream develop environment, I compile and run the decPerf sample, but it log:

[DEBUG][14:31:09] =========== Video Parameters Begin =============
[DEBUG][14:31:09]       Video codec     : AVC/H.264
[DEBUG][14:31:09]       Frame rate      : 30/1 = 30 fps
[DEBUG][14:31:09]       Sequence format : Progressive
[DEBUG][14:31:09]       Coded frame size: [1280, 720]
[DEBUG][14:31:09]       Display area    : [0, 0, 1280, 720]
[DEBUG][14:31:09]       Chroma format   : YUV 420
[DEBUG][14:31:09] =========== Video Parameters End   =============
[ERROR][14:31:09] CUDA error 100 at line 165 in file src/nvDecLite.cpp
[ERROR][14:31:09] Decoder not initialized.

The main ubuntu 16.04 develop toolchain is
TensorRT 3.0 + CUDA 9.0 + cuDNN 7.0 + Tesla P4 + Driver 384.90
and the second tool chain installed at /home/admin/opt/TensorRT2.1 is
DeepStream 1.0 + Video Codec SDK 8.0.14 + TensorRT 2.1.2 + CUDA 8.0 + Tesla P4 + Driver 384.90
and the nvidia-smi tool output:

Tue Dec 19 14:45:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:02:00.0 Off |                    0 |
| N/A   75C    P8     9W /  75W |      0MiB /  7606MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 710      Off  | 00000000:03:00.0 N/A |                  N/A |
| 30%   34C    P8    N/A /  N/A |    312MiB /   979MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

how to fix this? I already successful run decPerf sample in another environment 2 weeks ago.

Hi,

Have you installed Video_Codec_SDK_7.1.9 in your environment?
Please also remember to correct the path in Makefile.sample_decPerf to the location you installed:

VIDEOSDK_INSTALL_PATH = /path/to/the/Video_Codec_SDK_7.1.9

Thanks.

I download Video_Codec_SDK_7.1.9 from NVIDIA download archive, and unzip it to

/home/admin/src/Video_Codec_SDK_7.1.9

changed the Makefile.sample_decPerf content with

VIDEOSDK_INSTALL_PATH = /home/admin/src/Video_Codec_SDK_7.1.9

the compiled target still report “Decoder not initialized.” when running it.

Hi,

This error indicates CUDA_ERROR_NO_DEVICE when calling cuvidCreateDecoder().

From the nvidia-smi log, you have two GPU cards on the machine.
Could you check if you create the decoder on the correct P4 GPU card first?

Thanks.

I’m compiling the “cuda-8.0/samples/3_Imaging/cudaDecodeGL” and got:

/usr/bin/ld: cannot find -lnvcuvid

so I’m searching the nvcudvid lib now.
14:44+8:00
I changed the “cuda-8.0/samples/3_Imaging/cudaDecodeGL/Makefile” one line

# Common includes and paths for CUDA
INCLUDES  := -I../../common/inc
LIBRARIES := <b>-L/usr/lib/nvidia-384</b>

and successful compiling cudaDecodeGL sample, but it run with failed 100

[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = ../../bin/x86_64/linux/release/cudaDecodeGL
[cudaDecodeGL]: input file: <./data/plush1_720p_10s.m2v>
	VideoCodec      : MPEG-2
	Frame rate      : 30000/1001fps ~ 29.97fps
	Sequence format : Progressive
	Coded frame size: [1280, 720]
	Display area    : [0, 0, 1280, 720]
	Chroma format   : 4:2:0
	Bitrate         : 14116kBit/s
	Aspect ratio    : 16:9

argv[0] = ../../bin/x86_64/linux/release/cudaDecodeGL

> Device 0: <        Tesla P4 >, Compute SM 6.1 detected
reshape() glViewport(0, 0, 1280, 720)
>> initGL() creating window [1280 x 720]
> Using CUDA/GL Device [0]: Tesla P4
> Using GPU Device: Tesla P4 has SM 6.1 compute capability
  Total amount of global memory:     7606.3750 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file:              NV12ToARGB_drvapi64.ptx >
   CUDA Kernel Function (0x021c4c60) = <   NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file:              NV12ToARGB_drvapi64.ptx >
   CUDA Kernel Function (0x021c86d0) = <     Passthru_drvapi >
cuvidCtxLockCreate failed: 100
cudaDecodeGL: videoDecodeGL.cpp:1050: void initCudaVideo(): Assertion `0' failed.
Aborted (core dumped)

I run cudaDecodeGL with vnc viewer first time, but when I connect a monitor to the GeForce GT 710, still has error with or without device option:

argv[0] = ../../bin/x86_64/linux/release/cudaDecodeGL
argv[1] = device=1

gpuDeviceInitDRV() Using CUDA Device [1]: GeForce GT 710
> Device 0: <        Tesla P4 >, Compute SM 6.1 detected
reshape() glViewport(0, 0, 1280, 720)
>> initGL() creating window [1280 x 720]
gpuDeviceInitDRV() Using CUDA Device [1]: GeForce GT 710
> Using GPU Device: GeForce GT 710 has SM 3.5 compute capability
  Total amount of global memory:     979.8125 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file:              NV12ToARGB_drvapi64.ptx >
   CUDA Kernel Function (0x019e7dd0) = <   NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file:              NV12ToARGB_drvapi64.ptx >
   CUDA Kernel Function (0x019f0cc0) = <     Passthru_drvapi >
cuvidCtxLockCreate failed: 100
cudaDecodeGL: videoDecodeGL.cpp:1050: void initCudaVideo(): Assertion `0' failed.
Aborted (core dumped)

I searched for the gpuDeviceInitDRV() call, and TBC.

Hi,

Could you try to use driver-375 rather than driver-384?
Thanks

left it blank as success install with .run file.

left it blank as success install with .run file.

Hi AastaLLL, I can’t downgrade the driver with deb or ppa, so I reinstalled the ubuntu and install driver 375.66 and CUDA 8.0 with .run file. the cuvidCtxLockCreate failed: 100 problem dismissed and everything goes well

hello@admin-PT6630GC:~/src/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL$ ../../bin/x86_64/linux/release/cudaDecodeGL 
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = ../../bin/x86_64/linux/release/cudaDecodeGL
[cudaDecodeGL]: input file: <./data/plush1_720p_10s.m2v>
	VideoCodec      : MPEG-2
	Frame rate      : 30000/1001fps ~ 29.97fps
	Sequence format : Progressive
	Coded frame size: [1280, 720]
	Display area    : [0, 0, 1280, 720]
	Chroma format   : 4:2:0
	Bitrate         : 14116kBit/s
	Aspect ratio    : 16:9

argv[0] = ../../bin/x86_64/linux/release/cudaDecodeGL

> Device 0: <        Tesla P4 >, Compute SM 6.1 detected
reshape() glViewport(0, 0, 1280, 720)
>> initGL() creating window [1280 x 720]
> Using CUDA/GL Device [0]: Tesla P4
> Using GPU Device: Tesla P4 has SM 6.1 compute capability
  Total amount of global memory:     7606.3750 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file:              NV12ToARGB_drvapi64.ptx >
   CUDA Kernel Function (0x0126b5d0) = <   NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file:              NV12ToARGB_drvapi64.ptx >
   CUDA Kernel Function (0x0126ec10) = <     Passthru_drvapi >
  Free memory:     7491.3750 MB
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder

setTextureFilterMode(GL_NEAREST,GL_NEAREST)
ImageGL::CUcontext = 00b646e0
ImageGL::CUdevice  = 00000000
reshape() glViewport(0, 0, 1280, 720)
[cudaDecodeGL] - [Frame: 0016, 00.0 fps, frame time: 94619467776.00 (ms) ]
[cudaDecodeGL] - [Frame: 0032, 305.9 fps, frame time: 3.27 (ms) ]
[cudaDecodeGL] - [Frame: 0048, 16.0 fps, frame time: 62.48 (ms) ]
[cudaDecodeGL] - [Frame: 0064, 360.3 fps, frame time: 2.78 (ms) ]
[cudaDecodeGL] - [Frame: 0080, 16.0 fps, frame time: 62.37 (ms) ]
[cudaDecodeGL] - [Frame: 0096, 370.8 fps, frame time: 2.70 (ms) ]
[cudaDecodeGL] - [Frame: 0112, 15.9 fps, frame time: 63.02 (ms) ]
[cudaDecodeGL] - [Frame: 0128, 266.1 fps, frame time: 3.76 (ms) ]
[cudaDecodeGL] - [Frame: 0144, 16.2 fps, frame time: 61.83 (ms) ]
[cudaDecodeGL] - [Frame: 0160, 341.3 fps, frame time: 2.93 (ms) ]
[cudaDecodeGL] - [Frame: 0176, 280.3 fps, frame time: 3.57 (ms) ]
[cudaDecodeGL] - [Frame: 0192, 16.0 fps, frame time: 62.61 (ms) ]
[cudaDecodeGL] - [Frame: 0208, 308.8 fps, frame time: 3.24 (ms) ]
[cudaDecodeGL] - [Frame: 0224, 16.1 fps, frame time: 62.29 (ms) ]
[cudaDecodeGL] - [Frame: 0240, 512.3 fps, frame time: 1.95 (ms) ]
[cudaDecodeGL] - [Frame: 0256, 435.0 fps, frame time: 2.30 (ms) ]
[cudaDecodeGL] - [Frame: 0272, 15.7 fps, frame time: 63.49 (ms) ]
[cudaDecodeGL] - [Frame: 0288, 451.9 fps, frame time: 2.21 (ms) ]
[cudaDecodeGL] - [Frame: 0304, 220.3 fps, frame time: 4.54 (ms) ]
[cudaDecodeGL] - [Frame: 0320, 354.1 fps, frame time: 2.82 (ms) ]

[cudaDecodeGL] statistics
	 Video Length (hh:mm:ss.msec)   = 00:00:07.779
	 Frames Presented (inc repeats) = 329
	 Average Present Rate     (fps) = 42.29
	 Frames Decoded   (hardware)    = 329

except I can’t see the video context even the window popup or event I set device=1 option. I will issue another topic if it is a real problem. I hear of driver 375.66, or maybe some old one has memory leak problem, so I will update 375.66 to some new version but has same main version number with .run file to ignore it after I done many critical tasks.
I’ll go back to verify the deepstream environment soon.

Hi AastaLLL, I just create a environment exact same as “2.1 SYSTEM REQUIREMENTS” in “Chapter 2. INSTALLATION” of <DEEPSTREAM SDK DU-08633-001_v03.1 | June 2017 User Guide> and DeepStream decPerf sample runs well.

[DEBUG][14:45:51] Device name: Tesla P4
[DEBUG][14:45:52] Video [0]:  Decode Performance: 427.82 frames/second || Decoded Frames: 500

I will try to freeze this environment for further developing, Thanks.

duplicate content as server return 500

Hi,

Thanks for your feedback.
After aligning the environment of 2.1 SYSTEM REQUIREMENTS, could you launch the deepstream sample successfully?

Thanks.

Hi, Yes,
we install alignment package with .run or tar file, decPerf sample run successfully, soon we will roll the nvDecInfer_detection sample, as it is full functions to reference.
I will let your know if we have results. Thanks.

we launched the deepstream nvDecInfer_detection sample successfully, we can view the video with bbox when we set this:

DISPLAY_GPU=1 # GeForce GT 710 connected with monitor
INFER_GPU=0 # Tesla P4
-gui=1

and we got “Analysis Pipeline Performance” average in 100 fps(this sample analysising max speed is for playback using sleep_for function, don’t reference this result as Tesla P4 throughput please!).
Thanks.

Thanks for the feedback.
Happy Holiday : )