CUDA Error Running nvDecInfer_detection

pramod.babu · May 20, 2018, 4:11pm

All,

I am trying to install and run deepstream SDK on GTX 1080 Ti GPU.
decPerf and nvDecInfer_classification sample application seems to be running well. But when i try to run
nvDecInfer_detection, i have the below error. Am i missing anything ?

[DEBUG][21:34:52] Device ID for display [0]: GeForce GTX 1080 Ti
[DEBUG][21:34:52] Device ID for inference [0]: GeForce GTX 1080 Ti
[DEBUG][21:34:52] Video channels: 4
[ERROR][21:34:52] Warning: No mean files.
[DEBUG][21:34:52] GUI enabled.
[DEBUG][21:34:52] Endless Loop: 0
[DEBUG][21:34:52] Device name: GeForce GTX 1080 Ti
[DEBUG][21:34:53] Use INT8 data type.
Cuda error in file src/implicit_gemm.cu at line 1214: invalid configuration argument
customWinogradConvActLayer.cpp (301) - Cuda Error in execute: 9
customWinogradConvActLayer.cpp (301) - Cuda Error in execute: 9
sample_detection: src/nvInferLite.cpp:444: void NvInferLite::caffeToTensorRTModel(const char*, const char*, nvcaffeparser1::ICaffeParser*, size_t): Assertion `pEngine_’ failed.
./run.sh: line 43: 28496 Aborted (core dumped) …/bin/sample_detection -devID_display=${DISPLAY_GPU} -devID_infer=${INFER_GPU} -nChannels=${CHANNELS} -fileList=${FILE_LIST} -deployFile=${DEPLOY} -modelFile=${MODEL} -labelFile=${LABEL} -int8=1 -calibrationTableFile=${CALIBRATION} -tileWidth=${TILE_WIDTH} -tileHeight=${TILE_HEIGHT} -tilesInRow=${TILES_IN_ROW} -fullscreen=0 -gui=1 -endlessLoop=0

AastaLLL · May 21, 2018, 3:23am

Hi,

CUDA error 9 indicates invalid configuration:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

<b>cudaErrorInvalidConfiguration = 9</b>
This indicates that a kernel launch is requesting resources that can never be satisfied by the 
current device. Requesting more shared memory per block than the device supports will trigger this 
error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.

Could you monitor the GPU memory usage with nvidia-smi?

Thanks.

pramod.babu · May 21, 2018, 3:27am

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1268 G /usr/lib/xorg/Xorg 59MiB |

pramod.babu · May 21, 2018, 3:29am

Hi,

This is the usage when the process tries to start and then crashes with Assert.
Every 1.0s: nvidia-smi Mon May 21 08:58:41 2018

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1268 G /usr/lib/xorg/Xorg 59MiB |
| 0 30513 C …/bin/sample_detection 225MiB |
±-

Thanks

pramod.babu · May 22, 2018, 1:19am

Hi,

Removing -int8=1 from run.sh is able to bypass this error. Does this give any clues

…/bin/sample_detection -devID_display=${DISPLAY_GPU}
-devID_infer=${INFER_GPU}
-nChannels=${CHANNELS}
-fileList=${FILE_LIST}
-deployFile=${DEPLOY}
-modelFile=${MODEL}
-labelFile=${LABEL}
-int8=1
-calibrationTableFile=${CALIBRATION}
-tileWidth=${TILE_WIDTH}
-tileHeight=${TILE_HEIGHT}
-tilesInRow=${TILES_IN_ROW}
-fullscreen=0
-gui=1
-endlessLoop=0

Thanks

AastaLLL · May 24, 2018, 6:25am

Hi,

Could you share your CUDA/cuDNN/TensorRT/DeepStream version with us?

For DeepStream-1.5, please remember to install the recommended package for compatibility.

2.1 >> SYSTEM REQUIREMENTS
► Ubuntu 16.04 LTS (with GCC 5.4)
► NVIDIA Display Driver R384
► NVIDIA VideoSDK 8.0
► NVIDIA CUDA® 9.0
► cuDNN 7 & TensorRT 3.0

Thanks.

pramod.babu · May 24, 2018, 6:32am

Hi,

Ubuntu - DISTRIB_DESCRIPTION=“Ubuntu 16.04.4 LTS”

I did try with TensorRT - 3.0.4 and do have the same error.

Thanks

AastaLLL · May 25, 2018, 8:16am

Hi,

Could you try CUDA sample to help us narrow down the issue?
Thanks.

pramod.babu · May 28, 2018, 1:07am

Hi,

Any specific sample program output are you looking for?

Thanks
Pramod

AastaLLL · May 28, 2018, 7:42am

Hi,

Could you test a sample with CUDA kernel code?
For example, vectorAdd located at ‘samples/0_Simple/vectorAdd’?

Thanks.

pramod.babu · May 28, 2018, 7:46am

CUDA Sample outpus

./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

./vectorAddDrv

Using Device 0: “GeForce GTX 1080 Ti” with Compute 6.1 capability
findModulePath found file at <./vectorAdd_kernel64.ptx>
initCUDA loading module: <./vectorAdd_kernel64.ptx>
PTX JIT log:

Result = PASS

AastaLLL · May 30, 2018, 8:06am

Hi,

Based on the log, CUDA toolkit and display driver works correctly on your environment.
Could you test the cuDNN functionality further?

cp -r /usr/src/cudnn_samples_v7/ .
cd cudnn_samples_v7/mnistCUDNN/
make
./mnistCUDNN

Thanks.

pramod.babu · May 31, 2018, 2:13am

/mnistCUDNN
cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : 7005 (7.0.5)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 28 Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11171, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm …
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.027648 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.036864 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.036864 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.070240 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.096064 time requiring 203008 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm …
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.020480 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.023552 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.036864 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.068576 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.094080 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

AastaLLL · June 4, 2018, 8:19am

Hi,

Looks like the error is from TensorRT package.
Could you try to reinstall TensorRT from our website?
https://developer.nvidia.com/tensorrt

To compatible with CUDA 9.0, please remember to choose TensorRT 3.0.4 for Ubuntu 1604 and CUDA 9.0 DEB local repo packages.
Thanks and please let us know the results.

CUDA Error Running nvDecInfer_detection

For DeepStream-1.5, please remember to install the recommended package for compatibility.

2.1 >> SYSTEM REQUIREMENTS ► Ubuntu 16.04 LTS (with GCC 5.4) ► NVIDIA Display Driver R384 ► NVIDIA VideoSDK 8.0 ► NVIDIA CUDA® 9.0 ► cuDNN 7 & TensorRT 3.0

2.1 >> SYSTEM REQUIREMENTS
► Ubuntu 16.04 LTS (with GCC 5.4)
► NVIDIA Display Driver R384
► NVIDIA VideoSDK 8.0
► NVIDIA CUDA® 9.0
► cuDNN 7 & TensorRT 3.0