cuda error running YOLO-TensorRT-GIE- and ZED

Hi,
I’m working on a TX2 and I’m building up an applciation using a ZED camera and the YOLO port for TensorRT (https://github.com/TLESORT/YOLO-TensorRT-GIE-) but I having lots of cuda errors at runtime.
I got the following errors:

ZED (Init) >> Video mode: HD720@30
[TensorRtEngine] loading yolo_small.prototxt yolo_small.caffemodel
in sl::ERROR_CODE sl::Mat::setFrom(const sl::Mat&, sl::COPY_TYPE) : cuda error [4]: unspecified launch failure.
in sl::ERROR_CODE sl::Mat::updateGPUfromCPU() : cuda error [4]: unspecified launch failure.
cudnnFullyConnectedLayer.cpp (282) - Cuda Error in rowMajorMultiply: 13
[TensorRtEngine] Cuda failure: 4

Sometimes I also encounter the error “cudnnConvolutionLayer.cpp (213) - Cuda Error in execute: 7”

The problem seems to perform the YOLO computation in a thread while the frame grabbing is in a different thread.

Has someone encountered this problem?

Thank you!

Hi,

Could you share more information about your environment?
Including Jetpack version, CUDA version, cuDNN version and TensorRT version.

Thanks.

Thanks you AastaLLL!
I’m using jetpack 3.1, so CUDA 8, cuDNN 6. I can’t find the tensorRT version but is the one coming with the jetpack.

Thank you!

Hi,

Could you try TensorRT 3.0 in JetPack3.2:
https://devtalk.nvidia.com/default/topic/1027301/jetson-tx2/jetpack-3-2-mdash-l4t-r28-2-developer-preview-for-jetson-tx2/

Thanks.

Hi,
at the moment I can’t try the JetaPack3.2, I will try it as soon as possible.
Thanks

Hi, I hope to test the jetpack 3.2 in a few days…
In the meantime, I’m having trouble configuring eclipse on a TX2 to build (and to make autocompletion work) with PCL lib, opencv3 and zed sdk… Is there someone that develop on the tx2? using which IDE?

Thank you!

Hi,
I have installed jetpack3.2. I’m having problems with the zed sdk: the latest version should support cuda9 but the installer says that the sdk is compatible with cuda8 only.
I have installed it anyway but I can’t build any sample in the sdk due to cuda8 requirement.

Can someone help me?

Thanks!

Hi,

We don’t have too much experience for IDE on TX2.

If cross-compiler is acceptable, you can setup Nsight on desktop to get IDE environment.

If you prefer to use IDE directly on TX2, check this topic for information.
https://devtalk.nvidia.com/default/topic/1016158/recommended-ide-for-jetson-tx2-/

For ZED sdk, please contact the developer for the CUDA 9.0 support.

Thanks.

Thank you AastaLLL!
I have sent an email to the stereolabs support… I’ll keep this thread updated!

I have installed the ZED SDK 2.3.1 (beta) as stereolabs suggest: it works with Jetpack3.2.
Anyway I’m still having cuda error at runtime: using this configuration I get the following error:

cudnnActivationLayer.cpp (93) - Cuda Error in execute: 8

Any idea?

Thank you!

Hi,

Could you run native TensorRT sample with your model first?
This check can narrow down the issue is from TensorRT or application.

cp -r /usr/src/tensorrt/ .
cd tensorrt/samples/
make
cd ../bin/
./giexec --deploy=/path/to/prototxt --output=/name/of/output

Thanks.

Hi,
the output of the command you suggest is the following:

nvidia@tegra-ubuntu:~/tensorrt/bin$ ./giexec --deploy=/home/nvidia/Desktop/tensorrt-test/yolo_small_modified.prototxt --output=result
deploy: /home/nvidia/Desktop/tensorrt-test/yolo_small_modified.prototxt
output: result
Input "data": 3x448x448
Output "result": 1470x1x1
name=data, bindingIndex=0, buffers.size()=2
name=result, bindingIndex=1, buffers.size()=2
Average over 10 runs is 97.824 ms.
Average over 10 runs is 98.756 ms.
Average over 10 runs is 98.4284 ms.
Average over 10 runs is 97.9406 ms.
Average over 10 runs is 98.6445 ms.
Average over 10 runs is 98.3895 ms.
Average over 10 runs is 97.8623 ms.
Average over 10 runs is 98.3947 ms.
Average over 10 runs is 98.4803 ms.
Average over 10 runs is 97.7743 ms.

is it ok?

I have sent an email to the stereolabs support explaining what is happening and this is their answer:

The issue probably comes from having two different CUDA context in a single application.
Your solution is to initialize the ZED with the CUDA context from YOLO or initiate YOLO with the CUDA context created by the ZED.
You can set the ZED SDK context using InitParameters.sdk_cuda_ctx, or let it create one and get it with getCUDAContext().

I’m trying to do that but the type of the context used by the ZED sdk is a CUcontext (which is a struct CUctx_st*). In TensorRT code, the only ‘context’ I use is the IExecutionContext.
Can I convert it to a CUcontext (if possible)?
Is there a way to try what stereolabs support suggests?

Thank you!

Hi,

Very interesting use case.

Could you share a sample code with us to reproduce it without a ZED camera.
For example, create a dummy CUcontext and feed it into TensorRT?

This will help us check if there is an alternative for this issue.
Thanks.

Hi AastaLLL,
I think I did not understand…
I can share a sample code with ZED sdk and YOLO_with_TensorRT (but I need to remove some parts).
It’s ok? Or do you need something else?

Thanks

Hi,

We want to reproduce this issue internally.

But we don’t have a ZED camera.
Can we reproduce this error without ZED camera?

Thanks.

Hi,
I don’t know how to reproduce this error without a zed camera.
Maybe you can make 2 threads, one working with a cuda application and the other one running a TensorRt inference.
Thanks

Hi,

Sorry for that we have no experience on ZED SDK.
Could you help to provide a sample with us? (ZED team can help?)

Thanks.