nvcaffe 0.17 used in two plugins in the same pipe crashes

frederickk · January 29, 2019, 2:42am

When used only in one plugin, nvcaffe 0.17 works fine.
When I use it in 2 plugins in the same pipe I get

F0129 02:34:36.384413 15692 cudnn_conv_layer.cu:55] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure

The last line below is line 55

} else {
// “old” path
for (int i = 0; i < bottom.size(); ++i) {
const Ftype* bottom_data = bottom[i]->gpu_data();
Ftype* top_data = top[i]->mutable_gpu_data();
// Forward through cuDNN in parallel over groups.
const size_t gsize = ws->size() / ws_groups();
CHECK(is_even(gsize));
for (int g = 0; g < groups(); ++g) {
void* pspace = static_cast<unsigned char*>(ws->data()) + gsize * idxg(g);
// Filters.
CUDNN_CHECK(cudnnConvolutionForward(Caffe::cudnn_handle(idxg(g)),
cudnn::dataType::one, fwd_bottom_descs_[i], bottom_data + bottom_offset_ * g,
fwd_filter_desc_, weight + this->weight_offset_ * g,
fwd_conv_descs_[i], fwd_algo_[i], pspace, gsize,
cudnn::dataType::zero, fwd_top_descs_[i], top_data + top_offset_ * g));
}
// NOLINT_NEXT_LINE(whitespace/operators)
for (int ig = 0; ig < ws_groups(); ++ig) {
CUDA_CHECK(cudaStreamSynchronize(Caffe::thread_stream(ig)));

frederickk · January 29, 2019, 5:16am

Some additional information:

This happens while both plugins (gstreamer) are executing inside
Net::Forward.

I use gst-launch.

AastaLLL · January 29, 2019, 5:58am

Hi,

CUDA error 4 is cudaErrorLaunchFailure:
An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. All existing device memory allocations are invalid. To continue using CUDA, the process must be terminated and relaunched.

Could you share more information about the ‘plugin’?
Is it a customized layer in your caffe frameworks? Or you are using TensorRT?

Thanks.

frederickk · January 29, 2019, 6:13am

Nvcaffe is used simply, load model and infer.

They are part of two separate gstreamer/DeepStream plugins/elements.

I run them in a gstreamer pipe using gst-launch. The first plugin receives video frames from uridecodebin and sends it to nvtracker, which then sends it to the second plugin. The sink is a fakesink.

Each plugin is a separate *.so shared library loaded by gst-launch.

frederickk · January 29, 2019, 6:15am

To clarify, the term “plugin” is used in the sense of gstreamer/DeepStream plugin.

frederickk · January 29, 2019, 6:20am

I even ran simple experiments just to make sure it wasn’t memory corruption.
Each plugin just repeatedly inputs the same cv::Mat and does a Net::Forward.

Same error.

Both seem to be exercising code in cudnn_conv_layer.cu at the same time.

frederickk · January 30, 2019, 10:47am

I am also getting this:

W0130 10:43:05.240928 23861 gpu_memory.cpp:129] Lazily initializing GPU Memory Manager Scope on device 0. Note: it’s recommended to do this explicitly in your main() function.

Not sure if it is related to the crash, but how do I initialize the “GPU Memory Manager Scope”?

frederickk · January 31, 2019, 12:59am

I did more digging and found that
test_mem_req_all_grps_
is a static member of CuDNNConvolutionLayer

So my question is:
Is nvcaffe cudnn_conv_layer (.cu,.hpp,.cpp) safe to be used in two separate inferencing Net objects inferencing in separate threads?

Also, is there a better forum for this question?

frederickk · January 31, 2019, 1:02am

I checked the main codeline version of caffe. That version of CuDNNConvolutionLayer does not have static data members.

frederickk · January 31, 2019, 7:00am

Discussion continued here
https://github.com/NVIDIA/caffe/issues/555

Topic		Replies	Views
nvcaffe 0.17 used in two Jetson TX 2 gstreamer plugins in the same pipe crashes Container: NVCaffe	2	912	May 15, 2019
Is nvcaffe cudnn_conv_layer (.cu,.hpp,.cpp) safe to be used in two separate inferencing Net objects ... Jetson TX2	3	524	January 31, 2019
Installing and Running Jetpack 3.2 Caffe problem GPU-Accelerated Libraries	3	627	April 16, 2019
crash when using multi-GPU on caffe. CUDNN_STATUS_EXECUTION_FAILED cuDNN	3	1182	February 22, 2019
failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED Jetson TX2	10	1383	March 1, 2018
Caffe make faild Jetson Nano cuda , caffe	10	2229	November 25, 2021
crash when importing caffe model with plugin layers TensorRT	9	1337	October 31, 2018
NVCaffe crash Container: NVCaffe	1	1408	May 31, 2018
NVCaffe training out of memory GPU-Accelerated Libraries	3	861	December 21, 2017
CUDA Fail when running Tensorflow inference Jetson TX2	10	3528	February 2, 2018

nvcaffe 0.17 used in two plugins in the same pipe crashes

Related topics