Muliptle streams don't speed up processing

eitan · February 22, 2017, 12:26pm

Hi All,

I’m trying to process video images with multiple streams.
I see small improvement on my cloud GPU:

vc@atropos:/$ lspci | grep -i nvidia
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)

Here the code. Can someone run it and tell me what times he got ?

I’m getting:
processing video_small_01.mp4 streams 2
1M resize on video_small_01.mp4 finished on 25.7128
10k resize on video_small_01.mp4 finished on 0.002545

===================

Compilation and run with:

#export DYLD_LIBRARY_PATH=“$DYLD_LIBRARY_PATH:/usr/local/boost/lib”
#export LD_LIBRARY_PATH=“$DYLD_LIBRARY_PATH:/usr/local/boost/lib”

g++ test_decoding.cpp -o test_decoding pkg-config --cflags --libs opencv -I/usr/local/cuda/include/ -I/usr/local/boost/include -L/usr/local/cuda/lib64/ -L/usr/local/boost/lib/ -lboost_thread -lboost_system

#./test_decoding video_small_01.mp4 video_small_02.mp4 video_small_03.mp4 video_small_04.mp4
./test_decoding video_small_01.mp4 2

NOTE: Replace with real video file

=================== test_decoding.cpp ===================

//
// g++ test_CV.cpp -o test_CV pkg-config --cflags --libs opencv
// -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/
//

#include <boost/thread.hpp>
#include <opencv/cv.h>
#include <opencv/highgui.h>
#include <opencv/ml.h>
#include <opencv/cxcore.h>
#include
#include “opencv2/opencv.hpp”
#include “opencv2/cudaimgproc.hpp”
#include “opencv2/cudawarping.hpp”
#include “opencv2/cudaobjdetect.hpp”
#include “opencv2/cudafilters.hpp”
#include “opencv2/cudacodec.hpp”
#include “opencv2/cudaarithm.hpp”
cv::cuda::Stream s[100];
//cv::Ptrcv::cudacodec::VideoReader GpuCap[4]; //= cv::cudacodec::createVideoReader();

void processfilter(std::string fname, int sn)
{
int streams = sn;
cv::cuda::GpuMat frames[1000];
cv::cuda::GpuMat outresize[2*sn];
cv::Size ksize(256,256);
cv::Ptrcv::cudacodec::VideoReader GpuCap = cv::cudacodec::createVideoReader(fname);
clock_t full_time;
for (int i=0; i<1000; i++){
GpuCap->nextFrame(frames[i]);
}
full_time = clock();
for (int j=0; j<1000; j++){
for (int i=0; i<1000; i++){
cv::cuda::resize(frames[i], outresize[i%sn], ksize,0, 0, cv::INTER_LINEAR, s[i%sn+1]);
}
}
std::cout << "1M resize on "<< fname<< " finished on " << float(clock()-full_time) / CLOCKS_PER_SEC << std::endl;
full_time = clock();
for (int j=0; j<10; j++){
for (int i=0; i<10; i++){
cv::cuda::resize(frames[i], outresize[i%sn], ksize,0, 0, cv::INTER_LINEAR, s[i%sn+1]);
}
}
std::cout << "10k resize on "<< fname<< " finished on " << float(clock()-full_time) / CLOCKS_PER_SEC << std::endl;

}

int main( int argc, char** argv )
{
int sn = atoi(argv[2]);
std::cout<<“processing " << argv[1]<<” streams "<< sn <<std::endl;
processfilter(argv[1], sn );
}

gu_xiangtao · February 24, 2017, 1:04am

Maybe you can refer here:[url]https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-cuda-7-streams-simplify-concurrency/[/url]

eitan · February 27, 2017, 2:10pm

hi Gu Xiangtoa, thanks for comment.
Question, are you suggesting to use –default-stream per-thread ?
How this can be used with opencv ?
As you can see I’m working with cv::cudacodec::createVideoReader and cv::cuda::resize.

Thanks, Eitan

BulatZiganshin · March 1, 2017, 1:34pm

Muliptle streams don’t speed up processing

gpu kernels run on all gpu cores simultaneously, so streams can improve speed only in specific situatuions, such as a lot of device-cpu copying

S1NH · June 17, 2017, 1:44pm

The same problem. How –default-stream per-thread can be used with opencv ?

Thanks

Topic		Replies	Views
Sample AppDecMultiFiles in VideoCodec SDK does not improve the performance Video Processing & Optical Flow	1	737	December 6, 2019
multi task parallelization with cuda streams ? CUDA Programming and Performance	7	1593	September 14, 2017
How to enable multi streams when I use codec to decode multiple videos? Video Processing & Optical Flow	1	1002	May 14, 2018
Unable to run concurrent opencv cuda functions through Streams CUDA Programming and Performance opencv , cuda	2	1448	May 28, 2021
OpenCV CUDA Streams do not execute in parallel CUDA Programming and Performance opencv , cuda	2	2660	October 12, 2021
Why I can't lauch the Nv12ToBgra32 kernel function with different streams parallel? Video Processing & Optical Flow	1	796	June 4, 2018
nvJpeg Library-- How do you use cuda streams to get best concurrency Deep Learning (Training & Inference)	0	602	August 23, 2018
Cuda multithreading and stream problems generic system issues CUDA Programming and Performance	4	3445	August 15, 2008
Speedup by increasing # of streams vs. batch size TensorRT	2	795	June 23, 2022
cudaStream performance CUDA Programming and Performance	7	1726	June 21, 2016

Muliptle streams don't speed up processing

Related topics