GPU Pro Tip: CUDA 7 Streams Simplify Concurrency

anon95180265 · June 2, 2015, 1:41pm

Could it be this issue? https://github.com/thrust/t... It's fixed in the thrust trunk, but not in the version included in CUDA 7. You could try pulling the latest from github to compare.

anon84996908 · June 3, 2015, 10:53am

Indeed that fixed it. Thanks!

anon46359231 · July 13, 2015, 8:18am

how could i enable this function in visual stdio 2012 cudaruntime 7.0 soulution. I define define the CUDA_API_PER_THREAD_DEFAULT_STREAMpreprocessor macro before I include CUDA headers kernel.cu file. but it not works. I want to konw why ?

anon95180265 · July 13, 2015, 11:46pm

I don't have a windows machine in front of me, but I know that Visual Studio let's you add custom command line flags. So you could set -default-stream-per-thread on the command line. But the env variable should work too -- without seeing your code/project, I can't tell for sure why it's not working for you. Did you set it in one of the files, or in the project settings? I would set it in the environment box of the CUDA C++ project settings dialog.

anon46359231 · July 14, 2015, 1:33am

I seti it in the project. project->properties->cuda c/c++->host->preprocessor defenitions->CUDA_API_PER_THREAD_DEFAULT_STREAM

the code is the same as you, and the code is in a .cu file.

anon23045104 · July 30, 2015, 7:10am

i wonder if the method works in windows?i have the same questions as wei that i can't enable this function in vs2010.

anon95180265 · July 31, 2015, 2:05am

It should work -- I realized there was a typo in one spot where I had `--default-stream-per-thread` when it should be `--default-stream per-thread` (note space). Can you make sure you set the flag correctly?

anon1959062 · September 10, 2015, 11:34pm

Hello, I'm not sure if this post is for this kind of question, if not please excuse me. I am compiling Caffe with cuDNN for their use with DIGITS. I got this error:

src/caffe/layers/cudnn_conv_layer.cu(56): error: identifier "cudaStreamLegacy" is undefined

src/caffe/layers/cudnn_conv_layer.cu(137): error: identifier "cudaStreamLegacy" is undefined

Could you please give any advise about it?

Thanks a lot.

anon95180265 · September 10, 2015, 11:56pm

Make sure you have CUDA 7.0 or later -- cudaStreamLegacy is new in CUDA 7.

anon1959062 · September 11, 2015, 3:29am

I installed this two versions:

cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
cudnn-7.0-linux-x64-v3.0-prod.tgz

anon51788664 · November 9, 2015, 12:43pm

Hi Mark,

I am trying an spmv, so far it works fine with matrix that can fit into the GPU memory. I am about to try larger matrix which is larger than my GPU memory 4GB.
Can the streams be used in that scenario? At the moment I am passing the CSR's to the GPU and then once the GPU has the CSR's it performs the SPMV. Not sure how that would be possible with streams?

anon41583842 · December 30, 2015, 11:48pm

Hi Mark,

I'm a little late to this, but does this apply to thrust? i.e with CUDA 7, running thrust algorithms on different threads means those algorithms run using their per-thread default stream?

EDIT: Nevermind, I see Omar's question below.

anon93414281 · January 29, 2016, 5:37pm

My CUDA version is 7.5, I am using Visual Studio 2013, device is GTX 850M but when I run the program, nvvp shows no timeline. What can be the problem ?

And when I run nvprof, it says :

==8128== NVPROF is profiling process 8128, command: CudaTest.exe
==8128== Profiling application: CudaTest.exe
==8128== Profiling result:
No kernels were profiled.

==8128== API calls:
No API activities were profiled.

anon47460824 · April 16, 2016, 8:06am

I test the first one code on my PC, in the linux, it works fine. But in the windows, with VS2013 the stream doesn't run concurrently, with the "default-stream per-thread" flag. Also, I compile the code in the command-line "nvcc –default-stream per-thread ./stream_test.cu -o stream_per-thread". It doesn't work concurrently too.

anon92410033 · October 10, 2016, 8:44am

dear Mr. Harris,

I have try out your instructions above and then I come an idea like this.

void threadExecute(void *input_data, int nx)
{
cufftComplex *data = (cufftComplex*)input_data;
cufftHandle plan;
cufftPlan1d(&plan, nx, CUFFT_C2C, 1);
cufftSetStream(plan, this_stream);
cufftExecC2C(plan, data, data, CUFFT_FORWARD);
cufftDestroy(plan);
cudaStreamSynchronize(0);
}

As I understand, each CPU thread will be given it own stream on GPU. Does it correct ?

If it is correct how can I get the stream that is assigned to each CPU thread so that I can pass it to the cufftSetStream().

If it is not correct so how can I use cufft API with multiple CPU thread and multiple stream?

Could you please help me with this?

I will be very appreciate.

anon95180265 · October 12, 2016, 9:28pm

See my answer above regarding the NPP library, which is similar. If you follow the instructions in the post and compile your code to use "-default-stream per-thread", you should be able to pass cudaStreamPerThread to cufftSetStream() so that it uses the default stream in each thread. Does this work?

anon87304771 · March 22, 2017, 12:58am

Hello Mark,
I understand that as you mentioned, Enabling PTDS for your compilation units doesn't enable it for libraries that are separately compiled.
But I wish to enable PTDS for thrust library
How to I call thrustSetStream()
Thank you.

anon95180265 · March 22, 2017, 1:58am

To set a stream for a Thrust algorithm you need to use the .on() method on the cuda::par execution policy, like so:

thrust::sort(thrust:cuda::par.on(stream), begin, end, comparator)

anon78416780 · July 10, 2017, 11:10am

Hi Mark,
The example above works perfectly in my ubuntu system as well.
I could even obtained Figure 2 shown above using the command
nvcc --default-stream per-thread

However, when I try to do the same thing in the Nsight editor, I do not see the effect of --default-stream per-thread command.
I have written the command " --default-stream per-thread" on the Command box as "nvcc --default-stream per-thread" on the project properties -> settings -> Tool Settings in the NVCC Compiler.

I suspect if this is the correct place to put this flag. I have even tried putting it on the Build Stages -> Preprocessor options (-Xcompiler) but that too did not work.

Could you please guide me where should I put this command on the Nsight Editor.

Thanks and Warm Regards
Amit Gurung

anon95180265 · July 10, 2017, 5:49pm

Hi Amit,

Try adding the flag in Project Properties -> Settings -> Tool Settings -> NVCC Compiler -> Expert Setting:

1. ${COMMAND} --default-stream per-thread ${FLAGS} ${OUTPUT_FLAG} ${OUTPUT_PREFIX} ${OUTPUT} ${INPUTS}

Topic		Replies	Views
Cannot get any stream parallelism. CUDA Programming and Performance	13	1248	December 31, 2019
How to Overlap Data Transfers in CUDA C/C++ Technical Blog	23	2160	January 18, 2023
Time intervals and non-concurrent in multi streaming CUDA Programming and Performance cuda	6	563	April 6, 2023
Performances of multi-thread vs multi-process with MPS CUDA Programming and Performance	2	2980	August 20, 2018
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134546	May 26, 2010
Performance drops with dynamic parallelism CUDA Programming and Performance cuda , dynamic-control	12	473	June 3, 2024
Why does cudaStreamAddCallback serialize kernel execution and break concurrency? CUDA Programming and Performance	12	7932	April 5, 2015
CUDA very slow performance CUDA Programming and Performance	21	16416	March 6, 2020
Overlapping CPU and GPU operations using streams. Total failure. Any help? CUDA Programming and Performance	6	5974	April 2, 2013
An Even Easier Introduction to CUDA Technical Blog	141	6058	November 28, 2023

GPU Pro Tip: CUDA 7 Streams Simplify Concurrency

Related topics