The speed of program run on multiple SMs is similar to the speed that run on single SM?

Julier · September 25, 2021, 3:25am

Hello, everyone.
Accroding my understand of SMs, I think all of them is paralled and independed each other. So when I try put my program to multiple SMs from single SM, I expected the speed would promote 82x (My GPU is 3090 that process 82 SMs) than before. But the result is no difference between two schemes. I don’t know why. Is my understand wrong?

Robert_Crovella · September 25, 2021, 3:42pm

Please do not post pictures of code. Post the code as text, and use the available text formatting tools in the edit box.

Perhaps you are confusing SMs with streams. They are essentially unrelated.

I see no evidence in the code you have posted that in one case you are using a single SM, and in another case you are using multiple SMs.

A single SM launch might look like this:

cudaHashRandom <<<1, THREAD_3090, 0, stream[i]>>>(...);

A multiple SM launch might look like:

cudaHashRandom <<<160, THREAD_3090, 0, stream[i]>>>(...);

You haven’t shown any code that sets BLOCK_NUM and even if you did, I see no conditional behavior inside the while loop that is using alternate values.

If your question is actually about stream usage, that is unclear. But I will repeat, there is no connection between stream usage and SM usage. A single kernel launch running in a single stream can easily use all the SMs in your device.

Topic		Replies	Views
Cuda multi stream schedule CUDA Programming and Performance	2	1571	October 11, 2023
Running CUDA kernels from two different pthreads CUDA Programming and Performance	7	2928	May 10, 2016
Distribution Threads by the SMs CUDA Programming and Performance	1	569	December 15, 2014
What will be happen in the situation CUDA Programming and Performance	9	6264	December 23, 2008
More blocks than SMs may not make sense CUDA Programming and Performance	13	2732	November 11, 2010
Some questions on selecting a certain number of SM to simulate. CUDA Programming and Performance	2	1018	December 4, 2012
Concurrent execution of kernels on the same SM CUDA Programming and Performance	1	554	October 28, 2021
CUDA thread and SM CUDA Programming and Performance	1	951	September 30, 2021
Kernel Launch: number of blocks CUDA Programming and Performance	1	1715	May 21, 2009
how to run the same program in different number of sm cores? CUDA Programming and Performance	4	726	July 4, 2017

The speed of program run on multiple SMs is similar to the speed that run on single SM?

Related topics