How can I print a number pyramid in CUDA?

user366312 · August 14, 2023, 4:27pm

I wanted to print the following number pyramid in CUDA:

The follwing listing is the CPU version which works properly.

Host-only listing:

#include <stdio.h>

#define N 5 // Number of rows in the pyramid

void printPyramid()
{
    for (int tid = 0; tid < N; tid++)
    {
        // Calculate the number of spaces before the current row
        int numSpaces = N - tid - 1;

        // Calculate the starting number for the current row
        int startNum = tid + 1;

        // Print the spaces before the numbers
        for (int i = 0; i < numSpaces; i++)
        {
            printf(" ");
        }

        // Print the numbers for the current row
        for (int i = 0; i < startNum; i++)
        {
            printf("%d ", startNum);
        }

        printf("\n");
    }
}

int main()
{
    printPyramid();
    
    return 0;
}

However, my CUDA listing doesn’t work as expected and doesn’t produce the proper output:

user_name@192:~/CUDA_$ ./exe
          3 2 3 1 2 3 5 4 5 4 5 4 5 4 5




user_name@192:~/CUDA_$

CUDA listing:

#include <stdio.h>

#define N 5 // Number of rows in the pyramid

__global__ void printPyramid()
{
    int tid = threadIdx.x;
    
    // Calculate the number of spaces before the current row
    int numSpaces = N - tid - 1;
    
    // Calculate the starting number for the current row
    int startNum = tid + 1;
    
    // Print the spaces before the numbers
    for (int i = 0; i < numSpaces; i++)
    {
        printf(" ");
    }
    
    // Print the numbers for the current row
    for (int i = 0; i < startNum; i++)
    {
        printf("%d ", startNum);
    }
    
    printf("\n");
}

int main()
{
    // Launch the CUDA kernel with 1 block and N threads per block
    printPyramid<<<1, N>>>();
    
    // Wait for the kernel to finish
    cudaDeviceSynchronize();
    
    // Check for any errors during the kernel launch
    cudaError_t error = cudaGetLastError();
    if (error != cudaSuccess)
    {
        printf("CUDA error: %s\n", cudaGetErrorString(error));
        return 1;
    }
    
    return 0;
}

How can I fix this listing?

Robert_Crovella · August 14, 2023, 4:49pm

CUDA doesn’t impose any particular thread execution ordering (unless you impose it). Therefore we would have to approach such a problem carefully to write “proper” code. We’d like to avoid using extensive serialization of threads in a warp; if we had extensive serialization there (which is certainly one way to solve a number of issues) we might just as well write a single-threaded realization.

Therefore, I suggest it will be important to break the print-out into phases. We will need to identify chunks of work that are independent that is don’t depend on ordering, and then impose just enough ordering to handle the rest.

Looking at your desired diagram, I note the following:

we must have ordering from line to line. We want the line of 2 to print after the line of 1, for example.
we have the issue of the variability of the number of leading spaces on the line
for the actual non-whitespace print-out, we can see some opportunity for parallel/independent work there. For example each thread that is responsible for printing could print its number plus a single space. That level of print-out is independent of the ordering among threads.

Based on those ideas I come up with this:

# cat t9.cu
#include <cstdio>
const int N = 5; // must be 1024 or less

__global__ void k(){

  for (int i = 1; i < N+1; i++) {
    if ((threadIdx.x) > (i-1)) printf(" "); // print leading spaces "in parallel"
    __syncthreads();                             // before printing any digits
    if (threadIdx.x < i) printf("%d ", i);  // print the digits for each line "in parallel"
    __syncthreads();                             // before terminating the line
    if (!threadIdx.x) printf("\n");         // terminate each line (using only 1 thread)
    __syncthreads();                             // before printing the next line
  }
}

int main(){

  k<<<1,N>>>();
  cudaDeviceSynchronize();
  cudaError_t err = cudaGetLastError();
  if (err != cudaSuccess) printf("%s\n", cudaGetErrorString(err));
}
# nvcc -o t9 t9.cu
# ./t9
    1
   2 2
  3 3 3
 4 4 4 4
5 5 5 5 5
#

The formatting choices might need to be adjusted for N > 9.

user366312 · August 14, 2023, 6:00pm

If CUDA doesn’t guarantee thread ordering, why does the following listing print fibonacci numbers in a perfect order?

#include <stdio.h>

__global__ void fibonacci(int *fib, int n) {
    int tid = blockIdx.x * blockDim.x + threadIdx.x;
    
    if (tid <= 1) {
        fib[tid] = tid;
        return;
    }
    
    int previous = 0;
    int current = 1;
    int next;
    
    for (int i = 2; i <= tid; i++) {
        next = previous + current;
        previous = current;
        current = next;
    }
    
    fib[tid] = current;
}

int main() {
    int n;
    
    printf("Enter the number of Fibonacci numbers to generate: ");
    scanf("%d", &n);
    
    int *d_fib;
    int *h_fib = (int *)malloc(n * sizeof(int));
    
    cudaMalloc(&d_fib, n * sizeof(int));
    
    fibonacci<<<1, n>>>(d_fib, n);
    
    cudaMemcpy(h_fib, d_fib, n * sizeof(int), cudaMemcpyDeviceToHost);
    
    printf("Fibonacci Series: ");
    for (int i = 0; i < n; i++) {
        printf("%d ", h_fib[i]);
    }
    
    free(h_fib);
    cudaFree(d_fib);
    
    return 0;
}

Robert_Crovella · August 14, 2023, 6:03pm

Because the kernel you have now shown has entirely independent work between threads. The threads can execute in any order, and the results will be calculated properly.

The same statement is not true for the mass of printout in your pyramid example. The various pieces of printout must be done in a particular order.

The “thread ordering” I referred to previously might be clearer if I say “thread execution ordering” CUDA does not guarantee the order in which threads will execute.

system · August 28, 2023, 6:04pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA Threads CUDA Programming and Performance	3	4280	March 27, 2008
Can CUDA do sequential processing? CUDA Programming and Performance	7	6520	August 24, 2011
Simple question on passing to the kernel CUDA Programming and Performance	15	3391	January 15, 2012
CUDA 2d Array Mapping CUDA Programming and Performance	1	3468	April 24, 2015
threadIdx.y question CUDA Programming and Performance	7	6625	September 21, 2015
It seems that CUDA kernel are not running parallel CUDA Programming and Performance	5	22	August 29, 2024
Thread Synchronization CUDA Programming and Performance	4	8487	October 2, 2009
Why cuda printf in threads could not print in complete randomness? CUDA Programming and Performance	2	1550	April 8, 2018
QUIT CUDA? Kernel and pinned memory gives strange results CUDA Programming and Performance	6	6715	September 22, 2011
Need a little help to understand how thread change/works CUDA Programming and Performance	4	3848	December 10, 2011

How can I print a number pyramid in CUDA?

Related topics