Best practices for CPU vs GPU programming in C?

abalducci0 · September 14, 2024, 7:07pm

So, I’ve been able to think of a number of naïve ways to accomplish this, but does anyone know of any resources/documents that apply towards CPU vs GPU programming in C ?

By this, I mean lets say my program had a matmul function-- However I don’t know, or presume that the person using it has a CUDA enabled device.

I know how to perform the needed detection as the program starts, and I guess, if found I could just set some necessary flags, and then in the matmul function have a if/else or switch for how the operator is performed (i.e. I want to avoid having a matmulCPU function and a matmulGPU function, and of course the detection part would only be done once at the start).

I just didn’t know whether there was a more elegant/preferred way of handling this type situation ?

Robert_Crovella · September 14, 2024, 8:51pm

This may be of interest, including the links. I’m not sure exactly what constitutes “elegant/preferred”, but the stdpar and OpenACC methods have various benefits: concise expression, portability, CPU or GPU operation, mostly unified code paths, etc.

abalducci0 · September 14, 2024, 9:43pm

@Robert_Crovella thanks, I will have a look.

By more ‘elegant/preferred’, I just meant the easiest way would to just create global variables to hold the GPU configuration (and whether absent or present); But these days using global vars is mostly seen as a bit ‘verboten’.

At the same time passing configuration points to many disparate functions when they will not be used could start to seem tedious;

That is just why I wondered if there was some other way or to know what is seen as best practice.

Curefab · September 15, 2024, 1:59pm

Please do not forget that for many kernels copying from host (CPU) memory to device (GPU) memory and back is the limiting factor, if done for each operation.

So think about a way to use classes (if C++) or e.g. function pointers in structs (C) to abstract the location of your data.

That is perhaps even more critical for seamless operations than that the kernels are called. If you forget one kernel call, than your program may be slow, when the data normally resides on CPU; but if you move the data to GPU, you potentially would get UB, when accessing.

Managed memory is a possible solution, but it is slow compared to more direct implementations.

abalducci0 · September 15, 2024, 2:36pm

@Curefab thank you for your perspective and ‘pointers’.

In this case it regards a small community project of which I am not the lead, and C has been selected as the chosen language (honestly I’d much have a ‘real’ full OOP environment), so was just trying to work out/through my options.

Best,
-A

Curefab · September 16, 2024, 12:57pm

There are some C projects with better abstraction than an average C++ program. But it is far less nice.

I would distinguish from the abstraction of the operations (device functions running on the GPU) and abstraction of calling those functions (preparatory work on CPU and perhaps the global function on the GPU itself).

For the device functions I would try to share as much code 1:1 as possible.

For the dispatch code (preparatory work and global functions) I would try to make it as similar as possible between each function, which is called in the end. Either use conditionals or macros or code generation or write manually.

Topic		Replies	Views
Same c++/cuda code base to work on demand on cpu OR gpu? CUDA Programming and Performance	10	325	September 11, 2024
Automatic Acceleration if GPU is Available? CUDA Programming and Performance	3	763	February 3, 2018
CUDA basic discussions(Looping, branching) Why GPU faster than CPU when both use C? CUDA Programming and Performance	6	18831	August 22, 2007
Reuse Code for CPU and GPU CUDA Programming and Performance	2	905	March 2, 2013
Program without CUDA is faster CUDA Programming and Performance	6	10568	December 19, 2008
C++ (compiled with g++) and CUDA C (compiled with nvcc) recommended program organization and file st CUDA Programming and Performance	0	579	August 30, 2015
Common sources for GPU(cuda) and CPU CUDA Programming and Performance	0	390	December 12, 2017
Multi-CPU + 2 GPU on computational expensive process CUDA Programming and Performance	0	409	June 28, 2021
Using CUDA with CPU? CUDA Programming and Performance	5	3581	March 12, 2008
Function is much slower on GPU than on CPU CUDA Programming and Performance cuda	4	645	July 22, 2022

Best practices for CPU vs GPU programming in C?

Related topics